Perl, Martin Lewis, 1927-, American physicist, b. New York City, Ph.D. Columbia, 1955. He was a professor at the Univ. of Michigan from 1955 to 1963, when he accepted a position at Stanford; he retired as professor emeritus there in 2004. Perl and Frederick Reines were jointly awarded the 1995 Nobel Prize in Physics for pioneering experimental contributions to lepton physics. Perl discovered (1975) the tau lepton, a subatomic particle that is similar to the electron but 3,500 times heavier and far less stable. Until Perl's discovery, there were only two known families of elementary building blocks in the standard model of particle physics; the tau lepton turned out to be the first-discovered member of a third family.
Larry Wall began work on Perl in 1987, while working as a programmer at Unisys, and released version 1.0 to the comp.sources.misc newsgroup on December 18, 1987. The language expanded rapidly over the next few years. Perl 2, released in 1988, featured a better regular expression engine. Perl 3, released in 1989, added support for binary data streams.
Originally the only documentation for Perl was a single (increasingly lengthy) man page. In 1991, Programming perl (known to many Perl programmers as the "Camel Book") was published, and became the de facto reference for the language. At the same time, the Perl version number was bumped to 4, not to mark a major change in the language, but to identify the version that was documented by the book.
Perl 4 went through a series of maintenance releases, culminating in Perl 4.036 in 1993. At that point, Wall abandoned Perl 4 to begin work on Perl 5.
Initial design of Perl 5 continued into 1994. The perl5-portersmailing list was established in May 1994 to coordinate work on porting Perl 5 to different platforms. It remains the primary forum for development, maintenance, and porting of Perl 5.
Perl 5 was released on October 17, 1994. It was a nearly complete rewrite of the interpreter, and added many new features to the language, including objects, references, lexical (my) variables, and modules. Importantly, modules provided a mechanism for extending the language without modifying the interpreter. This allowed the core interpreter to stabilize, even as it enabled ordinary Perl programmers to add new language features.
As of 2008, Perl 5 is still being actively maintained. Important features and some essential new language constructs have been added along the way, including Unicode support, threads, improved support for object oriented programming and many other enhancements.
On December 18, 2007, the 20th anniversary of Perl 1.0, Perl 5.10.0 was released. Perl 5.10.0 includes notable new features, which bring it closer to Perl 6, among them a new switch statement (called "given/when"), regular expressions updates, the "smart match operator" ~~, and more.
One of the most important events in Perl 5 history took place outside of the language proper, and was a consequence of its module support. On October 26, 1995, the Comprehensive Perl Archive Network (CPAN) was established as a repository for Perl modules and Perl itself. At the time of writing, it carries over 13,500 modules by over 6,500 authors. CPAN is widely regarded as one of the greatest strengths of Perl in practice.
Perl was originally named "Pearl", after the Parable of the Pearl from the Gospel of Matthew. Larry Wall wanted to give the language a short name with positive connotations; he claims that he considered (and rejected) every three- and four-letter word in the dictionary. He also considered naming it after his wife Gloria. Wall discovered the existing PEARL programming language before Perl's official release and changed the spelling of the name.
The name is normally capitalized (Perl) when referring to the language and uncapitalized (perl) when referring to the interpreter program itself, since Unix-like file systems are case-sensitive. Before the release of the first edition of Programming Perl, it was common to refer to the language as perl; Randal L. Schwartz, however, capitalised the language's name in the book to make it stand out better when typeset. This case distinction was subsequently documented as canonical.
There is contention about the all-caps spelling "PERL", which the documentation declares incorrect and some core community members even consider a sign of outsiders. While the name is occasionally taken as an acronym for Practical Extraction and Report Language (which appears at the top of the documentation), this expansion actually came after the name; several others have been suggested as equally canonical, including Wall's own humorous Pathologically Eclectic Rubbish Lister. Indeed, Wall claims that the name was intended to inspire many different expansions.
The camel symbol
Programming Perl, published by O'Reilly Media, features a picture of a camel on the cover, and is commonly referred to as The Camel Book. This image of a camel has become a general symbol of Perl.
O'Reilly owns the image as a trademark, but claims to use their legal rights only to protect the "integrity and impact of that symbol".
O'Reilly allows non-commercial use of the symbol, and provides Programming Republic of Perl logos and Powered by Perl buttons.
According to Larry Wall, Perl has two slogans. The first is "There's more than one way to do it", commonly known as TMTOWTDI and the second is "Easy things should be easy and hard things should be possible".
Perl also takes features from shell programming. All variables are marked with leading sigils, which unambiguously identify the data type (scalar, array, hash, etc.) of the variable in context. Importantly, sigils allow variables to be interpolated directly into strings. Perl has many built-in functions which provide tools often used in shell programming (though many of these tools are implemented by programs external to the shell) like sorting, and calling on system facilities.
In Perl 5, features were added that support complex data structures, first-class functions (i.e., closures as values), and an object-oriented programming model. These include references, packages, class-based method dispatch, and lexically scoped variables, along with compiler directives (for example, the strict pragma). A major additional feature introduced with Perl 5 was the ability to package code as reusable modules. Larry Wall later stated that "The whole intent of Perl 5's module system was to encourage the growth of Perl culture rather than the Perl core.
All versions of Perl do automatic data typing and memory management. The interpreter knows the type and storage requirements of every data object in the program; it allocates and frees storage for them as necessary using reference counting (so it cannot deallocate circular data structures without manual intervention). Legal type conversions—for example, conversions from number to string—are done automatically at run time; illegal type conversions are fatal errors.
The design of Perl can be understood as a response to three broad trends in the computer industry: falling hardware costs, rising labor costs, and improvements in compiler technology. Many earlier computer languages, such as Fortran and C, were designed to make efficient use of expensive computer hardware. In contrast, Perl is designed to make efficient use of expensive computer programmers.
Perl has many features that ease the programmer's task at the expense of greater CPU and memory requirements. These include automatic memory management; dynamic typing; strings, lists, and hashes; regular expressions; introspection and an eval() function.
Wall was trained as a linguist, and the design of Perl is very much informed by linguistic principles. Examples include Huffman coding (common constructions should be short), good end-weighting (the important information should come first), and a large collection of language primitives. Perl favors language constructs that are concise and natural for humans to read and write, even where they complicate the Perl interpreter.
Perl syntax reflects the idea that "things that are different should look different". For example, scalars, arrays, and hashes have different leading sigils. Array indices and hash keys use different kinds of braces. Strings and regular expressions have different standard delimiters. This approach can be contrasted with languages like Lisp, where the same S-expression construct and basic syntax is used for many different purposes.
Perl does not enforce any particular programming paradigm (procedural, object-oriented, functional, etc.) or even require the programmer to choose among them.
There is a broad practical bent to both the Perl language and the community and culture that surround it. The preface to Programming Perl begins, "Perl is a language for getting your job done." One consequence of this is that Perl is not a tidy language. It includes many features, tolerates exceptions to its rules, and employs heuristics to resolve syntactical ambiguities. Because of the forgiving nature of the compiler, bugs can sometimes be hard to find. Discussing the variant behaviour of built-in functions in list and scalar contexts, the perlfunc(1) manual page says "In general, they do what you want, unless you want consistency."
Perl has several mottos that convey aspects of its design and use. One is "There's more than one way to do it." (TIMTOWTDI, usually pronounced 'Tim Toady'). Others are "Perl: the Swiss Army Chainsaw of Programming Languages" and "No unnecessary limits". A stated design goal of Perl is to make easy tasks easy and difficult tasks possible. Perl has also been called "The Duct Tape of the Internet".
There is no written specification or standard for the Perl language, and no plans to create one for the current version of Perl. There has only been one implementation of the interpreter. That interpreter, together with its functional tests, stands as a de facto specification of the language.
Perl has many and varied applications, compounded by the availability of many standard and third-party modules.
Perl is often used as a glue language, tying together systems and interfaces that were not specifically designed to interoperate, and for "data munging", i.e., converting or processing large amounts of data for tasks like creating reports. In fact, these strengths are intimately linked. The combination makes perl a popular all-purpose tool for system administrators, particularly as short programs can be entered and run on a single command line.
With a degree of care, Perl code can be made portable across Windows and Unix. Portable Perl code is often used by suppliers of software (both COTS and bespoke) to simplify packaging and maintenance of software build and deployment scripts.
Graphical user interfaces (GUI's) may be developed using Perl. In particular, Perl/Tk is commonly used to enable user interaction with Perl scripts. Such interaction may be synchronous or asynchronous using callbacks to update the GUI. For more information about the technologies involved see Tk,Tcl and WxPerl.
Perl is also widely used in finance and bioinformatics, where it is valued for rapid application development and deployment, and the ability to handle large data sets.
Perl is implemented as a core interpreter, written in C, together with a large collection of modules, written in Perl and C. The source distribution is, as of 2005, 12 MB when packaged in a tar file and compressed. The interpreter is 150,000 lines of C code and compiles to a 1 MB executable on typical machine architectures. Alternatively, the interpreter can be compiled to a link library and embedded in other programs. There are nearly 500 modules in the distribution, comprising 200,000 lines of Perl and an additional 350,000 lines of C code. (Much of the C code in the modules consists of character encoding tables.)
The interpreter has an object-oriented architecture. All of the elements of the Perl language—scalars, arrays, hashes, coderefs, file handles—are represented in the interpreter by C structs. Operations on these structs are defined by a large collection of macros, typedefs and functions; these constitute the Perl C API. The Perl API can be bewildering to the uninitiated, but its entry points follow a consistent naming scheme, which provides guidance to those who use it.
The execution of a Perl program divides broadly into two phases: compile-time and run-time. At compile time, the interpreter parses the program text into a syntax tree. At run time, it executes the program by walking the tree. The text is parsed only once, and the syntax tree is subject to optimization before it is executed, so the execution phase is relatively efficient. Compile-time optimizations on the syntax tree include constant folding and context propagation, but peephole optimization is also performed. However, compile-time and run-time phases may nest: BEGIN code blocks execute at compile-time, while the eval function initiates compilation during runtime. Both operations are an implicit part of a number of others—most notably, the use clause that loads libraries, known in Perl as modules, implies a BEGIN block.
Perl has a context-sensitive grammar which can be affected by code executed during an intermittent run-time phase. Therefore Perl cannot be parsed by a straight Lex/Yacc lexer/parser combination. Instead, the interpreter implements its own lexer, which coordinates with a modified GNU bison parser to resolve ambiguities in the language. It is said that "only perl can parse Perl", meaning that only the Perl interpreter (perl) can parse the Perl language (Perl). The truth of this is attested to by the persistent imperfections of other programs that undertake to parse Perl, such as source code analyzers and auto-indenters, which have to contend not only with the many ways to express unambiguous syntactic constructs, but also the fact that Perl cannot be parsed in the general case without executing it. Though successful in creating a Perl parser for document-related purposes, the PPI project determined that parsing Perl code as a document (retaining its integrity) and as executable code simultaneously was, in fact, not possible. Specifically the author claimed that, "parsing Perl suffers from the 'Halting Problem.'
Perl is distributed with some 120,000 functional tests. These run as part of the normal build process, and extensively exercise the interpreter and its core modules. Perl developers rely on the functional tests to ensure that changes to the interpreter do not introduce bugs; conversely, Perl users who see the interpreter pass its functional tests on their system can have a high degree of confidence that it is working properly.
Maintenance of the Perl interpreter has become increasingly difficult over the years. The code base has been in continuous development since 1994. The code has been optimized for performance at the expense of simplicity, clarity, and strong internal interfaces. New features have been added, yet virtually complete backward compatibility with earlier versions is maintained. The size and complexity of the interpreter is a barrier to developers who wish to work on it.
Perl is free software, and is licensed under both the Artistic License and the GNU General Public License. Distributions are available for most operating systems. It is particularly prevalent on Unix and Unix-like systems, but it has been ported to most modern (and many obsolete) platforms. With only six reported exceptions, Perl can be compiled from source code on all Unix-like, POSIX-compliant or otherwise Unix-compatible platforms. However, this is rarely necessary, as Perl is included in the default installation of many popular operating systems.
Because of unusual changes required for the Mac OS Classic environment, a special port called MacPerl was shipped independently.
The CPAN carries a complete list of supported platforms with links to the distributions available on each.
Users of Microsoft Windows typically install one of the native binary distributions of Perl for Win32, most commonly ActivePerl. Compiling Perl from source code under Windows is possible, but most installations lack the requisite C compiler and build tools. This also makes it hard to install modules from the CPAN, particularly those that are partially written in C.
Users of the ActivePerl binary distribution are therefore dependent on the repackaged modules provided in ActiveState’s module repository, which are precompiled and can be installed with PPM. Limited resources to maintain this repository have been cause for various long-standing problems.
To address this and other problems of Perl on the Windows platform, win32.perl.org was launched by Adam Kennedy on behalf of The Perl Foundation in June 2006. This is a community website for "all things Windows and Perl." A major aim of this project is to provide production-quality alternative Perl distributions that include an embedded C compiler and build tools, so as to enable Windows users to install modules directly from the CPAN. The production distribution in the family is known as Strawberry Perl, with research and experimental work done in a related Vanilla Perl distribution.
Another popular way of running Perl under Windows is provided by the Cygwin emulation layer. Cygwin provides a Unix-like environment on Windows and both perl and cpan are conveniently available as standard pre-compiled packages in the Cygwin setup program. Since Cygwin also includes the gcc, compiling Perl from source is also possible.
In Perl, the minimal Hello world program may be written as follows:
This prints the stringHello, world! and a newline, symbolically expressed by an n character whose interpretation is altered by the preceding escape character (a backslash).
The canonical form of the program is slightly more verbose:
The hash mark character introduces a comment in Perl, which runs up to the end of the line of code and is ignored by the compiler. The comment used here is of a special kind: it’s called the shebang line. This tells Unix-like operating systems where to find the Perl interpreter, making it possible to invoke the program without explicitly mentioning perl. (Note that on Microsoft Windows systems, Perl programs are typically invoked by associating the .plextension with the Perl interpreter. In order to deal with such circumstances, perl detects the shebang line and parses it for switches, so it is not strictly true that the shebang line is ignored by the compiler.)
The second line in the canonical form includes a semicolon, which is used to separate statements in Perl. With only a single statement in a block or file, a separator is unnecessary, so it can be omitted from the minimal form of the program—or more generally from the final statement in any block or file. The canonical form includes it because it is common to terminate every statement even when it is unnecessary to do so, as this makes editing easier: code can be added to or moved away from the end of a block or file without having to adjust semicolons.
Version 5.10 of Perl introduces a say function that implicitly appends a newline character to its output, making the minimal "Hello world" program even shorter:
A hash, or associative array, is a map from strings to scalars; the strings are called keys and the scalars are called values.
A file handle is a map to a file, device, or pipe which is open for reading, writing, or both.
A subroutine is a piece of code that may be passed arguments, be executed, and return data
Most variables are marked by a leading sigil, which identifies the data type being accessed (not the type of the variable itself), except filehandles, which don't have a sigil. The same name may be used for variables of different data types, without conflict.
File handles and constants need not be uppercase, but it is a common convention because there is no sigil to denote them. Both are global in scope, but file handles are interchangeable with references to file handles, which can be stored in scalars, which in turn permit lexical scoping. Doing so is encouraged in Damian Conway's Perl Best Practices. As a convenience, the open function in Perl 5.6 and newer will autovivify undefined scalars to file handle references.
Numbers are written in the bare form; strings are enclosed by quotes of various kinds.