Programming Perl

Programming Perl

4.4 7
by Larry Wall, Tom Christiansen, Jon Orwant

Perl is a powerful programming language that has grown in popularity since it first appeared in 1988. The first edition of this book, Programming Perl, hit the shelves in 1990, and was quickly adopted as the undisputed bible of the language. Since then, Perl has grown with the times, and so has this book.Programming Perl is not just a book about Perl.

…  See more details below


Perl is a powerful programming language that has grown in popularity since it first appeared in 1988. The first edition of this book, Programming Perl, hit the shelves in 1990, and was quickly adopted as the undisputed bible of the language. Since then, Perl has grown with the times, and so has this book.Programming Perl is not just a book about Perl. It is also a unique introduction to the language and its culture, as one might expect only from its authors. Larry Wall is the inventor of Perl, and provides a unique perspective on the evolution of Perl and its future direction. Tom Christiansen was one of the first champions of the language, and lives and breathes the complexities of Perl internals as few other mortals do. Jon Orwant is the editor ofThe Perl Journal, which has brought together the Perl community as a common forum for new developments in Perl.Any Perl book can show the syntax of Perl's functions, but only this one is a comprehensive guide to all the nooks and crannies of the language. Any Perl book can explain typeglobs, pseudohashes, and closures, but only this one shows how they really work. Any Perl book can say that my is faster than local, but only this one explains why. Any Perl book can have a title, but only this book is affectionately known by all Perl programmers as "The Camel."This third edition of Programming Perl has been expanded to cover version 5.6 of this maturing language. New topics include threading, the compiler, Unicode, and other new features that have been added since the previous edition.

Read More

Editorial Reviews

Danny Yee
Readers already familiar with Perl will presumably either own Programming Perl or have decided not to buy a copy, since it seems to be the only reference for the language. (It is certainly the standard one.) If you are thinking of learning Perl then you have a choice between using this book, using the companion volume Learning Perl, or hacking it out for yourself with the aid of the fairly comprehensive manual entry. Programming Perl worked fine for me, and it's probably the right way to go for anyone who can already program in C or shell. On the other hand, Learning Perl looks like a good textbook, and its existence makes Perl a suitable first language for those people who want to be able to write general purpose programs for their own use, rather than for commercial, scientific, or esoteric theoretical purposes.

The first two chapters of Programming Perl provide a basic introduction to Perl. The third and fourth are basically reference material, going in some detail through the syntax and semantics of the language and describing all of the functions available. The fifth and sixth chapters contain examples of useful Perl code fragments and real programs. The last chapter contains everything else. The appendices -- a BNF style grammar of Perl and a description of the Perl library functions -- improve the book's value as a reference, while the glossary will be helpful for those without a Unix and C background. Everything is liberally illustrated with examples, with the occasional redundancy doing no harm. The authors' sense of humor is always present, and they certainly don't take themselves too seriously -- Perl is the "Pathologically Eclectic Rubbish Lister," and the three cardinal virtues of a programmer are "laziness, impatience and hubris!"
Electronic Review of Computer Books

The inventor of the Perl programming language, along with several other knowledgeable sorts, update their authoritative introduction to Perl, first published in 1991 and updated in 1996. This heavily revised edition aims to be more accessible than its predecessor to those without a computer science background; details the newest developments in Perl; and is organized into smaller, more navigable sections. Includes an overview, detailed discussion of the guts of the language, technology that can make Perl do more, Perl programming as a human activity, and reference materials. Includes a glossary. Annotation c. Book News, Inc., Portland, OR (

Read More

Product Details

O'Reilly Media, Incorporated
Publication date:
Edition description:
Third Edition
Sales rank:
Product dimensions:
9.24(w) x 7.16(h) x 2.28(d)

Read an Excerpt

Chapter 18: Compiling

If you came here looking for a Perl compiler, you may be surprised to discover that you already have one--your perl program (typically /usr/bin/perl) already contains a Perl compiler. That might not be what you were thinking, and if it wasn't, you may be pleased to know that we do also provide code generators (which some well-meaning folks call "compilers"), and we'll discuss those toward the end of this chapter. But first we want to talk about what we think of as The Compiler. Inevitably there's going to be a certain amount of low-level detail in this chapter that some people will be interested in, and some people will not. If you find that you're not, think of it as an opportunity to practice your speed-reading skills.

Imagine that you're a conductor who's ordered the score for a large orchestral work. When the box of music arrives, you find several dozen booklets, one for each member of the orchestra with just their part in it. But curiously, your master copy with all the parts is missing. Even more curiously, the parts you do have are written out using plain English instead of musical notation. Before you can put together a program for performance, or even give the music to your orchestra to play, you'll first have to translate the prose descriptions into the normal system of notes and bars. Then you'll need to compile the individual parts into one giant score so that you can get an idea of the overall program.

Similarly, when you hand the source code of your Perl script over to perl to execute, it is no more useful to the computer than the English description of the symphony was to the musicians. Before yourprogram can run, Perl needs to compile these English-looking directions into a special symbolic representation. Your program still isn't running, though, because the compiler only compiles. Like the conductor's score, even after your program has been converted to an instruction format suitable for interpretation, it still needs an active agent to interpret those instructions.

The Life Cycle of a Perl Program

You can break up the life cycle of a Perl program into four distinct phases, each with separate stages of its own. The first and the last are the most interesting ones, and the middle two are optional. The stages are depicted in Figure 18.1.

  1. The Compilation Phase

    During phase 1, the compile phase, the Perl compiler converts your program into a data structure called a parse tree. Along with the standard parsing techniques, Perl employs a much more powerful one: it uses BEGIN blocks to guide further compilation. BEGIN blocks are handed off to the interpreter to be run as as soon as they are parsed, which effectively runs them in FIFO order (first in, first out). This includes any use and no declarations; these are really just BEGIN blocks in disguise. Any CHECK, INIT, and END blocks are scheduled by the compiler for delayed execution.

    Lexical declarations are noted, but assignments to them are not executed. All eval BLOCKs, s///e constructs, and noninterpolated regular expressions are compiled here, and constant expressions are pre-evaluated. The compiler is now done, unless it gets called back into service later. At the end of this phase, the interpreter is again called up to execute any scheduled CHECK blocks in LIFO order (last in, first out). The presence or absence of a CHECK block determines whether we next go to phase 2 or skip over to phase 4.

  2. The Code Generation Phase (optional)

    CHECK blocks are installed by code generators, so this optional phase occurs when you explicitly use one of the code generators (described later in "Code Generators"). These convert the compiled (but not yet run) program into either C source code or serialized Perl bytecodes--a sequence of values expressing internal Perl instructions. If you choose to generate C source code, it can eventually produce a file called an executable image in native machine language.[2]

    At this point, your program goes into suspended animation. If you made an executable image, you can go directly to phase 4; otherwise, you need to reconstitute the freeze-dried bytecodes in phase 3.

    • The Parse Tree Reconstruction Phase (optional)

      To reanimate the program, its parse tree must be reconstructed. This phase exists only if code generation occurred and you chose to generate bytecode. Perl must first reconstitute its parse trees from that bytecode sequence before the program can run. Perl does not run directly from the bytecodes; that would be slow.

    • The Execution Phase

      Finally, what you've all been waiting for: running your program. Hence, this is also called the run phase. The interpreter takes the parse tree (which it got either directly from the compiler or indirectly from code generation and subsequent parse tree reconstruction) and executes it. (Or, if you generated an executable image file, it can be run as a standalone program since it contains an embedded Perl interpreter.)

    At the start of this phase, before your main program gets to run, all scheduled INIT blocks are executed in FIFO order. Then your main program is run. The interpreter can call back into the compiler as needed upon encountering an eval STRING, a do FILE or require statement, an s///ee construct, or a pattern match with an interpolated variable that is found to contain a legal code assertion.

    When your main program finishes, any delayed END blocks are finally executed, this time in LIFO order. The very first one seen will execute last, and then you're done. (END blocks are skipped only if you exec or your process is blown away by an uncaught catastrophic error. Ordinary exceptions are not considered catastrophic.

Now we'll discuss these phases in greater detail, and in a different order.

Compiling Your Code

Perl is always in one of two modes of operation: either it is compiling your program, or it is executing it--never both at the same time. Throughout this book, we refer to certain events as happening at compile time, or we say that "the Perl compiler does this and that". At other points, we mention that something else occurs at run time, or that "the Perl interpreter does this and that". Although you can get by with thinking of both the compiler and interpreter as simply "Perl", understanding which of these two roles Perl is playing at any given point is essential to understanding why many things happen as they do. The perl executable implements both roles: first the compiler, then the interpreter. (Other roles are possible, too; perl is also an optimizer and a code generator. Occasionally, it's even a trickster--but all in good fun.)

It's also important to understand the distinction between compile phase and compile time, and between run phase and run time. A typical Perl program gets one compile phase, and then one run phase. A "phase" is a large-scale concept. But compile time and run time are small-scale concepts. A given compile phase does mostly compile-time stuff, but it also does some run-time stuff via BEGIN blocks. A given run phase does mostly run-time stuff, but it can do compile-time stuff through operators like eval STRING.

In the typical course of events, the Perl compiler reads through your entire program source before execution starts. This is when Perl parses the declarations, statements, and expressions to make sure they're syntactically legal. [3] If it finds a syntax error, the compiler attempts to recover from the error so it can report any other errors later in the source. Sometimes this works, and sometimes it doesn't; syntax errors have a noisy tendency to trigger a cascade of false alarms. Perl bails out in frustration after about 10 errors.

In addition to the interpreter that processes the BEGIN blocks, the compiler processes your program with the connivance of three notional agents. The lexer scans for each minimal unit of meaning in your program. These are sometimes called "lexemes", but you'll more often hear them referred to as tokens in texts about programming languages. The lexer is sometimes called a tokener or a scanner, and what it does is sometimes called lexing or tokenizing. The parser then tries to make sense out of groups of these tokens by assembling them into larger constructs, such as expressions and statements, based on the grammar of the Perl language. The optimizer rearranges and reduces these larger groupings into more efficient sequences. It picks its optimizations carefully, not wasting time on marginal optimizations, because the Perl compiler has to be blazing fast when used as a load-and-go compiler.

This doesn't happen in independent stages, but all at once with a lot of cross talk between the agents. The lexer occasionally needs hints from the parser to know which of several possible token types it's looking at. (Oddly, lexical scope is one of the things the lexical analyzer doesn't understand, because that's the other meaning of "lexical".) The optimizer also needs to keep track of what the parser is doing, because some optimizations can't happen until the parse has reached a certain point, like finishing an expression, statement, block, or subroutine.

You may think it odd that the Perl compiler does all these things at once instead of one after another, but it's really just the same messy process you go through to understand natural language on the fly, while you're listening to it or reading it. You don't wait till the end of a chapter to figure out what the first sentence meant. You could think of the following correspondences:

Computer LanguageNatural Language

Assuming the parse goes well, the compiler deems your input a valid story, er, program. If you use the -c switch when running your program, it prints out a "syntax OK" message and exits. Otherwise, the compiler passes the fruits of its efforts on to other agents. These "fruits" come in the form of a parse tree. Each fruit on the tree--or node, as it's called--represents one of Perl's internal opcodes, and the branches on the tree represent that tree's historical growth pattern. Eventually, the nodes will be strung together linearly, one after another, to indicate the execution order in which the run-time system will visit those nodes.

Each opcode is the smallest unit of executable instruction that Perl can think about. You might see an expression like $a = -($b + $c) as one statement, but Perl thinks of it as six separate opcodes. Laid out in a simplified format, the parse tree for that expression would look like Figure 18.2. The numbers represent the visitation order that the Perl run-time system will eventually follow.

Perl isn't a one-pass compiler as some might imagine. (One-pass compilers are great at making things easy for the computer and hard for the programmer.) It's really a multipass, optimizing compiler consisting of at least three different logical passes that are interleaved in practice. Passes 1 and 2 run alternately as the compiler repeatedly scurries up and down the parse tree during its construction; pass 3 happens whenever a subroutine or file is completely parsed. Here are those passes:

Pass 1: Bottom-Up Parsing
During this pass, the parse tree is built up by the yacc(1) parser using the tokens it's fed from the underlying lexer (which could be considered another logical pass in its own right). Bottom-up just means that the parser knows about the leaves of the tree before it knows about its branches and root. It really does figure things out from bottom to top in Figure 18.2, since we drew the root at the top, in the idiosyncratic fashion of computer scientists. (And linguists.)

As each opcode node is constructed, per-opcode sanity checks verify correct semantics, such as the correct number and types of arguments used to call built-in functions. As each subsection of the tree takes shape, the optimizer considers what transformations it can apply to the entire subtree now beneath it. For instance, once it knows that a list of values is being fed to a function that takes a specific number of arguments, it can throw away the opcode that records the number of arguments for functions that take a varying number of arguments. A more important optimization, known as constant folding, is described later in this section.

This pass also constructs the node visitation order used later for execution, which is a really neat trick because the first place to visit is almost never the top node. The compiler makes a temporary loop of opcodes, with the top node pointing to the first opcode to visit. When the top-level opcode is incorporated into something bigger, that loop of opcodes is broken, only to make a bigger loop with the new top node. Eventually the loop is broken for good when the start opcode gets poked into some other structure such as a subroutine descriptor. The subroutine caller can still find that first opcode despite its being way down at the bottom of the tree, as it is in Figure 18.2. There's no need for the interpreter to recurse back down the parse tree to figure out where to start.

Pass 2: Top-Down Optimizer

A person reading a snippet of Perl code (or of English code, for that matter) cannot determine the context without examining the surrounding lexical elements. Sometimes you can't decide what's really going on until you have more information. Don't feel bad, though, because you're not alone: neither can the compiler. In this pass, the compiler descends back down the subtree it's just built to apply local optimizations, the most notable of which is context propagation. The compiler marks subjacent nodes with the appropriate contexts (void, scalar, list, reference, or lvalue) imposed by the current node. Unwanted opcodes are nulled out but not deleted, because it's now too late to reconstruct the execution order. We'll rely on the third pass to remove them from the provisional execution order determined by the first pass.

Pass 3: Peephole Optimizer

Certain units of code have their own storage space in which they keep lexically scoped variables. (Such a space is called a scratchpad in Perl lingo.) These units include eval STRINGs, subroutines, and entire files. More importantly from the standpoint of the optimizer, they each have their own entry point, which means that while we know the execution order from here on, we can't know what happened before, because the construct could have been called from anywhere. So when one of these units is done being parsed, Perl runs a peephole optimizer on that code. Unlike the previous two passes, which walked the branch structure of the parse tree, this pass traverses the code in linear execution order, since this is basically the last opportunity to do so before we cut the opcode list off from the parser. Most optimizations were already performed in the first two passes, but some can't be.

Assorted late-term optimizations happen here, including stitching together the final execution order by skipping over nulled out opcodes, and recognizing when various opcode juxtapositions can be reduced to something simpler. The recognition of chained string concatenations is one important optimization, since you'd really like to avoid copying a string back and forth each time you add a little bit to the end. This pass doesn't just optimize; it also does a great deal of "real" work: trapping barewords, generating warnings on questionable constructs, checking for code unlikely to be reached, resolving pseudohash keys, and looking for subroutines called before their prototypes had been compiled.

Pass 4: Code Generation

This pass is optional; it isn't used in the normal scheme of things. But if any of the three code generators -- B::Bytecode, B::C, and B::CC -- are invoked, the parse tree is accessed one final time. The code generators emit either serialized Perl bytecodes used to reconstruct the parse tree later or literal C code representing the state of the compile-time parse tree.

Generation of C code comes in two different flavors. B::C simply reconstructs the parse tree and runs it using the usual runops() loop that Perl itself uses during execution. B::CC produces a linearized and optimized C equivalent of the run-time code path (which resembles a giant jump table) and executes that instead.

During compilation, Perl optimizes your code in many, many ways. It rearranges code to make it more efficient at execution time. It deletes code that can never be reached during execution, like an if (0) block, or the elsifs and the else in an if (1) block. If you use lexically typed variables declared with my ClassName $var or our ClassName $var, and the ClassName package was set up with the use fields pragma, accesses to constant fields from the underlying pseudohash are typo-checked at compile time and converted into array accesses instead. If you supply the sort operator with a simple enough comparison routine, such as {$a <=> $b} or {$b cmp $a}, this is replaced by a call to compiled C code.

Perl's most dramatic optimization is probably the way it resolves constant expressions as soon as possible. For example, consider the parse tree shown in Figure 18.2. If nodes 1 and 2 had both been literals or constant functions, nodes 1 through 4 would have been replaced by the result of that computation, something like Figure 18.3....

[2] Your original script is an executable file too, but it's not machine language, so we don't call it an image. An image file is called that because it's a verbatim copy of the machine codes your CPU knows how to execute directly.

[3] No, there's no formal syntax diagram like a BNF, but you're welcome to peruse the perly.y file in the Perl source tree, which contains the yacc(1) grammar Perl uses. We recommend that you stay out of the lexer, which has been known to induce eating disorders in lab rats.

Read More

Meet the Author

Larry Wall originally created Perl while a programmer at Unisys. He now works full time guiding the future development of the language as a researcher and developer at O'Reilly & Associates. Larry is known for his idiosyncratic and thought-provoking approach to programming, as well as for his groundbreaking contributions to the culture of free software programming. He is the principal author of the bestselling Programming Perl, known colloquially as "the Camel book."

Tom Christiansen is a freelance consultant specializing in Perl training and writing. After working for several years for TSR Hobbies (of Dungeons and Dragons fame), he set off for college where he spent a year in Spain and five in America, dabbling in music, linguistics, programming, and some half-dozen different spoken languages. Tom finally escaped UW-Madison with B.A.s in Spanish and computer science and an M.S. in computer science. He then spent five years at Convex as a jack-of-all-trades working on everything from system administration to utility and kernel development, with customer support and training thrown in for good measure. Tom also served two terms on the USENIX Association Board of directors. With over fifteen years' experience in UNIX system administration and programming, Tom presents seminars internationally. Living in the foothills above Boulder, Colorado, surrounded by mule deer, skunks, and the occasional mountain lion and black bear, Tom takes summers off for hiking, hacking, birding, music making, and gaming.

Jon Orwant, a well-known member of the Perl community, founded The Perl Journal and co-authored OReillys bestseller, Programming Perl, 3rd Edition.

Read More

Customer Reviews

Average Review:

Write a Review

and post it to your social network


Most Helpful Customer Reviews

See all customer reviews >

Programming Perl 4.5 out of 5 based on 0 ratings. 6 reviews.
gcraigm More than 1 year ago
This is essential reading for Perl programmers. There are tons of Perl references floating around and 'perldoc' is always under my fingertips. But when it comes right down to it, this book consistently has the quickest answer to most simple questions. If the answer isn't in here, there's usually something here that points you in the right direction. So if "Learning Perl" struck you as a bit pedestrian, but some of the recipes in "Perl Cookbook" went by too fast, then you want this book. I would loan you mine but I don't let it leave my desk. ;)
Anonymous More than 1 year ago
Anonymous More than 1 year ago
Anonymous More than 1 year ago
JKW More than 1 year ago
A project at work needed to translated from my previously written Python code into a Perl script. This book was instrumental to the success of the project! The overall tone is friendly and informed. A recommended read to anyone who has Perl programming needs.
Guest More than 1 year ago
I found this book to be well written, easy to understand, and extremely informative. It makes a great reference, but also takes the time to teach you the perl methodology of programming. You get all the in's and out's of the compiler since the creator of the language wrote the book. If you are going to buy just one Perl book, this is it. If you are new to programming, or just need an easy entry into Perl, I'd recommend O'Reilly's Learning Perl book, otherwise this is the ultimate, definitive Perl guide.