Perl Cookbook

( 1 )


Find a Perl programmer, and you'll find a copy of Perl Cookbook nearby. Perl Cookbook is a comprehensive collection of problems, solutions, and practical examples for anyone programming in Perl. The book contains hundreds of rigorously reviewed Perl "recipes" and thousands of examples ranging from brief one-liners to complete applications.The second edition of Perl Cookbook has been fully updated for Perl 5.8, with extensive changes for Unicode support, I/O layers, mod_perl, and new technologies that have emerged...

See more details below
Paperback (Second Edition)
$35.82 price
(Save 28%)$49.95 List Price

Pick Up In Store

Reserve and pick up in 60 minutes at your local store

Other sellers (Paperback)
  • All (26) from $1.99   
  • New (7) from $28.27   
  • Used (19) from $1.99   
Perl Cookbook

Available on NOOK devices and apps  
  • NOOK Devices
  • NOOK HD/HD+ Tablet
  • NOOK
  • NOOK Color
  • NOOK Tablet
  • Tablet/Phone
  • NOOK for Windows 8 Tablet
  • NOOK for iOS
  • NOOK for Android
  • NOOK Kids for iPad
  • PC/Mac
  • NOOK for Windows 8
  • NOOK for PC
  • NOOK for Mac
  • NOOK Study
  • NOOK for Web

Want a NOOK? Explore Now

NOOK Book (eBook)
$22.99 price
(Save 42%)$39.99 List Price


Find a Perl programmer, and you'll find a copy of Perl Cookbook nearby. Perl Cookbook is a comprehensive collection of problems, solutions, and practical examples for anyone programming in Perl. The book contains hundreds of rigorously reviewed Perl "recipes" and thousands of examples ranging from brief one-liners to complete applications.The second edition of Perl Cookbook has been fully updated for Perl 5.8, with extensive changes for Unicode support, I/O layers, mod_perl, and new technologies that have emerged since the previous edition of the book. Recipes have been updated to include the latest modules. New recipes have been added to every chapter of the book, and some chapters have almost doubled in size.Covered topic areas include:

  • Manipulating strings, numbers, dates, arrays, and hashes
  • Pattern matching and text substitutions
  • References, data structures, objects, and classes
  • Signals and exceptions
  • Screen addressing, menus, and graphical applications
  • Managing other processes
  • Writing secure scripts
  • Client-server programming
  • Internet applications programming with mail, news, ftp, and telnet
  • CGI and mod_perl programming
  • Web programming
Since its first release in 1998, Perl Cookbook has earned its place in the libraries of serious Perl users of all levels of expertise by providing practical answers, code examples, and mini-tutorials addressing the challenges that programmers face. Now the second edition of this bestselling book is ready to earn its place among the ranks of favorite Perl books as well.Whether you're a novice or veteran Perl programmer, you'll find Perl Cookbook, 2nd Edition to be one of the most useful books on Perl available. Its comfortable discussion style and accurate attention to detail cover just about any topic you'd want to know about. You can get by without having this book in your library, but once you've tried a few of the recipes, you won't want to.

This recommended compendium provides problem resolution techniques and coding options for 19 different topics. From common and easy to obscure and difficult, this cookbook of Perl recipes contains practical wisdom for UNIX and Windows 95 environments. You should note, this is NOT a Perl tutorial, it assumes you have a Perl background.

Read More Show Less

Editorial Reviews

From Barnes & Noble
The Barnes & Noble Review
Maybe you know your way around Perl, at least a little...but today you need a specific technique, or example, or solution. Where should you look? For years, Perl programmers have turned to O’Reilly’s Perl Cookbook. Now they have a better option: Perl Cookbook, Second Edition.

What’s new here? Loads. Just look at the spine: This edition’s about 300 pages fatter. Leading Perl experts Tom Christiansen and Nathan Torkington have created more than 80 new recipes. They’ve substantially updated another 100. They’ve added an entirely new chapter on mod_perl, Apache’s embedded Perl interpreter, covering everything from authentication and logging to advanced templating with Mason and the Template Toolkit. Their new chapter on XML covers everything from parsing and validation to transformation.

The whole book’s been updated for Perl 5.8.1, the robust, stable version that’s become the default standard while folks held their breath for Perl 6. And most of the code's been tested under BSD, Linux, and Solaris. (Except for the system programming examples, most of these recipes ought to work wherever Perl runs, including Windows and Mac OS X.)

While Perl Cookbook, Second Edition isn’t as a Perl tutorial, it’s organized so you can gradually deepen and solidify the skills you already have, even if they’re rudimentary. For example, the authors start with “recipes” for using Perl’s simplest data types and operators -- basic stuff, but invaluable to relative novices. There’s a full chapter on basics such as accessing substrings, parsing comma-separated data, and using Unicode strings. There are examples of representing floating point data, generating pseudo-random numbers, converting between numeric and string date formats, manipulating lists and arrays, and more. There’s also a detailed, start-to-finish demonstration of working with associative arrays, arguably Perl’s most useful data type.

Perl has always been an extraordinarily powerful pattern-matching tool. The authors offer nearly two dozen pattern matching recipes: for matching letters, words, multiple lines, nested and recursive patterns, and strings. You’ll find solutions for manipulating files, followed by four chapters on enhancing program flexibility and power -- including coverage of creating your own user-defined types.

If you can do it with Perl, chances are this book can help you do it better. There’s a full chapter on manipulating DBM files and using SQL and the DBI module to query and update external databases. There’s extensive coverage of process management and communication, and a full chapter on Perl sockets programming. In addition to the aforementioned mod_perl coverage, there are more than 50 recipes for building Internet applications and services: DNS, FTP, mail, LDAP, CGI, automated forms, cookies, HTTP, robots, and more.

In each chapter, Christiansen and Torkington start simple and move toward more complex solutions. Often, they present several approaches to solving the same or similar problems, outlining the trade-offs. As Perlfolk say, “There’s more than one way to do it.” But it’s amazing how often the best way is in here. Bill Camarda

Bill Camarda is a consultant, writer, and web/multimedia content developer. His 15 books include Special Edition Using Word 2000 and Upgrading & Fixing Networks for Dummies, Second Edition.

Two experienced Perl writers present a collection of problems, solutions, and practical examples for Perl programmers. Topics range from beginner questions to techniques for experienced programmers. Covered topics include manipulation of strings, numbers, dates, and hashes, reading and writing text and binary files, pattern matching and text substitutions, and signals and exceptions. Others subjects are objects and classes, accessing text and SQL databases, graphical applications, writing secure scripts, and Internet applications programming. Annotation c. Book News, Inc., Portland, OR (
Read More Show Less

Product Details

  • ISBN-13: 9780596003135
  • Publisher: O'Reilly Media, Incorporated
  • Publication date: 8/28/2003
  • Edition description: Second Edition
  • Edition number: 2
  • Pages: 968
  • Sales rank: 590,580
  • Product dimensions: 7.00 (w) x 9.19 (h) x 1.90 (d)

Meet the Author

Tom Christiansen is a freelance consultant specializing in Perl training and writing. After working for several years for TSR Hobbies (of Dungeons and Dragons fame), he set off for college where he spent a year in Spain and five in America, dabbling in music, linguistics, programming, and some half-dozen different spoken languages. Tom finally escaped UW-Madison with B.A.s in Spanish and computer science and an M.S. in computer science. He then spent five years at Convex as a jack-of-all-trades working on everything from system administration to utility and kernel development, with customer support and training thrown in for good measure. Tom also served two terms on the USENIX Association Board of directors. With over fifteen years' experience in UNIX system administration and programming, Tom presents seminars internationally. Living in the foothills above Boulder, Colorado, surrounded by mule deer, skunks, and the occasional mountain lion and black bear, Tom takes summers off for hiking, hacking, birding, music making, and gaming.

Nathan Torkington is a banjo player, father, and husband. His crimes in the computing community include: coauthor of the Perl Cookbook, editor for O'Reilly and Associates, content coordinator for the Open Source Convention and Perl Conference, and project manager for perl6.

Read More Show Less

Read an Excerpt

From Chapter 12: Packages, Libraries, and Modules

Package is a compile-time declaration that sets the default package prefix for unqualified global identifiers, just as chdir sets the default directory prefix for relative pathnames. This effect lasts until the end of the current scope (a brace-enclosed block, file, or eval). The effect is also terminated by any subsequent package statement in the same scope. (See the following code.) All programs are in package main until they use a package statement to change this...

Unlike user-defined identifiers, built-in variables with punctuation names (like $_ and $.) and the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC, and SIG are all forced to be in package main when unqualified. That way things like STDIN, @ARGV, %ENV, and $_ are always the same no matter what package you're in; for example, @ARGV always means @main:: ARGV, even if you've used package to change the default package. A fully qualified @ElseWhere::ARGV would not (and carries no special built-in meaning). Make sure to localize $_ if you use it in your module.


The unit of software reuse in Perl is the module, a file that has a collection of related functions designed to be used by other programs and library modules. Every module has a public interface, a set of variables and functions that outsiders are encouraged to use. From inside the module, the interface is defined by initializing certain package variables that the standard Exporter module looks at. From outside the module, the interface is accessed by importing symbols as a side effect of the use statement. The public interface of a Perlmodule is whatever is documented to be public. in the case of undocumented interfaces, it's whatever is vaguely intended to be public. When we talk about modules in this chapter, and traditional modules in general, we mean those that use the Exporter.

The require or use statements both pull a module into your program, although their semantics are slightly different. require loads modules at runtime, with a check to avoid the redundant loading of a given module. use is like require, with two added properties: compile-time loading and automatic importing.

Modules included with use are processed at compile time, but require processing happens at run time. This is important because if a module that a program needs is missing, the program won't even start because the use fails during compilation of your script. Another advantage of compile-time use over run-time require is that function prototypes in the module's subroutines become visible to the compiler. This matters because only the compiler cares about prototypes, not the interpreter. (Then again, we don't usually recommend prototypes except for replacing built-in commands, which do have them.)

Use is suitable for giving hints to the compiler because of its compile-time behavior. A pragma is a special module that acts as directive to the compiler to alter how Pert compiles your code. A pragma's name is always all lowercase, so when writing a regular module instead of a program, choose a name that starts with a capital letter. Pragmas supported by Pert 5.004 include autouse, constant, diagnostics, integer, lib, locale, overload, sigtrap, strict, subs, and vars. Each has its own manpage.

The other difference between require and use is that use performs an implicit import on the included module's package. Importing a function or variable from one package to another is a form of aliasing; that is, it makes two different names for the same underlying thing. It's like linking in files from another directory to your current one by the command In In/somedir/somefile. Once it's linked in, you no longer have to use the full pathname to access the file. Likewise, an imported symbol no longer needs to be fully qualified by package name (or predeclared with use vars or use subs). You can use imported variables as though they were part of your package. If you imported $English: :OUTPUT_AUTOFLUSH in the current package, you could refer to it as $OUTPUT_AUTOFLUSH.

The required file extension for a Pert module is ".pm". The module named FileHandle would be stored in the file The full path to the file depends on your include path, which is stored in the global @INC variable. Recipe 12.7 shows how to manipulate this array to your own purposes.

If the module name itself contains one or more double colons, these are translated into your system's directory separator. That means that the File::Find module resides in the file File/ under most filesystems. For example...

Import/Export Regulations

The following is a typical setup for a hypothetical module named Cards::Poker that demonstrates how to manage its exports. The code goes in the file named within the directory Cards: that is, Cards/Pokerpm. (See Recipe 12.7 for where the Cards directory should reside.) Here's that file, with line numbers included for reference...

Line 1 declares the package that the module will put its global variables and functions in. Typically, a module first switches to a particular package so that it has its own place for global variables and functions, one that won't conflict with that of another program. This must be written exactly as the corresponding use statement will be written when the module is loaded.

Don't say package Poker just because the basename of your file is Rather, say package Cards::Poker because your users will say use Cards: :Poker. This common problem is hard to debug. If you don't make the package and use statements exactly the same, you won't see a problem until you try to call imported functions or access imported variables, which will be mysteriously missing.

Line 2 loads in the Exporter module, which manages your module's public interface as described below. Line 3 initializes the special, per-package array @ISA to contain the word 11 Exporter". When a user says use Cards:: Poker, Pert implicitly calls a special method, Cards::Poker->import(). You don't have an import method in your package, but that's OK, because the Exporter package does, and you're inheriting from it because of the assignment to @ISA (is a). Pert looks at the package's @ISA for resolution of undefined methods. inheritance is a topic of Chapter 13, Classes, Objects, and Ties. You may ignore it for now-so long as you put code as shown in lines 2 and 3 into each module you write.

Line 4 assigns the list ('&shuffle', ' @card_deck') to the special, per-package array @EXPORT. When someone imports this module, variables and functions listed in that array are aliased into the caller's own package. That way they don't have to call the function Poker::Deck::shuffle (23) after the import. They can just write shuffle (23) instead. This won't happen if they load Cards::Poker with require Cards::Poker; only a use imports.

Lines 5 and 6 set up the package global variables and functions to be exported. (We presume you'll actually flesh out their initializations and definitions more than in these examples.) You're free to add other variables and functions to your module as well, including ones you don't put in the public interface via @EXPORT. See Recipe 12.1 for more about using the Exporter.

Finally, line 7 is a simple 1, indicating the overall return value of the module. If the last evaluated expression in the module doesn't produce a true value, an exception will be raised. Trapping this is the topic of Recipe 12.2. Any old true value will do, like 6.02e23 or "Because tchrist and gnat told us to put this here"; however, 1 is the canonical true value used by almost every module.

Packages group and organize global identifiers. They have nothing to do with privacy. Code compiled in package Church can freely examine and alter variables in package State. Package variables are always global and are used for sharing. But that's okay, because a module is more than just a package; it's also a file, and files count as their own scope. So if you want privacy, use lexical variables instead of globals. This is the topic of Recipe 12.4...

Read More Show Less

Table of Contents

What's in This Book;
What's New in This Edition;
Platform Notes;
Other Books;
Conventions Used in This Book;
We'd Like to Hear from You;
Acknowledgments for the First Edition;
Acknowledgments for the Second Edition;
Chapter 1: Strings;
1.1 Introduction;
1.1 Accessing Substrings;
1.2 Establishing a Default Value;
1.3 Exchanging Values Without Using Temporary Variables;
1.4 Converting Between Characters and Values;
1.5 Using Named Unicode Characters;
1.6 Processing a String One Character at a Time;
1.7 Reversing a String by Word or Character;
1.8 Treating Unicode Combined Characters as Single Characters;
1.9 Canonicalizing Strings with Unicode Combined Characters;
1.10 Treating a Unicode String as Octets;
1.11 Expanding and Compressing Tabs;
1.12 Expanding Variables in User Input;
1.13 Controlling Case;
1.14 Properly Capitalizing a Title or Headline;
1.15 Interpolating Functions and Expressions Within Strings;
1.16 Indenting Here Documents;
1.17 Reformatting Paragraphs;
1.18 Escaping Characters;
1.19 Trimming Blanks from the Ends of a String;
1.20 Parsing Comma-Separated Data;
1.21 Constant Variables;
1.22 Soundex Matching;
1.23 Program: fixstyle;
1.24 Program: psgrep;
Chapter 2: Numbers;
2.1 Introduction;
2.1 Checking Whether a String Is a Valid Number;
2.2 Rounding Floating-Point Numbers;
2.3 Comparing Floating-Point Numbers;
2.4 Operating on a Series of Integers;
2.5 Working with Roman Numerals;
2.6 Generating Random Numbers;
2.7 Generating Repeatable Random Number Sequences;
2.8 Making Numbers Even More Random;
2.9 Generating Biased Random Numbers;
2.10 Doing Trigonometry in Degrees, Not Radians;
2.11 Calculating More Trigonometric Functions;
2.12 Taking Logarithms;
2.13 Multiplying Matrices;
2.14 Using Complex Numbers;
2.15 Converting Binary, Octal, and Hexadecimal Numbers;
2.16 Putting Commas in Numbers;
2.17 Printing Correct Plurals;
2.18 Program: Calculating Prime Factors;
Chapter 3: Dates and Times;
3.1 Introduction;
3.1 Finding Today's Date;
3.2 Converting DMYHMS to Epoch Seconds;
3.3 Converting Epoch Seconds to DMYHMS;
3.4 Adding to or Subtracting from a Date;
3.5 Difference of Two Dates;
3.6 Day in a Week/Month/Year or Week Number;
3.7 Parsing Dates and Times from Strings;
3.8 Printing a Date;
3.9 High-Resolution Timers;
3.10 Short Sleeps;
3.11 Program: hopdelta;
Chapter 4: Arrays;
4.1 Introduction;
4.1 Specifying a List in Your Program;
4.2 Printing a List with Commas;
4.3 Changing Array Size;
4.4 Implementing a Sparse Array;
4.5 Iterating Over an Array;
4.6 Iterating Over an Array by Reference;
4.7 Extracting Unique Elements from a List;
4.8 Finding Elements in One Array but Not Another;
4.9 Computing Union, Intersection, or Difference of Unique Lists;
4.10 Appending One Array to Another;
4.11 Reversing an Array;
4.12 Processing Multiple Elements of an Array;
4.13 Finding the First List Element That Passes a Test;
4.14 Finding All Elements in an Array Matching Certain Criteria;
4.15 Sorting an Array Numerically;
4.16 Sorting a List by Computable Field;
4.17 Implementing a Circular List;
4.18 Randomizing an Array;
4.19 Program: words;
4.20 Program: permute;
Chapter 5: Hashes;
5.1 Introduction;
5.1 Adding an Element to a Hash;
5.2 Testing for the Presence of a Key in a Hash;
5.3 Creating a Hash with Immutable Keys or Values;
5.4 Deleting from a Hash;
5.5 Traversing a Hash;
5.6 Printing a Hash;
5.7 Retrieving from a Hash in Insertion Order;
5.8 Hashes with Multiple Values per Key;
5.9 Inverting a Hash;
5.10 Sorting a Hash;
5.11 Merging Hashes;
5.12 Finding Common or Different Keys in Two Hashes;
5.13 Hashing References;
5.14 Presizing a Hash;
5.15 Finding the Most Common Anything;
5.16 Representing Relationships Between Data;
5.17 Program: dutree;
Chapter 6: Pattern Matching;
6.1 Introduction;
6.1 Copying and Substituting Simultaneously;
6.2 Matching Letters;
6.3 Matching Words;
6.4 Commenting Regular Expressions;
6.5 Finding the Nth Occurrence of a Match;
6.6 Matching Within Multiple Lines;
6.7 Reading Records with a Separator;
6.8 Extracting a Range of Lines;
6.9 Matching Shell Globs as Regular Expressions;
6.10 Speeding Up Interpolated Matches;
6.11 Testing for a Valid Pattern;
6.12 Honoring Locale Settings in Regular Expressions;
6.13 Approximate Matching;
6.14 Matching from Where the Last Pattern Left Off;
6.15 Greedy and Non-Greedy Matches;
6.16 Detecting Doubled Words;
6.17 Matching Nested Patterns;
6.18 Expressing AND, OR, and NOT in a Single Pattern;
6.19 Matching a Valid Mail Address;
6.20 Matching Abbreviations;
6.21 Program: urlify;
6.22 Program: tcgrep;
6.23 Regular Expression Grab Bag;
Chapter 7: File Access;
7.1 Introduction;
7.1 Opening a File;
7.2 Opening Files with Unusual Filenames;
7.3 Expanding Tildes in Filenames;
7.4 Making Perl Report Filenames in Error Messages;
7.5 Storing Filehandles into Variables;
7.6 Writing a Subroutine That Takes Filehandles as Built-ins Do;
7.7 Caching Open Output Filehandles;
7.8 Printing to Many Filehandles Simultaneously;
7.9 Opening and Closing File Descriptors by Number;
7.10 Copying Filehandles;
7.11 Creating Temporary Files;
7.12 Storing a File Inside Your Program Text;
7.13 Storing Multiple Files in the DATA Area;
7.14 Writing a Unix-Style Filter Program;
7.15 Modifying a File in Place with a Temporary File;
7.16 Modifying a File in Place with the -i Switch;
7.17 Modifying a File in Place Without a Temporary File;
7.18 Locking a File;
7.19 Flushing Output;
7.20 Doing Non-Blocking I/O;
7.21 Determining the Number of Unread Bytes;
7.22 Reading from Many Filehandles Without Blocking;
7.23 Reading an Entire Line Without Blocking;
7.24 Program: netlock;
7.25 Program: lockarea;
Chapter 8: File Contents;
8.1 Introduction;
8.1 Reading Lines with Continuation Characters;
8.2 Counting Lines (or Paragraphs or Records) in a File;
8.3 Processing Every Word in a File;
8.4 Reading a File Backward by Line or Paragraph;
8.5 Trailing a Growing File;
8.6 Picking a Random Line from a File;
8.7 Randomizing All Lines;
8.8 Reading a Particular Line in a File;
8.9 Processing Variable-Length Text Fields;
8.10 Removing the Last Line of a File;
8.11 Processing Binary Files;
8.12 Using Random-Access I/O;
8.13 Updating a Random-Access File;
8.14 Reading a String from a Binary File;
8.15 Reading Fixed-Length Records;
8.16 Reading Configuration Files;
8.17 Testing a File for Trustworthiness;
8.18 Treating a File as an Array;
8.19 Setting the Default I/O Layers;
8.20 Reading or Writing Unicode from a Filehandle;
8.21 Converting Microsoft Text Files into Unicode;
8.22 Comparing the Contents of Two Files;
8.23 Pretending a String Is a File;
8.24 Program: tailwtmp;
8.25 Program: tctee;
8.26 Program: laston;
8.27 Program: Flat File Indexes;
Chapter 9: Directories;
9.1 Introduction;
9.1 Getting and Setting Timestamps;
9.2 Deleting a File;
9.3 Copying or Moving a File;
9.4 Recognizing Two Names for the Same File;
9.5 Processing All Files in a Directory;
9.6 Globbing, or Getting a List of Filenames Matching a Pattern;
9.7 Processing All Files in a Directory Recursively;
9.8 Removing a Directory and Its Contents;
9.9 Renaming Files;
9.10 Splitting a Filename into Its Component Parts;
9.11 Working with Symbolic File Permissions Instead of Octal Values;
9.12 Program: symirror;
9.13 Program: lst;
Chapter 10: Subroutines;
10.1 Introduction;
10.1 Accessing Subroutine Arguments;
10.2 Making Variables Private to a Function;
10.3 Creating Persistent Private Variables;
10.4 Determining Current Function Name;
10.5 Passing Arrays and Hashes by Reference;
10.6 Detecting Return Context;
10.7 Passing by Named Parameter;
10.8 Skipping Selected Return Values;
10.9 Returning More Than One Array or Hash;
10.10 Returning Failure;
10.11 Prototyping Functions;
10.12 Handling Exceptions;
10.13 Saving Global Values;
10.14 Redefining a Function;
10.15 Trapping Undefined Function Calls with AUTOLOAD;
10.16 Nesting Subroutines;
10.17 Writing a Switch Statement;
10.18 Program: Sorting Your Mail;
Chapter 11: References and Records;
11.1 Introduction;
11.1 Taking References to Arrays;
11.2 Making Hashes of Arrays;
11.3 Taking References to Hashes;
11.4 Taking References to Functions;
11.5 Taking References to Scalars;
11.6 Creating Arrays of Scalar References;
11.7 Using Closures Instead of Objects;
11.8 Creating References to Methods;
11.9 Constructing Records;
11.10 Reading and Writing Hash Records to Text Files;
11.11 Printing Data Structures;
11.12 Copying Data Structures;
11.13 Storing Data Structures to Disk;
11.14 Transparently Persistent Data Structures;
11.15 Coping with Circular Data Structures Using Weak References;
11.16 Program: Outlines;
11.17 Program: Binary Trees;
Chapter 12: Packages, Libraries, and Modules;
12.1 Introduction;
12.1 Defining a Module's Interface;
12.2 Trapping Errors in require or use;
12.3 Delaying use Until Runtime;
12.4 Making Variables Private to a Module;
12.5 Making Functions Private to a Module;
12.6 Determining the Caller's Package;
12.7 Automating Module Cleanup;
12.8 Keeping Your Own Module Directory;
12.9 Preparing a Module for Distribution;
12.10 Speeding Module Loading with SelfLoader;
12.11 Speeding Up Module Loading with Autoloader;
12.12 Overriding Built-in Functions;
12.13 Overriding a Built-in Function in All Packages;
12.14 Reporting Errors and Warnings Like Built-ins;
12.15 Customizing Warnings;
12.16 Referring to Packages Indirectly;
12.17 Using h2ph to Translate C #include Files;
12.18 Using h2xs to Make a Module with C Code;
12.19 Writing Extensions in C with Inline::C;
12.20 Documenting Your Module with Pod;
12.21 Building and Installing a CPAN Module;
12.22 Example: Module Template;
12.23 Program: Finding Versions and Descriptions of Installed Modules;
Chapter 13: Classes, Objects, and Ties;
13.1 Introduction;
13.1 Constructing an Object;
13.2 Destroying an Object;
13.3 Managing Instance Data;
13.4 Managing Class Data;
13.5 Using Classes as Structs;
13.6 Cloning Constructors;
13.7 Copy Constructors;
13.8 Invoking Methods Indirectly;
13.9 Determining Subclass Membership;
13.10 Writing an Inheritable Class;
13.11 Accessing Overridden Methods;
13.12 Generating Attribute Methods Using AUTOLOAD;
13.13 Coping with Circular Data Structures Using Objects;
13.14 Overloading Operators;
13.15 Creating Magic Variables with tie;
Chapter 14: Database Access;
14.1 Introduction;
14.1 Making and Using a DBM File;
14.2 Emptying a DBM File;
14.3 Converting Between DBM Files;
14.4 Merging DBM Files;
14.5 Sorting Large DBM Files;
14.6 Storing Complex Data in a DBM File;
14.7 Persistent Data;
14.8 Saving Query Results to Excel or CSV;
14.9 Executing an SQL Command Using DBI;
14.10 Escaping Quotes;
14.11 Dealing with Database Errors;
14.12 Repeating Queries Efficiently;
14.13 Building Queries Programmatically;
14.14 Finding the Number of Rows Returned by a Query;
14.15 Using Transactions;
14.16 Viewing Data One Page at a Time;
14.17 Querying a CSV File with SQL;
14.18 Using SQL Without a Database Server;
14.19 Program: ggh—Grep Netscape Global History;
Chapter 15: Interactivity;
15.1 Introduction;
15.1 Parsing Program Arguments;
15.2 Testing Whether a Program Is Running Interactively;
15.3 Clearing the Screen;
15.4 Determining Terminal or Window Size;
15.5 Changing Text Color;
15.6 Reading Single Characters from the Keyboard;
15.7 Ringing the Terminal Bell;
15.8 Using POSIX termios;
15.9 Checking for Waiting Input;
15.10 Reading Passwords;
15.11 Editing Input;
15.12 Managing the Screen;
15.13 Controlling Another Program with Expect;
15.14 Creating Menus with Tk;
15.15 Creating Dialog Boxes with Tk;
15.16 Responding to Tk Resize Events;
15.17 Removing the DOS Shell Window with Windows Perl/Tk;
15.18 Graphing Data;
15.19 Thumbnailing Images;
15.20 Adding Text to an Image;
15.21 Program: Small termcap Program;
15.22 Program: tkshufflepod;
15.23 Program: graphbox;
Chapter 16: Process Management and Communication;
16.1 Introduction;
16.1 Gathering Output from a Program;
16.2 Running Another Program;
16.3 Replacing the Current Program with a Different One;
16.4 Reading or Writing to Another Program;
16.5 Filtering Your Own Output;
16.6 Preprocessing Input;
16.7 Reading STDERR from a Program;
16.8 Controlling Input and Output of Another Program;
16.9 Controlling the Input, Output, and Error of Another Program;
16.10 Communicating Between Related Processes;
16.11 Making a Process Look Like a File with Named Pipes;
16.12 Sharing Variables in Different Processes;
16.13 Listing Available Signals;
16.14 Sending a Signal;
16.15 Installing a Signal Handler;
16.16 Temporarily Overriding a Signal Handler;
16.17 Writing a Signal Handler;
16.18 Catching Ctrl-C;
16.19 Avoiding Zombie Processes;
16.20 Blocking Signals;
16.21 Timing Out an Operation;
16.22 Turning Signals into Fatal Errors;
16.23 Program: sigrand;
Chapter 17: Sockets;
17.1 Introduction;
17.1 Writing a TCP Client;
17.2 Writing a TCP Server;
17.3 Communicating over TCP;
17.4 Setting Up a UDP Client;
17.5 Setting Up a UDP Server;
17.6 Using Unix Domain Sockets;
17.7 Identifying the Other End of a Socket;
17.8 Finding Your Own Name and Address;
17.9 Closing a Socket After Forking;
17.10 Writing Bidirectional Clients;
17.11 Forking Servers;
17.12 Pre-Forking Servers;
17.13 Non-Forking Servers;
17.14 Multitasking Server with Threads;
17.15 Writing a Multitasking Server with POE;
17.16 Writing a Multihomed Server;
17.17 Making a Daemon Server;
17.18 Restarting a Server on Demand;
17.19 Managing Multiple Streams of Input;
17.20 Program: backsniff;
17.21 Program: fwdport;
Chapter 18: Internet Services;
18.1 Introduction;
18.1 Simple DNS Lookups;
18.2 Being an FTP Client;
18.3 Sending Mail;
18.4 Reading and Posting Usenet News Messages;
18.5 Reading Mail with POP3;
18.6 Simulating Telnet from a Program;
18.7 Pinging a Machine;
18.8 Accessing an LDAP Server;
18.9 Sending Attachments in Mail;
18.10 Extracting Attachments from Mail;
18.11 Writing an XML-RPC Server;
18.12 Writing an XML-RPC Client;
18.13 Writing a SOAP Server;
18.14 Writing a SOAP Client;
18.15 Program: rfrm;
18.16 Program: expn and vrfy;
Chapter 19: CGI Programming;
19.1 Introduction;
19.1 Writing a CGI Script;
19.2 Redirecting Error Messages;
19.3 Fixing a 500 Server Error;
19.4 Writing a Safe CGI Program;
19.5 Executing Commands Without Shell Escapes;
19.6 Formatting Lists and Tables with HTML Shortcuts;
19.7 Redirecting to a Different Location;
19.8 Debugging the Raw HTTP Exchange;
19.9 Managing Cookies;
19.10 Creating Sticky Widgets;
19.11 Writing a Multiscreen CGI Script;
19.12 Saving a Form to a File or Mail Pipe;
19.13 Program: chemiserie;
Chapter 20: Web Automation;
20.1 Introduction;
20.1 Fetching a URL from a Perl Script;
20.2 Automating Form Submission;
20.3 Extracting URLs;
20.4 Converting ASCII to HTML;
20.5 Converting HTML to ASCII;
20.6 Extracting or Removing HTML Tags;
20.7 Finding Stale Links;
20.8 Finding Fresh Links;
20.9 Using Templates to Generate HTML;
20.10 Mirroring Web Pages;
20.11 Creating a Robot;
20.12 Parsing a Web Server Log File;
20.13 Processing Server Logs;
20.14 Using Cookies;
20.15 Fetching Password-Protected Pages;
20.16 Fetching https:// Web Pages;
20.17 Resuming an HTTP GET;
20.18 Parsing HTML;
20.19 Extracting Table Data;
20.20 Program: htmlsub;
20.21 Program: hrefsub;
Chapter 21: mod_perl;
21.1 Introduction;
21.1 Authenticating;
21.2 Setting Cookies;
21.3 Accessing Cookie Values;
21.4 Redirecting the Browser;
21.5 Interrogating Headers;
21.6 Accessing Form Parameters;
21.7 Receiving Uploaded Files;
21.8 Speeding Up Database Access;
21.9 Customizing Apache's Logging;
21.10 Transparently Storing Information in URLs;
21.11 Communicating Between mod_perl and PHP;
21.12 Migrating from CGI to mod_perl;
21.13 Sharing Information Between Handlers;
21.14 Reloading Changed Modules;
21.15 Benchmarking a mod_perl Application;
21.16 Templating with HTML::Mason;
21.17 Templating with Template Toolkit;
Chapter 22: XML;
22.1 Introduction;
22.1 Parsing XML into Data Structures;
22.2 Parsing XML into a DOM Tree;
22.3 Parsing XML into SAX Events;
22.4 Making Simple Changes to Elements or Text;
22.5 Validating XML;
22.6 Finding Elements and Text Within an XML Document;
22.7 Processing XML Stylesheet Transformations;
22.8 Processing Files Larger Than Available Memory;
22.9 Reading and Writing RSS Files;
22.10 Writing XML;

Read More Show Less

First Chapter

Chapter 8: File Contents

The most brilliant decision in all of Unix
was the choice of a single character
for the newline sequence.

Mike O'Dell, only half jokingly


Before the Unix Revolution, every kind of data source and destination was inherently different. Getting two programs merely to understand each other required heavy wizardry and the occasional sacrifice of a virgin stack of punch cards to an itinerant mainframe repairman. This computational Tower of Babel made programmers dream of quitting the field to take up a less painful hobby, like autoflagellation.

These days, such cruel and unusual programming is largely behind us. Modern operating systems work hard to provide the illusion that I/O devices, network connections, process control information, other programs, the system console, and even users terminals are all abstract streams of bytes called files. This lets you easily write programs that don't care where their input came from or where their output goes.

Because programs read and write via byte streams of simple text, every program can communicate with every other program. It is difficult to overstate the power and elegance of this approach. No longer dependent upon troglodyte gnomes with secret tomes of JCL (or COM) incantations, users can now create custom tools from smaller ones by using simple command-line I/O redirection, pipelines, and backticks.

Treating files as unstructured byte streams necessarily governs what you can do with them. You can read and write sequential, fixed-size blocks of data at any location in the file, increasing its size if you write past the current end. Perl uses the standard C I/O library to implement reading and writing of variable-length records like lines, paragraphs, and words.

What can't you do to an unstructured file? Because you can't insert or delete bytes anywhere but at end of file, you can't change the length of, insert, or delete records. An exception is the last record, which you can delete by truncating the file to the end of the previous record. For other modifications, you need to use a temporary file or work with a copy of the file in memory. If you need to do this a lot, a database system may be a better solution than a raw file (See Chapter 14, Database Access).

The most common files are text files, and the most common operations on text files are reading and writing lines. Use <FH> (or the internal function implementing it, readline ) to read lines, and use print to write them. These functions can also be used to read or write any record that has a specific record separator. Lines are simply records that end in "\n".

The <FH> operator returns undef on error or when end of the file is reached, so use it in loops like this:

 while (defined ($line = <DATAFILE>)) {     chomp $line;     $size = length $line;     print "$size\n";                # output size of line

Because this is a common operation and that's a lot to type, Perl gives it a shorthand notation. This shorthand reads lines into $_ instead of $line. Many other string operations use $_ as a default value to operate on, so this is more useful than it may appear at first:

 while (<DATAFILE>) {     chomp;     print length, "\n";             # output size of line }

Call <FH> in scalar context to read the next line. Call it in list context to read all remaining lines:

 @lines = <DATAFILE>;

Each time <FH> reads a record from a filehandle, it increments the special variable $. (the "current input record number"). This variable is only reset when close is called explicitly, which means that it's not reset when you reopen an already opened filehandle.

Another special variable is $/ , the input record separator. It is set to "\n", the default end-of-line marker. You can set it to any string you like, for instance "\0" to read null-terminated records. Read paragraphs by setting $/ to the empty string, "". This is almost like setting $/ to "\n\n", in that blank lines function as record separators, but "" treats two or more consecutive empty lines as a single record separator, whereas "\n\n" returns empty records when more than two consecutive empty lines are read. Undefine $/ to read the rest of the file as one scalar:

 undef $/; $whole_file = <FILE>;               # 'slurp' mode

The -0 option to Perl lets you set $/ from the command line:

 % perl -040 -e '$word = <>; print "First word is $word\n";'

The digits after -0 are the octal value of the single character that $/ is to be set to. If you specify an illegal value (e.g., with -0777 ) Perl will set $/ to undef. If you specify -00, Perl will set $/ to "". The limit of a single octal value means you can't set $/ to a multibyte string, for instance, "%%\n" to read fortune files. Instead, you must use a BEGIN block:

 % perl -ne 'BEGIN { $/="%%\n" } chomp; print if /Unix/i' fortune.dat

Use print to write a line or any other data. The print function writes its arguments one after another and doesn't automatically add a line or record terminator by default.

 print HANDLE "One", "two", "three"; # "Onetwothree" print "Baa baa black sheep.\n";     # Sent to default output handle

There is no comma between the filehandle and the data to print. If you put a comma in there, Perl gives the error message "No comma allowed after filehandle". The default output handle is STDOUT. Change it with the select function. (See the introduction to Chapter 7, File Access.)

All systems use the virtual "\n" to represent a line terminator, called a newline. There is no such thing as a newline character. It is an illusion that the operating system, device drivers, C libraries, and Perl all conspire to preserve. Sometimes, this changes the number of characters in the strings you read and write. The conspiracy is revealed in Processing Binary Files .

Use the read function to read a fixed-length record. It takes three arguments: a filehandle, a scalar variable, and the number of bytes to read. It returns undef if an error occurred or else the number of bytes read. To write a fixed-length record, just use print.

 $rv = read(HANDLE, $buffer, 4096)         or die "Couldn't read from HANDLE : $!\n"; # $rv is the number of bytes read, # $buffer holds the data read

The truncate function changes the length of a file, which can be specified as a filehandle or as a filename. It returns true if the file was successfully truncated, false otherwise:

 truncate(HANDLE, $length)     or die "Couldn't truncate: $!\n"; truncate("/tmp/$$.pid", $length)     or die "Couldn't truncate: $!\n";

Each filehandle keeps track of where it is in the file. Reads and writes occur from this point, unless you've specified the O_APPEND flag (see Chapter 7, Opening a File). Fetch the file position for a filehandle with tell , and set it with seek. Because the stdio library rewrites data to preserve the illusion that "\n" is the line terminator, you cannot portably seek to offsets calculated by counting characters. Instead, only seek to offsets returned by tell.

 $pos = tell(DATAFILE); print "I'm $pos bytes from the start of DATAFILE.\n";

The seek function takes three arguments: the filehandle, the offset (in bytes) to go to, and a numeric argument indicating how to interpret the offset. 0 indicates an offset from the start of the file (the kind of value returned by tell ); 1, an offset from the current location (a negative number means move backwards in the file, a positive number means move forward); and 2, an offset from end of file.

 seek(LOGFILE, 0, 2)         or die "Couldn't seek to the end: $!\n"; seek(DATAFILE, $pos, 0)     or die "Couldn't seek to $pos: $!\n"; seek(OUT, -20, 1)           or die "Couldn't seek back 20 bytes: $!\n";

So far we've been describing buffered I/O. That is, <FH>, print, read, seek, and tell are all operations that use buffers for speed. Perl also provides unbuffered I/O operations: sysread, syswrite, and sysseek, all discussed in Chapter 7, File Access.

The sysread and syswrite functions are different from their <FH> and print counterparts. They both take a filehandle to act on, a scalar variable to either read into or write out from, and the number of bytes to read or write. They can also take an optional fourth argument, the offset in the scalar variable to start reading or writing at:

 $written = syswrite(DATAFILE, $mystring, length($mystring)); die "syswrite failed: $!\n" unless $written == length($mystring); $read = sysread(INFILE, $block, 256, 5); warn "only read $read bytes, not 256" if 256 != $read;

The syswrite call sends the contents of $mystring to DATAFILE. The sysread call reads 256 bytes from INFILE and stores them 5 characters into $block, leaving its first 5 characters intact. Both sysread and syswrite return the number of bytes transferred, which could be different than the amount of data you were attempting to transfer. Maybe the file didn't have all the data you thought it did, so you got a short read. Maybe the filesystem that the file lives on filled up. Maybe your process was interrupted part of the way through the write. Stdio takes care of finishing the transfer in cases of interruption, but if you use the sysread and syswrite calls, you must do it yourself. See Chapter 9, for an example of this.

The sysseek function doubles as an unbuffered replacement for both seek and tell. It takes the same arguments as seek, but it returns either the new position if successful or undef on error. To find the current position within the file:

 $pos = sysseek(HANDLE, 0, 1);       # don't change position die "Couldn't sysseek: $!\n" unless defined $pos;

These are the basic operations available to you. The art and craft of programming lies in using these basic operations to solve complex problems like finding the number of lines in a file, reversing the order of lines in a file, randomly selecting a line from a file, building an index for a file, and so on.

Reading Lines with Continuation Characters


You have a file with long lines split over two or more lines, with backslashes to indicate that a continuation line follows. You want to rejoin those split lines. Makefiles, shell scripts, and many other scripting or configuration languages let you break a long line into several shorter ones in this fashion.


Build up the complete lines one at a time until reaching one without a backslash:

 while (defined($line = <FH>) ) {     chomp $line;     if ($line =~ s/\\$//) {         $line .= <FH>;         redo unless eof(FH);     }     # process full record in $line here }


Here's an example input file:


You'd like to process that file with the escaped newlines ignored. That way the first record would in this case be the first two lines, the second record the next three lines, etc.

Here's how the algorithm works. The while loop reads lines, which may or may not be complete records -- they might end in backslash (and a newline). The substitution operator s/// tries to remove a trailing backslash. If the substitution fails, we've found a line without a backslash at the end. Otherwise, read another record, concatenate it onto the accumulating $line variable, and use redo to jump back to just inside the opening brace of the while loop. This lands us back on the chomp.

A common problem with files in this format is invisible blanks between the backslash and end of line. It would be more forgiving if the substitute were like this:

 if ($line =~ s/\\\s*$//) {      # as before }

Unfortunately, even if your program is forgiving, others doubtlessly aren't. Just remember to be liberal in what you accept and conservative in what you produce.

See Also

The chomp function in perlfunc (1) and in Chapter 3 of Programming Perl ; the redo keyword in the "Loop Control" sections of perlsyn (1) and Chapter 2 of Programming Perl

Counting Lines (or Paragraphs or Records) in a File


You need to compute the number of lines in a file.


Many systems have a wc program to count lines in a file:

 $count = `wc -l < $file`; die "wc failed: $?" if $?; chomp($count);

You could also open the file and read line-by-line until the end, counting lines as you go:

 open(FILE, "< $file") or die "can't open $file: $!"; $count++ while <FILE>; # $count now holds the number of lines read

Here's the fastest solution, assuming your line terminator really is "\n":

 $count += tr/\n/\n/ while sysread(FILE, $_, 2 ** 16);


Although you can use -s $file to determine the file size file in bytes, you generally cannot use it to derive a line count. See the Introduction to Chapter 9, for more on -s.

If you can't or don't want to call another program to do your dirty work, you can emulate wc by opening up and reading the file yourself:

 open(FILE, "< $file") or die "can't open $file: $!"; $count++ while <FILE>; # $count now holds the number of lines read

Another way of writing this is:

 open(FILE, "< $file") or die "can't open $file: $!"; for ($count=0; <FILE>; $count++) { }

If you're not reading from any other files, you don't need the $count variable in this case. The special variable $. holds the number of lines read since a filehandle was last explicitly close d:

 1 while <FILE>; $count = $.;

This reads all the records in the file and discards them.

To count paragraphs, set the global input record separator variable $/ to the empty string ( "" ) before reading to make <> read a paragraph at a time.

 $/ = '';            # enable paragraph mode for all reads open(FILE, $file) or die "can't open $file: $!"; 1 while <FILE>; $para_count = $.;

See Also

Your system's wc (1) manpage; the $/ entry in perlvar (1), and in the "Special Variables" section of Chapter 2 of Programming Perl ; the Introduction to Chapter 9.

Processing Every Word in a File


You need to do something to every word in a file, similar to the foreach function of csh.


Either split each line on whitespace:

 while (<>) {     for $chunk (split) {         # do something with $chunk     } }

Or use the m//g operator to pull out one chunk at a time:

 while (<>) {     while ( /(\w[\w'-]*)/g ) {         # do something with $1     } }


Decide what you mean by "word." Sometimes you want anything but whitespace, sometimes you only want program identifiers, and sometimes you want English words. Your definition governs which regular expression to use.

The preceding two approaches work differently. Patterns are used in the first approach to decide what is not a word. In the second, they're used to decide what is a word.

With these techniques, it's easy to make a word frequency counter. Use a hash to store how many times each word has been seen:

 # Make a word frequency count %seen = (); while (<>) {     while ( /(\w['\w-]*)/g ) {         $seen{lc $1}++;     }   # output hash in a descending numeric sort of its values foreach $word ( sort { $seen{$b} <=> $seen{$a} } keys %seen) {     printf "%5d %s\n", $seen{$word}, $word; }

To make the example program count line frequency instead of word frequency, omit the second while loop and do $seen{lc $_}++ instead:

 # Line frequency count %seen = (); while (<>) {     $seen{lc $_}++; foreach $line ( sort { $seen{$b} <=> $seen{$a} } keys %seen ) {     printf "%5d %s", $seen{$line}, $line; }

Odd things that may need to be considered as words include "M.I.T.", "Micro$oft", "o'clock", "49ers", "street-wise", "and/or", "&", "c/o", "St.", "Tschüß", and "Niño". Bear this in mind when you choosing a pattern to match. The last two require you to place a use locale in your program and then employ \w for a word character in the current locale.

See Also

The split function in perlfunc (1) and in Chapter 3 of Programming Perl; Recipe 6.3; Recipe 6.23

Reading a File Backwards by Line or Paragraph


You want to process each line or paragraph of a text file in reverse.


Read all lines into an array, then process that array from the end to the start:

 @lines = <FILE>; while ($line = pop @lines) {     # do something with $line }

Or store an array of lines in reverse order:

 @lines = reverse <FILE>; foreach $line (@lines) {     # do something with $line }


The limitations of file access mentioned in this chapter's Introduction prevent you from reading a line at a time starting from the end. You must read the lines into memory, then process them in reverse order. Needless to say, this requires at least as much available memory as the size of the file.

The first technique moves through the array of lines, in reverse order. This destructively processes the array, popping an element off the end of the array each time through the loop. We could do it non-destructively with:

 for ($i = $#lines; $i != -1; $i--) {     $line = $lines[$i]; }

The second approach generates an array of lines that is already in reverse order. This array can then be processed non-destructively. We get the reversed lines because the assignment to @lines forces list context on reverse, which in turn forces it on <FILE>. <> in a list context returns a list of all lines in the file.

These approaches are easily extended to paragraphs just by changing $/:

 # this enclosing block keeps local $/ temporary {                local $/ = '';     @paragraphs = reverse <FILE>;   foreach $paragraph (@paragraphs) {     # do something

See Also

The reverse function in perlfunc (1) and in Chapter 3 of Programming Perl ; the $/ entry in perlvar (1), and in the "Special Variables" section of Chapter 2 of Programming Perl ; Recipe 4.10; Recipe 1.6

Trailing a Growing File


You want to read from a continually growing file, but the read fails when you reach the (current) end of file.


Read until the end of file. Sleep, clear the EOF flag, and read some more. Repeat until interrupted. To clear the EOF flag, either use seek:

 for (;;) {     while (<FH>) { .... }     sleep $SOMETIME;     seek(FH, 0, 1); }

or the IO::Handle module's clearerr method:

 use IO::Seekable;   for (;;) {     while (<FH>) { .... }     sleep $SOMETIME;     FH->clearerr(); }


When you read to the end of a file, an internal flag is set that prevents further reading. The most direct way to clear this flag is the clearerr method, if supported: it's in the IO::Handle and FileHandle modules.

 $naptime = 1;   use IO::Handle; open (LOGFILE, "/tmp/logfile") or die "can't open /tmp/logfile: $!"; for (;;) {     while (<LOGFILE>) { print }     # or appropriate processing     sleep $naptime;     LOGFILE->clearerr();            # clear stdio error flag }

If that simple approach doesn't work on your system, you may need to use seek . The seek code given above tries to move zero bytes from the current position, which nearly always works. It doesn't change the current position, but it should clear the end-of-file condition on the handle so that the next <LOGFILE> picks up new data.

If that still doesn't work (e.g., it relies on features of your C library's (so-called) standard I/O implementation), then you may need to use the following seek code, which remembers the old file position explicitly and returns there directly.

 for (;;) {     for ($curpos = tell(LOGFILE); <LOGFILE>; $curpos = tell(LOGFILE)) {         # process $_ here     }     sleep $naptime;     seek(LOGFILE, $curpos, 0);  # seek to where we had been }

On some kinds of filesystems, the file could be removed while you are reading it. If so, there's probably little reason to continue checking whether it grows. To make the program exit in that case, stat the handle and make sure its link count (the third field in the return list) hasn't gone to 0:

 exit if (stat(LOGFILE))[3] == 0

If you're using the File::stat module, you could write that more readably as:

 use File::stat; exit if stat(*LOGFILE)->nlink == 0;

See Also

The seek function in perlfunc (1) and in Chapter 3 of Programming Perl ; your system's tail (1) and stdio (3) manpages

Picking a Random Line from a File


You want to return a random line from a file.


Use rand and $. (the current line number) to decide which line to print:

 srand; rand($.) < 1 && ($line = $_) while <>; # $line is the random line


This is a beautiful example of a solution that may not be obvious. We read every line in the file but don't have to store them all in memory. This is great for large files. Each line has a 1 in N (where N is the number of lines read so far) chance of being selected.

Here's a replacement for fortune using this algorithm:

 $/ = "%%\n"; $data = '/usr/share/games/fortunes'; srand; rand($.) < 1 && ($adage = $_) while <>; print $adage;

If you know line offsets (for instance, you've created an index) and the number of lines, you can randomly select a line and jump to its offset in the file, But you usually don't have such an index.

Here's a more rigorous explanation of how the algorithm works. The function call rand ($.) picks a random number between 0 and the current line number. Therefore, you have a one in N chance, that is, 1/N, of keeping the Nth line. Therefore you've a 100% chance of keeping the first line, a 50% chance of keeping the second, a 33% chance of keeping the third, and so on. The question is whether this is fair for all N, where N is any positive integer.

First, some concrete examples, then abstract ones.

Obviously, a file with one line (N=1) is fair: you always keep the first line because 1/1 = 100%, making it fair for files of 1 line. For a file with two lines, N=2. You always keep the first line; then when reaching the second line, you have a 50% chance of keeping it. Thus, both lines have an equal chance of being selected, which shows that N=2 is fair. For a file with three lines, N=3. You have a one-third chance, 33%, of keeping that third line. That leaves a two-thirds chance of retaining one of the first two out of the three lines. But we've already shown that for those first two lines there's a 50-50 chance of selecting either one. 50 percent of two-thirds is one-third. Thus, you have a one-third chance of selecting each of the three lines of the file.

In the general case, a file of N+1 lines will choose the last line times and one of the previous N lines times. Dividing by N leaves us with for each the first N lines in our N+1 line file, and also for line number N+1. The algorithm is therefore fair for all N, where N is a positive integer.

We've managed to choose fairly a random line from a file with speed directly proportional to the size of the file, but using no more memory than it takes to hold the longest line, even in the worst case.

See Also

The $. entry in perlvar (1) and in the "Special Variables" section of Chapter 2 of Programming Perl ; Recipe 2.7; Recipe 2.8

Randomizing All Lines


You want to copy a file and randomly reorder its lines.


Read all lines into an array, shuffle the array using the algorithm from Chapter 4, and write the shuffled lines back out:

 # assumes the &shuffle sub from Chapter 4 while (<INPUT>) {     push(@lines, $_); @reordered = shuffle(@lines); foreach (@reordered) {     print OUTPUT $_; }


The easiest approach is to read all lines into memory and shuffle them there. Because you don't know where lines start in the file, you can't just shuffle a list of line numbers and then extract the lines in the order they'll appear in the shuffled file. Even if you did know their starts, it would probably still be slower because you'd be seek ing around in the file instead of simply reading it from start to finish.

See Also

Recipe 2.7; Recipe 2.8; Recipe 4.17

Reading a Particular Line in a File


You want to extract a single line from a file.


The simplest solution is to read the lines until you get to the one you want:

 # looking for line number $DESIRED_LINE_NUMBER $. = 0; do { $LINE = <HANDLE> } until $. == $DESIRED_LINE_NUMBER || eof;

If you are going to be doing this a lot and the file fits into memory, read the file into an array:

 @lines = <HANDLE>; $LINE = $lines[$DESIRED_LINE_NUMBER];

If you will be retrieving lines by number often and the file doesn't fit into memory, build a byte-address index to let you seek directly to the start of the line:

 # usage: build_index(*DATA_HANDLE, *INDEX_HANDLE) sub build_index {     my $data_file  = shift;     my $index_file = shift;     my $offset     = 0;       while (<$data_file>) {         print $index_file pack("N", $offset);         $offset = tell($data_file);     }   # usage: line_with_index(*DATA_HANDLE, *INDEX_HANDLE, $LINE_NUMBER) # returns line or undef if LINE_NUMBER was out of range sub line_with_index {     my $data_file   = shift;     my $index_file  = shift;     my $line_number = shift;       my $size;               # size of an index entry     my $i_offset;           # offset into the index of the entry     my $entry;              # index entry     my $d_offset;           # offset into the data file       $size = length(pack("N", 0));     $i_offset = $size * ($line_number-1);     seek($index_file, $i_offset, 0) or return;     read($index_file, $entry, $size);     $d_offset = unpack("N", $entry);     seek($data_file, $d_offset, 0);     return scalar(<$data_file>);   # usage: open(FILE, "< $file")         or die "Can't open $file for reading: $!\n"; open(INDEX, "+>$file.idx")         or die "Can't open $file.idx for read/write: $!\n"; build_index(*FILE, *INDEX); $line = line_with_index(*FILE, *INDEX, $seeking);

If you have the DB_File module, its DB_RECNO access method ties an array to a file, one line per array element:

 use DB_File; use Fcntl;   $tie = tie(@lines, $FILE, O_RDWR, 0666, $DB_RECNO) or die  # extract it $line = $lines[$sought-1];


Each strategy has different features, useful in different circumstances. The linear access approach is easy to write and best for short files. The index method gives quick two-step lookup, but requires that the index be pre-built, so it is best when the file being indexed doesn't change often compared to the number of lookups. The DB_File mechanism has some initial overhead, but subsequent accesses are much faster than with linear access, so use it for long files that are accessed more than once and are accessed out of order.

It is important to know whether you're counting lines from 0 or 1. The $. variable is 1 after the first line is read, so count from 1 when using linear access. The index mechanism uses lots of offsets, so count from 0. DB_File treats the file's records as an array indexed from 0, so count lines from 0.

Here are three different implementations of the same program, print_line. The program takes two arguments, a filename, and a line number to extract.

The version in Example 8-1 simply reads lines until it finds the one it's looking for.

Example 8-1. print_line-v1 

 #!/usr/bin/perl -w # print_line-v1 - linear style   @ARGV == 2 or die "usage: print_line FILENAME LINE_NUMBER\n";   ($filename, $line_number) = @ARGV; open(INFILE, "< $filename") or die "Can't open $filename for reading: $!\n"; while (<INFILE>) {     $line = $_;     last if $. == $line_number; if ($. != $line_number) {     die "Didn't find line $line_number in $filename\n"; print;

The index version in Example 8-2 must build an index. For many lookups, you could build the index once and then use it for all subsequent lookups:

Example 8-2. print_line-v2

 #!/usr/bin/perl -w # print_line-v2 - index style # build_index and line_with_index from above @ARGV == 2 or     die "usage: print_line FILENAME LINE_NUMBER";   ($filename, $line_number) = @ARGV; open(ORIG, "< $filename")          or die "Can't open $filename for reading: $!";   # open the index and build it if necessary # there's a race condition here: two copies of this # program can notice there's no index for the file and # try to build one.  This would be easily solved with # locking $indexname = "$filename.index"; sysopen(IDX, $indexname, O_CREAT|O_RDWR)          or die "Can't open $indexname for read/write: $!"; build_index(*ORIG, *IDX) if -z $indexname;  # XXX: race unless lock   $line = line_with_index(*ORIG, *IDX, $line_number); die "Didn't find line $line_number in $filename" unless defined $line; print $line;

The DB_File version in Example 8-3 is indistinguishable from magic.

Example 8-3. print_line-v3 

 #!/usr/bin/perl -w # print_line-v3 - DB_File style use DB_File; use Fcntl;   @ARGV == 2 or     die "usage: print_line FILENAME LINE_NUMBER\n";   ($filename, $line_number) = @ARGV; $tie = tie(@lines, "DB_File", $filename, O_RDWR, 0666, $DB_RECNO)         or die "Cannot open file $filename: $!\n";   unless ($line_number < $tie->length) {     die "Didn't find line $line_number in $filename\n"   print $lines[$line_number-1];                         # easy, eh?

See Also

The documentation for the standard DB_File module (also in Chapter 7 of Programming Perl ); the tie function in perlfunc (1) and in Chapter 3 of Programming Perl ; the entry on $. in perlvar (1) and in the "Special Variables" section of Chatper 2 of Programming Perl

Processing Variable-Length Text Fields


You want to extract variable length fields from your input.


Use split with a pattern matching the field separators.

 # given $RECORD with field separated by PATTERN, # extract @FIELDS. @FIELDS = split(/PATTERN/, $RECORD);


The split function takes up to three arguments: PATTERN, EXPRESSION, and LIMIT. The LIMIT parameter is the maximum number of fields to split into. (If the input contains more fields, they are returned unsplit in the final list element.) If LIMIT is omitted, all fields (except any final empty ones) are returned. EXPRESSION gives the string value to split. If EXPRESSION is omitted, $_ is split. PATTERN is a pattern matching the field separator. If PATTERN is omitted, contiguous stretches of whitespace are used as the field separator and leading empty fields are silently discarded.

If your input field separator isn't a fixed string, you might want split to return the field separators as well as the data by using parentheses in PATTERN to save the field separators. For instance:

 split(/([+-])/, "3+5-2");

returns the values:

 (3, '+', 5, '-', 2)

To split colon-separated records in the style of the /etc/passwd file, use:

 @fields = split(/:/, $RECORD);

The classic application of split is whitespace-separated records:

 @fields = split(/\s+/, $RECORD);

If $RECORD started with whitespace, this last use of split would have put an empty string into the first element of @fields because split would consider the record to have an initial empty field. If you didn't want this, you could use this special form of split:

 @fields = split(" ", $RECORD);

This behaves like split with a pattern of /\s+/, but ignores leading whitespace.

When the record separator can appear in the record, you have a problem. The usual solution is to escape occurrences of the record separator in records by prefixing them with a backslash. See Chapter 1, Escaping Characters.

See Also

The split function in perlfunc (1) and in Chapter 3 of Programming Perl

Removing the Last Line of a File


You'd like to remove the last line from a file.


Read the file a line at a time and keep track of the byte address of the last line you've seen. When you've exhausted the file, truncate to the last address you saved:

 open (FH, "+< $file")               or die "can't update $file: $!"; while ( <FH> ) {     $addr = tell(FH) unless eof(FH); truncate(FH, $addr)                 or die "can't truncate $file: $!";


This is much more efficient than reading the file into memory all at once, since it only holds one line at a time. Although you still have to grope your way through the whole file, you can use this program on files larger than available memory.

See Also

The open and binmode functions in perlfunc (1) and in Chapter 3 of Programming Perl ; your system's open (2) and fopen (3) manpages

Processing Binary Files


Your system distinguishes between text and binary files. How do you?


Use the binmode function on the filehandle:



Not everyone agrees what constitutes a line in a text file, because one person's textual character set is another's binary gibberish. Even when everyone is using ASCII instead of EBCDIC, Rad50, or Unicode, discrepancies arise.

As mentioned in the Introduction, there is no such thing as a newline character. It is purely virtual, a figment of the operating system, standard libraries, device drivers, and Perl.

Under Unix or Plan9, a "\n" represents the physical sequence "\cJ" (the Perl double-quote escape for Ctrl-J), a linefeed. However, on a terminal that's not in raw mode, an Enter key generates an incoming "\cM" (a carriage return) which turns into "\cJ", whereas an outgoing "\cJ" turns into "\cM\cJ". This strangeness doesn't happen with normal files, just terminal devices, and it is handled strictly by the device driver.

On a Mac, a "\n" is usually represented by "\cM" ; just to make life interesting (and because the standard requires that "\n" and "\r" be different), a "\r" represents a "\cJ". This is exactly the opposite of the way that Unix, Plan9, VMS, CP/M, or nearly anyone else does it. So, Mac programmers writing files for other systems or talking over a network have to be careful. If you send out "\n", you'll deliver a "\cM", and no "\cJ" will be seen. Most network services prefer to receive and send "\cM\cJ" as a line terminator, but most accept merely a "\cJ".

Under VMS, DOS, or their derivatives, a "\n" represents "\cJ", similar to Unix and Plan9. From the perspective of a tty, Unix and DOS behave identically: a user who hits Enter generates a "\cM", but this arrives at the program as a "\n", which is "\cJ". A "\n" (that's a "\cJ", remember) sent to a terminal shows up as a "\cM\cJ".

These strange conversions happen to Windows files as well. A DOS text file actually physically contains two characters at the end of every line, "\cM\cJ". The last block in the file has a "\cZ" to indicate where the text stops. When you write a line like "bad news\n" on those systems, the file contains "bad news\cM\cJ", just as if it were a terminal.

When you read a line on such systems, it's even stranger. The file itself contains "bad news\cM\cJ", a 10-byte string. When you read it in, your program gets nothing but "bad news\n", where that "\n" is the virtual newline character, that is, a linefeed ( "\cJ" ). That means to get rid of it, a single chop or chomp will do. But your poor program has been tricked into thinking it's only read nine bytes from the file. If you were to read 10 such lines, you would appear to have read just 90 bytes into the file, but in fact would be at position 100. That's why the tell function must always be used to determine your location. You can't infer your position just by counting what you've read.

This legacy of the old CP/M filesystem, whose equivalent of a Unix inode stored only block counts and not file sizes, has frustrated programmers for decades, and no end is in sight. Because DOS is compatible with CP/M file formats, Windows with DOS, and NT with Windows, the sins of the fathers have truly been visited unto the children of the fourth generation.

You can circumvent the single "\n" terminator by telling Perl (and the operating system) that you're working with binary data. The binmode function indicates that data read or written through the given filehandle should not be mangled the way a text file would likely be on those systems.

 $gifname = "picture.gif"; open(GIF, $gifname)         or die "can't open $gifname: $!";   binmode(GIF);               # now DOS won't mangle binary input from GIF binmode(STDOUT);            # now DOS won't mangle binary output to STDOUT   while (read(GIF, $buff, 8 * 2**10)) {     print STDOUT $buff; }

Calling binmode on systems that don't make this distinction (including Unix, the Mac, and Plan 9) is harmless. Inappropriately doing so (such as on a text file) on systems that do (including MVS, VMS, and DOS, regardless of its GUI ) can mangle your files.

If you're not using binmode, the data you read using stdio (<>) will automatically have the native system's line terminator changed to "\n", even if you change $/. Similarly, any "\n" you print to the filehandle will be turned into the native line terminator. See this chapter's Introduction for more details.

If you want to get what was on the disk, byte for byte, you should set binmode if you're on one of the odd systems listed above. Then, of course, you also have to set $/ to the real record separator if you want to use <> on it.

See Also

The open and binmode functions in perlfunc (1) and in Chapter 3 of Programming Perl ; your system's open (2) and fopen (3) manpages

Using Random-Access I/O


You have to read a binary record from the middle of a large file but don't want to read a record at a time to get there.


Once you know the record's size, multiply it by the record number to get the byte address, and then seek to that byte address and read the record:

 $ADDRESS = $RECSIZE * $RECNO; seek(FH, $ADDRESS, 0) or die "seek:$!"; read(FH, $BUFFER, $RECSIZE);


The Solution assumes the first record has a RECNO of 0. If you're counting from one, use:


This won't work on a text file unless all lines are the same length. This is rarely the case.

See Also

The seek function in perlfunc (1) and in Chapter 3 of Programming Perl ; Recipe 8.13

Updating a Random-Access File


You want to read in an old record from a binary file, change its values, and write back the record.


After read ing the old record, pack up the updated values, seek to the previous address, and write it back.

 use Fcntl;                          # for SEEK_SET and SEEK_CUR   $ADDRESS = $RECSIZE * $RECNO; seek(FH, $ADDRESS, SEEK_SET)        or die "Seeking: $!"; read(FH, $BUFFER, $RECSIZE) == $RECSIZE                                     or die "Reading: $!"; @FIELDS = unpack($FORMAT, $BUFFER); # update fields, then $BUFFER = pack($FORMAT, @FIELDS); seek(FH, -$RECSIZE, SEEK_CUR)       or die "Seeking: $!"; print FH $BUFFER; close FH                            or die "Closing: $!";


You don't have to use anything fancier than print in Perl to output a record. Remember that the opposite of read is not write but print, although oddly enough, the opposite of sysread actually is syswrite. ( split and join are opposites, but there's no speak to match listen, no resurrect for kill, and no curse for bless.)

The example program shown in Example 8-4, weekearly, takes one argument: the user whose record you want to backdate by a week. (Of course, in practice, you wouldn't really want to (nor be able to!) mess with the system accounting files.) This program requires write access to the file to be updated, since it opens the file in update mode. After fetching and altering the record, it packs it up again, skips backwards in the file one record, and writes it out.

Example 8-4. weekearly

 #!/usr/bin/perl # weekearly -- set someone's login date back a week use User::pwent; use IO::Seekable;   $typedef = 'L A12 A16';         # linux fmt; sunos is "L A8 A16" $sizeof  = length(pack($typedef, ())); $user    = shift(@ARGV) || $ENV{USER} || $ENV{LOGNAME};   $address = getpwnam($user)->uid * $sizeof;   open (LASTLOG, "+</var/log/lastlog")     or die "can't update /usr/adm/lastlog: $!"; seek(LASTLOG, $address, SEEK_SET)     or die "seek failed: $!"; read(LASTLOG, $buffer, $sizeof) == $sizeof     or die "read failed: $!";   ($time, $line, $host) = unpack($typedef, $buffer); $time  -= 24 * 7 * 60 * 60;         # back-date a week $buffer = pack($typedef, $time, $line, $time);   seek(LASTLOG, -$sizeof, SEEK_CUR)   # backup one record     or die "seek failed: $!"; print LASTLOG $record;      close(LASTLOG)     or die "close failed: $!";

See Also

The open, seek, read, pack, and unpack functions in the perlfunc (1) and in Chapter 3 of Programming Perl ; Recipe 8.12; Recipe 8.14

Reading a String from a Binary File


You want to read a NUL-terminated string from a file, starting at a particular address.


Set $/ to an ASCII NUL, and read the string with <>:

 $old_rs = $/;                       # save old $/ $/ = "\0";                          # NULL seek(FH, $addr, SEEK_SET)           or die "Seek error: $!\n"; $string = <FH>;                     # read string chomp $string;                      # remove NULL $/ = $old_rs;                       # restore old $/

You can use local to save and restore $/ if you want:

     local $/ = "\0";     # ... }                           # $/ is automatically restored


The example program shown in Example 8-5, bgets, accepts a filename and one or more byte addresses as arguments. Decimal, octal, or hexadecimal addresses may be specified. For each address, the program reads and prints the NULL- or EOF-terminated string at that position:

Example 8-5. bgets

 #!/usr/bin/perl  #  bgets - get a string from an address in a binary file use IO::Seekable; ($file, @addrs) = @ARGV             or die "usage: $0 addr ..."; open(FH, $file)                     or die "
cannot open $file: $!"; $/ = "\000"; foreach $addr (@addrs) {     $addr = oct $addr if $addr =~ /^0/;     seek(FH, $addr, SEEK_SET)         or die "can't seek to $addr in $file: $!";     printf qq{%#x %#o %d "%s"\n}, $addr, $addr, $addr, scalar <>;

Here's a simple implementation of the Unix strings program:

Example 8-6. strings 

 #!/usr/bin/perl #  strings - pull strings out of a binary file $/ = "\0"; while (<>) {     while (/([\040-\176\s]{4,})/g) {         print $1, "\n";     } }

See Also

The seek, getc, and ord functions in perlfunc (1) and in Chapter 3 of Programming Perl ; the discussion of qq// in the "Quote and Quote-Like Operators" section of the perlop (1) manpage, and in the "Pick Your Own Quotes" section of Chapter 2 of Programming Perl

Reading Fixed-Length Records


You want to read a file whose records have a fixed length.


Use read and unpack:

 # $RECORDSIZE is the length of a record, in bytes. # $TEMPLATE is the unpack template for the record # FILE is the file to read from # @FIELDS is an array, one element per field   until ( eof(FILE) ) {     read(FILE, $record, $RECORDSIZE) == $RECORDSIZE         or die "short read\n";     @FIELDS = unpack($TEMPLATE, $record); }


Because the file in question is not a text file, you can't use <FH> or IO:: modules' getline method to read in records. Instead, you must simply read a particular number of bytes into a buffer. This buffer then contains one record's data, which you decode using unpack with the right format.

For binary data, the catch is often determining the right format. If you're reading data written by a C program, this can mean peeking at C include files or manpages describing the structure layout, and this requires knowledge of C. It also requires that you become unnaturally chummy with your C compiler, because otherwise it's hard to predict field padding and alignment (such as the x2 in the format used in See Program: tailwtmp. ). If you're lucky enough to be on a Berkeley Unix system or a system supporting gcc, then you may be able to use the c2ph tool distributed with Perl to cajole your C compiler into helping you with this.

The tailwtmp program at the end of this chapter uses the format described in utmp (5) under Linux and works on its /var/log/wtmp and /var/run/utmp files. Once you commit to working in a binary format, machine dependencies creep in fast. It probably won't work unaltered on your system, but the procedure is still illustrative. Here is the relevant layout from the C include file on Linux:

 #define UT_LINESIZE           12 #define UT_NAMESIZE           8 #define UT_HOSTSIZE           16   struct utmp {                       /* here are the pack template codes */     short ut_type;                  /* s for short, must be padded      */     pid_t ut_pid;                   /* i for integer                    */     char ut_line[UT_LINESIZE];      /* A12 for 12-char string           */     char ut_id[2];                  /* A2, but need x2 for alignment    */     time_t ut_time;                 /* l for long                       */     char ut_user[UT_NAMESIZE];      /* A8 for 8-char string             */     char ut_host[UT_HOSTSIZE];      /* A16 for 16-char string           */     long ut_addr;                   /* l for long                       */ };

Once you figure out the binary layout, feed that (in this case, "s x2 i A12 A2 x2 l A8 A16 l" ) to pack with an empty field list to determine the record's size. Remember to check the return value of read when you read in your record to make sure you got back the number of bytes you asked for.

If your records are text strings, use the "a" or "A" unpack templates.

Fixed-length records are useful in that the n th record begins at byte offset SIZE * ( n -1) in the file, where SIZE is the size of a single record. See the indexing code in Reading a Particular Line in a File for an example of this.

See Also

The unpack, pack, and read functions in perlfunc (1) and in Chapter 3 of Programming Perl ; Recipe 1.1

Reading Configuration Files


You want to allow the users of your program to change its behavior through configuration files.


Either process a file in trivial VAR=VALUE format, setting a hash key-value pair for each setting:

 while (<CONFIG>) {     chomp;                  # no newline     s/#.*//;                # no comments     s/^\s+//;               # no leading white     s/\s+$//;               # no trailing white     next unless length;     # anything left?     my ($var, $value) = split(/\s*=\s*/, $_, 2);     $User_Preferences{$var} = $value; }

Or better yet, treat the config file as full Perl code:

 do "$ENV{HOME}/.progrc";


The first solution lets you read in config files in a trivial format like this (comments and blank lines are allowed):

 # set class C net NETMASK = MTU     = 296      DEVICE  = cua1 RATE    = 115200 MODE    = adaptive

After you're done, you can pull in a setting by something like $User_Preferences{"RATE"} to find the value 115200. If you wanted the config file to directly set a variable in your program using that name, instead of assigning to the hash, do this:

 no strict 'refs'; $$var = $value;

and the $RATE variable would contain 115200.

The second solution uses do to pull in raw Perl code directly. When used with an expression instead of a block, do interprets the expression as a filename. This is nearly identical to using require, but without risk of taking a fatal exception. In the second format, the config file would look like:

 # set class C net $NETMASK = ''; $MTU     = 0x128; # Brent, please turn on the modem $DEVICE  = 'cua1'; $RATE    = 115_200; $MODE    = 'adaptive';

If you don't see the point of having extra punctuation and live code, consider this: you can have all of Perl at your disposal. You can now add arbitrary logic and tests to your simple assignments:

 if ($DEVICE =~ /1$/) {     $RATE = 28_800; } else {     $RATE = 115_200; }

Many programs support system and personal configuration files. If you want the user's choices to override the system ones, load the user file second:

 $APPDFLT = "/usr/local/share/myprog";   do "$APPDFLT/"; do "$ENV{HOME}/.myprogrc";

If you want to ignore the system config file when the user has their own, test the return value of the do.

 do "$APPDFLT/"     or do "$ENV{HOME}/.myprogrc";

You might wonder what context those files will be executed under. They will be in the same package that do itself was compiled into. Typically you'll direct users to set particular variables, which, being unqualified globals, will end up in the current package. If you'd prefer unqualified variables go into a particular package, do this:

 { package Settings; do "$ENV{HOME}/.myprogrc" } 

As with a file read in using require or use, those read in using do count as a separate and unrelated lexical scope. That means the configuration file can't access its caller's lexical ( my ) variables, nor can the caller find any such variables that might have been set in the file. It also means that the user's code isn't held accountable to a pragma like use strict or use integer that may be in effect in the caller.

If you don't want clean partitioning of variable visibility, you can get the config file's code executed in your own lexical scope. If you have a cat program or its technical equivalent handy, you could write yourself a hand-rolled do:

 eval `cat $ENV{HOME}/.myprogrc`;

We've never actually seen anyone (except Larry) use that approach in production code.

For one thing, do is a lot easier to type. Also, it respects the @INC path, which is normally searched if a full path is not specified, but, unlike using a require, no implicit error checking happens under do. This means you don't have to wrap it in an eval to catch exceptions that would otherwise cause your program to die, because do already functions as an eval.

You can still check for errors on your own if you'd like:

 $file = ""; unless ($return = do $file) {     warn "couldn't parse $file: $@"         if $@;     warn "couldn't do $file: $!"            unless defined $return;     warn "couldn't run $file"               unless $return; }

This is much simpler for the programmer to source in code than it would be to invent and then parse a complicated, new syntax. It's also much easier on the user than forcing them to learn the syntax rules of yet another configuration file. Even better, you give the user access to a powerful algorithmic programming language.

One reasonable concern is security. How do you know that the file hasn't been tampered with by someone other than the user? The traditional approach here is to do nothing, trusting the directory and file permissions. Nine times out of ten, this is also the right approach. Most projects just aren't worth being that paranoid over. For those that are, see the next recipe.

See Also

The eval and require functions in perlfunc (1) and in Chapter 3 of Programming Perl ; Recipe 8.17; Recipe 10.12

Testing a File for Trustworthiness


You want to read from a file, perhaps because it has configuration information. You only want to use the file if it can't be written to (or perhaps not even be read from) by anyone else than its owner.


Use the stat call to retrieve ownership and file permissions information. You can use the built-in version, which returns a list:

 ( $dev, $ino, $mode, $nlink,    $uid, $gid, $rdev, $size,    $atime, $mtime, $ctime,    $blksize, $blocks )       = stat($filename)         or die "no $filename: $!";   $mode &= 07777;             # discard file type info

Or you can use the by-name interface in:

 $info = stat($filename)     or die "no $filename: $!"; if ($info->uid == 0) {     print "Superuser owns $filename\n"; }  if ($info->atime > $info->mtime) {     print "$filename has been read since it was written.\n"; }


Usually you trust users to set file permissions as they wish. If they want others to read their files, or even to write to them, that's their business. Applications like editors, mailers, and shells are often more discerning, though, refusing to evaluate code in configuration files if anyone but the owner can write to them. This helps avoid Trojan horses attacks. Security-minded programs like ftp and rlogin may even reject config files that can be read by anyone but their owner.

If the file is writable by someone other than the owner or is owned by someone other than the current user or the superuser, it shouldn't be trusted. To figure out file ownership and permissions, the stat function is used. The following function returns true if the file is deemed safe and false otherwise. If the stat fails, undef is returned.

 use File::stat;   sub is_safe {     my $path = shift;     my $info = stat($path);     return unless $info;       # owner neither superuser nor me      # the real uid is in stored in the $< variable     if (($info->uid != 0) && ($info->uid != $<)) {         return 0;     }       # check whether group or other can write file.     # use 066 to detect either reading or writing     if ($info->mode & 022) {   # someone else can write this         return 0 unless -d _;  # non-directories aren't safe             # but directories with the sticky bit (01000) are         return 0 unless $info->mode & 01000;             }     return 1; }

A directory is considered safe even if others can write to it, provided that its mode 01000 (owner delete only) bit is set.

Careful programmers also ensure that no enclosing directory is writable. This is due to systems with the " chown giveaway" problem in which any user can give away a file they own and make it owned by someone else. The following function handles that by using the is_safe function to check every enclosing directory up to the root if it detects that you have the chown problem, for which it queries the POSIX::sysconf. If you don't have an unrestrictedversion of chown, the is_verysafe subroutine just calls is_safe. If you do have the problem, it walks up the filesystem tree until it reaches the root.

 use Cwd; use POSIX qw(sysconf _PC_CHOWN_RESTRICTED); sub is_verysafe {     my $path = shift;     return is_safe($path) if sysconf(_PC_CHOWN_RESTRICTED);     $path = getcwd() . '/' . $path if $path !~ m{^/};     do {         return unless is_safe($path);         $path =~ s#([^/]+|/)$##;               # dirname         $path =~ s#/$## if length($path) > 1;  # last slash     } while length $path;       return 1; }

To use this in a program, try something like this:

 $file = "$ENV{HOME}/.myprogrc"; readconfig($file) if is_safe($file);

This has potential for a race condition, because it's presumed that the hypothetical readconfig function will open the file. Between the time when is_safe checks the file's stats and when readconfig opens it, something wicked could theoretically occur. To avoid this, pass is_safe the already open filehandle, which is set up to handle this:

 $file = "$ENV{HOME}/.myprogrc"; if (open(FILE, "< $file")) {      readconfig(*FILE) if is_safe(*FILE); }

You would still have to arrange for readconfig to accept a filehandle instead of a filename, though.

Program: tailwtmp

Every time a user logs into or out of a Unix system, a record is added to the wtmp file. You can't use the normal tail program on it because it's in binary format. The tailwtmp program in Example 8-7 knows the format of the binary file and shows every new record as it appears. You'll have to adjust the pack format for your own system.

Example 8-7. tailwtmp 

 #!/usr/bin/perl # tailwtmp - watch for logins and logouts;  # uses linux utmp structure, from utmp(5) $typedef = 's x2 i A12 A4 l A8 A16 l'; $sizeof = length pack($typedef, () ); use IO::File; open(WTMP, '/var/log/wtmp') or die "can't open /var/log/wtmp: $!"; seek(WTMP, 0, SEEK_END); for (;;) {          while (read(WTMP, $buffer, $sizeof) == $sizeof) {         ($type, $pid,  $line, $id, $time, $user, $host, $addr)              = unpack($typedef, $buffer);         next unless $user && ord($user) && $time;         printf "%1d %-8s %-12s %2s %-24s %-16s %5d %08x\n",              $type,$user,$line,$id,scalar(localtime($time)),              $host,$pid,$addr;     }      for ($size = -s WTMP; $size == -s WTMP; sleep 1) {}     WTMP->clearerr();

Program: tctee

Not all systems support the classic tee program for splitting output pipes to multiple destinations. This command sends the output from someprog to /tmp/output and to the mail pipe beyond.

 % someprog | tee /tmp/output | Mail -s 'check this'

This program helps not only users who aren't on Unix systems and don't have a regular tee. It also helps those who are, because it offers features not found on other version of tee.

The four flag arguments are -i to ignore interrupts, -a to append to output files, -u for unbuffered output, and -n to omit copying the output on to standard out.

Because this program uses Perl's magic open, you can specify pipes as well as files.

 % someprog | tctee f1 "|cat -n" f2 ">>f3"

That sends the output from someprog to the files f1 and f2, appends it to f3, sends a copy to the program cat -n, and also produces the stream on standard output.

The program in Example 8-8 is one of many venerable Perl programs written nearly a decade ago that still runs perfectly well. If written from scratch now, we'd probably use strict, warnings, and ten to thirty thousand lines of modules. But if it ain't broke . . .

Example 8-8. tctee

 #!/usr/bin/perl # tctee - clone that groks process tees  # perl3 compatible, or better.   while ($ARGV[0] =~ /^-(.+)/ && (shift, ($_ = $1), 1)) {     next if /^$/;     s/i// && (++$ignore_ints, redo);     s/a// && (++$append,      redo);     s/u// && (++$unbuffer,    redo);     s/n// && (++$nostdout,    redo);     die "usage tee [-aiun] [filenames] ...\n";   if ($ignore_ints) {     for $sig ('INT', 'TERM', 'HUP', 'QUIT') { $SIG{$sig} = 'IGNORE'; }   $SIG{'PIPE'} = 'PLUMBER'; $mode = $append ? '>>' : '>'; $fh = 'FH000';   unless ($nostdout) {     %fh = ('STDOUT', 'standard output'); # always go to stdout     }   $| = 1 if $unbuffer;   for (@ARGV) {     if (!open($fh, (/^[^>|]/ && $mode) . $_)) {         warn "$0: cannot open $_: $!\n"; # like sun's; i prefer die         $status++;         next;     }     select((select($fh), $| = 1)[0]) if $unbuffer;     $fh{$fh++} = $_;   while (<STDIN>) {     for $fh (keys %fh) {         print $fh $_;     }   for $fh (keys %fh) {     next if close($fh) || !defined $fh{$fh};     warn "$0: couldnt close $fh{$fh}: $!\n";     $status++;   exit $status;   sub PLUMBER {     warn "$0: pipe to \"$fh{$fh}\" broke!\n";     $status++;     delete $fh{$fh};

Program: laston

When you log in to a Unix system, it tells you when you last logged in. That information is stored in a binary file called lastlog. Each user has their own record; UID 8 is at record 8, UID 239 at record 239, and so on. To find out when a given user last logged in, convert their login name to a number, seek to their record in that file, read, and unpack. Doing so with shell tools is very hard, but it's very easy with the laston program. Here's an example:

 % laston gnat 
gnat  UID 314 at Mon May 25 08:32:52 1998 on ttyp0 from

The program in Example 8-9 is much newer than the tctee program in Example 8-8, but it's less portable. It uses the Linux binary layout of the lastlog file. You'll have to change this for other systems.

Example 8-9. laston

 #!/usr/bin/perl # laston - find out when given user last logged on use User::pwent; use IO::Seekable qw(SEEK_SET);   open (LASTLOG, "/var/log/lastlog") or die "can't open /usr/adm/lastlog: $!";   $typedef = 'L A12 A16';  # linux fmt; sunos is "L A8 A16" $sizeof  = length(pack($typedef, ()));   for $user (@ARGV) {
Read More Show Less

Customer Reviews

Average Rating 5
( 1 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Terms of Use.
  • - Barnes & reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously
Sort by: Showing all of 5 Customer Reviews
  • Posted October 10, 2009

    I Also Recommend:

    Use It Every Day

    I just started a new job where perl is a main language and I needed something to boost my perl quickly. I use this book every day, many times each day. A lot more general than most of the other cookbooks I have used. It has really helped with my understanding of perl much more than the camel book. Maybe it is because I respond to concrete examples better.

    Was this review helpful? Yes  No   Report this review
  • Posted February 5, 2009

    Great Cookbook

    With twenty-two chapters spanning a variety of topics, the Perl Cookbook is enough to satiate any Perl programmer. Although not my top recommendation for a beginner, the book does an excellent job of teaching the basics for those who need it. For a more seasoned programmer this book does not disappoint; the topics covered in later chapters are sure to pique your interest. More importantly, however, it ensures that whatever your task may be, it is done with equal weight placed on both speed and practicality.<BR/><BR/>The book begins by discussing the various ways to manipulate strings, numbers, arrays and hashes. It conveys various examples of many common tasks that serve as a strong base for future Perl programs. The book then shifts attention to the all-powerful regular expressions, providing numerous commonly used (and often forgotten) examples, not to mention the all-encompassing Regular Expression Grab Bag, a three-page spread of expressions that often sit on the tip of your tongue. From here the book begins to pick up pace and diagram proper programming etiquette and design for subroutines, packages, libraries, modules, classes and objects.<BR/><BR/>My favorite part of the book comes near the end in the form of sockets and client/server communication. The book quickly gave me the answers I needed on how to establish a proper client and server as well as several options and explanations along the way. Instead of simply telling me what I needed to do, the book gave me several options and explained the pros and cons of each (a common theme I enjoyed throughout the book).<BR/><BR/>In short, this book allowed me to learn the ins and outs of Perl at my own pace, making this a wonderful cookbook for any Perl chef.

    Was this review helpful? Yes  No   Report this review
  • Anonymous

    Posted April 4, 2000

    Thanks a lot!!

    I thought I had a great idea for a CGI script, but when I opened the Perl Cookbook, I found bits of code from about four different recipes that described my previously innovative script idea exactly. So thanks a lot guys!! Seriously, I learn by solving problems and this book not only taught me how to program concisely and efficiently, but also gave me solutions to many real world problems. Thanks, guys!

    Was this review helpful? Yes  No   Report this review
  • Anonymous

    Posted June 26, 2010

    No text was provided for this review.

  • Anonymous

    Posted November 9, 2009

    No text was provided for this review.

Sort by: Showing all of 5 Customer Reviews

If you find inappropriate content, please report it to Barnes & Noble
Why is this product inappropriate?
Comments (optional)