Writing Apache Modules with PERL and C by Doug MacEachern, Lincoln Stein |, Paperback | Barnes & Noble
Writing Apache Modules with PERL and C

Writing Apache Modules with PERL and C

by Doug MacEachern, Lincoln Stein

Apache is the most popular web server on the Internet because it is free, reliable, and extensible. The availability of the source code and the modular design of Apache makes it possible to extend web server functionality through the Apache API.For the most part, however, the Apache API has only been available to C programmers, and requires rebuilding the Apache


Apache is the most popular web server on the Internet because it is free, reliable, and extensible. The availability of the source code and the modular design of Apache makes it possible to extend web server functionality through the Apache API.For the most part, however, the Apache API has only been available to C programmers, and requires rebuilding the Apache server from source. mod_perl, the popular Apache module used primarily for enhanced CGI performance, changed all that by making the Apache API available to Perl programmers. With mod_perl, it becomes simple to develop Apache modules with Perl and install them without having to rebuild the web server.Writing Apache Modules with Perl and C shows how to extend web server capabilities regardless of whether the programming language is Perl or C. The book explains the design of Apache, mod_perl, and the Apache API. It then demonstrates how to use them to perform for tasks like the following:

  • Rewriting CGI scripts as Apache modules to vastly improve performance
  • Server-side filtering of HTML documents, to embed special markup or code (much like SSI)
  • Enhancing server log functionality
  • Converting file formats on the fly
  • Implementing dynamic navigation bars
  • Incorporating database access into CGI scripts
  • Customizing access control and authorization to block robots or to use an external database for passwords
The authors are Lincoln Stein and Doug MacEachern. Lincoln is the successful author of How to Set Up and Maintain a World Wide web Site and the developer of the widely used Perl CGI.pm module. Doug is a consultant and the creator of the innovative mod_perl Apache module.

Product Details

O'Reilly Media, Incorporated
Publication date:
Product dimensions:
7.00(w) x 9.22(h) x 1.22(d)

Read an Excerpt

Chapter 7: Other Request Phases

The previous chapters have taken you on a wide-ranging tour of the most popular and useful areas of the Apache API. But we're not done yet! The Apache API allows you to customize URI translation, logging, the handling of proxy transactions, and the manner in which HTTP headers are parsed. There's even a way to incorporate snippets of Perl code directly into HTML pages that use server-side includes.

We've already shown you how to customize the response, authentication, authorization, and access control phases of the Apache request cycle. Now we'll fill in the cracks. At the end of the chapter, we show you the Perl server-side include system, and demonstrate a technique for extending the Apache Perl API by subclassing the Apache request object itself.

The Child Initialization and Exit Phases

Apache provides hooks into the child process initialization and exit handling. The child process initialization handler, installed with PerlChildInitHandler is called just after the main server forks off a child but before the child has processed any incoming requests. The child exit handler, installed with PerlChildExitHandler, is called just before the child process is destroyed.

You might need to install handlers for these phases in order to perform some sort of module initialization that won't survive a fork. For example, the Apache::DBI module has a child init handler that initializes a cache of per-child database connections, and the Apache::Resource module steps in during this phase to set up resource limits on the child processes. The latter is configured in this way:

PerlChildInitHandler Apache::Resource

Like other handlers, you can install a child init handler programatically using Apache::push_handlers(). However, because the child init phase comes so early, the only practical place to do this is from within the parent process, in a Perl startup file configured with a PerlModule or PerlRequire directive. For example, here's how to install an anonymous subroutine that will execute during child initialization to choose a truly random seed value for Perl's random number generator:

use Math::TrulyRandom (); Apache->push_handlers(PerlChildInitHandler => sub { });

Install this piece of code in the Perl startup file. By changing the value of the random number seed on a per-child basis, it ensures that each child process produces a different sequence of random numbers when the built in rand() function is called.

The child exit phase complements the child intialization phase. Child processes may exit for various reasons: the MaxRequestsPerChild limit may have been reached, the parent server was shutdown, or a fatal error occurred. This phase gives modules a chance to tidy up after themselves before the process exits.

The most straightforward way to install a child exit handler is with the explicit PerlChildExitHandler directive, as in:

PerlChildExitHandler Apache::Guillotine

During the child exit phase, mod_perl invokes the Perl API function, perl_destruct()* to run the contents of END blocks and to invoke the DESTROY method for any global objects that have not gone out of scope already. Refer to the Chapter 9 section Special Global Variables, Subroutines and Literals for details.

Note: neither child initialization nor exit hooks are available on Win32 platforms for the reason that the Win32 port of Apache uses a single process.

The Post Read Request Phase

When a listening server receives an incoming request, it reads the HTTP request line and parses any HTTP headers sent along with it. Provided that what's been read is valid HTTP, Apache gives modules an early chance to step in during the post_read_request phase, known to the Perl API world as the rlPostReadRequestHandler. This is the very first callback that Apache makes when serving an HTTP request, and it happens even before URI translation turns the requested URI into a physical pathname.

The post_read_request phase is a handy place to initialize per-request data that will be available to other handlers in the request chain. Because of its usefulness as an initialize routine, mod_perl provides the directive PerlInitHandler as a more readable alias to PerlPostReadRequestHandler.

Since the post_read_request phase happens before URI translation, PerlPostReadRequestHandler cannot appear in , or sections. However the PerlInitHandler directive is actually a bit special. When it appears outside a directory section, it acts as an alias for PerlPostReadRequestHandler as just described. However, when it appears within a directory section, it acts as an alias for PerlHeaderParserHandler (discussed later in this chapter), allowing for per-directory initialization. In other words, wherever you put PerlInitHandler, it will act the way you expect.

Several optional Apache modules install handlers for the post_read_request phase. For example, the mod_unique_id module steps in here to create the UNIQUE_ID environment variable. When the module is activated, this variable is unique to each request over an extended period of time, and so is useful for logging and the generation of session IDs (see Chapter 5). Perl scripts can get at the value of this variable by reading $ENV{UNIQUE_ID}, or by calling $r->subprocess_env('UNIQUE_ID').

mod_setenvif also steps in during this phase to allow you to set enviroment variables based on the incoming client headers. For example, this directive will set the environment variable LOCAL_REFERRAL to true if the Referer header matches a certain regular expression:

SetEnvIf Referer \.acme\.com LOCAL_REFERRAL

mod_perl itself uses the post_read_request phase to process the PerlPassEnv and PerlSetEnv directives, allowing environment variables to be passed to modules that execute early in the request cycle. The built-in Apache equivalents, PassEnv and SetEnv don't get processed until the fixup phase, which may be too late. The Apache::StatINC module, which watches .pm files for changes and reloads them if necessary, is also usually installed into this phase:

PerlPostReadRequestHandler Apache::StatINC PerlInitHandler Apache::StatINC # same thing, but easier to type

The URI Translation Phase

One of the Web's virtues is its Uniform Resource Identifier (URI) and Uniform Resource Locator (URL) standards.* End users never know for sure what is sitting behind a URI. It could be a static file, a dynamic script, a proxied request, or something even more esoteric. The file or program behind a URI may change over time, but this too is transparent to the end user.

Much of Apache's power and flexibility comes from its highly configurable URI translation phase, which comes relatively early in the request cycle, after the post_read_request and before the header_parser phases. During this phase, the URI requested by the remote browser is translated into a physical filename, which may in turn be returned directly to the browser as a static document, or passed on to a CGI script or Apache API module for processing. During URI translation, each module that has declared its interest in handling this phase is given a chance to modify the URI. The first module to handle the phase (i.e. return something other than a status of DECLINED) terminates the phase. This prevents several URI translators from interfering with one another by trying to map the same URI onto several different file paths.

By default, two URI translation handlers are installed in stock Apache distributions. The mod_alias module looks for the existence of several directives that may apply to the current URI. These include Alias, ScriptAlias, Redirect, AliasMatch, and other directives. If it finds one, it uses the directive's value to map the URI to a file or directory somewhere on the server's physical file system. Otherwise, the request falls through to the http_core module (where the default response handler is also found). http_core simply appends the URI to the value of the DocumentRoot configuration directive, forming a file path relative to the document root.

The optional mod_rewrite module implements a much more comprehensive URI translator that allows you to slice and dice URIs in various interesting ways. It is extremely powerful, but uses a series of pattern matching conditions and substitution rules that can be difficult to get right.

Once a translation handler has done its work, Apache walks along the returned filename path in the manner described in Chapter 4, finding where the path part of the URI ends and the additional path information begins. This phase of processing is performed internally and cannot be modified by the module API.

In addition to their intended role in transforming URIs, translation handlers are sometimes used to associate certain types of URIs with specific upstream handlers. We'll see examples of this later in this chapter when we discuss creating custom proxy services.

A Very Simple Translation Handler

Let's look at an example. Many of the documents browsed on a web site are files that are located under the configured DocumentRoot. That is, the requested URI is a filename relative to a directory on the hard disk. Just so you can see how simple a translation handler's job can be, we present a Perl version of Apache's default translation handler found in the http_core module....

Meet the Author

Doug MacEachern has been addicted to Perl and web servers since early 1994 when he was introduced to Plexus as a student employee at the University of Arizona. Soon after returning to his home town of Boston, Massachusetts, and entering the "real world," he discovered the Apache web server, and since early 1996, he has been gluing Perl into all its nooks and crannies. His day job has consisted of integrating various other technologies with the Web, including DCE, Kerberos, and GSSAPI, but Perl has been the only one he cannot let go of. Doug has continued as a developer disguised as a consultant since the start of 1998, spending most of his time between Auckland, New Zealand, and San Francisco, California, with time at home in Boston during the warmer months. Doug likes to spend his time away from software—far, far away, sailing on the ocean, diving below it, or simply looking at it from a warm, sandy beach where technology doesn't go much beyond thatched huts and blenders.

Lincoln Stein is an assistant investigator at Cold Spring Harbor Laboratory, where he develops databases and user interfaces for the Human Genome Project using the Apache server and its module API. He is the author of several books about programming for the Web, including The Official Guide to CGI.pm, How to Set Up and Maintain a Web Site, and Web Security: A Step-by-Step Reference Guide.

Customer Reviews

Average Review:

Write a Review

and post it to your social network


Most Helpful Customer Reviews

See all customer reviews >