Writing Apache Modules with Perl and C: The Apache API and mod_perl
746Writing Apache Modules with Perl and C: The Apache API and mod_perl
746Paperback
-
PICK UP IN STORECheck Availability at Nearby Stores
Available within 2 business hours
Related collections and offers
Overview
- Rewriting CGI scripts as Apache modules to vastly improve performance
- Server-side filtering of HTML documents, to embed special markup or code (much like SSI)
- Enhancing server log functionality
- Converting file formats on the fly
- Implementing dynamic navigation bars
- Incorporating database access into CGI scripts
- Customizing access control and authorization to block robots or to use an external database for passwords
Product Details
ISBN-13: | 9781565925670 |
---|---|
Publisher: | O'Reilly Media, Incorporated |
Publication date: | 04/28/1999 |
Pages: | 746 |
Product dimensions: | 7.00(w) x 9.19(h) x 1.22(d) |
About the Author
Lincoln Stein is an assistant investigator at Cold Spring Harbor Laboratory, where he develops databases and user interfaces for the Human Genome Project using the Apache server and its module API. He is the author of several books about programming for the Web, including The Official Guide to CGI.pm, How to Set Up and Maintain a Web Site, and Web Security: A Step-by-Step Reference Guide.
Read an Excerpt
Chapter 7: Other Request Phases
The previous chapters have taken you on a wide-ranging tour of the most popular and useful areas of the Apache API. But we're not done yet! The Apache API allows you to customize URI translation, logging, the handling of proxy transactions, and the manner in which HTTP headers are parsed. There's even a way to incorporate snippets of Perl code directly into HTML pages that use server-side includes.We've already shown you how to customize the response, authentication, authorization, and access control phases of the Apache request cycle. Now we'll fill in the cracks. At the end of the chapter, we show you the Perl server-side include system, and demonstrate a technique for extending the Apache Perl API by subclassing the Apache request object itself.
The Child Initialization and Exit Phases
Apache provides hooks into the child process initialization and exit handling. The child process initialization handler, installed with PerlChildInitHandler is called just after the main server forks off a child but before the child has processed any incoming requests. The child exit handler, installed with PerlChildExitHandler, is called just before the child process is destroyed.
You might need to install handlers for these phases in order to perform some sort of module initialization that won't survive a fork. For example, the Apache::DBI module has a child init handler that initializes a cache of per-child database connections, and the Apache::Resource module steps in during this phase to set up resource limits on the child processes. The latter is configured in this way:
PerlChildInitHandler Apache::Resource
Like other handlers, you can install a child init handler programatically using Apache::push_handlers(). However, because the child init phase comes so early, the only practical place to do this is from within the parent process, in a Perl startup file configured with a PerlModule or PerlRequire directive. For example, here's how to install an anonymous subroutine that will execute during child initialization to choose a truly random seed value for Perl's random number generator:
use Math::TrulyRandom (); Apache->push_handlers(PerlChildInitHandler => sub { });
Install this piece of code in the Perl startup file. By changing the value of the random number seed on a per-child basis, it ensures that each child process produces a different sequence of random numbers when the built in rand() function is called.
The child exit phase complements the child intialization phase. Child processes may exit for various reasons: the MaxRequestsPerChild limit may have been reached, the parent server was shutdown, or a fatal error occurred. This phase gives modules a chance to tidy up after themselves before the process exits.
The most straightforward way to install a child exit handler is with the explicit PerlChildExitHandler directive, as in:
PerlChildExitHandler Apache::Guillotine
During the child exit phase, mod_perl invokes the Perl API function, perl_destruct()* to run the contents of END blocks and to invoke the DESTROY method for any global objects that have not gone out of scope already. Refer to the Chapter 9 section Special Global Variables, Subroutines and Literals for details.
Note: neither child initialization nor exit hooks are available on Win32 platforms for the reason that the Win32 port of Apache uses a single process.
The Post Read Request Phase
When a listening server receives an incoming request, it reads the HTTP request line and parses any HTTP headers sent along with it. Provided that what's been read is valid HTTP, Apache gives modules an early chance to step in during the post_read_request phase, known to the Perl API world as the rlPostReadRequestHandler. This is the very first callback that Apache makes when serving an HTTP request, and it happens even before URI translation turns the requested URI into a physical pathname.
The post_read_request phase is a handy place to initialize per-request data that will be available to other handlers in the request chain. Because of its usefulness as an initialize routine, mod_perl provides the directive PerlInitHandler as a more readable alias to PerlPostReadRequestHandler.
Since the post_read_request phase happens before URI translation, PerlPostReadRequestHandler cannot appear in Several optional Apache modules install handlers for the post_read_request phase. For example, the mod_unique_id module steps in here to create the UNIQUE_ID environment variable. When the module is activated, this variable is unique to each request over an extended period of time, and so is useful for logging and the generation of session IDs (see Chapter 5). Perl scripts can get at the value of this variable by reading $ENV{UNIQUE_ID}, or by calling $r->subprocess_env('UNIQUE_ID'). mod_setenvif also steps in during this phase to allow you to set enviroment variables based on the incoming client headers. For example, this directive will set the environment variable LOCAL_REFERRAL to true if the Referer header matches a certain regular expression: SetEnvIf Referer \.acme\.com LOCAL_REFERRAL mod_perl itself uses the post_read_request phase to process the PerlPassEnv and PerlSetEnv directives, allowing environment variables to be passed to modules that execute early in the request cycle. The built-in Apache equivalents, PassEnv and SetEnv don't get processed until the fixup phase, which may be too late. The Apache::StatINC module, which watches .pm files for changes and reloads them if necessary, is also usually installed into this phase: PerlPostReadRequestHandler Apache::StatINC PerlInitHandler Apache::StatINC # same thing, but easier to type The URI Translation Phase One of the Web's virtues is its Uniform Resource Identifier (URI) and Uniform Resource Locator (URL) standards.* End users never know for sure what is sitting behind a URI. It could be a static file, a dynamic script, a proxied request, or something even more esoteric. The file or program behind a URI may change over time, but this too is transparent to the end user. Much of Apache's power and flexibility comes from its highly configurable URI translation phase, which comes relatively early in the request cycle, after the post_read_request and before the header_parser phases. During this phase, the URI requested by the remote browser is translated into a physical filename, which may in turn be returned directly to the browser as a static document, or passed on to a CGI script or Apache API module for processing. During URI translation, each module that has declared its interest in handling this phase is given a chance to modify the URI. The first module to handle the phase (i.e. return something other than a status of DECLINED) terminates the phase. This prevents several URI translators from interfering with one another by trying to map the same URI onto several different file paths. By default, two URI translation handlers are installed in stock Apache distributions. The mod_alias module looks for the existence of several directives that may apply to the current URI. These include Alias, ScriptAlias, Redirect, AliasMatch, and other directives. If it finds one, it uses the directive's value to map the URI to a file or directory somewhere on the server's physical file system. Otherwise, the request falls through to the http_core module (where the default response handler is also found). http_core simply appends the URI to the value of the DocumentRoot configuration directive, forming a file path relative to the document root. The optional mod_rewrite module implements a much more comprehensive URI translator that allows you to slice and dice URIs in various interesting ways. It is extremely powerful, but uses a series of pattern matching conditions and substitution rules that can be difficult to get right. Once a translation handler has done its work, Apache walks along the returned filename path in the manner described in Chapter 4, finding where the path part of the URI ends and the additional path information begins. This phase of processing is performed internally and cannot be modified by the module API. In addition to their intended role in transforming URIs, translation handlers are sometimes used to associate certain types of URIs with specific upstream handlers. We'll see examples of this later in this chapter when we discuss creating custom proxy services. A Very Simple Translation Handler Let's look at an example. Many of the documents browsed on a web site are files that are located under the configured DocumentRoot. That is, the requested URI is a filename relative to a directory on the hard disk. Just so you can see how simple a translation handler's job can be, we present a Perl version of Apache's default translation handler found in the http_core module....