Essential Apache for Web Professionals

Essential Apache for Web Professionals

Paperback

$26.99 $29.99 Save 10% Current price is $26.99, Original price is $29.99. You Save 10%.

Product Details

ISBN-13: 9780130649300
Publisher: Pearson Education
Publication date: 12/10/2001
Series: Essentials for Web Professionals Series
Pages: 242
Product dimensions: 6.00(w) x 9.00(h) x 0.71(d)

Table of Contents

Introductionxiii
Chapter 1Installation1
Introduction3
Operating System3
Source Code versus Binary Distributions4
Obtaining Apache4
Unpacking the Distributions5
Unpacking Unix Distributions5
Windows5
Compiling Apache6
Manual Compilation6
Background Information: make7
The APACI Method8
Changing the Default Configuration11
Installing Apache on Unix12
Higher Security Installation12
Dynamic Shared Objects13
Preparing Apache for DSOs13
Compiling Shared Object Modules with configure14
Compiling Shared Object Modules with apxs14
Using Shared Objects14
Binary Distributions15
Windows Installation15
Basic Installation16
Installing as Windows Service17
Commercial Distributions18
Commercial Apache via Linux Distribution18
Commercial Apache19
Recap19
Chapter 2Basic Apache21
Introduction23
Directives23
Global Behavior Directives25
ServerType25
ServerRoot26
PidFile26
ScoreBoardFile26
Timeout27
KeepAlive27
MaxKeepAliveRequests27
KeepAliveTimeout28
MaxSpareServers28
MinSpareServers28
StartServers28
MaxClients29
MaxRequestsPerChild29
Listen29
BindAddress30
LoadModule30
Limiting Scope with Container Directives30
Limiting Scope to a Directory via [left angle bracket]Directory[right angle bracket] and [left angle bracket]DirectoryMatch[right angle bracket]31
Limiting Scope to a Directory via .htaccess Files32
Limiting Scope to a URL with [left angle bracket]Location[right angle bracket] and [left angle bracket]LocationMatch[right angle bracket]33
Limiting Scope to a Virtual Host34
Limiting Scope by [left angle bracket]File[right angle bracket] and [left angle bracket]FileMatch[right angle bracket]34
Main Server Configuration Directives35
Port35
User and Group35
ServerAdmin36
ServerName36
DocumentRoot36
Options37
AllowOverride39
AccessFileName40
Order, Allow, Deny40
[left angle bracket]IfModule[right angle bracket]40
UserDir41
DirectoryIndex41
CacheNegotiatedDocs42
ClearModuleList42
AddModule42
MIME Types43
AddType43
AddHandler43
TypesConfig44
DefaultType44
MIMEMagicFile44
Logging45
HostnameLookups45
ErrorLog45
LogLevel46
TransferLog46
LogFormat Variables48
Resetting Logs49
Custom Output and Indexing49
BrowserMatch49
IndexOptions50
AddIcon, AddIconByType, AddIconByEncoding50
DefaultIcon50
AddDescription50
ReadmeName51
HeaderName51
IndexIgnore51
Windows-Specific Configuration51
Differences from Unix52
MaxRequestsPerChild52
ThreadsPerChild52
Starting, Restarting, and Stopping52
Notes for Win32 Users52
Starting Apache53
apachectl53
Starting Apache (Windows)55
Restarting Apache (Unix)56
Signals57
Restarting with apachectl58
Restarting Apache (Windows)58
Stopping Apache (Unix)59
Stopping Apache (Windows)59
Troubleshooting59
fcntl: F_SETLKW: No record locks available60
Cannot determine host name. Use ServerName directive to set it manually60
setgid: Invalid argument60
Linux Problems61
Windows Problems61
Error 106761
Recap61
Chapter 3Hosting Multiple Sites63
Introduction64
Prerequisites65
Ports66
Port Directives67
IP Addresses67
Virtual Hosting by Name68
NameVirtualHost68
[left angle bracket]VirtualHost[right angle bracket]69
Default Virtual Host70
Configuration Tip71
Virtual Hosting by IP71
Combining Name- and IP-Based Virtual Hosts72
Suggestions for Virtual Host Configuration72
User Home Pages73
UserDir some_subdirectory73
UserDir /an/absolute/path75
UserDir /absolute/path/*/with/wildcard75
Recap76
Chapter 4Dynamic Content77
Introduction78
Server Side Includes79
Enabling SSI79
XBitCrack81
SSI Keywords81
config81
echo82
exec82
fsize82
flastmod82
if and elif83
include83
printenv83
set83
CGI84
Enabling CGI by Location84
Enabling CGI by File Type86
Debugging CGI86
CGI Environment Variables88
Controlling Resource Usage88
FastCGI89
Obtaining FastCGI89
FastCgiIpcDir91
mod_perl91
Installing mod_perl92
Compiling mod_perl into httpd93
Creating a mod_perl DSO94
Using mod_perl94
Apache::ASP95
mod_python96
Installing97
Configuring Apache99
PythonDebug99
PHP100
Installing PHP101
Configuring Apache for PHP101
Recap102
Chapter 5Advanced Topics103
Introduction104
Performance Tuning104
mod_status104
Excessive Name Resolution105
Excessive Logging106
Generating Detailed Process Information107
The vmstat Unix Utility108
Active Servers108
Trimming httpd109
.htaccess Files110
Logging110
Enable KeepAlives110
Web Databases111
MySQL112
Server Configuration112
Database Access via Basic CGI113
Database Access via Embedded CGI Script Interpreters116
Database Access via Commercial Product118
Load Balancing118
Round Robin DNS119
mod_rewrite119
Module Creation121
mod_perl121
The Apache API122
Creating Handlers122
The Request Object123
A Basic Module123
Invoking the Basic Module124
Perl API Configuration Directives124
Handler Directives125
Performance Considerations126
Recap127
Appendix ADirective Listing129
Appendix BHTTP Status Codes203
Index207

Preface

Introduction

This book is a discussion of the installation, configuration, andmaintenance of the Apache Web server. At this writing, Apache isthe most popular Web server in the world. Apache is open-sourcesoftware; among other things, that means it is available fordownload at no cost.

The source code also is included with most distributions. Ifyou choose, you can modify Apache to suit your needs. This featurehas led to a rich variety of third-party add-ons. Many talentedprogrammers have chosen to make their work available tothe general public.

Apache is high-quality software. It is rare to encounter anerror in the source code itself. If you do encounter a problem,technical support is available from a variety of outlets on theInternet and in bookstores. Some companies also provide phonesupport for a fee.

This book will teach you how to use the Apache Web server.The discussion assumes a basic familiarity with computer concepts,but if you use computers at all, you should find the bookaccessible. In this chapter, I first provide a brief discussion of somegeneral networking concepts. If you've been working with net-worksfor a while, feel free to skip this section. If not, you shouldreview it, as the discussion in later chapters presumes a familiaritywith the terms introduced here. The last portion of this chapterlays out the typographical conventions of the rest of the book.

Basic Concepts

In this section I will introduce several fundamental concepts ofnetworking software in general. Next I'll cover several conceptsthat are specific to Apache.

Web Servers

Apache is a Web server. A Web server is a piece of software thatrespondsto the requests of Web browsers. When you type a URLin the address window of your Web browser, an intricate ques-tion-and-answer sequence is initiated between your browser andvarious Internet services. In order to understand the material inthis book, you need to have some understanding of these pro-cesses,so I will explain them first.

IP ADDRESSES

If you're even peripherally involved in the computer industry,you're probably familiar with the concept of an IP address. An IPaddress is a sequence of four numbers, each ranging in valuefrom 0 to 255, which are separated by periods. The following isan example of an IP address:

192.168.100.1

You will probably notice that most of the examples in thisbook use addresses in the range of 192.168.100.1 to 192.168.100.255. These aren't real addresses, at least not ones you can get tofrom the Internet. They are part of a range of addresses that wasset aside for private networks not connected to the Internet. Assuch, they are perfect for examples—because they are not real,they cannot be hacked.

NAME RESOLUTION

As members of the browsing public, we are accustomed to thinkingof Web addresses in terms of their domain names. A domainname is an address of the form:

www.stitch.com

You might be surprised to learn that those names are not ofmuch use to your computer. Computers almost never care aboutEnglish names for things. In order to connect to your Apacheserver and start downloading information, the Web browser thatwants to be your client must know two things about you:

  • the IP address of your machine
  • the port that your server is monitoring

However, in all likelihood, when users try to connect to your Website, all they have is your domain name. How do we get from the

www.stitch.com

printed on your business card to the IP address and port numberthe networking software uses?

The first step in the process is name resolution. Name resolutionis the process of looking up the IP address associated witha domain name. Name resolution usually occurs without anyhelp from the end user. When you install networking software onyour PC—such as the kind provided by your Internet service provider(AOL, Earthlink, and so on)—part of the installation processis to tell your machine where to go when it needs some name resolutiondone.

Usually, the machines that perform name resolution arelarge, powerful server machines that are dedicated to that onetask. Most of them run software called the Domain Name Service(DNS). Not every machine that runs DNS contains every singleaddress of the Internet. DNS servers store only the addresses thatare most popular among their client bases. When they are askedto resolve a domain name with which they are not familiar, theypass the question on to another DNS server. The details of thename resolution process aren't really important to you as anadministrator. The key point to remember is this:

When you decide to add a new Web site to your server, youmust make sure the Internet at large knows that thedomain name you are supporting is associated with the IPaddress of your server. The actual mechanics of this processare probably outside of your control. In practice, DNS registrationusually is accomplished by picking up the phoneand calling your Internet service provider (ISP). Tell themthat you want the domain name you are hosting to be reg-isteredin DNS as belonging to your IP address. Generallythis process takes a couple of hours on hold and $50 or so.You also should allow a couple of days for news of thechange to travel from the DNS software of your ISP out tothe world at large.

Once you have found an unclaimed domain name you can livewith and have registered it with DNS, the worst is over.

PORTS

Let us assume that the example browser has contacted a DNSserver and that name resolution has been completed successfully.Now the browser knows the IP address of the machine with whichit wants to communicate.

However, you may recall that earlier I said that, in order tomake a network connection, the client browser also needs toknow what port the Web server will be listening on. The machineassociated with the IP address you found may be running multiplenetwork services (ftp, telnet, etc.). Each of these services mustrespond to different requests in different ways. How does theserver keep them separated? The answer is ports.

A port is a secondary number associated with an IP address.Ports come in the range of 1 to 65535. Rather than asking eachindividual machine which service it associates with which port, ithas become customary for all machines connected to the Internetto use the same port for the same services. The term for this customis well-known port. The well-known port for Web service isnumber 80. When connecting across the secure socket layer (SSL),port 43 also is used.

SOCKETS

A socket is a network programming construct that enables twomachines to communicate across a network. A socket is definedby the IP address of the originating machine, the IP address of theterminating machine, and the port they are using to communicate.Socket connections are requested by the client browser. Ifthere is a server process (such as Apache) on the machine at theIP address requested by the client, monitoring the well-knownport associated with Web connections, that server will accept theconnection. At that point, a socket is created.The actual transmission of Web pages occurs across the socketconnection.

PROTOCOL

The term protocol, as it is used in computer science, is derivedfrom the term as it is used in human interaction. Just as diplomats and debutantes have all sorts of rituals they perform tofacilitate a smooth interaction between parties, so do computers.The idea is that computers aren't versatile enough to improvise,so the order and nature of each request—and each response toeach request—must be rigidly defined.

To give you just a rough idea of what I'm talking about, thefirst thing a server does after it has accepted a connection from aclient is to transfer information about which version of the protocolit is using across the socket. The client browser uses this informationto fine-tune the nature of the requests it sends and itsresponse to the information it receives. Next, the client has anopportunity to request data. The server responds to that requestwith either a Web page or an error message. The client displaysthe data it received and the cycle repeats itself.

All network services use some sort of protocol. Sometimes, asin the case of File Transfer Protocol (ftp) and HyperText TransferProtocol (http), the names reflect this. The protocol associatedwith the World Wide Web is http.

It's worth noting that the http protocol is not absolutely ideal.At the time it was created in 1990, something called "the Internet"did exist; it was largely the province of academics and lonelysingle men. The relentless hype that came to characterize it in themid-1990s was still years in the future. The most popular Internetapplications at the time were newsgroups and bulletin boards,both of which were, for the most part, text only. This was partly afunction of bandwidth—modems at the time were glacially slowcompared to what's available today. At 300 baud, even text-onlymessages took an achingly long time to download, and imagefiles were out of the question.

Modem speed improved, of course. At about the same time, aguy at CERN (a European research center) named Tim Berners-Leedeveloped a piece of software that would exploit both theincreasing speed of modems and the graphical user interface(GUI) capabilities of modern operating systems. His http enabledthe user to access data—including pictures—across a networkusing an intuitive, point-and-click interface.

This was truly a brilliant idea, and it took off immediately.However, in retrospect, it may have taken off too quickly. Let mepreface these next few sentences with a disclaimer: I am about toindulge in some shameless Monday-morning quarterbacking. Iwas a computer science student during this period, and I hadaccess to the same sorts of resources Tim Berners-Lee did. The main difference between us is that I was the one who failed toinvent the World Wide Web.

Having said that, I will go ahead and point out that http containedno provision for the secure transfer of data, no provisionfor the execution of scripts on either the client or the server side,and only rudimentary graphic-formatting capabilities. For thelast 11 years or so, the computer science community hasexpended enormous energy trying to find a way to retrofit thesecapabilities into http. The solutions that have been developed arecertainly functional, but no one ever describes them as elegant.

To be fair, I don't think that anybody at the time had anyidea just how huge the Web was going to be. If they had, theymight have spent a bit more time refining the protocols beforereleasing them on an unsuspecting public.

How Apache Works

Usually, Web servers handle requests from many browsers simultaneously.If a single server process were to handle all of theincoming requests, a great deal of overhead would be incurred inkeeping track of who wants what, what stage of the protocol theyare in, and so forth. In the UNIX environment, there is a simplerway: On UNIX systems, each client is assigned its own individualserver process.

How does this work? When Apache is started, the first thing itdoes is check whether it is the first such process on the machine.The first process, called the parent, has rights and responsibilitiesthat the other processes do not have. Specifically, it is responsiblefor creating copies of itself, called child processes, tohandle user requests. It also is responsible for killing the childprocesses off as necessary. As an Apache server administrator,you have the ability to control the number of these processes.

Apache on Windows is slightly different. On Windows,Apache relies on multiple threads within a single process to handleall user requests. The Apache program has a lot of assumptionsabout parent and child processes that were difficult toremove when the windows port was performed, so there is a parentprocess as well. Note, however, that the parent/child model isnot optimal.

Directives

Apache is a versatile piece of software. It alters its behavior at runtime based on the values of hundreds or even thousands of differentvariables stored in its configuration file. These variables arecalled directives. Most of this book is concerned with definingwhat these directives do, what their possible values are, and howyou can best exploit them to suit your needs.

Even the simplest Apache server will need to have dozens ofdirectives set. Rather than type the directives in when the serverprocess is invoked, as is common with Unix command line utilities,Apache stores the directives in a configuration file. This configuration file is a plain old text file. You can edit it with yourfavorite text editor, copy it at will, and generally treat it as youwould any other text file.

In order for any changes you make in the configuration file totake effect, you must restart the server process. The details of howto do this are discussed in Chapter 2.

Modules

Apache distributions all come with the same chunk of basic functionality,called the core, enabled by default. This functionalityincludes the ability to do such basic tasks as read its configurationfile, perform rudimentary access control, and find the Webpages it is supposed to be serving.

Each of these (and many other) tasks is handled by its ownclearly defined section of code. These sections of code are calledmodules. Apache is designed so that you can use only the modulesyou really need and discard the rest.

In order to fully exploit the modular capabilities of Apache,you will need to create an executable program from the sourcecode provided with the distribution. The process of creating anexecutable program from source code is called compilation.The program you end up compiling is called httpd. The compilationprocess is discussed in detail in Chapter 1.

It's worth emphasizing here that Apache is httpd. The termswill be used interchangeably throughout this book and all otherApache documentation. Why don't we just call httpd "Apache"?That's a fair question. The code that eventually became theApache server is descended from a program called the HyperTextTransfer Protocol Daemon. The name Apache is one of the weakjokes common among programmers—it refers to the fact thatearly versions of the server required a lot of software patches inorder to run correctly. By the time the name Apache was coined,using the label httpd for the running server process was unassailablyentrenched in both the source code and documentation.

Perhaps the best way to build a module is as a DynamicShared Object (or DSO). A DSO is a module that can be added toor removed from the httpd executable as the server is being startedsimply by changing a few directives in the configuration file. Thisis an amazingly handy ability. Compiling a module as a DSO isslightly more complicated than compiling it into a static serverprocess, but it is a smart investment of time. The details of compilingDSOs also are discussed in Chapter 1.

Handlers

Modules sometimes provide specific handlers, which are methodsof processing files or requests in an unusual way. Sometimeshandlers are named so that they can be referred to in configurationdirectives. Named handlers and their associated modulesare listed in Table 0.1.

TABLE 0-1 Named Handlers
HandlerModuleEffect
send-as-ismod_asisServe file and headers as-is
cgi-scriptmod_cgiAttempt to execute and serveoutput
imap-filemod_imapImagemap rule file
server-infomod_infoDisplay server configurationinformation
server-parsedmod_includeLocate and replace server-sideincludes
server-statusmod_statusDisplay server status information
type-mapmod_negotiationParse as type map file

As I implied in the discussion of modules, you must include amodule in the current httpd executable before you can access itshandler.

MIME Types

MIME is an acronym for Multimedia Internet Mail Extensions.The idea behind MIME types is to enable a program to determinewhat kind of data a file contains by looking at the file's extension.Apache comes with a default mechanism that enables you to define how MIME types will be presented to the client. Like everythingelse in Apache, this mechanism is fully configurable.

Conventions of This Book

Throughout this book you'll find example commands and configurationdirectives, always accompanied by at least some explanationand sometimes by example output. In general, I don'tprovide detailed syntax information for directives and systemcommands in the regular text. That sort of thing is found in theAppendices, particularly Appendix A (Core Directives) andAppendix B (Other Directives). I hope you'll be able to glean thegeneral nature of any command with which you are unfamiliarfrom the context.

The success or failure of any given Apache transactiondepends on the internal server configuration, the content beingtransferred, the configuration of the underlying operating system,and the vagaries of the network support services. Given that, it isimpossible to say with absolute certainty that the examples presentedherein will run on your particular machine. You have mysolemn vow that I typed each and every one of them in and theyworked for me.

If you have any questions, comments, corrections, or suggestionsfor improvement, please feel free to contact me at:

s_hawkins@mindspring.com

Additional information about this and other books in PrenticeHall PTR's Essential Web Series can be found at:

www.phptr.com/essential/

Recap

A Web server is a piece of software that monitors an IP addressand port and uses the http protocol to respond to requests fromclient browsers. The Web pages are served across a network connectioncalled a socket.

The behavior of the Apache server is controlled by variablescalled directives stored in a configuration file. Apache is not a singleprocess but, rather, a collection of nearly identical child processesthat are created and destroyed by a parent.

Apache is composed of modules that may be included in theserver process at the discretion of the administrator. Some modulesprovide handlers, which are methods of processing files orrequests in a nonstandard way.

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews