Paperback
-
PICK UP IN STORECheck Availability at Nearby Stores
Available within 2 business hours
Related collections and offers
Overview
Product Details
ISBN-13: | 9780072191837 |
---|---|
Publisher: | McGraw-Hill/Osborne Media |
Publication date: | 09/05/2001 |
Series: | Network Professional's Library |
Pages: | 560 |
Product dimensions: | 7.03(w) x 9.74(h) x 1.23(d) |
About the Author
Read an Excerpt
Excerpt from Chapter 1:
History and Background of Apache
Despite its dominance and importance today, the World Wide Web (WWW) is a relative newcomer to networked computing, having been developed only in the middle 1990s. Despite its late start, the Web has become the service synonymous with "Internet" to millions of users worldwide. Whether you've been around the Internet since the early days (and remember Gopher and other pre-Web services) or you arrived on the scene after the Web had become the most popular service for Internet users-running neck and neck with electronic mail-you know people want fast and reliable access to the millions of Web pages out there.While you can't guarantee reliable service on the user's end, you can make sure your own pages are served rapidly and your Web presence is stable, whether you're running a small Web server out of your dining room or you're part of an administrative team operating a server that offers thousands of pages for millions of daily hits. The secret to a stable Web presence is choosing the right Web server for your site: the Apache Server. Over 60 percent of sites on the Web use Apache or one of its derivatives to power their pages. In this chapter, you learn why Web administrators choose Apache, as well as what makes it so powerful and unique.
WHAT IS APACHE?
At its most basic, the Apache Server is a standards-compliant Web server. This means the Apache Server supports the requirements of the HTTP 1.1 standard, a document that defines the method by which files encoded in Hypertext Markup Language (HTML) are moved across computer networks. TIP: HTTP is an acronym for Hyper Text Transfer Protocol.The term server means Apache responds to requests from other programs, but doesn't provide documents of its own volition. That is, when you open a Web browser-such as Netscape-and type http://www.apache.org into the text box and then press ENTER, your browser contacts the server at apache.org and requests the default page for that site. The server responds to the request with the file you want to see, which the browser then formats and displays. Figure 2-1 shows the basic process.
NOTE: These standards are maintained by the World Wide Web Consortium (W3C), a nonprofit group that works to develop standards for both HTTP and HTML. In Chapter 13, "Serving Compliant HTML," you learn more about working with standards and why they're critical to administrators and their sites.
Apache is more than a simple Web server, though. The true power behind the Apache Server lies in its modularity. The core of the server is actually quite small, serving as the central component of the program, but not providing a lot of extra functions. Those functions are added as modules, individual pieces of code that permit the server to handle a particular type of request or file in the appropriate way. Chapter 5, "Apache Modules," covers the range of available modules, while Chapter 8, "Dealing with Innovation: mod_perl, A Case Study," explains one popular module in great detail. If you plan to run Apache in any serious way, you'll find its modularity means you only need to install the functions you plan to use-without wasting machine cycles on functions you don't need.
DEVELOPMENT AND HISTORY OF THE APACHE PROJECT
The Apache Server is the creation of a large group of programmers and developers who work together to build and strengthen Apache and its modules, as well as to incorporate new technologies into the server. The Apache project started in 1995 as an attempt to upgrade the original HTTP daemon (httpd) developed at the National Center for Supercomputing Applications by Rob McCool. Because McCool had taken a new job in 1994, nobody at NCSA had taken over the project, so httpd was languishing at a time when Web programming was starting to take off.Web administrators were working on httpd on their own, and they began to share their patches and hacks with each other in an attempt to strengthen httpd without McCool's input. Soon, eight programmers announced the formation of the Apache Group, which would serve as a central node for httpd development. They took all the patches they could find and incorporated them into httpd code, releasing the first Apache server distribution in April 1995 as version 0.6.2. Testing and writing new code occupied the Apache group (including NCSA programmers) for the remainder of 1995, and after two more beta releases, Apache 1.0 was released in December 1995. Within a year, Apache was the most popular server being used on the Web. This popularity hasn't slowed, with Apache itself now serving 60 percent of Web sites and its derivatives adding another 3 or 4 percent to that total. Apache is currently in beta for version 2.0, with the most recent stable release being 1.3.
NOTE: This book is written using both Apache 2.0 and Apache 1.3. Since the 2.0 release is still under construction and is released only as beta software, those running Web sites that require reliability may need to stay with the current stable release (Apache 1.3) until the 2.0 version is released as stable. Significant differences from 1.3 are noted in this book, but some processes given here for 2.0 may not work on 1.3 installations.
At the end of 1999, the Apache developers took a somewhat unusual step. The server had become so popular, a more bureaucratic structure was needed to manage the project and its work. So, the Apache Software Foundation was established under United States law as a fully nonprofit organization. The foundation can receive donations, distribute funds to developers or other recipients, and manage the growth of Apache in an organized manner. Perhaps even more important, the foundation is considered a separate legal entity, apart from any people involved in the project. The foundation can enter into contracts, participate in legal action, and even sue or be sued, though one hopes that will never be necessary!
OPEN SOURCE SOFTWARE
Working with Apache without learning something about the Open Source or Free Software community is nearly impossible. Apache is often touted as one of the biggest successes to come out of this community, and the project has stayed faithful to its roots as the server has become more widely used. But what J's Software, and why is it important?At their most general, the terms Free Softwak and Open Source refer to software developed by volunteers and distributed with a license that's simultaneously restrictive and open. Free Software licenses usually require the user to contribute any changes made to the program back to the development community. They also require the full code base be distributed openly, holding nothing back as a "trade secret." Many programs released under such licenses, like Apache, are also distributed free of charge.
NOTE: Free Software doesn't always mean "no cost" software. The "free" refers to the way in which the code base, and improvements to the code base, must circulate among users and developers. People in the community use the phrase "free speech, not free beer" to indicate a difference exists between sharing without restraint and sharing without payment.
The Free Software movement is the brainchild of Richard Stallman, an MIT computer scientist who spent much of the 1970s decrying the rise of commercial software that hid its code from users and administrators. Without access to the code, Stallman knew administrators would have to rely on the software companies to fix bugs and produce upgrades. These upgrades would be generic and not always useful for a particular administrator's needs. So, Stallman began working on projects that would be released freely to the computing community and has continued to do so for the last quarter-century. He also created a foundation, called the Free Software Foundation, which helps people write Free Software and get it distributed.
Many of Stallman's programs are now considered integral parts of a Unix system, which is ironic because his project name, GNU, stands for Gnu's Not Unix. Stallman wasn't the only person working on such programs, though. A robust international community of programmers, hackers, and students was building an amazing array of programs. The rise of the Internet and its growing availability to people outside the military and academic networks helped with this explosion of code. However, the catalyst for truly amazing growth came when a Finnish college student, Linus Torvalds, released the first version of a new operating system called Linux.
NOTE: You'll see Unix spelled both with the capital U and in all capital letters, as in UNIX. The latter is a registered trademark, while the former has become the general way to describe UNIX-based operating systems, which may or may not contain part of the code in the AT&T copyrighted UNIX. In this book, the Unix spelling is used.
Linux was a version of an older Unix-based operating system called Minix, but it was developed and released under a GNU-derived license. One major innovation was that Linux could run on a variety of hardware, a far cry from the days when individual computers arrived with their own unique operating systems. The wide distribution of Linux meant a large user base was available to work with new programs and to generate data that would work as independent of the hardware platform as possible. With a Free and flexible operating system now available, the community exploded . . . and business began to take note.
Unfortunately-or fortunately, depending on the side you take-Stallman's insistence on the term "Free Software" wasn't the best marking tool. Businesses weren't comfortable with the concept of "free," thinking free code might be worth exactly what was paid for it. The programs were good and competitive, but the perception was a problem. Enter Eric Raymond, a programmer active in the Free Software community who identified this problem. In his landmark essay "The Cathedral and the Bazaar," Raymond suggested the term "open source" as a replacement. Open Source would carry the same connotations of open development and the distribution of source code, but would remove any financial or moral implications from the software's description. What term you use is up to you, but you should be aware of the shadings behind each description.
NOTE: If you're interested in learning more about this community, you can find out a lot by searching the Web and by reading the writings of both Stallman and Raymond. Raymond's book, The Cathedral and the Bazaar (O'Reilly & Associates, 2000), is a collection of his most important essays, which are also available on his Web site: http://www.tuxedo.org/-esr/writings/. You can learn more about Stallman's views by reading through the GNU site at hftp://www.gnu.org....
Table of Contents
Acknowledgments | xxiii | |
Introduction | xxv | |
Part I | Installing Apache | |
1 | History and Background of Apache | 3 |
What Is Apache? | 4 | |
Development and History of the Apache Project | 5 | |
Open Source Software | 6 | |
How Apache Works | 9 | |
Features of Apache 2.0 | 10 | |
Summary | 11 | |
2 | Preparing for Apache | 13 |
Locating and Downloading Apache | 14 | |
Preparing the Web Server Machine | 16 | |
Identifying and Removing Prior Servers | 18 | |
Using Apache with Unix | 20 | |
Upgrading from Earlier Versions of Apache | 24 | |
Identifying Previous Apache Installations | 24 | |
Should You Upgrade? | 27 | |
Summary | 28 | |
3 | Installing Apache | 29 |
Installing Apache from Binaries | 30 | |
Installing Apache from Source Code | 35 | |
Summary | 44 | |
4 | Running a Heterogeneous Network | 47 |
Samba for Windows Users | 48 | |
netatalk for Macintosh Users | 51 | |
When You Run Multiple Flavors of Unix | 57 | |
Summary | 60 | |
5 | Apache Modules | 61 |
How Apache Modules Work | 62 | |
The Default Modules | 63 | |
Locating Modules Not Included with Basic Packages | 86 | |
Installing Modules | 87 | |
Summary | 88 | |
Part II | Configuring and Running Apache | |
6 | Configuring and Testing Apache | 91 |
The Apache Configuration Files | 93 | |
Configuring Apache for Unix | 93 | |
Configuring Apache for Windows | 116 | |
The apachect1 Utility | 118 | |
Summary | 119 | |
7 | Managing the Apache Server | 121 |
Controlling Apache with Direct Commands | 122 | |
Using apachect1 | 125 | |
Starting Apache Automatically At System Boot | 127 | |
Defining the File System | 132 | |
Summary | 135 | |
8 | Dealing with Innovation (mod_perl: A Case Study) | 137 |
When to Use a New Idea | 139 | |
Finding New Modules and Shortcuts | 140 | |
The mod_perl Module | 151 | |
Security Versus Innovation | 154 | |
Summary | 155 | |
Part III | Apache Administration | |
9 | Logs | 159 |
Apache Logs | 160 | |
Finding the Logs | 161 | |
How to Read Logs | 162 | |
Configuring Logs | 162 | |
The mod_log_config Module | 167 | |
Useful Log Tricks | 168 | |
Summary | 172 | |
10 | Disk Management | 173 |
File system Management | 174 | |
Disk Partitions | 175 | |
Moving Content | 176 | |
Disk Quotas | 179 | |
File and Directory Permissions | 180 | |
Summary | 183 | |
11 | Performance Tuning | 185 |
Why Tune? | 186 | |
Streamlining Your Apache Installation | 188 | |
Unnecessary Modules | 194 | |
Load Balancing | 195 | |
Tracking Site Use | 197 | |
Summary | 199 | |
12 | Dealing with Users | 201 |
The Human Side of Administration | 202 | |
Setting Quotas | 203 | |
Setting Policies | 204 | |
Unix User Management | 206 | |
Summary | 208 | |
13 | Serving Compliant HTML | 209 |
What Is the World Wide Web Consortium? | 210 | |
HTML Standards | 211 | |
Setting Appropriate Server Policies | 225 | |
Summary | 226 | |
Part IV | Beyond the Basics: Advanced Apache Topics | |
14 | MIME and Other Encoding | 229 |
What Is MIME? | 230 | |
MIME Types and Apache Configuration | 237 | |
Character Sets | 256 | |
Summary | 259 | |
15 | CGI: The Common Gateway Interface | 261 |
The Common Gateway Interface | 262 | |
CGI and Apache | 263 | |
Obtaining CGI Scripts | 268 | |
Uses for CGI on Your Site | 270 | |
CGI and Security | 276 | |
Writing Your Own CGI Scripts | 278 | |
Summary | 280 | |
16 | Image Maps | 281 |
Web Navigation | 283 | |
Constructing Image Maps | 284 | |
Enabling Image Maps | 289 | |
Serving Image Maps: mod_imap | 290 | |
Maintaining Accessibility | 293 | |
Summary | 294 | |
17 | Using Apache to Save Time: SSI and CSS | 295 |
Server Side Includes | 296 | |
Configuring SSI | 298 | |
Working with SSI Variables | 302 | |
SSI Commands | 303 | |
Cascading Style Sheets | 306 | |
Making Web Pages Accessible | 309 | |
Summary | 310 | |
18 | Virtual Domain Hosting | 311 |
Virtual Domains | 312 | |
Should You Host Virtual Domains? | 313 | |
Working with the Domain Name Server | 315 | |
Configuring Virtual Domains | 317 | |
Virtual Domain Services: E-Mail | 322 | |
Summary | 323 | |
19 | E-Commerce | 325 |
What Is E-Commerce, Anyway? | 327 | |
Security and E-Commerce | 329 | |
Adding E-Commerce Elements to Your Site | 332 | |
Choosing an E-Commerce Provider | 336 | |
Summary | 339 | |
Part V | Security and Apache | |
20 | Basic Security Concerns | 343 |
Security Self-Evaluation | 344 | |
Access | 346 | |
Availability | 347 | |
Resources | 348 | |
Software and Practices for Secure Operation | 350 | |
Summary | 354 | |
21 | What to Do If You Get Cracked | 355 |
Noticing the Crack | 356 | |
Finding and Fixing Vulnerabilities | 358 | |
Preventive Measures | 359 | |
Security Breach Checklists | 360 | |
Summary | 367 | |
22 | SSL: The Secure Socket Layer | 369 |
What Is SSL? | 370 | |
How SSL Works with Apache | 377 | |
Using SSL as a Module | 379 | |
Summary | 381 | |
23 | Firewalls and Proxies | 383 |
What Is a Firewall? | 384 | |
Choosing a Firewall | 387 | |
Firewall Structures | 388 | |
Administering a Firewall | 395 | |
What Is a Proxy? | 395 | |
Choosing and Compiling a Proxy Package | 396 | |
Configuring a SOCKS Proxy | 397 | |
The mod_proxy Module | 398 | |
Summary | 399 | |
Part VI | Appendices | |
A | Internet Resources | 403 |
Web Sites | 404 | |
Newsgroups | 408 | |
Mailing Lists | 410 | |
Getting Involved with the Apache Community | 412 | |
Related Resources | 412 | |
B | Using a Unix Text Editor | 417 |
GNU Emacs | 424 | |
pico | 429 | |
Summary | 432 | |
C | Glossary | 433 |
A | 434 | |
B | 434 | |
C | 435 | |
D | 438 | |
E | 439 | |
F | 439 | |
G | 440 | |
H | 440 | |
I | 442 | |
L | 443 | |
M | 443 | |
N | 445 | |
O | 445 | |
P | 446 | |
Q | 448 | |
R | 448 | |
S | 449 | |
T | 452 | |
U | 453 | |
V | 453 | |
W | 454 | |
X | 454 | |
D | Common Unix Commands | 455 |
E | Apache Configuration Files | 479 |
httpd-std.conf | 481 | |
httpd-win.conf | 500 | |
highperformance-std.conf | 518 | |
Index | 521 |
Introduction
Introduction
Everybody loves the Web. Many people think the Web is the Internet because it's the most widely advertised Internet service and the subject of much business experimentation over the past few years. Even though the Web is only one of several critical Internet services (along with e-mail, file transfer, and other useful technologies, it has certainly become a critical part of many people's daily lives and work. This is an amazing fact, but it's even more astonishing when you realize the Web is a new technology, developed and popularized within the last ten years!While most people use the Web frequently and familiarly, far fewer are aware of the software that gets Web pages on to their monitors. Sure, everyone knows about Web browsers, but the servers that talk to the browsers and hand over the requested files are much more anonymous. However, without Web servers, no Web exists. A number of Web servers are available to the would-be Web administrator, from the complex and highly configured commercial servers sold as part of an e-commerce package to the most bare-bones and terse servers designed for test needs. Chapter 2, "Preparing for Apache," introduces some of these Web servers.
The two most popular Web servers, though, are Microsoft's Internet Information Server (IIS) and the Apache Web server. In fact, Apache is the most popular Web server in the world. It runs more than half the world's Web sites, and it performs well on rigorous benchmarking and performance tests. While IIS has the edge in some all-Microsoft networks, even the most hardcore Microsoft administrators often run Apache for their Web sites. To add to the popularity of the Apache server, you can download the software free. The source code is also openly available, meaning a constant and enthusiastic development community is building new features and functions for the server, and Apache is thoroughly tested in real-world situations and installations.
Obviously, this is a book about Apache, so you might expect me to be partial. Yes, I tend to think freely developed software has the edge on a lot of commercial software, but that's not the point with Apache. Apache is simply a better Web server than anything else out there. It's robust, streamlined, modular, responsive, and stable. That's the recipe for a darned good piece of software, which is precisely what Apache is. In this book, you find little preaching about Free Software or Open Source (though Chapter 1, "History and Background of Apache," contains an introduction to the topic, so you understand the community that created Apache).
Instead, we explore one of the two most popular and successful freely developed programs and how it can work for you. My hope is learning more about Apache will dispel some of the myths you might believe about noncommercial software, and that you'll consider other such software for your system as well.
TIP: The other freely developed success is the Linux operating system. Both Apache and Linux come from dedicated and committed communities, which work on the projects as hobby and passion.
No matter the reason why you've chosen the Apache Web server-or the reason you chose this book-you can find something in it to challenge your skills and meet your needs. Apache is a great piece of software and I hope you share my enthusiasm for it after you finish this book. Please be aware, there are worlds beyond what's covered here. In particular, this book doesn't cover dynamic content served from databases, and it gives little room to module programming and advanced scripting. Other valuable books cover such topics. This book is an introduction and a guide to basic Apache administration, and I hope you continue to explore other topics once the basics are under your belt.
WHAT'S IN THE BOOK
This book is divided into six parts. The first three sections deal with the basic tasks involved in running Apache, while the last three introduce more extended topics and provide helpful information you can use as a resource. If you're completely new to Apache, start at the beginning and read the first two parts before you install the server, using the remaining parts to bolster your knowledge as you gain experience. If you're more experienced with Unix servers in general, you may choose to skip the first two sections (or use them as a reference) and move to the fourth and fifth sections to expand your knowledge about Web-related topics. All readers can use the appendices in Part VI as support for the rest of the book.TIP: Two Tables of Contents at the beginning of the book. One is a chapter listing, while the second is expanded and contains the various subheadings of each chapter. Skim the expanded Table of Contents to learn more about the topics covered in each chapter and each part of the book.
Part 1, Installing Apache, starts from the beginning, with an introduction to open software and to the Apache server itself. This part also includes practical information on preparing your machine for the Apache server, locating a recent copy of the software, and installing the server. Other chapters in this part introduce software that can help you run a network that includes more than one operating system, as well as introduce Apache's modular construction and the various modules that perform different functions for the server.
Part II, Configuring and Running Apache, is the next step after successfully installing the server. Part II contains extensive information about configuring the server to meet your particular needs, as well as help in testing your configuration and fixing any problems that might occur. Once the server is configured properly, you're ready to manage and operate the server to provide Web pages to your visitors. This part of the book concludes with an introduction to the ongoing world of Apache development, including numerous modules that provide extended features to the server.
Part lIl, Apache Administration, focuses on the basic tasks involved in being a Web administrator. Chapters on Apache logs, basic Unix disk management, and performance tuning can help you understand your server and site traffic, as well as keep your installation running as smoothly as possible. This part also contains a chapter on dealing with your user base and setting up appropriate user policies, plus a chapter on the HTML standard and why you should attempt to serve HTML code that's as standard-compliant as possible.
Part IV, Beyond the Basics: Advanced Apache Topics, moves to topics of interest to a Web administrator, but that aren't required to run the Apache server. This part begins with an explanation of the MIME standard and text types; including character sets, which you can serve on your site. In this part, you also find an introduction to CGI scripts, image maps, server-side includes, and cascading style sheets. These are all page design techniques, but those that require some attention from you as the site administrator. Here, you also learn how to host virtual domains from your regular site. This section of the book concludes with an introduction to e-commerce and its complications.
Part V, Security and Apache, concludes the main part of the book. Security is an integral consideration for any Web administrator. In this section, you learn about some basic security concerns and precautions, and what to do if your site is cracked. This section also contains an introduction to Secure Sockets Layer (SSL) technology, and explains how to set up firewalls and proxies to further secure your Web server.
The final part of this book, Part VI, contains five appendices for further information. Appendix A is a list of some helpful Internet resources for the Web administrator. Appendix B offers instruction for several popular Unix text editors, which you need when you configure Apache. Appendix C is a glossary, while Appendix D contains a list of commonly used Unix commands. Finally, Appendix E contains the text of Apache's configuration files.
WHO SHOULD READ THIS BOOK
No one "ideal reader" exists for this book. Yes, the material here is targeted at the beginning to the intermediate user of Apache, but enough information is contained in the book that almost anyone should be able to find it useful. The absolute beginner with both Unix and Apache can find help in working with a new operating system, as well as with the server software, while the more experienced administrator might find Part IV or V the most useful. Regardless of your level of experience, this book should be of help to you. That said, I did make some assumptions about you, the reader, as I wrote:
- You have more than an academic interest in running a Web server, whether you want to serve a personal noncommercial site from a home network or you're involved in administering an extensive and high-profile Web site at your workplace;
- You have access to an always-on, high-speed Internet connection and a Pentium-level computer with sufficient RAM (or the equivalent Macintosh set-up);
- You have, or are willing to install, a Unix variant as your operating system;
- You know, or are willing to learn, the rudiments of working on a Unix machine;
NOTE: Although a vast number of personal pales and sites are on the Web, most of those pages aren't hosted by the individual who owns them. Instead, the pages are served by the individual's ISP, a third-party Web hosting service, or one of the free Web hosts, like Yahoo!Geocities or Angelfire. In that sense, the pages are hosted by a commercial entity and not by an individual.
Thus, I made an additional set of assumptions about readers who have a professional interest in running Apache:
- You manage computer resources for a nontechnical user base.
- You have access to multiple machines on an internal network.
- You serve (or plan to serve) a Web site that's critical to your company's work.
- You might not have the ultimate authority over the files served from the site.
- You have root access to the machine that hosts the Web server.
HOW TO USE THE BOOK
Books are easy to use. Just pick one up and start reading! In the case of technical books, though, some additional information can help before you get too deeply into the subject matter. As with most technical books, this one uses a certain set of conventions to indicate particular kinds of information:A word in italics is a new or important word, which is usually defined within the next sentence. So, you might see a sentence like this: "Installing Apache on a Unix machine requires that you have root access. The root account is the administrative account under Unix, and has special privileges that normal user accounts don't have, such as installing and running server programs." Many of the italicized terms in this book are also found in Appendix C, the glossary.
URLs are shown in boldface. Be aware, though, many of the URLs in this book are fanciful and don't refer to real sites. They're used as examples. However, a number of URLs throughout the book point you to useful sites or extra information that can help you with running Apache.
Words or phrases in the Courier font are direct Unix commands, file or directory names, or full directory paths. Some Courier text is set off from the text paragraphs surrounding it, as in
Code is often shown in this format.
Lengthy text files or bits of code are usually set off like this.
Some text is printed inside a box with a shaded edge, hich is called a sidebar. These sidebars contain information that adds to the chapter, buthat didn't flow neatly with the main chapter text. A sidebar might explain a deeper technology topic or provide some background on a particular Apache function. You can also see special paragraphs in the text t labeled Tip, Note, or Caution:
- TIP: A Tip is an extra bit of information that might interest you. Tips might contain links to more specialized Web pages, a piece of Unix or Apache history, or some other item that's interesting, though not critical to running Apache.
NOTE: A Note is something you should know before you begin working with the subject under discussion. Notes might be configuration details, additional commands, or other information to enhance your understanding of the topic.
CAUTION: Relatively few Cautions are in this book, but pay attention to the ones you find. A caution is a warning, whether about Apache itself, the Web, or some function on your Unix machine. Read the cautions carefully, so you can avoid the pitfalls they describe.