Web Performance Tuning

Web Performance Tuning

by Patrick Killelea

For as long as there's been a Web, people have been trying to make it faster. The maturation of the Web has meant more users, more data, more bells and whistles, and consequently longer waits on the Web. Improved performance has become one of the most important factors in determining the usability of both the Web in general and of individual sites in


For as long as there's been a Web, people have been trying to make it faster. The maturation of the Web has meant more users, more data, more bells and whistles, and consequently longer waits on the Web. Improved performance has become one of the most important factors in determining the usability of both the Web in general and of individual sites in particular.Web Performance Tuning is about getting the best performance from the Web. This book isn't just about tuning the web server software; it's also about getting optimal performance from a browser, tuning the hardware (on both the server and browser ends), and maximizing the capacity of the network itself.Web Performance Tuning hits the ground running, giving concrete advice for quick results—the "blunt instruments" for improving crippled performance right away. The book then takes a breath and pulls back to give a conceptual background of the principles of computing performance. The latter half of the book approaches each element of a web transaction—from client to network to server—to examine the weak links in the chain and how to strengthen them.Tips include:

  • Using simultaneous downloads to locate bottlenecks
  • Adjusting TCP for better web performance
  • Reducing the impact of DNS
  • Upgrading device drivers
  • Using alternatives to CGI
  • Locating the web server strategically
  • Minimizing browser cache lookups
  • Avoiding symbolic links for web content

Editorial Reviews

The Barnes & Noble Review
The Web may never be fast enough, but a whole lot's been learned in the past few years about building faster web sites. This book brings it all together, from optimizing content to tuning servers, scaling network infrastructure to building faster JavaServer Pages. This is in-depth stuff: detailed examples, measurement techniques, performance graphs, and dozens of solutions -- both "blunt instruments" and "scalpels."

Patrick Killelea begins with quick, preliminary recommendations for both the server and browser side: techniques that will make a significant difference in many, if not most, environments. Next, he reviews the planning and analysis techniques for identifying problems and acting proactively. You'll learn how to plan bandwidth, server, and memory capacity; automatically monitor each key performance parameter; test loads; and account for both reliability and security. Detailed case studies address several of the most widespread problems, including uncontrolled growth in database tables; logging delays caused by reverse DNS lookups; and database connection pool limitations.

Killelea then systematically reviews every link in the chain of Web performance: architecture, browsers, client and server operating systems and hardware; network connections; TCP/IP configuration; server applications; CGI; content; and much more. Killelea doesn't mince words: Java, he says, will never be adequate on the client side, but there are a raft of techniques for improving its performance on the server side (profiling, JITs, static compilation; adjusting runtime options). Whatever your role in maximizing web performance, whatever your application, you'll find this book indispensable. (Bill Camarda)

Bill Camarda is a consultant, writer, and web/multimedia content developer with nearly 20 years' experience in helping technology companies deploy and market advanced software, computing, and networking products and services. He served for nearly ten years as vice president of a New Jersey–based marketing company, where he supervised a wide range of graphics and web design projects. His 15 books include Special Edition Using Word 2000 and Upgrading & Fixing Networks For Dummies®, Second Edition.

Product Details

O'Reilly Media, Incorporated
Publication date:
Edition description:
Older Edition
Product dimensions:
7.00(w) x 9.13(h) x 0.80(d)

Read an Excerpt

Chapter 11: Server Hardware

Here we revisit computer hardware from the server perspective. Even though each client receives exactly as many bytes as the server sends, the server hardware needs to be more powerful than client hardware because the servers must be capable of handling many clients simultaneously. On the other hand, it is common for small web sites to overestimate just how much server power they really need. if your server is handling only one client every several seconds, then you can probably make do with the same hardware that would make a good web client. For the majority of sites, the network connection is more likely than server hardware to be the limiting factor.

Server tuning is the subject of many entire books, and the subject is much larger than I can present in a single chapter. For in-depth detail, some good books on the subject are: System Performance Tuning, by Mike Loukides (O'Reilly & Associates); Sun Performance and Tuning, 2nd Edition, by Adrian Cockcroft (Prentice Hall); Configuration and Capacity Planning for Solaris Servers, by Brian Wong (Prentice Hall); and Optimizing Windows NT, by Russ Blake (Microsoft Press).

How Server Hardware Is Different

Box on a Wire

A web server is essentially remote storage that copies data from its RAM or disk to the network connection upon request. It may not be a simple copy, since dynamic content or database access may be involved, but from the user's point of view, your web server is just one more mass storage device. Now, does a disk drive have a windowing system? No. Similarly, your web server does not need a windowing system, a video card, a monitor, or even a keyboard! in fact, a windowing system occupies a great deal of RAM and CPU time, so it is a drain on server performance. You don't have any choice if you're using a Windows or Mac web server, but on Unix systems you can simply turn off X Windows. NT and Unix have an additional reason not to use a windowing system: the currently active window has a higher execution priority than other processes. On Solaris for example, processes belonging to the currently selected window are bumped up in priority by 10 points (out of 100 or so). If you're not very careful with the windowing system, you can hurt web server performance simply by moving the mouse. it is better to do web server administration remotely over one or more telnet sessions from a different computer.

Web servers without monitors are known as headless servers.

Good I/O

The fundamental distinguishing feature of server hardware is high-performance 1/0. Commodity PC hardware is limited by its legacy I/O subsystem, while server hardware is designed around 1/0 and can easily have ten times the I/O performance of the best PCs.

Multiple Busses

Servers usually have separate busses for L2 cache, 1/0, RAM, and peripherals. This reduces contention and allows the use of appropriate hardware for each bus, Server busses may be packet switched, in the sense that a request is made over the bus and the bus is released until the response is ready, allowing requests to be interleaved with responses and improving throughput. Bus throughput is critical for servers, because a great deal of what a server does is simply copy data between network devices and storage devices.

Fast Disks

Servers should have separate high-speed SCSI disks for content and logging. IDE disks are not acceptable. Striping data over disk arrays is highly recommended to allow seeks to proceed in parallel.

Lots of Memory

Servers should have large amounts of RAM to reduce disk accesses. A good rule is to allow enough RAM to hold the complete OS and the most frequently accessed parts of your data set. Servers also tend to have large L1 and L2 caches, and may have the cache split between data and instruction caches, because data and instructions have different access patterns. The only memory faster than L1 cache is the set of registers on the CPU. Many megabytes of L2 cache is becoming common.

Unfortunately, the effectiveness of caching for server CPUs is reduced by the context switching that happens with every network interrupt. httpd code and network handling code displace each other from the caches.


A server should be scalable to smoothly handle an increasing workload. Unix workstations have far more capacity for scaling by adding CPUs and RAM than PCs. Unix workstations scale up to 64 or 128 CPUs, depending on whom you ask, while PC hardware cannot generally handle the contention between more than 4 CPUs, as of this writing. Workstations also have better I/O bandwidth and more RAM expandability.

Network Interface Card

The Network Interface Card (NIC) provides the connection between the network cable and the server's bus. NICs fill a conceptually simple niche, but their variety reflects the many permutations possible between network cable, cable signalling protocol, and host computer bus. NICs take an incoming serial stream of bits and output a parallel stream onto the bus, and vice versa. Until recently, it could be assumed that the network connection would be far slower than the CPU and bus, but LAN network speeds have been increasing faster than CPU and bus speeds, so it is no longer a safe bet that your network card can be handled by your machine. Still, at the interface to the Internet, you can be fairly sure that your server will be more constrained by Internet access than by any other component, save perhaps disk.

NICs have on-board buffers, and a bigger buffer always gives you more flexibility. The buffer has historically been important for holding outgoing data until the network can deal with it all, but as mentioned, that situation is reversing, so the buffers will in the future tend to hold incoming data, waiting for the computer. In either case, a larger buffer makes a buffer overflow and consequent data loss less likely. Lost TCP/IP data is simply retransmitted, adding to overhead. Typically, 8bit Ethernet cards have 8K buffers, while 16-bit cards have 16K buffers.

When a NIC has a complete unit of data from the network and is ready to forward it on to the computer's bus, it generates a hardware interrupt, which forces the CPU to save its current state and run the network card interrupt handler, which retrieves the data from the NIC's buffer and fills a data structure in memory. Therefore, a critical performance factor is how many interrupts per second the CPU, memory, and bus can handle from the NIC.

Another important measure of a server is how quickly it can get data from RAM or disk out to the network interface. This involves copying data from one place in memory to another, which is typical of server activity. Data is copied from the server's memory to the network interface card memory. Given a 1500-byte outgoing Ethernet packet, the OS must copy it-probably 4 bytes at a time-from RAM or cache out to the NIC buffer, so this copy would require 375 bus cycles to complete. The bcopy or memcpy library calls are often used here, so the efficiency of your server's implementation of these library calls is significant. This is also where the implementation of TCP/IP in your kernel becomes significant. If you have a poor implementation, it probably means the wait between the NIC's interrupt and the retrieval of a packet from the NIC's buffer is large, so additional packets arriving on the NIC may not find sufficient buffer space and may be dropped or overrun data in the buffer. This results in a costly retransmission of the lost packet.

You will get the best performance from the most recent network cards. Many network cards can now be upgraded by loading new code into their flash memory. The latest non-beta release of this code should give you the best performance.

It is possible to sidestep the use of the CPU for retrieving NIC buffer data by using a "busmastering" NIC, which is capable of moving data directly between the NIC buffer and the machine's memory without interrupting the processor. Busmastering cards have a performance advantage over non-busmastering cards, but are more expensive, because they need more on-card intelligence. Intel has specified a method for interfacing NICs directly to PC hard disk, called the 120 specification, which will need operating system support. 120 should be available by the time you read this.


A bus is a set of parallel wires (usually 32, 64, 128, or 256 wires, plus error and protocol handling wires) embedded in a board forming the backbone of the computer. Other components, including CPU, disk, memory, and network cards, are connected to each other by their shared bus.

There may be more than one bus in a computer. PCs may have only one bus connecting everything. Server hardware, however, typically has at least two separate busses: a high-speed bus for connecting memory to the CPU, and a slower bus for connecting 1/0 to the CPU. System busses lag CPU speed by a large margin, meaning that CPUs spend a great many cycles simply sitting and waiting for the bus to catch up. On the other hand, busses are usually faster than network connections. As already mentioned, this has been changing recently. Fast Ethernet, for example, runs at 100Mbps, which is more than ISA or EISA busses can handle. Gigabit Ethernet runs at 1000Mbps, which is even more of a challenge. At gigabit rates, the server bus and CPU generally become the bottleneck, especially if the CPU is trying to do database access or run CGI applications at the same time.

While a throughput of 1056Mbps from a 32-bit 33MHz PCI bus is technically possible, your true throughput will be far lower because of contention, network packet overhead, OS implementation, and many other issues. 10Mbps is good TCP/IP throughput for a PC. A Sun Ultra 1 should get much better than 40Mbps of TCP/IP throughput. (The advertised rates you see will be the far higher theoretical rates.) The 66MHz PCI bus exceeds memory access speeds, moving the bottleneck to RAM.

Multiple PCI busses, provided on some Compaq PCs, may give you parallel access to peripheral devices. Sun uses the IEEE 1496 standard for its peripheral SBus, but recently started building machines with PCI peripheral busses, so you can use offthe-shelf PCI cards if you install Sun-specific device drivers. Sun implements 64-bit PCI at 66MHz for the throughput needed for 622Mbps ATM, gigabit Ethernet, and Fibrechannel....

Meet the Author

Patrick Killelea currently works for a major on-line brokerage, but he won't say which one. He spends his days writing monitoring and load testing tools, and proclaiming the web to the be the one true front end because of its simplicity, portability, and performance. He thinks Microsoft is not to be trusted with your back end. Patrick knows there are huge web performance improvements yet to be realized using the details of existing open protocols. He is a fan of T/TCP and hopes one day to set up a connection and deliver an entire web page all in a single packet. Patrick spends his evenings playing with his wife and kids, and is interested in etymologies, obscure religions, and pan-seared salmon with mixed greens and a nice merlot. He likes to get e-mail about web and Java performance issues. Please visit his web site at patrick.net.

Customer Reviews

Average Review:

Write a Review

and post it to your social network


Most Helpful Customer Reviews

See all customer reviews >