Web Performance Tuning: Speeding up the Web

Web Performance Tuning: Speeding up the Web

by Patrick Killelea
Web Performance Tuning: Speeding up the Web

Web Performance Tuning: Speeding up the Web

by Patrick Killelea

Paperback(Second Edition)

$44.99 
  • SHIP THIS ITEM
    Qualifies for Free Shipping
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Related collections and offers


Overview

As long as there's been a Web, people have been trying to make it faster. The maturation of the Web has meant more users, more data, more features, and consequently longer waits on the Web. Improved performance has become a critical factor in determining the usability of the Web in general and of individual sites in particular. Web Performance Tuning, 2nd Edition is about getting the best possible performance from the Web. This book isn't just about tuning web server software; it's also about streamlining web content, getting optimal performance from a browser, tuning both client and server hardware, and maximizing the capacity of the network itself. Web Performance Tuning hits the ground running, giving concrete advice for quick results — the "blunt instruments" for improving crippled performance right away. The book then shifts gears to give a conceptual background of the principles of computing performance. The latter half of the book examines each element of a web transaction — from client to network to server — to find the weak links in the chain and show how to strengthen them. In this second edition, the book has been significantly expanded to include:
  • New chapters on Web site architecture, security, reliability, and their impact on performance
  • Detailed discussion of scalability of Java on multi-processor servers
  • Perl scripts for writing web performance spiders that handle logins, cookies, SSL, and more
  • Detailed instructions on how to use Perl DBI and the open source program gnuplot to generate performance graphs on the fly
  • Coverage of rstat, a Unix-based open source utility for gathering performance statistics remotely
In addition, the book includes many more examples and graphs of real-world performance problems and their solutions, and has been updated for Java 2. This book is for anyone who has waited too long for a web page to display, or watched the servers they manage slow to a crawl. It's about making the Web more usable for everyone.

Product Details

ISBN-13: 9780596001728
Publisher: O'Reilly Media, Incorporated
Publication date: 03/28/2002
Edition description: Second Edition
Pages: 480
Product dimensions: 7.00(w) x 9.19(h) x 1.14(d)

About the Author

Patrick Killelea currently works for a major on-line brokerage, but he won't say which one. He spends his days writing monitoring and load testing tools, and proclaiming the web to the be the one true front end because of its simplicity, portability, and performance. He thinks Microsoft is not to be trusted with your back end. Patrick knows there are huge web performance improvements yet to be realized using the details of existing open protocols. He is a fan of T/TCP and hopes one day to set up a connection and deliver an entire web page all in a single packet. Patrick spends his evenings playing with his wife and kids, and is interested in etymologies, obscure religions, and pan-seared salmon with mixed greens and a nice merlot. He likes to get e-mail about web and Java performance issues. Please visit his web site at patrick.net.

Read an Excerpt


Chapter 11: Server Hardware

Here we revisit computer hardware from the server perspective. Even though each client receives exactly as many bytes as the server sends, the server hardware needs to be more powerful than client hardware because the servers must be capable of handling many clients simultaneously. On the other hand, it is common for small web sites to overestimate just how much server power they really need. if your server is handling only one client every several seconds, then you can probably make do with the same hardware that would make a good web client. For the majority of sites, the network connection is more likely than server hardware to be the limiting factor.

Server tuning is the subject of many entire books, and the subject is much larger than I can present in a single chapter. For in-depth detail, some good books on the subject are: System Performance Tuning, by Mike Loukides (O'Reilly & Associates); Sun Performance and Tuning, 2nd Edition, by Adrian Cockcroft (Prentice Hall); Configuration and Capacity Planning for Solaris Servers, by Brian Wong (Prentice Hall); and Optimizing Windows NT, by Russ Blake (Microsoft Press).

How Server Hardware Is Different

Box on a Wire

A web server is essentially remote storage that copies data from its RAM or disk to the network connection upon request. It may not be a simple copy, since dynamic content or database access may be involved, but from the user's point of view, your web server is just one more mass storage device. Now, does a disk drive have a windowing system? No. Similarly, your web server does not need a windowing system, a video card, a monitor, or even a keyboard! in fact, a windowing system occupies a great deal of RAM and CPU time, so it is a drain on server performance. You don't have any choice if you're using a Windows or Mac web server, but on Unix systems you can simply turn off X Windows. NT and Unix have an additional reason not to use a windowing system: the currently active window has a higher execution priority than other processes. On Solaris for example, processes belonging to the currently selected window are bumped up in priority by 10 points (out of 100 or so). If you're not very careful with the windowing system, you can hurt web server performance simply by moving the mouse. it is better to do web server administration remotely over one or more telnet sessions from a different computer.

Web servers without monitors are known as headless servers.

Good I/O

The fundamental distinguishing feature of server hardware is high-performance 1/0. Commodity PC hardware is limited by its legacy I/O subsystem, while server hardware is designed around 1/0 and can easily have ten times the I/O performance of the best PCs.

Multiple Busses

Servers usually have separate busses for L2 cache, 1/0, RAM, and peripherals. This reduces contention and allows the use of appropriate hardware for each bus, Server busses may be packet switched, in the sense that a request is made over the bus and the bus is released until the response is ready, allowing requests to be interleaved with responses and improving throughput. Bus throughput is critical for servers, because a great deal of what a server does is simply copy data between network devices and storage devices.

Fast Disks

Servers should have separate high-speed SCSI disks for content and logging. IDE disks are not acceptable. Striping data over disk arrays is highly recommended to allow seeks to proceed in parallel.

Lots of Memory

Servers should have large amounts of RAM to reduce disk accesses. A good rule is to allow enough RAM to hold the complete OS and the most frequently accessed parts of your data set. Servers also tend to have large L1 and L2 caches, and may have the cache split between data and instruction caches, because data and instructions have different access patterns. The only memory faster than L1 cache is the set of registers on the CPU. Many megabytes of L2 cache is becoming common.

Unfortunately, the effectiveness of caching for server CPUs is reduced by the context switching that happens with every network interrupt. httpd code and network handling code displace each other from the caches.

Scalability

A server should be scalable to smoothly handle an increasing workload. Unix workstations have far more capacity for scaling by adding CPUs and RAM than PCs. Unix workstations scale up to 64 or 128 CPUs, depending on whom you ask, while PC hardware cannot generally handle the contention between more than 4 CPUs, as of this writing. Workstations also have better I/O bandwidth and more RAM expandability.

Network Interface Card

The Network Interface Card (NIC) provides the connection between the network cable and the server's bus. NICs fill a conceptually simple niche, but their variety reflects the many permutations possible between network cable, cable signalling protocol, and host computer bus. NICs take an incoming serial stream of bits and output a parallel stream onto the bus, and vice versa. Until recently, it could be assumed that the network connection would be far slower than the CPU and bus, but LAN network speeds have been increasing faster than CPU and bus speeds, so it is no longer a safe bet that your network card can be handled by your machine. Still, at the interface to the Internet, you can be fairly sure that your server will be more constrained by Internet access than by any other component, save perhaps disk.

NICs have on-board buffers, and a bigger buffer always gives you more flexibility. The buffer has historically been important for holding outgoing data until the network can deal with it all, but as mentioned, that situation is reversing, so the buffers will in the future tend to hold incoming data, waiting for the computer. In either case, a larger buffer makes a buffer overflow and consequent data loss less likely. Lost TCP/IP data is simply retransmitted, adding to overhead. Typically, 8bit Ethernet cards have 8K buffers, while 16-bit cards have 16K buffers.

When a NIC has a complete unit of data from the network and is ready to forward it on to the computer's bus, it generates a hardware interrupt, which forces the CPU to save its current state and run the network card interrupt handler, which retrieves the data from the NIC's buffer and fills a data structure in memory. Therefore, a critical performance factor is how many interrupts per second the CPU, memory, and bus can handle from the NIC.

Another important measure of a server is how quickly it can get data from RAM or disk out to the network interface. This involves copying data from one place in memory to another, which is typical of server activity. Data is copied from the server's memory to the network interface card memory. Given a 1500-byte outgoing Ethernet packet, the OS must copy it-probably 4 bytes at a time-from RAM or cache out to the NIC buffer, so this copy would require 375 bus cycles to complete. The bcopy or memcpy library calls are often used here, so the efficiency of your server's implementation of these library calls is significant. This is also where the implementation of TCP/IP in your kernel becomes significant. If you have a poor implementation, it probably means the wait between the NIC's interrupt and the retrieval of a packet from the NIC's buffer is large, so additional packets arriving on the NIC may not find sufficient buffer space and may be dropped or overrun data in the buffer. This results in a costly retransmission of the lost packet.

You will get the best performance from the most recent network cards. Many network cards can now be upgraded by loading new code into their flash memory. The latest non-beta release of this code should give you the best performance.

It is possible to sidestep the use of the CPU for retrieving NIC buffer data by using a "busmastering" NIC, which is capable of moving data directly between the NIC buffer and the machine's memory without interrupting the processor. Busmastering cards have a performance advantage over non-busmastering cards, but are more expensive, because they need more on-card intelligence. Intel has specified a method for interfacing NICs directly to PC hard disk, called the 120 specification, which will need operating system support. 120 should be available by the time you read this.

Bus

A bus is a set of parallel wires (usually 32, 64, 128, or 256 wires, plus error and protocol handling wires) embedded in a board forming the backbone of the computer. Other components, including CPU, disk, memory, and network cards, are connected to each other by their shared bus.

There may be more than one bus in a computer. PCs may have only one bus connecting everything. Server hardware, however, typically has at least two separate busses: a high-speed bus for connecting memory to the CPU, and a slower bus for connecting 1/0 to the CPU. System busses lag CPU speed by a large margin, meaning that CPUs spend a great many cycles simply sitting and waiting for the bus to catch up. On the other hand, busses are usually faster than network connections. As already mentioned, this has been changing recently. Fast Ethernet, for example, runs at 100Mbps, which is more than ISA or EISA busses can handle. Gigabit Ethernet runs at 1000Mbps, which is even more of a challenge. At gigabit rates, the server bus and CPU generally become the bottleneck, especially if the CPU is trying to do database access or run CGI applications at the same time.

While a throughput of 1056Mbps from a 32-bit 33MHz PCI bus is technically possible, your true throughput will be far lower because of contention, network packet overhead, OS implementation, and many other issues. 10Mbps is good TCP/IP throughput for a PC. A Sun Ultra 1 should get much better than 40Mbps of TCP/IP throughput. (The advertised rates you see will be the far higher theoretical rates.) The 66MHz PCI bus exceeds memory access speeds, moving the bottleneck to RAM.

Multiple PCI busses, provided on some Compaq PCs, may give you parallel access to peripheral devices. Sun uses the IEEE 1496 standard for its peripheral SBus, but recently started building machines with PCI peripheral busses, so you can use offthe-shelf PCI cards if you install Sun-specific device drivers. Sun implements 64-bit PCI at 66MHz for the throughput needed for 622Mbps ATM, gigabit Ethernet, and Fibrechannel....

Table of Contents

  • Preface
  • Preliminary Considerations
    • Chapter 1: The Quick and the Dead
    • Chapter 2: Web Site Architecture
    • Chapter 3: Capacity Planning
    • Chapter 4: Performance Monitoring
    • Chapter 5: Load Testing
    • Chapter 6: Performance Analysis
    • Chapter 7: Reliability
    • Chapter 8: Security
    • Chapter 9: Case Studies
    • Chapter 10: Principles and Patterns
  • Tuning in Depth
    • Chapter 11: Browsers
    • Chapter 12: Client Operating System
    • Chapter 13: Client Hardware
    • Chapter 14: Lines and Terminators
    • Chapter 15: Network Protocols
    • Chapter 16: Server Hardware
    • Chapter 17: Server Operating System
    • Chapter 18: Server Software
    • Chapter 19: Content
    • Chapter 20: Custom Applications
    • Chapter 21: Java
    • Chapter 22: Databases
  • Web Performance Product Lists and Reviews
  • Colophon
From the B&N Reads Blog

Customer Reviews