Read an Excerpt
HTTP Pocket Reference
This book describes HTTP, the Hypertext Transfer Protocol. It provides a high level description of how the protocol works, along with reference information on client requests and server responses. Included are dumps of HTTP transactions, as well as tabular data that summarizes most of the standardized parameters used in HTTP.
The HTTP Pocket Reference is intended for system administrators, web site developers, and software engineers. With an understanding of HTTP, system administrators will have a better understanding of web site configuration and debugging. Web site designers can implement services that make better use of the protocol and streamline web client and server interaction. Software engineers who need to implement HTTP will find this book useful for its short, concise description of the protocol.
What Is HTTP?
HTTP is the protocol behind the World Wide Web. With every web transaction, HTTP is invoked. HTTP is behind every request for a web document or graphic, every click of a hypertext link, and every submission of a form. The Web is about distributing information over the Internet, and HTTP is the protocol used to do so.
HTTP is useful because it provides a standardized way for computers to communicate with each other. HTTP specifies how clients request data, and how servers respond to these requests. By understanding how HM works, you'll be able to:
- Manually query web servers and receive low-level information that typical web browsers hide from the user. With this information, you can better understand the configuration and capabilities of a particular server, and debug configuration errors with the server or programming errors in programs invoked by the web server.
- Understand the interaction between web clients (browsers, robots, search engines, etc.) and web servers.
- Streamline web services to make better use of the protocol.
This section presents an example of a common web transaction, showing the HTTP exchanged between the client and server program.
Given the following URL:
The browser interprets the URL as follows:
Use HTTP, the Hypertext Transfer Protocol.hypothetical.ora.com
Contact a computer over the network with the hostname of hypothetical.ora.com.:80
Connect to the computer at port 80. The port number can be any legitimate IP port number: 1 through 65535, inclusively.* If the colon and port number are omitted, the port number is assumed to be HTTP's default port number, which is 80.
Anything after the hostname is regarded as a document path. In this example, the document path is /.
* Assuming IP version 4 addressing, which is the most common version of IP currently in use.
So the browser connects to hypotbetical.ora.com on port 80 using the HTTP protocol. The message that the browser sends to the server is:
GET / HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/ Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE Host: hypothetical.ora.com Connection: Keep-Alive
Let's look at what these lines are saying:
1. The first line of this request (GET / HTTP/1.1) requests a document at / from the server. HTTP/1.1 is given as the version of the HTTP protocol that the browser uses.
2. The second line tells the server what kind of documents are accepted by the browser.
3. The third line indicates that the preferred language is English. This header allows the client to specify a preference for one or more languages, in the event that a server has the same document in multiple languages.
4. The fourth line indicates that the client understands how to interpret a server response that is compressed with the gzip or deflate algorithm.
5. In the fifth line, beginning with the string User-Agent, the client identifies itself as Mozilla version 4.0, running on Windows NT. In parenthesis it mentions that it is really Microsoft Internet Explorer version 5.01.
6. The sixth line tells the server what the client thinks the server's hostname is. This header is mandatory in HTTP 1.1, but optional in HTTP 1.0. Since the server may have multiple hostnames, the client indicates which hostname is being requested. In this environment, a web server can have a different document tree for each hostname assigned to it. If the client hasn't specified the server's hostname, the server may be unable to determine which document tree to use.
7. The seventh line (Connection:) tells the server to keep the TCP connection open until explicitly told to disconnect. Under HTTP 1.1, the default server behavior is to keep the connection open until the client specifies that the connection should be closed. The standard behavior in HTTP 1.0 is to close the connection after the client's request. See the discussion under "Persistent Connections" later in this book for details.
Together, these seven lines constitute a request. Lines two through seven are request headers. The section "Headers" discusses each header in more detail...