Better the Nutshell than the Nuthouse
Back when the Earth was young, "PC" meant "personal computer" rather than "IBM compatible," and the World-Wide-Web was only a gleam in Tim Berners-Lee's eye, O'Reilly built its reputation by recruiting UNIX wizards and gurus to write thoughtful, carefully edited, highly structured "Nutshell" handbooks on complex, esoteric topics such as COFF, DNS, BIND, and sendmail. WebMaster in a Nutshell marks a return to those roots, but arrives in a less civilized age, when the pace of change is frantic, and the competition for shelf space and mind-share intense.
Definitive Guide, and Gundavaram's CGI Programming on the World
Wide Web. As with other O'Reilly handbooks, there's wildlife on the cover, but this time it's an spider instead of some obscure vertebrate, and a rather nasty-looking rascal at that.
I suppose the relevance of the remaining two sections depends largely on what O'Reilly sees as the audience for this book. Certainly most people who call themselves WebMasters these days would have little occasion to turn to the HTTP and Server Management sections; routine operation of a commercial Web server rarely brings one into intimate contact with HTTP protocol issues, and the Server Management discussion is limited to UNIX HTTP daemons and O'Reilly's WebSite product for Windows NT and Windows-95.
After putting WebMaster in a Nutshell to the test in my own work environment over the last couple of months, my impression is that the authors tried to cover too much ground. The overall concept is reasonable, and the book is definitely useful in its present form, but I hope that O'Reilly will rethink the contents and audience carefully before releasing the next edition.--Dr. Dobb's Electronic Review of Computer Books
Read an Excerpt
Chapter 1: IntroductionThis book is a compilation of some fairly diverse reference material. What links these topics is that they are crucial knowledge for today's webmaster in a Unix environment.
In this chapter, we give the world's quickest introduction to web technology and the role of the webmaster who breathes life into each web document. if you want to learn more about the history of the Web, how to make your web pages "cool," the social impact of the Internet, or how to make money online, this is the wrong book..
This is a book by impatient writers for impatient readers. We're less interested in the hype of the Web than we are in what makes it actually tick. We'll leave it to the pundits to predict the future of the Web or to declare today's technology already outdated. Too much analysis makes our heads spin; we just want to get our web sites online.
The Web in a Nutshell
We've organized this book in a roughly "outside-in" fashion-that is, with the outermost layer (HTML) first and the innermost layer (the server itself) last. But since it's a good idea for all readers to know how everything fits together, let's take a minute to breeze through a description of the Web from the inside-out: no history, no analysis, just the technology basics.
Clients and Servers
The tool most people use on the Web is a browser, such as Netscape Navigator, Internet Explorer, Opera, Mosaic, or Lynx. Web browsers work by connecting over the Internet to remote machines, requesting specific documents, and then formatting the documents they receive for viewing on the local machine.
The language, or protocol, used for web transactions is Hypertext Transfer Protocol, or HTTP. The remote machines containing the documents run HTTP servers that wait for requests from browsers and then return the specified document. The browsers themselves are technically HTTP clients.
Uniform Resource Locators (URLs)
One of the most important things to grasp when working on the Web is the format for URLs. A URL is basically an address on the Web, identifying each document uniquely (for example, http://www.oreilly.com/products.html). Since URLs are so fundamental to the Web, we discuss them here in a little detail. The simple syntax for a URL is:
The host to connect to-e.g., www.oreilly.com or www.altavista.com. (While many web servers run on hosts beginning with www, the www prefix is just a convention.)
The document requested on that server. This is not the same as the filesystem. path, as its root is defined by the server.
Most URLs you encounter follow this simple syntax. A more generalized syntax, however, is:
The protocol that connects to the site. For web sites, the scheme is http: for FTP the scheme is figs.
extra-path-info and query-info
Optional information used by CGI programs. See Chapter 12, CGI Overview, for more information.
HTML documents also often use a "shorthand" for linking to other documents on the same server, called a relative URL. An example of a relative URL is images/webnut.gif. The browser knows to translate this into complete URL syntax before sending the request. For example, if http:www.oreilly.com/books/webnut.html contains a reference to images/webnut.gif, the browser reconstructs the relative URL as a full (or absolute) URL, http://www.oreilly.com/books/images/webnut.gif and requests that document independently (if needed).
Often in this book, you'll see us refer to a URI, not a URL. A URI (Universal Resource identifier) is a superset of URL, in anticipation of different resource naming conventions being developed for the Web. For the time being, however, the only URI syntax in practice is URL; so while purists might complain, you can safely assume that "URI" is synonymous with "URL" and not go wrong (yet).
While web documents can conceivably be in any format, the universal standard is Hypertext Markup Language (HTML), a language for creating formatted text interspersed with images, sounds, animation, and hypertext links to other documents anywhere on the Web. Chapter 2, HTML Overview, through Chapter 8, Color Names and Values, cover the most current version of HTML.
In 1996, a significant extension to HTML was developed in the form of Cascading Style Sheets (CSS). Cascading Style Sheets allow web site developers to associate a number of style-related characteristics (such as font, color, spacing, etc.) with a particular HTML tag. This enables HTML authors to create a consistent look and feel throughout a set of documents. Chapter 9, Cascading Style Sheets, provides an overview of and a reference to CSS.
While HTML remains the widespread choice for web site development, there is also an heir apparent called XML (Extensible Markup Language). XML is a metalanguage that allows you to define your own document tags. While XML's development remains highly volatile, Chapter 10, XML, gives you the basics.
The HTTP Protocol
In between clients and servers is the network, which uses TCP (Transmission Control Protocol) and IP (Internet Protocol) to transmit data and find servers and clients. On top of TCP/IP, clients and servers use the HTTP protocol to communicate. Chapter 17, HTTP, gives details on the HTTP protocol, which you must understand for writing CGI programs, server scripts, web administration, and just about any other part of working with a server.
The runaway leader among Unix-based web servers is Apache. Chapter 18, Apache Configuration, deals with configuring Apache, while Chapter 19, Apache Modules, discusses the various Apache modules. Regardless of the type of server you're running, there are various measures you can take to maximize its efficiency. Chapter 20, Server Performance, describes a number of these server optimization techniques....