Read an Excerpt
HTML, XHTML, and CSS Bible
By Bryan Pfaffenberger Steven M. Schafer Chuck White Bill Karow
John Wiley & Sons ISBN: 0-7645-5739-4
Chapter One In This Chapter
Introducing the World Wide Web
How the Web Works
Where HTML Fits in
Creating an HTML Document
Introducing the Web and HTML
This chapter addresses the questions most people have when they're getting started with HTML/XHTML, such as what is the difference between HTML and XHTML, and when do Cascading Style Sheets (CSS) come into play? If you're already familiar with the basic concepts discussed here, you can get started with practical matters in Chapter 2. Still, I encourage you to at least skim this chapter, making sure you understand the very important distinction between structure and presentation (see What Is CSS?) and how HTML, XML, and XHTML are related (see What Is XHTML?).
What Is the World Wide Web?
The World Wide Web-the Web, for short-is a network of computers able to exchange text, graphics, and multimedia information via the Internet. By sitting at a computer that is attached to the Web, using either a dialup phone line or a much faster broadband (Ethernet, cable, or DSL connection), you can visit Web-connected computers next door, at a nearby university, or halfway around the world. And you can take full advantage of the resources these computers make available, including text, graphics, videos, sounds, and animation. Think of the Web as the multimedia version of the Internet, and you'll be right on the mark.
How Does the Web Work?
The computers that make all these Web pages available are called Web servers. On any computer that's connected to the Web, you can run an application called a Web browser. Technically, a Web browser is called a Web client-that is, a program that's able to contact a Web server and request information. When the Web server receives the requested information, it looks for this information within its file system, and sends out the requested information via the Internet.
They all speak a common "language," called HyperText Transfer Protocol (HTTP). (HTTP isn't really a language like the ones people speak. It's a set of rules or procedures, called protocols, that enables computers to exchange information over the Web.) Regardless of where these computers reside-China, Norway, or Austin, Texas-they can communicate with each other through HTTP.
The following illustrates how HTTP works (see Figure 1-1):
* Most Web pages contain hyperlinks, which are specially formatted words or phrases that enable you to access another page on the Web. Although the hyperlink usually doesn't make the address of this page visible, it contains all the information needed for your computer to request a Web page from another computer.
* When you click the hyperlink, your computer sends a message called an HTTP request. This message says, in effect, "Please send me the Web page that I want."
* The Web server receives the request, and looks within its stored files for the Web page you requested. When it finds the Web page, it sends it to your computer, and your Web browser displays it. If the page isn't found, you see an error message, which probably includes the HTTP code for this error: 404, "Not Found."
What Is Hypertext?
You probably noticed the word "hypertext" in the spelled-out version of HTTP, Hypertext Transfer Protocol. Originated by computing pioneer Theodore Nelson, the term "hypertext" doesn't mean "text that can't sit still," although some Web authors do use a much-despised HTML code that makes the text blink on-screen. Instead, the term is an analogy to a time-honored (but physically impossible) science fiction concept, the hyperspace jump, which enables a starship to go immediately from one star system to another. Hypertext is a type of text that contains hyperlinks (or just links for short), which enable the reader to jump from one hypertext page to another. You may also hear the word hypermedia. A hypermedia system works just like hypertext, except that it includes graphics, sounds, videos, and animation as well as text.
In contrast to ordinary text, hypertext gives readers the ability to choose their own path through the material that interests them. A book is designed to be read in sequence: Page 2 follows page 1, and so on. Sure, you can skip around, but books don't provide much help, beyond including an index. Computer-based hypertexts let readers jump around all they want. The computer part is important because it's hard to build a hypertext system out of physical media, such as index cards or pieces of paper.
The Web is a giant computer-based hypermedia system, and you've probably already done lots of jumping around from one page to another on the Web-it's called surfing. If one Web page doesn't seem all that interesting once you visit, you can click another link that seems more related to your needs (and so on). The Web makes surfing so easy that you'll need to give some thought to keeping people on your sites-keeping them engaged and interested-so they won't surf away!
Where Does HTML Fit In?
Hypertext Markup Language (HTML) enables you to mark up text so that it can function as hypertext on the Web. The term markup comes from printing; editors mark up manuscript pages with funny-looking symbols that tell the printer how to print the page. HTML consists of its own set of funny-looking symbols that tell Web browsers how to display the page. These symbols, called elements, include the ones needed to create hyperlinks.
The invention of HTML
HTML and HTTP were both invented by Tim Berners-Lee, who was then working as a computer and networking specialist at a Swiss research institute. He wanted to give the Institute's researchers a simple markup language, which would enable them to share their research papers via the Internet. Berners-Lee based HTML on Standard Generalized Markup Language (SGML), an international standard for marking up text for presentation on a variety of physical devices. The basic idea of SGML is that the document's structure should be separated from its presentation:
* Structure refers to the various components or parts of a document that authors create, such as titles, paragraphs, headings, and lists. For example, you're reading an item in an unordered list, as it is termed in SGML (most people use the more familiar bulleted list). In SGML, you mark up this item as a bulleted list, but you don't say anything about how it's supposed to look. That's left up to whatever device displays or prints the marked-up file.
* Presentation refers to the way these various components are actually displayed by a given media device, such as a computer or a printer. For example, this book displays this bulleted list item with an indentation and other special formatting.
What's so great about separating structure from presentation? There are several very important advantages:
* Authors usually aren't very good designers. It's wise, especially in large organizations, to let writers compose their documents, and let designers worry about how the documents are supposed to look. That's particularly true when an organization has a corporate look or style, such as Apple Computer's standard typeface, which you'll see in all of its documents. The designers make sure that every document produced within the organization conforms to that style. So SGML doesn't contain any features that control presentation.
* If markup consists of structure alone, the document's appearance can be changed quickly. All that's necessary is to change the presentation settings on whatever device is displaying the document.
* Documents containing only structural markup are much easier and cheaper to maintain. When presentation markup is included along with structural markup, the document becomes an unmanageable mess, and maintenance costs skyrocket.
* If a document contains only structural markup, it is more accessible to people with limited vision or other physical limitations. For example, a document marked up structurally might be presented by a Braille printer for those with limited vision, or by a text reader for those with limited hearing.
Sounds great, right? Still, from the beginning, HTML didn't make the structure versus presentation distinction as clearly as SGML purists would have liked. And as HTML developed and the Internet became a commercial network, Web authors demanded more tools to make their documents look attractive on-screen. The companies that make Web browsers responded by introducing new, nonstandardized HTML elements that contained presentation information. By 1996, many Web experts were worried that HTML standards were spiraling out of control. The newly founded World Wide Consortium, hoping to keep at least some kind of standard in place, tried to standardize existing practices, including the use of presentation and structure. The result was the W3C's HTML 3.2 standard, which is still widely used. But organizations found that HTML 3.2 exposed them to excessive maintenance costs. The SGML purists were right: Structure and presentation should have been kept separate.
A short history of HTML
To date, HTML has gone through four major standards, including the latest 4.01. In addition to the HTML standards, Cascading Style Sheets and XML have also provided valuable contributions to Web standards.
The following sections provide a brief overview of the various versions and technologies.
HTML 1.0 was never formally specified by the W3C because the W3C came along too late. HTML 1.0 was the original specification Mosaic 1.0 used, and it supported few elements. What you couldn't do on a page is more interesting than what you could do. You couldn't set the background color or background image of the page. There were no tables or frames. You couldn't dictate the font. All inline images had to be GIFs; JPEGs were used for out-of-line images. And there were no forms.
Every page looked pretty much the same: gray background and Times Roman font. Links were indicated in blue until you'd visited them, and then they were red. Because scanners and image-manipulation software weren't as available then, the image limitation wasn't a huge problem. HTML 1.0 was only implemented in Mosaic and Lynx (a text-only browser that runs under UNIX).
Huge strides forward were made between HTML 1.0 and HTML 2.0. An HTML 1.1 actually did exist, created by Netscape to support what its first browser could do. Because only Netscape and Mosaic were available at the time (both written under the leadership of Marc Andreesen), browser makers were in the habit of adding their own new features and creating names for HTML elements to use those features.
Between HTML 1.0 and HTML 2.0, the W3C also came into being, under the leadership of Tim Berners-Lee, founder of the Web. HTML 2.0 was a huge improvement over HTML 1.0. Background colors and images could be set. Forms became available with a limited set of fields, but nevertheless, for the first time, visitors to a Web page could submit information. Tables also became possible.
Why no 3.0? The W3C couldn't get a specification out in time for agreement by the members. HTML 3.2 was vastly richer than HTML 2.0. It included support for style sheets (CSS level 1). Even though CSS was supported in the 3.2 specification, the browser manufacturers didn't support CSS well enough for a designer to make much use of it. HTML 3.2 expanded the number of attributes that enabled designers to customize the look of a page (exactly the opposite of HTML 4). HTML 3.2 didn't include support for frames, but the browser makers implemented them anyway.
A page with two frames is actually processed like three separate pages within your browser. The outer page is the frameset. The frameset indicates to the browser, which pages go where in the browser window. Implementing frames can be tricky, but frames can also be an effective way to implement a Web site. A common use for frames is navigation in the left pane and content in the right.
What does HTML 4.0 add? Not so much new elements-although those do exist-as a rethinking of the direction HTML is taking. Up until now, HTML has encouraged interjecting presentation information into the page. HTML 4.0 now clearly deprecates any uses of HTML that relate to forcing a browser to format an element a certain way. All formatting has been moved into the style sheets. With formatting information strewn throughout the pages, HTML 3.2 had reached a point where maintenance was expensive and difficult. This movement of presentation out of the document, once and for all, should facilitate the continued rapid growth of the Web.
Use the W3C's MarkUp Validation Service, available at validator .w3.org/, to check your HTML against most of the versions mentioned in this chapter.
Extensible Markup Language (XML) was originally designed to meet the needs of large-scale electronic publishing. As such, it was designed to help separate structure from presentation and provide enough power and flexibility to be applicable in a variety of publishing applications. In fact, many modern word processing programs contain XML components or even export their documents in XML-compliant formats.
CSS 1.0 and 2.0
Cascading Style Sheets (CSS) were designed to help move formatting out of the HTML specification. Much like styles in a word processing program, CSS provides a mechanism to easily specify and change formatting without changing the underlying code. The "cascade" in the name comes from the fact that the specification allows for multiple style sheets to interact, allowing individual Web documents to be formatted slightly different from their kin (following department document guidelines but still adhering to the company standards, for example). The second version of CSS (2.0) builds on the capabilities of the first version, adding more attributes and properties for a Web designer to draw upon.
HTML 4.01 is a minor revision of the HTML 4.0 standard. In addition to fixing errors identified since the inception of 4.0, HTML 4.01 also provides the basis for meanings of XHTML elements and attributes, reducing the size of the XHTML 1.0 specification.
Extensible HyperText Markup Language (XHTML) is the first specification for the HTML and XML cross-breed. XHTML was created to be the next generation of markup languages, infusing the standard of HTML with the extensibility of XML.
Excerpted from HTML, XHTML, and CSS Bible by Bryan Pfaffenberger Steven M. Schafer Chuck White Bill Karow Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.