Read an Excerpt
IntroductionWhy this book is needed
”I am left with the feeling that all of the sites I have created are 50% elegance, and 50% nasty kludge.”
This quote from a recent Slashdot discussion on PHP development resonated with the audience. Indeed, as many would attest, a web site usually starts simple but quickly grows into a complex, convoluted mess — where you are afraid of making a change for fear of breaking something else.
Why should a web site so quickly become a nightmare of unmaintainable code, visual and semantic inconsistencies, and outright errors embarrassingly visible to the whole world? Many reasons could be quoted, from limitations specific to the particular web development platforms (such as PHP, ASP, or Perl) to fundamental drawbacks of the ”web site as an application” paradigm.
This book is devoted to one very important way in which the majority of today’s web sites are broken — and, of course, to the technology which (if correctly applied) can mend this breakage. The problem I’m speaking about is the lack of a consistently semantic and media-independent representation of web site content; the technology that can help you solve this problem is
Say what you mean.
say what you mean in content markup.
The word content is the key.
In fact, a lot of the approaches in this book apply not only to web sites but to any
as such, abstracting it not only from its presentation but from any processing requirements as well.What you will find in this book
This is not a general
- structuring your web site content into cleanly separated semantic layers;
- developing a custom
- automatically validating both markup and structure of content;
- transforming content from source
- integrating the content markup and transformation system with existing web development frameworks and other software.
Building the backbone. The point of using
content from presentation; the above items cover the complete transition from the former to the latter. Simply put, we focus on developing the best source markup for your content and programming the most efficient transformation into your chosen presentation style.
Usability and portability. In a web development context, the term usability normally refers to how easy to use a web site is for a visitor. In this book, however, I would like to redefine this term by focusing on a different aspect of usability that is too often ignored — usability of a web site for its developers, authors, editors, and maintainers. With the Web growing more and more collaborative, this aspect is becoming critical.
Another important theme of the book is portability. Again, this term usually describes a web site’s viewability and functionality across browsers and platforms. It’s not less important, however, that before a web site gets to your browser, it must be developed and authored — often in different environments and on different platforms. We touch on this server-side aspect of portability with regard to the
Who this book is for
Everyone interested in web development or in practical
You need to have a basic knowledge of
How this book is organized
Perhaps you’ve already thumbed through the book, so you might have noticed that it breaks into three main parts. The first part is composed mostly of text and diagrams; the second features lots of example code; the last displays a number of screenshots. This sequence metaphorically reflects the path that we’ll follow from manipulating abstract notions, to writing practical markup and code, to launching and maintaining a final working web site.
Chapter 1 is mostly theoretical; we’ll spend some time discussing the basic premises of
The topics of this chapter include the principles behind using
Chapter 2 is dedicated to the foundation of an
source definition. This includes schemas for all document types used by the web site’s
In this chapter, we’ll look at different schema languages and discuss the implementation options for those parts of the source definition that a schema cannot handle. We will also examine the common generic markup constructs, the best approaches to their schematization, and a number of corresponding pitfalls. For instance, in this chapter you’ll find insights into the eternal ”child elements vs. attributes” dilemma.
Chapter 3 is the practical complement to the previous chapter. Here, we’ll use the approaches of Chapter 2 to mark up some real web site source documents. Most common elements of web pages, such as text blocks, headings, links, and images, are considered. In most cases, existing standardized vocabularies that you can borrow from are mentioned.
Some important concepts of the book, such as abbreviating addresses, are introduced in this chapter. This is also the chapter where markup examples start appearing in large numbers, so if you prefer to learn by looking at examples, you might want to start your reading from this chapter. The last section of the chapter presents summary examples of a page document, a master document, and a Schematron schema that validates both types of documents.
Chapter 4 is the first of the two XSLT chapters. It is an introduction aimed at a developer who has had some experience with XSLT 1.0. Here, we’ll discuss some of the new stuff that is being introduced in XSLT 2.0 and XPath 2.0 as well as the existing XSLT extensions. A detailed analysis is devoted to the important issue of adapting traditional algorithms to XSLT, which is a functional language without an assignment operator.
Chapter 5 is the core of the book — the practical XSLT chapter and the largest of all of them. Lots of XSLT code examples show all aspects of an
This chapter not only uses but extends XSLT. We’ll see how a few simple Java classes may drastically advance the capabilities of an XSLT stylesheet. These extensions are used in Chapter 5 for all kinds of tasks, from generating bitmap images via SVG to batch processing all page documents of a site. Again, a section with complete listings of the stylesheet and related bits of code summarizes this chapter.
Chapter 6 is where the screenshots are. It is devoted to all kinds of software that will help you run your
Sections of this chapter discuss the existing
Chapter 7 is concerned with integrating the
Designing your own book is a mindbending experience (something that songwriters who author both music and lyrics would probably agree with). In my book, I tried to make the text look rich but consistent, pleasantly dense but varied. Some of the solutions that I came up with may deserve a few words.
Running in from aside. Three levels of numbered headings are used within each chapter. In addition, unnumbered bold run-in headings are often used (as in this paragraph) to break the text into even smaller, manageable pieces.
Semantically, the run-ins are closer to margin notes than headings; usually their goal is not to state the subject but to provide a remark, an aside, a metaphor related to the topic of discussion. Hopefully these run-in headings are memorable enough to serve as landmarks facilitating navigation.
Small but not least. Some paragraphs, with or without run-in headings, are set in a smaller type. They present material that may be skipped in the first reading without any damage to understanding. You can treat the smallertype fragments as extended footnotes or sidebars.
Cross-references. Bold gray numbers (such as 3.9) refer to numbered sections of the book. The running headers and footers should make it easy to find the referenced sections; however, for references that jump especially far, page numbers are also provided.
Syntax coloring without colors. Unlike most computer books with code listings, this one makes use of a concept that has long been commonplace in text editors: syntax coloring. Of course, a black-andwhite book page is not really capable of color (except for shades of gray), but instead it can freely use font faces that usually look nicer on paper than on a computer screen. Thus, I have attempted to make code in the book at least as readable as it is in a good text editor by consistently ”coloring” syntactic constructions with different font faces.
Essential URLs. All web addresses are given in footnotes in an abbreviated form without http://, index.html, or trailing slashes.
Slash what? I use forward slashes (/) and not backslashes (\) as directory separators for both Windows and Unix (the latter including Mac OS X). The rationale is simple: Forward slashes are standard on Unix and in URLs, and most Windows tools understand both kinds of slashes anyway.Notes on terminology
The terminology used in the book is basically standard. Sometimes I simplify the accepted terminology in order to make it more accessible, or I use my own terms instead of those used in authoritative sources; all such cases are noted. Some important terms that may appear confusing or are often misunderstood are commented on below.
Element type, element, or tag? When speaking of
element and an element type. Sometimes, a tag is also confused with an element. Note that an element cannot have a name — only an element type can; still, we can refer to an element by its element type name if we identify which of the elements of this type is in question. In the XSLT context, an element from the XSLT namespace (e.g., xsl:template) is often called an instruction.
Stylesheet or transformation? The word stylesheet may be misleading when applied to an XSLT program that transforms one
transformation would be more appropriate. (Note that xsl:stylesheet and xsl:transform are both acceptable as the root element of an XSLT stylesheet.) Still, backed by tradition, I mostly use ”stylesheet” or, sometimes, ”transformation stylesheet” when referring to the XSLT component of a web site setup.
Stylesheet or style sheet? To avoid confusion with XSLT stylesheets, CSS style sheets are always spelled thus; this is conformant with both XSLT and CSS specifications.
Document, instance, page, or file? Document is a generic term, but I use it only to refer to
pages. Instance is another term often used in
file; a document is not necessarily stored in a file at all. Therefore, “file” is used only when real files, handled by the operating system, are involved.
Document element or root element? The XSLT specification uses the term document element with the meaning of root element. I use the latter term as more descriptive, even though it may be slightly confusing from an XSLT viewpoint because the “root node” of XPath (/) is the parent of the node corresponding to the “root element” (e.g., /page).
is the W3C recommendation for a schema language. Unfortunately, its name is way too generic for its own good. Even the capital S in “Schema” cannot prevent confusion when you have to speak about
XSDL schemas to refer to specific schema definitions.
Yet another abbreviation you may have seen used for the same language is WXS, standing for W3C
URI or URL? This one may confuse even experts at times. URI is a more general term than URL, but the difference between them — i.e., those URIs that are not URLs — is so insignificant that for practical purposes, these terms are interchangeable. See RFC 2396 for more details.
HTML or XHTML? Since this book views HTML mostly as a result of an XSLT transformation, what I mean when speaking of HTML may actually be either HTML or XHTML (any versions). With XSLT, you can output both formats, and modern browsers do not have any problems with either. When there’s a meaningful distinction between HTML and XHTML, this is noted.
“Data is” or “data are”? Formally, data is the plural of datum. In modern English, however, using “data” as singular is more common, as evidenced by statistics reported by Internet search engines. In this book “data” is used as singular.How this book was created
“Practice what you preach.” “Eat your own dogfood.” One way or the other, this book itself uses many of the techniques it describes.
The text of the book was written directly in
The design for the book was also created by me, with elements borrowed from the other books in the series that we worked on using the same