Uh-oh, it looks like your Internet Explorer is out of date.

For a better shopping experience, please upgrade now.

XML in a Nutshell: A Desktop Quick Reference

XML in a Nutshell: A Desktop Quick Reference

by Elliotte Rusty Harold, W. Scott Means

XML, the Extensible Markup Language, is a W3C endorsed standard for document markup. Because of its ability to deliver portable data, XML is positioned to be a key web application technology.Given the complexity and incredible potential of this powerful markup language, it is clear that every serious developer using XML for data or text formatting and


XML, the Extensible Markup Language, is a W3C endorsed standard for document markup. Because of its ability to deliver portable data, XML is positioned to be a key web application technology.Given the complexity and incredible potential of this powerful markup language, it is clear that every serious developer using XML for data or text formatting and transformation will need a comprehensive, easy-to- access desktop reference in order to take advantage of XML's full potential. XML in a Nutshell will assist developers in formatting files and data structures correctly for use in XML documents.XML defines a basic syntax used to mark up data with simple, human-readable tags, and provides a standard format for computer documents. This format is flexible enough to be customized for transforming data between applications as diverse as web sites, electronic data inter-change, voice mail systems, and wireless devices, to name a few.Developers can either write their own programs that interact with, massage, and manipulate the data in XML documents, or they can use off-the-shelf software like web browsers and text editors to work with XML documents. Either choice gives them access to a wide range of free libraries in a variety of languages that can read and write XML.The XML specification defines the exact syntax this markup must follow: how elements are delimited by tags, what a tag looks like, what names are acceptable for elements, where attributes are placed, and so forth. XML doesn't have a fixed set of tags and elements that are supposed to work for everybody in all areas of interest for all time. It allows developers and writers to define the elements they need as they need them.Although XML is quite flexible in the elements it allows to be defined, it is quite strict in many other respects. XML in a Nutshell covers the fundamental rules that all XML documents and authors must adhere to, detailing the grammar that specifies where tags may be placed, what they must look like, which element names are legal, how attributes attach to elements, and much more.

Editorial Reviews

The Barnes & Noble Review
Trying to capture all of XML in one compact book is like trying to capture the universe moments after the Big Bang: Things are expanding in an awful hurry. But this tutorial and reference comes close.

Elliotte Rusty Harold and W. Scott Means organize XML in a Nutshell -- and by extension XML itself -- into four sections. The first offers clear, to-the-point explanations of the key concepts every XML user and developer needs to understand -- from elements and attributes to well-formedness, DTDs to namespaces. Section II covers the XML technologies most widely used in what the authors call "narrative-centric" documents -- from web pages through gigantic Defense Department documentation manuals. Here's where they introduce XHTML, basic XSL Transformations, XPath, Xlinks, Xpointers, Cascading Style Sheets, and the evolving XSL Formatting Objects specification.

Section III addresses XML's role as the data format of choice for Internet-based information sharing and storage. Harold and Means introduce XML programming models, then offer quick introductions to both DOM and SAX.

Like most O'Reilly ...in a Nutshell books, XML in a Nutshell ends with a comprehensive reference section, presenting syntax, descriptions, attributes, and in the case of DOM, Java bindings and example code.

There are a few things missing -- for example, coverage of SOAP, and of specialized XML applications such as MathML. But overall, the book succeeds admirably in its goals: to become your first-line source whenever you need to learn something new about XML. (Bill Camarda) (Bill Camarda)

Bill Camarda is a consultant and writer with nearly 20 years' experience in helping technology companies deploy and market advanced software, computing, and networking products and services. His 15 books include Special Edition Using Word 2000 and Upgrading & Fixing Networks For Dummies®, Second Edition.

Product Details

O'Reilly Media, Incorporated
Publication date:
O'Reilly Nutshell Series
Edition description:
Older Edition
Product dimensions:
6.01(w) x 8.99(h) x 1.01(d)

Read an Excerpt

Chapter 9: XPath

XPath is a non-XML language used to identify particular parts of XML documents. XPath lets you write expressions that refer to the document's first person element, the seventh child element of the third person element, the ID attribute of the first person element whose contents are the string "Fred Jones," all xml-stylesheet processing instructions in the document's prolog, and so forth. XPath indicates nodes by position, relative position, type, content, and several other criteria. XSLT uses XPath expressions to match and select particular elements in the input document for copying into the output document or further processing. XPointer uses XPath expressions to identify the particular point in or part of an XML document that an XLink links to.

XPath expressions can also represent numbers, strings, or Booleans, so XSLT stylesheets carry out simple arithmetic for numbering and cross-referencing figures, tables, and equations. String manipulation in XPath lets XSLT perform tasks like making the title of a chapter uppercase in a headline, but mixed case in a reference in the body text.

The Tree Structure of an XML Document

An XML document is a tree made up of nodes. Some nodes contain other nodes. One root node ultimately contains all other nodes. XPath is a language for picking nodes and sets of nodes out of this tree. From the perspective of XPath, there are seven kinds of nodes:

  • The root node

  • Element nodes

  • Text nodes

  • Attribute nodes

  • Comment nodes

  • Processing instruction nodes

  • Namespace nodes

Note the constructs not included in this list: CDATA sections, entity references, and document type declarations. XPath operates on an XML document after these items have merged into the document. For instance, XPath cannot identify the first CDATA section in a document or tell whether a particular attribute value was included directly in the source element start tag or merely defaulted from the declaration of the attribute in the DTD.

Consider the document in Example 9-1. This document exhibits all seven types of nodes. Figure 9-1 is a diagram of this document's tree structure....

...The XPath data model has several inobvious features. First, the tree's root node is not the same as its root element. The tree's root node contains the entire document, including the root element and comments and processing instructions that occur before the root element start tag or after the root element end tag. In Example 9-1, the root node contains the xml-stylesheet processing instruction and the root element people.

The XPath data model does not include everything in the document. In particular, the XML declaration and DTD are not addressable via XPath. However, if the DTD provides default values for any attributes, then XPath recognizes those attributes. The homepage element has an xlink:type attribute supplied by the DTD. Similarly, any references to parsed entities are resolved. Entity references, character references, and CDATA sections are not individually identifiable, though any data they contain is addressable. For example, XSLT does not enable you to make all text in CDATA sections bold because XPath doesn't know what text is and isn't part of a CDATA section.

Finally, xmlns attributes are reported as namespace nodes. They are not considered attribute nodes, though a non-namespace aware parser will see them as such. Furthermore these nodes are attached to every element and attribute node for which that declaration has scope. They are not just attached to the single element where the namespace is declared.

Location Paths

The most useful XPath expression is a location path. A location path uses at least one location step to identify a set of nodes in a document. This set may be empty, contain a single node, or contain several nodes. These nodes can be element, attribute, namespace, text, comment, processing instruction, root nodes, or any combination of them.

The Root Location Path

The simplest location path is the one that selects the document's root node. This path is simply the forward slash /. (You'll notice that a lot of XPath syntax was deliberately chosen to be similar to the syntax used by the Unix shell. Here / is the root of a Unix filesystem and / is the root node of an XML document.) For example, this XSLT template uses the XPath pattern / to match the entire input document tree and wrap it in an html element...

Meet the Author

Elliotte Rusty Harold is originally from New Orleans to which he returns periodically in search of a decent bowl of gumbo. However, he currently resides in the Prospect Heights neighborhood of Brooklyn with his wife Beth and dog Thor. He's a frequent speaker at industry conferences including Software Development, Dr. Dobb's Architecture & Design World, SD Best Practices, Extreme Markup Languages, and too many user groups to count. His open source projects include the XOM Library for processing XML with Java and the Amateur media player.

Customer Reviews

Average Review:

Post to your social network


Most Helpful Customer Reviews

See all customer reviews