The Barnes & Noble Review
Trying to capture all of XML in one compact book is like trying to capture the universe moments after the Big Bang: Things are expanding in an awful hurry. But this tutorial and reference comes close.
Elliotte Rusty Harold and W. Scott Means organize XML in a Nutshell -- and by extension XML itself -- into four sections. The first offers clear, to-the-point explanations of the key concepts every XML user and developer needs to understand -- from elements and attributes to well-formedness, DTDs to namespaces. Section II covers the XML technologies most widely used in what the authors call "narrative-centric" documents -- from web pages through gigantic Defense Department documentation manuals. Here's where they introduce XHTML, basic XSL Transformations, XPath, Xlinks, Xpointers, Cascading Style Sheets, and the evolving XSL Formatting Objects specification.
Section III addresses XML's role as the data format of choice for Internet-based information sharing and storage. Harold and Means introduce XML programming models, then offer quick introductions to both DOM and SAX.
Like most O'Reilly ...in a Nutshell books, XML in a Nutshell ends with a comprehensive reference section, presenting syntax, descriptions, attributes, and in the case of DOM, Java bindings and example code.
There are a few things missing -- for example, coverage of SOAP, and of specialized XML applications such as MathML. But overall, the book succeeds admirably in its goals: to become your first-line source whenever you need to learn something new about XML. (Bill Camarda)
Bill Camarda is a consultant and writer with nearly 20 years' experience in helping technology companies deploy and market advanced software, computing, and networking products and services. His 15 books include Special Edition Using Word 2000 and Upgrading & Fixing Networks For Dummies®, Second Edition.
Read an Excerpt
Chapter 9: XPath
XPath is a non-XML language used to identify particular parts of
XML documents. XPath lets you write expressions that refer to the document's
person element, the seventh child element of
person element, the
ID attribute of the first
element whose contents are the string "Fred Jones," all
xml-stylesheet processing instructions in the document's
prolog, and so forth. XPath indicates nodes by position, relative position,
type, content, and several other criteria. XSLT uses XPath expressions to match
and select particular elements in the input document for copying into the output
document or further processing. XPointer uses XPath expressions to identify the
particular point in or part of an XML document that an XLink links to.
XPath expressions can also represent numbers, strings, or
Booleans, so XSLT stylesheets carry out simple arithmetic for numbering and
cross-referencing figures, tables, and equations. String manipulation in XPath
lets XSLT perform tasks like making the title of a chapter uppercase in a
headline, but mixed case in a reference in the body text.
The Tree Structure of an XML Document
An XML document is a tree made up of nodes. Some nodes contain
other nodes. One root node ultimately contains all other nodes. XPath is a
language for picking nodes and sets of nodes out of this tree. From the
perspective of XPath, there are seven kinds of nodes:
- The root node
- Element nodes
- Text nodes
- Attribute nodes
- Comment nodes
- Processing instruction nodes
- Namespace nodes
Note the constructs not included in this list: CDATA sections,
entity references, and document type declarations. XPath operates on an XML
document after these items have merged into the document. For instance, XPath
cannot identify the first CDATA section in a document or tell whether a
particular attribute value was included directly in the source element start tag
or merely defaulted from the declaration of the attribute in the DTD.
Consider the document in Example 9-1. This document exhibits all seven types
of nodes. Figure 9-1 is a diagram of
this document's tree structure....
...The XPath data model has several inobvious features. First, the
tree's root node is not the same as its root element.
The tree's root node contains the entire document, including the root element
and comments and processing instructions that occur before the root element
start tag or after the root element end tag. In Example 9-1, the root node contains the
xml-stylesheet processing instruction and the root element
The XPath data model does not include everything in the document.
In particular, the XML declaration and DTD are not
addressable via XPath. However, if the DTD provides default values for any
attributes, then XPath recognizes those attributes. The
homepage element has an
xlink:type attribute supplied by the DTD. Similarly, any
references to parsed entities are resolved. Entity references, character
references, and CDATA sections are not individually identifiable, though any
data they contain is addressable. For example, XSLT does not enable you to make
all text in CDATA sections bold because XPath doesn't know what text is and
isn't part of a CDATA section.
xmlns attributes are reported
as namespace nodes. They are not considered attribute nodes, though a
non-namespace aware parser will see them as such. Furthermore these nodes are
attached to every element and attribute node for which that declaration has
scope. They are not just attached to the single element where the namespace is
The most useful XPath expression is a location
path. A location path uses at least one location step to identify a set of
nodes in a document. This set may be empty, contain a single node, or contain
several nodes. These nodes can be element, attribute, namespace, text, comment,
processing instruction, root nodes, or any combination of them.
The Root Location Path
The simplest location path is the one that selects the document's
root node. This path is simply the forward slash
(You'll notice that a lot of XPath syntax was deliberately chosen to be similar
to the syntax used by the Unix shell. Here
/ is the
root of a Unix filesystem and
/ is the root node of
an XML document.) For example, this XSLT template uses the XPath pattern
/ to match the entire input document tree and wrap it in an