XML in a Nutshell: A Desktop Quick Reference
714XML in a Nutshell: A Desktop Quick Reference
714eBook
Related collections and offers
Overview
If you're a developer working with XML, you know there's a lot to know about XML, and the XML space is evolving almost moment by moment. But you don't need to commit every XML syntax, API, or XSLT transformation to memory; you only need to know where to find it. And if it's a detail that has to do with XML or its companion standards, you'll find it--clear, concise, useful, and well-organized--in the updated third edition of XML in a Nutshell.With XML in a Nutshell beside your keyboard, you'll be able to:
- Quick-reference syntax rules and usage examples for the core XML technologies, including XML, DTDs, Xpath, XSLT, SAX, and DOM
- Develop an understanding of well-formed XML, DTDs, namespaces, Unicode, and W3C XML Schema
- Gain a working knowledge of key technologies used for narrative XML documents such as web pages, books, and articles technologies like XSLT, Xpath, Xlink, Xpointer, CSS, and XSL-FO
- Build data-intensive XML applications
- Understand the tools and APIs necessary to build data-intensive XML applications and process XML documents, including the event-based Simple API for XML (SAX2) and the tree-oriented Document Object Model (DOM)
Product Details
ISBN-13: | 9781449379049 |
---|---|
Publisher: | O'Reilly Media, Incorporated |
Publication date: | 09/23/2004 |
Series: | In a Nutshell (O'Reilly) |
Sold by: | Barnes & Noble |
Format: | eBook |
Pages: | 714 |
File size: | 7 MB |
About the Author
Elliotte Rusty Harold is an adjunct professor of computer science at Polytechnic University in Brooklyn, New York, where he lectures on object-oriented programming and XML. His Cafe con Leche Web site has become one of the most popular sites for information on XML. In addition, he is the author and coauthor of numerous books, the most recent of which are "The XML Bible" (John Wiley & Sons, 2001) and "XML in a Nutshell" (O'Reilly, 2002).
0321150406AB08272003
Means began his career as a software developer with Microsoft in 1988, joining the company at the age of 17. He is currently serving as President and CEO of Enterprise Web Machines, a South Carolina-based Internet software product and services company.
Read an Excerpt
Chapter 9: XPath
XPath is a non-XML language used to identify particular parts of XML documents. XPath lets you write expressions that refer to the document's firstperson
element, the seventh child element of
the third person
element, the ID
attribute of the first person
element whose contents are the string "Fred Jones," all xml-stylesheet
processing instructions in the document's
prolog, and so forth. XPath indicates nodes by position, relative position,
type, content, and several other criteria. XSLT uses XPath expressions to match
and select particular elements in the input document for copying into the output
document or further processing. XPointer uses XPath expressions to identify the
particular point in or part of an XML document that an XLink links to.
XPath expressions can also represent numbers, strings, or Booleans, so XSLT stylesheets carry out simple arithmetic for numbering and cross-referencing figures, tables, and equations. String manipulation in XPath lets XSLT perform tasks like making the title of a chapter uppercase in a headline, but mixed case in a reference in the body text.
The Tree Structure of an XML Document
An XML document is a tree made up of nodes. Some nodes contain other nodes. One root node ultimately contains all other nodes. XPath is a language for picking nodes and sets of nodes out of this tree. From the perspective of XPath, there are seven kinds of nodes:
- The root node
- Element nodes
- Text nodes
- Attribute nodes
- Comment nodes
- Processing instruction nodes
- Namespace nodes
Note the constructs not included in this list: CDATA sections, entity references, and document type declarations. XPath operates on an XML document after these items have merged into the document. For instance, XPath cannot identify the first CDATA section in a document or tell whether a particular attribute value was included directly in the source element start tag or merely defaulted from the declaration of the attribute in the DTD.
Consider the document in Example 9-1. This document exhibits all seven types of nodes. Figure 9-1 is a diagram of this document's tree structure....
...The XPath data model has several inobvious features. First, the
tree's root node is not the same as its root element.
The tree's root node contains the entire document, including the root element
and comments and processing instructions that occur before the root element
start tag or after the root element end tag. In Example 9-1, the root node contains the xml-stylesheet
processing instruction and the root element
people
.
The XPath data model does not include everything in the document.
In particular, the XML declaration and DTD are not
addressable via XPath. However, if the DTD provides default values for any
attributes, then XPath recognizes those attributes. The homepage
element has an xlink:type
attribute supplied by the DTD. Similarly, any
references to parsed entities are resolved. Entity references, character
references, and CDATA sections are not individually identifiable, though any
data they contain is addressable. For example, XSLT does not enable you to make
all text in CDATA sections bold because XPath doesn't know what text is and
isn't part of a CDATA section.
Finally, xmlns
attributes are reported
as namespace nodes. They are not considered attribute nodes, though a
non-namespace aware parser will see them as such. Furthermore these nodes are
attached to every element and attribute node for which that declaration has
scope. They are not just attached to the single element where the namespace is
declared.
Location Paths
The most useful XPath expression is a location path. A location path uses at least one location step to identify a set of nodes in a document. This set may be empty, contain a single node, or contain several nodes. These nodes can be element, attribute, namespace, text, comment, processing instruction, root nodes, or any combination of them.
The Root Location Path
The simplest location path is the one that selects the document's
root node. This path is simply the forward slash /
.
(You'll notice that a lot of XPath syntax was deliberately chosen to be similar
to the syntax used by the Unix shell. Here /
is the
root of a Unix filesystem and /
is the root node of
an XML document.) For example, this XSLT template uses the XPath pattern /
to match the entire input document tree and wrap it in an
html
element...
Table of Contents
- Preface
- Part I: XML Concepts
- Part I: XML Concepts
- Chapter 1: Introducing XML
- What XML Offers
- Portable Data
- How XML Works
- The Evolution of XML
- Portable Data
- Chapter 2: XML Fundamentals
- XML Documents and XML Files
- Elements, Tags, and Character Data
- Attributes
- XML Names
- Entity References
- CDATA Sections
- Comments
- Processing Instructions
- The XML Declaration
- Checking Documents for Well-Formedness
- Elements, Tags, and Character Data
- Chapter 3: Document Type Definitions
- Validation
- Element Declarations
- Attribute Declarations
- General Entity Declarations
- External Parsed General Entities
- External Unparsed Entities and Notations
- Parameter Entities
- Conditional Inclusion
- Two DTD Examples
- Locating Standard DTDs
- Element Declarations
- Chapter 4: Namespaces
- The Need for Namespaces
- Namespace Syntax
- How Parsers Handle Namespaces
- Namespaces and DTDs
- Namespace Syntax
- Chapter 5: Internationalization
- The Encoding Declaration
- Text Declarations
- XML-Defined Character Sets
- Unicode
- ISO Character Sets
- Platform-Dependent Character Sets
- Converting Between Character Sets
- The Default Character Set for XML Documents
- Character References
- xml:lang
- Text Declarations
- Part II: Narrative-Centric Documents
- Chapter 6: XML as a Document Format
- SGML's Legacy
- Narrative Document Structures
- TEI
- DocBook
- Document Permanence
- Transformation and Presentation
- Narrative Document Structures
- Chapter 7: XML on the Web
- XHTML
- Direct Display of XML in Browsers
- Authoring Compound Documents with Modular XHTML
- Prospects for Improved Web Search Methods
- Direct Display of XML in Browsers
- Chapter 8: XSL Transformations
- An Example Input Document
- xsl:stylesheet and xsl:transform
- Stylesheet Processors
- Templates
- Calculating the Value of an Element with xsl:value-of
- Applying Templates with xsl:apply-templates
- The Built-in Template Rules
- Modes
- Attribute Value Templates
- XSLT and Namespaces
- Other XSLT Elements
- xsl:stylesheet and xsl:transform
- Chapter 9: XPath
- The Tree Structure of an XML Document
- Location Paths
- Compound Location Paths
- Predicates
- Unabbreviated Location Paths
- General XPath Expressions
- XPath Functions
- Location Paths
- Chapter 10: XLinks
- Simple Links
- Link Behavior
- Link Semantics
- Extended Links
- Linkbases
- DTDs for XLinks
- Link Behavior
- Chapter 11: XPointers
- XPointers on URLs
- XPointers in Links
- Bare Names
- Child Sequences
- Points
- Ranges
- XPointers in Links
- Chapter 12: Cascading Stylesheets (CSS)
- The Three Levels of CSS
- CSS Syntax
- Associating Stylesheets with XML Documents
- Selectors
- The Display Property
- Pixels, Points, Picas, and Other Units of Length
- Font Properties
- Text Properties
- Colors
- CSS Syntax
- Chapter 13: XSL Formatting Objects (XSL-FO)
- XSL Formatting Objects
- The Structure of an XSL-FO Document
- Master Pages
- XSL-FO Properties
- Choosing Between CSS and XSL-FO
- The Structure of an XSL-FO Document
- Part III: Data-Centric Documents
- Chapter 14: XML as a Data Format
- Programming Applications of XML
- Describing Data
- Support for Programmers
- Describing Data
- Chapter 15: Programming Models
- Event- Versus Object-Driven Models
- Programming Language Support
- Non-Standard Extensions
- Transformations
- Processing Instructions
- Links and References
- Notations
- What You Get Is Not What You Saw
- Programming Language Support
- Chapter 16: Document Object Model (DOM)
- DOM Core
- DOM Strengths and Weaknesses
- Parsing a Document with DOM
- The Node Interface
- Specific Node Types
- The DOMImplementation Interface
- A Simple DOM Application
- DOM Strengths and Weaknesses
- Chapter 17: SAX
- The ContentHandler Interface
- SAX Features and Properties
- Part IV: Reference
- Chapter 18: XML 1:0 Reference
- How to Use This Reference
- Annotated Sample Documents
- Key to XML Syntax
- Well-Formedness
- Validity
- Global Syntax Structures
- DTD (Document Type Definition)
- Document Body
- XML Document Grammar
- Annotated Sample Documents
- Chapter 19: XPath Reference
- The XPath Data Model
- Datatype
- Location Paths
- Predicates
- XPath Functions
- Datatype
- Chapter 20: XSLT Reference
- The XSLT Namespace
- XSLT Elements
- XSLT Functions
- XSLT Elements
- Chapter 21: DOM Reference
- Object Hierarchy
- Object Reference
- Chapter 22: SAX Reference
- The org:xml:sax Package
- The org:xml:sax:helpers Package
- SAX Features and Properties
- The org:xml:sax:ext Package
- The org:xml:sax:helpers Package
- Chapter 23: Character Sets
- Character Tables
- HTML4 Entity Sets
- Other Unicode Blocks
- HTML4 Entity Sets
- Index