The XML Handbook

The XML Handbook

by Charles F. Goldfarb, Paul Prescod

Paperback(Older Edition)

$42.08 $49.99 Save 16% Current price is $42.08, Original price is $49.99. You Save 16%. View All Available Formats & Editions

Temporarily Out of Stock Online

Eligible for FREE SHIPPING

Overview

The XML Handbook by Charles F. Goldfarb, Paul Prescod

The XML Handbook is the definitive entry point to XML for Web professionals-content developers, managers, and programmers-but you needn't be a programmer to read it. Although XML, like HTML, is derived from SGML (which was invented by one of the authors), XML has so many more uses than HTML that an XML book must be much more than a markup tutorial.

There are three major divisions:

  1. A 64-page non-technical introduction to XML. It covers the reasons for XML, how it is affecting the Web, and how XML is used in the real world. Just enough of the language is taught in this section for the reader to understand the next section.
  2. 358 pages of detailed descriptions of the full range of XML applications: three-tier Web applications, data interchange, content management, e-commerce, Web publishing, extended linking, etc. ... plus walk-throughs of XML tools. All of these are illustrated extensively with screen shots and examples.
  3. 130 pages of tutorials on XML, XLink, and XSL, plus 180 pages of other technical information. The tutorials are fun and friendly, but comprehensive, precise, and technically accurate.

The book includes a CD-ROM with 55 no-time-limit XML freeware programs, trial versions of major XML products, and all the XML-related specs.

Product Details

ISBN-13: 9780130550682
Publisher: Pearson Education
Publication date: 10/30/2000
Series: Definitive XML Series from Charles F. Goldfarb Series
Edition description: Older Edition
Pages: 1056
Product dimensions: 6.94(w) x 9.11(h) x 1.58(d)

About the Author


Charles F. Goldfarb is the father of markup languages, a term that he coined in 1970. He is the inventor of SGML, the International Standard on which both XML and HTML are based. Find him on the Web at www.xmltimes.com

Paul Prescod is a leading XML software developer for ActiveState and a member of the W3C group that developed XML.

Read an Excerpt

XML in the real world

  • Real-world concepts
  • Application scenarios
  • Case studies
  • XML tools
  • Jargon demystified

Applications are the reason for using technology, so it makes sense to get a good idea of what XML is used for before digging into the details of the language.

And since XML may be somewhat different from the technologies that you are accustomed to using, it is also helpful to see how people actually work with it; how the tools are used.

Were going to cover those subjects at length in the next three parts of the book. In preparation, we need to examine some often elusive but vital concepts relating to real-world use of XML.

4.1 | Is XML for documents or for data? What is a document?
My dictionary says:
Something written, inscribed, engraved, etc., which provides evidence or information or serves as a record. Documents come in all shapes and sizes and media, as you can see in Figure 4-1. Here are some you may have encountered:

  • Long documents: books, manuals, product specifications
  • Broadsides: catalog sheets, posters, notices
  • Forms: registration, application, etc.
  • Letters: email, memos
  • Records: Acme Co., Part# 732, reverse widget, $32.50, 5323 in stock
  • Messages: job complete, update accepted

An e-commerce transaction, such as a purchase, might involve several of these. A buyer could start by sending several documents to a vendor:

  • Covering note: a letter
  • Purchase order: a form
  • Attached product specification: a long document

The vendor might respond with several more documents:

  • Formal acknowledgment: a message
  • Thank you note: a letter
  • Invoice: a form

The beauty of XML is that the same software can process all of this diversity. Whatever you can do with one kind of document you can do with all the others. The only time you need additional tools is when you want to do different kinds of things not when you want to work with different kinds of documents. And there are lots of things that you can do.

4.2 | Endless spectrum of application opportunities Sorry about that, weve been reading too many marketing brochures. But its true, nevertheless. At one end of the spectrum we have the grand old man of generalized markup, POP Presentation-Oriented Publishing. You can see him in Figure 4-2.

At the other end of the spectrum is that darling of the data processors, MOM Message-Oriented Middleware. She smiles radiantly from Figure 4-3.

Lets take a closer look at both of them.

4.2.1 Presentation-oriented publishing
POP was the original killer app for SGML, XMLs parent, because it saves so much money for enterprises with Web-sized document collections.

POP documents are chiefly written by humans for other humans to read. Instead of creating formatted renditions, as in word processors or desktop publishing programs, XML POP users create unformatted abstractions. That means the document file captures what is in the document, but not how it is supposed to look.

To get the desired look, the POP user creates a stylesheet, a set of commands that tell a program how to format (and/or otherwise process) the document. The power of XML in this regard is that you dont need to choose just one look you can have a separate stylesheet for every purpose. At a minimum, you might want one for print, one for CD-ROM, and another for a Web site.

POP documents tend to be (but neednt be) long-lived, large, and with complex structures. When delivered in electronic media, they may be interactive. How they will be rendered is of great importance, but, because XML is used, the rendition information can be and is kept distinct from the abstract data.

4.2.2 Message-oriented middleware MOM is the killer app actually, a technology that drives lots of killer apps for XML on the Web.

Middleware, as you might suspect from the name, is software that comes between two other programs. It acts like your interpreter/guide might if you were to visit someplace where you couldnt speak the language and had no idea of the local customs. It talks in the native tongue, using the native customs, and translates the native replies the messages into your language.

MOM documents are chiefly generated by programs for other programs to read.

Instead of writing specialized programs (clients) to access particular databases or other data sources (servers), XML MOM users break the old two-tier client/server model. They introduce a third tier, the middle tier, that acts as a data integrator. The middle-tier server does all the talking to the data sources and sends their messages in XML to the client.

That means the client can read data from anywhere, but only has to understand data that is in XML documents. The XML markup provides information about the data (i.e., metadata) that was in the original data source schema, like the database table name and field names (also called cell or column names).

The MOM user typically doesn't care much about rendition. He does care, though, about extracting the original data accurately and making some use of the metadata. His client software, instead of having a specialized module for each data source, has a single XML parser module. The parser is the program that separates the markup from the data, just as it does in POP applications.

And just like POP applications, there can be a stylesheet, a set of commands that tell a program how to process the document. It may not look much like a POP stylesheet it might look more like a script or program but it performs the same function. And, as with POP stylesheets, there can be different MOM stylesheets for different document types, or to do different things with message documents of a single document type.

There is an extra benefit to XML three-tier MOM applications in a networked environment. For many applications, the middle-tier server can collect all of the relevant data at once and send it in a single document to the client. Further querying, sorting, and other processing can then take place solely on the client system. That not only cuts down Web traffic and overhead, but it vastly improves the end-users perceived performance and his satisfaction with the experience.

MOM documents tend to be (but neednt be) short-lived, non-interactive, small, and with simple structures.

4.2.3 Opposites are attracted

To XML, that is!

How is it that XML can be optimal for two such apparently extreme opposites as MOM and POP? The answer is, the two are not really different where it counts.

In both cases, we start with abstract information. For POP, it comes from a human authors head. For MOM, it comes from a database. But either way, the abstract data is marked up with tags and becomes a document.

Here is a terminally cute mnemonic for this very important relationship: Data + Markup = DocuMent

Aren't you sorry you read it? Now you'll never forget it. But XML DocuMents are special. An application can do three kinds of processing with one:

  • Parse it, in order to extract the original data. This can be done without information loss because XML represents both metadata and data, and it lets you keep the abstractions distinct from rendition information.
  • Render it, so it can be presented in a physical medium that a human can perceive. It can be rendered in many different ways, for delivery in multiple media such as screen displays, print, Braille, spoken word, and so on.
  • Hack it, meaning process it as plain text without parsing. Hacking might involve cutting and pasting into other XML documents, or scanning the markup to get some information from it without doing a real parse.

The real revelation here is that data and documents aren't opposites. Far from it they are actually two states of the same information.

The real difference between the two is that when data is in a database, the metadata about its structure and meaning (the schema) is stored according to the proprietary architecture of the database. When the data becomes a document, the metadata is stored as markup.

A mixture of markup and data must be governed by the rules of some notation. XML and SGML are notations, as are RTF and Word file format. The rules of the notation determine how a parser will interpret the document text to separate the data from the markup.

Notations are not just for complete documents. There are also data object notations, such as GIF, TIFF, and EPS, that are used to represent such things as graphics, video (e.g., MPEG), and audio (e.g., AVI). Document notations usually allow their documents to contain data objects, such as pictures, that are in the objects own data object notations.

Data object notations are usually (not always) in binary; that is, they are built-up from low-level ones and zeros. Document notations, however, are frequently character-based. XML is character-based, which is why it can be hacked.

In fact, a design objective of XML was to support the desperate Perl hacker someone who needs to write a program in a hurry, using a scripting language like Perl, and who doesn't use a real XML parser. Instead, his program scans the XML document as though it were plain text. The program might search for markup strings, but can also search for data.

A hacker often uses cues that have special meaning to him, like giving special treatment to a tag that occurs at the start of a line, even though those cues have no meaning to a parser. That's why serious hackers do their XML editing with programs that can preserve a documents source and reproduce it character-for-character. They dont let the software decide which characters are important enough to preserve.

Since databases and documents are really the same, and MOM and POP applications both use XML documents, there are lots of opportunities for synergy.

4.2.4 MOM and POP They're so great together!

Classically, MOM and POP were radically different kinds of applications, each doing things its own way with different technologies and mental models. But POP applications frequently need to include database data in their document content think of an automotive maintenance manual that has to get the accurate part numbers from a database.

Similarly, MOM applications need to include human-written components. When the dealer asks for price and availability of the automotive parts you need, the display might include a description as well.

With the advent of generalized markup, the barriers to doing MOM-like things in POP applications began to disappear. Some of the POP-like applications you'll read about in the next part of the book appear to have invented the middle tier on their own. And now, with the advent of XML, MOM applications can easily incorporate POP functionality as well.

In fact, wed go so far as to say there is no longer a difference in kind between the two, only a difference in degree. There really is an endless spectrum of application opportunities. It is a multi-dimensional spectrum where applications need not be implemented differently just because they process different document types. The real differentiators are other document characteristics, like persistency, size, interactivity, structural complexity, percentage of human-written content, and the importance of eventual presentation to humans.

At the extremes, some applications may call for specialized (or optimized) techniques, but the broad central universe of applications can all be implemented similarly. Much of the knowledge that POP application developers have acquired over the years is now applicable to MOM applications, and vice versa. Keep that in mind as you read the application descriptions and case studies.

That cross-fertilization is true of products and their underlying technologies as well. All of the product descriptions in this book should be of interest, whether you think of your applications as chiefly being MOM or being POP. It is the differences in functionality and design that should cause you to choose one product over another, not their marketing thrust or apparent orientation. Weve included detailed usage examples for leading tools in each category so you can look beyond the labels.

4.3 | XML tools

Our coverage of tools falls into three broad categories.

Editing and composition
These are the classic tools of POP applications, but now with applicability to the MOM world as well. Editors are used for creating and revising documents. Composition tools produce renditions, but composition functionality is sometimes included in editors. Content management A major benefit of XML is the ability to store and work with components of documents, rather than only being able to deal with the document as a whole. These tools use databases to store information components so they can be controlled, managed, and assembled into end-products in the same way as components of automobiles, aircraft, or other complex devices. Think of them as the MOM and POP store.

Middle-tier tools These are the vital MOM application tools for creating middle-tier servers. They integrate data sources and allow applications to interoperate.

In each category, we cover a number of products with detailed usage examples. Although there is often functional overlap among them, each has unique strengths that are targeted towards a particular kind of use. Weve tried to emphasize those differences in order to discuss different tool characteristics in each chapter.

There is also a survey of tools that are available for free, in categories such as XML parsers, XSL engines, converters, and viewers. Some 55 of them are on the CD-ROM accompanying this book.

Tool capabilities are also discussed in the application scenarios and case studies.

4.4 | XML jargon demystifier

One of the problems in learning a new technology like XML is getting used to the jargon. A good book will hold you by the hand, introduce terms gradually, and use them precisely and consistently.

Out in the real word, though, people use imprecise terminology that often makes it hard to understand things, let alone compare products. And, unlike authors, they sometimes just plain get things wrong.

For example, you may see statements like XML documents are either well-formed or valid. As you've learned from this book, that simply isn't true. All XML documents are well-formed; some of them are also valid.

In this book, we've taken pains to edit the application and tool chapters to use consistent and accurate terminology. However, for product literature and other documents you read, the mileage may vary. So we've prepared a handy guide to the important XML jargon, both right and wrong. Think of it as a MOM application for XML knowledge.

4.4.1 Structured vs. unstructured

XML documents are frequently referred to as structured while other text, such as rendition notations like RTF, are called unstructured.

In fact, renditions can have a rich structure, composed of elements like pages, columns, and blocks. The real distinction being made is between abstract and rendered. Keep that in mind when you read about structured and unstructured, even in this book

4.4.2 Tag vs. element

Tags aren't the same thing as elements. Tags describe elements.

The package, metaphorically speaking, is an element. The contents of the package is the content of an element. The tag describes the element. It contains two names:

  • the element type name (Wristwatch), which says what type of element it is, and
  • a unique identifier, or ID (WW42-3729), which says which particular element it is.

    A tag could also include attributes describing other properties of the element, such as Manufacturer=Hy TimePiece Company. When people talk about a tag name:

    1.Figure 4-5
    2.What's in a tag?

    4.4.3 Document type, DTD, and markup declarations

    A document type is a class of similar documents, like telephone books, technical manuals, or (when they are marked up as XML) inventory records.

    A document type definition (DTD) is the set of rules for using XML to represent documents of a particular type. These rules might exist only in your mind as you create a document, or they may be written out.

    Markup declarations, such as those in Example 4-1, are XMLs way of writing out DTDs.

    Example 4-1 Markup declarations in the file greeting.dtd. It is easy to mix up these three constructs: a document type, XMLs markup rules for documents of that type (the DTD), and the expression of those rules (the markup declarations). It is necessary to keep the constructs separate if you are dealing with two or more of them at the same time, as when discussing alternative ways to express a DTD. But most of the time, even in this book, DTD will suffice for referring to any of the three.

    4.4.4 Document, XML document, and document instance

    The term document has two distinct meanings in XML.

    Consider a really short XML document that might be rendered as: Hello World

    In one sense, the abstract message you get in your mind when you read the rendition is the real document. Communicating that abstraction is the reason for using XML in the first place.

    In a formal, syntactic sense, though, the complete text (markup + data, remember) of Example 4-2, is the XML document. Perhaps surprisingly, that includes the markup declarations for its DTD in Example 4-1. The XML document, in other words, is a character string that represents the real document.

    In this example, much of that string consists of the markup declarations, which express the greeting DTD. Only the last four lines describe the real document, which is an instance of a greeting. Those lines are called the document instance.

    Example 4-2 A greeting document.
    4.4.5 Coding, encoding, and markup

    People refer to computer programs as code, and to the act of programming as coding.

    There is also the word encoding, which refers to the way that characters are represented as ones and zeros in computer storage. XML has a declaration for specifying an encoding.

    You'll often see (in places other than this book) phrases like XML-encoded data, coded in HTML, or XML coding.

    But using XML isn't coding. Not in the sense of programming, and not in the sense of character encoding. What those phrases mean are XML document, marked-up in HTML, and XML markup.

    4.5 | Conclusion

    We've covered the key concepts of XML itself, and of the ways in which it is used in the real world. Now we are ready to examine those real-world uses in depth, with application scenarios, case studies of actual users, and detailed descriptions of the tools of the tag trade.

Table of Contents

Preface
Foreword
Prolog

I. THE WHO, WHAT, AND WHY OF XML.

1. Why XML?
Introductory Discussion. Text Formatters and SGML. XML Markup. Road to XML. EDI, EAI and Other Tlas. Conclusion.

2. Just Enough XML.
Introductory Discussion. The Goal. Elements: The Logical Structure. Unicode: The Character Set. Entities: The Physical Structure. Markup. Document Types. Well-Formedness and Validity. Hyperlinking. Stylesheets. Programming Interfaces and Models. XML and Protocols. Conclusion.

3. The XML Usage Spectrum.
Introductory Discussion. Is XML for Documents or for Data? A Wide Spectrum of Application Opportunities. Opposites Are Attracted. MOM and POP - They're So Great Together! Conclusion.

4. Better Browsing through XML.
Introductory Discussion. Beyond HTML. Database Publishing. Multimedia. Metadata. Content Syndication. Science on the Web. Portals and Personalization. Alternative Delivery Platforms. Conclusion.

5. Taking Care of E-Business.
Introductory Discussion. Commerce Frameworks. Going Vertical. Repository Stories. Conclusion.

6. XML Jargon Demystifier™.
Introductory Discussion. Structured Vs Unstructured. Tag Vs Element. Document Type, Dtd, and Markup Declarations. Document, Xml Document, and Instance. Schema and Schema Definition.What's the Meta? Notations and Characters. Coding, Encoding, and Markup. Documents and Data. And in Conclusion.

II. MIDDLE-TIER SERVERS.


7. Personalized Frequent-Flyer Website.
Introductory Discussion. Client/Server Frequent-Flyer Sites. What's Wrong with This Web Model? A Better Model for Doing Business on the Web. An XML-Enabled Frequent-Flyer Website. Understanding the Softland Air Scenario. Towards the Brave New Web.

8. Building an Online Auction Website.
Application Discussion. Getting Data from the Middle Tier. Building the User Interface. Updating the Data Source from the Client. Conclusion.

9. Anatomy of an Information Server.
Tool Discussion. E-Business applications. Requirements for an Information Server. How excelon Works.

10. Wells Fargo & Company.
Case Study. Website Requirements. The Challenge: Leverage All the Information. The New Intranet System. How the System Works. Conclusion.

11. Bidirectional Information Flow.
Application Discussion. Infoshark Plays Its CARD! Application Scenario: Metro Police. Other Features of CARD.

III. E-COMMERCE.


12. From EDI to IEC: The New Web Commerce.
Introductory Discussion. What is EDI? The Value of EDI. Traditional EDI: Built on Outdated principles. Leveraging XML and the Internet. Conclusion.

13. XML and EDI: Working Together.
Introductory Discussion. What is Integrated E-Commerce? Traditional EDI and XML Compared. An XML-EDI Trading System. The Future of E-Commerce.

14. Collaboration in an E-Commerce Supply Web.
Application Discussion. It's All about Collaboration! Modes of E-Commerce. An E-Commerce Scenario.

15. Lead Tracking by Web and Email.
Case Study. The Challenge. The Solution.

16. An Information Pipeline for Petrochemicals.
Case Study. The Petrochemical Marketplace. Integrating with XML. Achieving a Free-Flowing Information Pipeline. Conclusion.

IV. PORTALS.


17. Enterprise Information Portals (EIP).
Introductory Discussion. Information Is the Global Economy. Enterprise Information Challenges. Enterprise Information Portals. A Framework for Portals. Summary.

18. Portal Servers for E-Business.
Tool Discussion. Portal Server Requirements. Architecture of an E-Business Portal Server. Other Portal Server Facilities.

19. RxML: Your Prescription for Healthcare.
Case Study. Doing as Well as Can Be Expected - Not! The Prescription: a Health Portal System. Connectivity Counts. Aggregation Adds Value. Personalization Assures Usability. Linking Up the Supply Chain. Conclusion.

V. SYNDICATION.


20. XMLNews: A Syndication Document Type.
Application Discussion. Structure of a News Story. Structure of an Xmlnews-Story Document. Rich Inline Markup. Media Objects.

21. Wavo Corporation.
Case Study. The Challenge. Wavo's Mediaxpress Service. Summary.

22. Information and Content Exchange (ICE).
Application Discussion. Beyond the Newswire. Syndication Requirements. ICE: A Cool and Solid Solution! An ICE Scenario.

VI. PUBLISHING.


23. Frank Russell Company.
Case Study. Background. Project Strategy Considerations. Identifying the Needs. Create an Abstract Architecture. Implement Applications. Conclusion.

24. PC World Online.
Case Study. The Challenge. Templates and Databases Were Not Enough. XML Provides a Solution. Results and Benefits. Summary.

25. MTU-DaimlerChrysler Aerospace.
Case Study. The Challenge. The Solution. The Result.

VII. CONTENT MANAGEMENT.


26. Tweddle Litho Company.
Case Study. Auto Manufacturing Is Large-Scale Publishing. Global Markets, Global Information. Needed: An XML Component Management System. Improving the Translation Process. One Source, Multiple Delivery Formats. Conclusion.

27. Efficient Content Management.
Tool Discussion. How Today's Process Works. How to Make the Process Efficient. Conclusion.

28. Document Storage and Retrieval.
Tool Discussion. Storage Strategies. Indexing and Retrieval. Conclusion.

29. Enterprise Data Management.
Tool Discussion. Applications and XML. Requirements for Enterprise Data Management. XML Database Operations. Internet File System. An E-Commerce Example. Conclusion.

VIII. CONTENT ACQUISITION.


30. Developing Reusable Content.
Application Discussion. The Content Developer's Dilemma. Content Development Strategy. Editing XML Abstractions. Linking and Navigation.

31. Converting Renditions to Abstractions.
Application Discussion. Concepts of Document Conversion. The Conversion Process.

32. Planning for Document Conversion.
Application Discussion. The Data Conversion Laboratory Methodology. Phase 1: Concept and Planning. Phase 2: Proof-of-Concept. Phase 3: Analysis, Design and Engineering. Phase 4: Production. Conclusion.

33. XML Mass-Conversion Facility.
Tool Discussion. The Challenge. The Solution. Conclusion.

34. Integrating Legacy Data.
Application Discussion. What Is Legacy Data? E-Commerce with Legacy Data. Legacy Data Flow. Legacy Data Challenges.

IX. SCHEMAS.


35. Building a Schema for a Product Catalog.
Friendly Tutorial. Online Catalog Requirements. Design Considerations. Datatypes. The Design. Schema Definition Notations. A Sample Document. Conclusion.

36. Schema Management at Major Bank.
Case Study. The Situation. Schema Management as a Solution. The Plan of Attack. Conclusion.

37. Building Your E-Commerce Vocabulary.
Tool Discussion. Why Do You Need an E-Commerce Vocabulary? Where Do Schemas Come From? Capturing Existing Business Semantics. Reuse for E-Commerce.

38. Repositories and Vocabularies.
Resource Description. Repositories. Public Vocabularies.

X. STYLESHEETS.


39. The Role of Stylesheets.
Tool Discussion. The Need for Intelligent Publications. Creating a Stylesheet. Delivering the Results.

40. A Stylesheet-Driven Tutorial Generator.
Case Study. Touring a Tutorial. The Tutorial XML Document. Generating the Tutorial. Conclusion.

41. Designing Website Stylesheets.
Application Discussion. Server Delivery Strategy. Designing Document Types for Navigation. Filtering with XSL. Rendering XML Documents as Speech. Conclusion.

XI. NAVIGATION.


42. Extended Linking.
Application Discussion. The Shop Notes Application. Other Applications of Extended Linking. Strong Link Typing. Conclusion.

43. Topic Maps: Knowledge Navigation Aids.
Friendly Tutorial. Topic Maps in a Nutshell. Applications of Topic Maps. Tool Support for Topic Maps. Conclusion.

44. Application Integration Using Topic Maps.
Application Discussion. Distributed Objects. Architecture for Application Integration. A Simple Workflow Example. A Compound Workflow Example. Conclusion.

XII. INFRASTRUCTURE.


45. Java Technology for XML Development.
Tool Discussion. SAX and DOM Implementations. XML Middleware Services.

46. Building a Rich-Media Digital Asset Manager.
Application Discussion. Architecture of a Rich-Media Digital Asset Manager. Object-Oriented Messaging. Scripting with XML. Element Structure and Storage Structure. XML-Based Rich-Media Distribution.

47. New Directions for XML Applications.
Application Discussion. Performance Analysis. A Clean Solution with SOAP! Coming Soon to a Television Near You … Performance Enhancement.

XIII. XML TUTORIALS.


48. XML Basics.
Friendly Tutorial. Syntactic Details. Prolog Vs Instance. The Logical Structure. Elements. Attributes. The Prolog. Markup Miscellany. Summary.

49. Creating a Document Type Definition.
Friendly Tutorial. Document Type Declaration. Internal and External Subset. Element Type Declarations. Element Type Content Specification. Content Models. Attributes. Notation Declarations.

50. Entities: Breaking Up Is Easy to Do.
Tad Tougher Tutorial. Overview. Entity Details. Classifications of Entities. Internal General Entities. External Parsed General Entities. Unparsed Entities. Internal and External Parameter Entities. Markup May Not Span Entity Boundaries. External Identifiers. Conclusion.

51. Advanced Features of XML.
Friendly Tutorial. Conditional Sections. Character References. Processing Instructions. Special Attributes and Newlines. Standalone Document Declaration. Is That All There Is?

52. Reading the XML Specification.
Tad Tougher Tutorial. A Look at XML's Grammar. Constant Strings. Names. Occurrence Indicators. Combining Rules. Conclusion.

XIV. RELATED TUTORIALS.


53. Namespaces.
Friendly Tutorial. Problem Statement. The Namespaces Solution. Namespace Prefixes. Scoping. Attribute Names. Namespaces and Dtds. Are Namespaces a Good Thing?

54. XML Path Language (XPath).
Tad Tougher Tutorial. Xpath Applications. User Scenarios. Specifications Built on Xpath. The Xpath Data Model. Sources of the Model. Tree Addressing. Node Tree Construction. Node Types. Location Paths. Basic Concepts. Anatomy of a Step. Our Story So Far. Predicates. ID Function. Conclusion.

55. Extensible Stylesheet Language (XSL).
Friendly Tutorial. Transformation vs rendition. Formatting objects. In the meantime. XSL stylesheets. Rules, patterns and templates. Creating a stylesheet. Document-level template rule. Literal result elements. Extracting data. The apply templates instruction. Handling optional elements. Reordering the output. Sharing a template rule. Data content. Handling inline elements. Final touches. Top-level instructions. Stylesheet combination. Keys. Whitespace handling. Output descriptions. Numeric formats. Attribute sets. Namespace alias. Variables and parameters. XSL formatting objects. Referencing XSL stylesheets. Conclusion.

56. XML Pointer Language (XPointer).
Friendly Tutorial. Xpointers: The Reason Why. Uniform Resource Identifiers. URI References. ID References with Xpointers. Xpointer Abbreviations. Extensions to Xpath. Ranges. Point Functions. Other Extension Functions. Multiple Xpointer Parts. The Role of Xpointers. Conclusion.

57. XML Linking Language (XLink).
Friendly Tutorial. Basic Concepts. Simple Links. Link Roles. Is This for Real? Link Behaviors. Extended Links. Locator Elements. Arcs. Linkbases. Conclusion.

58. Datatypes.
Friendly Tutorial. Datatype Requirements. Xml Schema Datatypes. Built-In Datatypes. User-Derived Datatypes. Using Datatypes. XML Schema Definition Language (XSDL). XML Dtds. Conclusion.

59. XML Schema (XSDL).
Tad Tougher Tutorial. Dtds and Schemas. Next Generation Schemas. XSDL Syntax. A Simple Sample Schema. Baseline DTD. Declaring an Element Type. Declaring Attributes. Declaring Schema Conformance. Additional Capabilities. Locally-Scoped Element Types. Element Types Versus Types. Schema Inclusion. Other Capabilities.

XV. RESOURCES.


60. Free Resources on the CD-ROM.
Resource Description. Software Featured on the Covers. Xmlsolutions Corporation Free Software. IBM Alphaworks XML Software Suite. Adobe Framemaker+SGML XML/SGML Editor/Formatter. Excelon Stylus XSL Stylesheet Manager. Extensibility XML Authority Schema Editor. Infoshark Viewshark XML Relational Data Viewer. Arbortext Adept Editor LE. Enigma INSIGHT XML Publishing Software. IBM Alphaworks. The Alphaworks Idea. XML at Alphaworks. An Extravagance of Free XML Software. Parsers and Engines. Editing and Composition. Control Information Development. Conversion. Electronic Delivery. Document Storage and Management. The XML Spectacular. W3C Base Standards. W3C XML Applications. Other Specifications.

61. Other XML-Related Books.
Resource Description. Program Development with XML. Websites and Internet. Dtds and Schemas. XML Reference. An Awesomely Unique XML/SGML Application. Learning the Foundations of XML.

Index.

Interviews

Author Essay

The World Wide Web is undergoing a radical change that is introducing wonderful services for users and amazing new opportunities for web site developers and businesses.

HTML—the hypertext markup language—made the Web the world's library. Now its sibling, XML—the extensible markup language—has begun to make the Web the world's commercial and financial hub. XML is a brand-new W3C [World Wide Web Consortium] recommendation, but already there are millions of XML files out there, with more coming online every day.

You can see why by comparing XML and HTML. Both are based on SGML, the international standard for structured information, but look at the difference:

In HTML:

<p>P266 Laptop
<br>Friendly Computer Shop
<br>$1438

In XML:

<product>
<model>P266 Laptop</model>
<dealer>Friendly Computer Shop</dealer>
<price>$1438</price>
</product>

Both of these may look the same in your browser, but the XML data is smart data. HTML tells how the data should look, but XML tells you what it means.

With XML, your browser knows there is a product, and it knows the model, dealer, and price. From a group of these it can show you the cheapest product or closest dealer without going back to the server.

Unlike HTML, with XML you create your own tags, so they describe exactly what you need to know. Because of that, your client-side applications can access data sources anywhere on the Web, in any format. New "middle-tier" servers sit betweenthe data sources and the client, translating everything into your own task-specific XML.

But XML data isn't just smart data, it's also a smart document. That means when you display the information, the model name can be a different font from the dealer name, and the lowest price can be highlighted in green. Unlike HTML, where text is just text to be rendered in a uniform way, with XML text is smart, so it can control the rendition.

And you don't have to decide whether your information is data or documents; in XML, it is always both at once. You can do data processing or document processing or both at the same time.

With that kind of flexibility, it's no wonder that we're starting to see a brave new Web of smart, structured information. Your broker sends your account data to Quicken using XML. Your push technology channel definitions are in XML. Everything from math to multimedia, chemistry to CommerceNet, is using XML or is preparing to start.

Say goodbye to dumb data. Welcome to the brave new XML Web!

Charles F. Goldfarb's Top 10 Computer Books

XML isn't HTML with a capital X. It requires new ways of thinking about web content. The authors of these books have gotten the message and know how to share it with you. I recruited them personally for my book series because I know they are genuine experts. We worked together to make their books accurate and clear, which is why I am able to recommend them. The list includes books on both XML and SGML—and one for people who want to get inside a secure system (or keep others out of their own!)

  1. The XML Handbook by Charles F. Goldfarb and Paul Prescod
  2. XML by Example: Building E-Commerce Applications by Sean McGrath
  3. Structuring XML Documents by David Megginson
  4. Designing XML Internet Applications by Michael Leventhal, David Lewis, and Matthew Fuchs
  5. XML: The Annotated Specification by Bob DuCharme
  6. The XML and SGML Cookbook: Recipes for Structured Information by Rick Jelliffe
  7. SGML Buyer's Guide by Charles F. Goldfarb, Steve Pepper, and Chet Ensign
  8. SGML: The Billion Dollar Secret by Chet Ensign
  9. The SGML Handbook by Charles F. Goldfarb
  10. Top Secret Intranet: How U.S. Intelligence Built Intelink—the World's Largest, Most Secure Network by Fredrick Thomas Martin

Preface

XML is taking over the world!

I saw the proof a few days ago in the newsletter for a major mutual fund. A profile of one of the fund's top holdings praised the company's great prospects because of its leadership in XML technology—and the article didn't even explain what XML is!

If a financial analyst thinks that XML is common knowledge, can world domination be far behind?

I'm delighted at the analyst's enthusiasm, but I don't think the knowledge is all that common yet—which is why Paul and I wrote this book. We know—and we want to share with you—the reasons why XML is taking over the world. We want you to understand how it is enabling all sorts of wonderful services for Web users and amazing new opportunities for website developers and businesses.

HTML—the HyperText Markup Language—made the Web the world's library. XML—the Extensible Markup Language—is its sibling, and it is making the Web the world's commercial and financial hub.

In the process, the Web is becoming much more than a static library. Increasingly, users are accessing the Web for "Web pages" that aren't actually on the shelves. Instead, the pages are generated dynamically from information available to the Web server. That information can come from databases on the Web server, from the site owner's enterprise databases, or even from other websites.

And that dynamic information needn't be served up raw. It can be analyzed, extracted, sorted, styled, and customized to create a personalized Web experience for the end-user. For this kind of power and flexibility, XML is the markup language ofchoice.

You can see why by comparing XML and HTML. Both are based on SGML—the International Standard for structured information—but look at the difference:

In HTML:

<p>P200 Laptop <br>Friendly Computer Shop <br>$1438 

In XML:

<product> <model>P200 Laptop</model> <dealer>Friendly Computer Shop</dealer> <price>$1438</price> </product> 

Both of these may appear the same in your browser, but the XML data is smart data. HTML tells how the data should look, but XML tells you what it means.

With XML, your browser knows there is a product, and it knows the model, dealer, and price. From a group of these it can show you the cheapest product or closest dealer without going back to the server.

Unlike HTML, with XML you create your own tags, so they describe exactly what you need to know. Because of that, your client-side applications can access data sources anywhere on the Web, in any format. New "middle-tier" servers sit between the data sources and the client, translating everything into your own task-specific XML.

But XML data isn't just smart data, it's also a smart document. That means when you display the information, the model name can be a different font from the dealer name, and the lowest price can be highlighted in green. Unlike HTML, where text is just text to be rendered in a uniform way, with XML text is smart, so it can control the rendition.

And you don't have to decide whether your information is data or documents; in XML, it is always both at once. You can do data processing or document processing or both at the same time.

With that kind of flexibility, it's no wonder that we're starting to see a Brave New Web of smart, structured information. Your broker sends your account data to Quicken using XML. Your "push" technology channel definitions are in XML. Everything from math to multimedia, chemistry to CommerceNet, is using XML or is preparing to start.

You should be too!

Welcome to the Brave New XML Web.

What about SGML?

This book is about XML. You won't find feature comparisons to SGML, or footnotes with nerdy observations like "the XML empty-element tag does not contradict the rule that every element has a start-tag and an end-tag because, in SGML terms, it is actually a start-tag followed immediately by a null end-tag."

Nevertheless, for readers who use SGML, it is worth addressing the question of how XML and SGML relate. There has been a lot of speculation about this.

Some claim that XML will replace SGML because there will be so much free and low-cost software. Others assert that XML users, like HTML users before them, will discover that they need more of SGML and will eventually migrate to the full standard.

Both assertions are nonsense ... XML and SGML don't even compete.

XML is a simplified subset of SGML. The subsetting was optimized for the Web environment, which implies data-processing-oriented (rather than publishing-oriented), short life-span (in fact, usually dynamically-generated) information. The vast majority of XML documents will be created by computer programs and processed by other programs, then destroyed. Humans will never see them.

Eliot Kimber, who was a member of both the XML and SGML standards committees, says:

There are certain use domains for which XML is simply not sufficient and where you need the additional features of SGML. These applications tend to be very large scale and of long term; e.g., aircraft maintenance information, government regulations, power plant documentation, etc.
Any one of them might involve a larger volume of information than the entire use of XML on the Web. A single model of commercial aircraft, for example, requires some four million unique pages of documentation that must be revised and republished quarterly. Multiply that by the number of models produced by companies like Airbus and Boeing and you get a feel for the scale involved.

I agree with Eliot. I invented SGML, I'm proud of it, and I'm awed that such a staggering volume of the world's mission-critical information is represented in it.

I'm thrilled that it has been such an enabler of the Web that the Society for Technical Communication awarded joint Honorary Fellowships to the Web's inventor,Tim Berners-Lee, and myself in recognition of the synergy.

But I'm also proud of XML. I'm proud of my friend Jon Bosak who made it happen, and I'm excited that the World Wide Web is becoming XML-based.

If you are new to XML, don't worry about any of this. All you need to know is that the XML subset of SGML has been in use for a decade or more, so you can trust it.

SGML still keeps the airplanes flying, the nuclear plants operating safely, and the defense departments in a state of readiness. You should look into it if you produce documents on the scale of an Airbus or Boeing. For the rest of us, there's XML.

About our sponsors

With all the buzz surrounding a hot technology like XML, it can be tough for a newcomer to distinguish the solid projects and realistic applications from the fluff and the fantasies. It is tough for authors as well, to keep track of all that is happening in the brief time we can steal from our day jobs.

The solution to both problems was to seek support and expert help from our friends in the industry. We know the leading companies in the XML arena and knew they had experience with both proven and leading-edge applications and products.

In the usual way of doing things, had we years to write this book, we would have interviewed each company to learn about its products and/or application experiences, written the chapters, asked the companies to review them, etc., and gone on to the next company. To save time and improve accuracy, we engaged in parallel processing. I spoke with the sponsors, agreed on subject matter for their chapters, and asked them to write the first draft.

All sponsored chapters are identified with the name of the sponsor, and sometimes with the names of the experts who prepared the original text. I used their materials as though they were my own interview notes—editing, rewriting, deleting, and augmenting as necessary to achieve my objective for the chapter in the context of the book, with consistent terminology and an objective factual style. I'd like to take this opportunity to thank these experts publicly for being so generous with their time and knowledge.

The sponsorship program was directed by Linda Burman, the president of L. A. Burman Associates, a consulting company that provides marketing and business development services to the XML and SGML industries.

We are grateful to our sponsors just as we are grateful to you, our readers. Both of you together make it possible for The XML Handbook to exist. In the interests of everyone, we make our own editorial decisions and we don't recommend or endorse any product or service offerings over any others.

Our 27 sponsors are:

  • Adobe Systems Incorporated, ...

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews

The XML Handbook 2 out of 5 based on 0 ratings. 1 reviews.
Guest More than 1 year ago
This is a fine theory book, but if you want applicable and practical information, this is not the book for you. Tells you what XML is (duh) but doesn't tell you what to do with it.