Read an Excerpt
Chapter 2: Of DTDs and Schemas
- Understanding basic schema structure
- Examining the basics behind DTDs
- Comparing schemas and DTDs
- Looking toward the future
This is not to say that DTDs have outlived their usefulness;however,schemas are giving them a run for their money.As developers,we now have options for validation.Two options come from the W3C:XML Schema and XML DTDs.However,you ’re not limited to only those two options.In addition to XML Schema and XML DTDs,there are several other schema languages that are circulating throughout the XML community.Two worth noting are REgular LAnguage description for XML Next Generation (RELAX NG)and Schema- tron,which are both lightweight schema languages that offer functionality similar to the functionality XML Schema offers.Find out more about RELAX NG and Schematron in Chapter 12.
In this chapter,we focus on basic underlying concepts of XML Schema and XML DTDs. It ’s important to understand the strengths and weaknesses of both approaches before you choose the appropriate validation tool for your application.
Understanding DTD Structures and FunctionsDTDs have provided structure to XML documents for a long time.Whereas flexibility is one of XML ’s primary strengths,there are instances where structure is important —even a requirement.Defining a document model provides a structure to which documents must conform.E-commerce and B2B transactions are two common scenarios that require strict document models.
There are several reasons you might want to use a validation mechanism:
- If multiple developers will be working with the document model,the DTD would pro- vide a framework from which they can work.
- If your document model contains required elements (such as a price for your product), DTDs allow you to define element and attribute behavior.
- If you ’re developing a document model that will continue to evolve,a DTD could help guide that process.
- Provide a structural framework for documents
- Define a content model for elements
- Declare a list of allowable attributes for each element
- Allow for limited datatyping for attribute values
- Provide default values for attributes
- Define a mechanism for creating reusable chunks of data,with some limitations
- Provide a mechanism for passing non-XML data to the appropriate processor
- Allow you to use conditional sections to mark declarations for inclusion or xclusion
TIP As you probably know, DTD validation is no longer the only option for defining document models. XML Schema offers a flexible solution to the preceding scenarios.
DeclarationsDTDs consist of declarations that provide rules for your document model.Each declaration defines an element,set of attributes,entity,or notation.These four declaration types make up the bulk of any DTD:
element declarations Identify the names of elements and the nature of their content. DTDs do not allow for complex content model definitions.Rather,DTDs allow authors to provide information about element hierarchy.The only datatype you can define for ele- ment content is parsed character data (PCDATA ).
attribute declarations Identify which elements may have attributes,what attributes they may have,what values the attributes may hold,and what the default value is.
entity declarations Allow you to associate a name with some other fragment of content. That construct can be a chunk of regular text,a chunk of the document type declaration, or a reference to an external file containing either text or binary data.
notation declarations Identify specific types of external binary data.This information is passed to the processing application.
When defining DTD declarations,you have to follow a few rules governing the order of their occurrence.If multiple declarations exist for the same element,entity,attribute,or notation,the first one defined takes precedence (the other redundant declarations are then ignored).You also have to be careful when defining entities.Parameter entities (entities defined and used within the DTD)must be declared before they can be referenced.
The syntax used to create declarations allows for white space anywhere within the declara- tions,but there are a few delimiters that have to be written accurately (such as the exclama- tion point in !ELEMENT .The follow declarations are all correct:
<!ELEMENT book (title,author)> <!ELEMENT book (title,author)> <!ELEMENT book ( title, author)>
Declarations can reside inside the XML document or can be defined as a stand-alone doc- ument.If defined as a part of an XML document,the collection of declarations is referred to as the internal subset .If the declarations are defined externally in a separate file,that file is referred to as an external subset .Many times,you ’ll find that you need to use both internal and external subsets.The collection of all subsets is known as the DTD.Listing 2.1 provides an example of a small collection of DTD declarations defined as a part of the internal DTD subset.
.Listing 2.1 An XML Document Containing an Internal DTD Subset
<?xml version=”1.0 ”?> <!DOCTYPE publications [ <!ELEMENT publications (book+)> <!ELEMENT book (title,author)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> ]> <publications> <book> <title>Mastering XHTML <author>Ed Tittel </book> <book> <title>Java Developer ’s Guide to E-Commerce with XML and JSP <author>William Brogden </book> </publications>
This example defines only element type declarations.In most cases,your document model would be more complex,also allowing for attributes,notations,and entities.For each ele- ment,there ’s a corresponding content model defined.For example,the book element is allowed to contain only a title element followed by an author element.
Internal SubsetInternal subsets are handy if you plan to import declarations from external DTD subsets. This is because you can override externally defined declarations by defining a new declaration in the internal subset.The declaration found first (the XML parser reads the internal subset before the external)takes precedence.
There are a couple of restrictions placed on internal subsets:
- You cannot use conditional sections to mark the inclusion or exclusion of DTD declara- tions.Conditional sections make it easier to combine DTD subsets,therefore allowing you to modularize your DTD.
- Your parameter entity usage is limited.According to the XML 1.0 Specification,you can- not define and use a parameter entity within another declaration.