“XML Schema 1.1 has gone from strong data typing to positively stalwart—so powerful it can enforce database level constraints and business rules, so your data transfer code won’t have to. This book covers the 1.1 changes—and more—in its 500 revisions to Priscilla Walmsley’s 10-year best-selling classic. It’s the guide you need to navigate XML Schema’s complexity—and master its power!”
—Charles F. Goldfarb
For Ten Years the World’s Favorite Guide to XML Schema—Now Extensively Revised for Version 1.1 and Today’s Best Practices!
To leverage XML’s full power, organizations need shared vocabularies based on XML Schema. For a full decade, Definitive XML Schema has been the most practical, accessible, and usable guide to working with XML Schema. Now, author Priscilla Walmsley has thoroughly updated her classic to fully reflect XML Schema 1.1, and to present new best practices for designing successful schemas.
Priscilla helped create XML Schema as a member of the W3C XML Schema Working Group, so she is well qualified to explain the W3C recommendation with insight and clarity. Her book teaches practical techniques for writing schemas to support any application, including many new use cases. You’ll discover how XML Schema 1.1 provides a rigorous, complete specification for modeling XML document structure, content, and datatypes; and walk through the many aspects of designing and applying schemas, including composition, instance validation, documentation, and namespaces. Then, building on the fundamentals, Priscilla introduces powerful advanced techniques ranging from type derivation to identity constraints. This edition’s extensive new coverage includes
- Many new design hints, tips, and tricks – plus a full chapter on creating an enterprise strategy for schema development and maintenance
- Design considerations in creating schemas for relational and object-oriented models, narrative content, and Web services
- An all-new chapter on assertions
- Coverage of new 1.1 features, including overrides, conditional type assignment, open content and more
- Modernized rules for naming and design
- Substantially updated coverage of extensibility, reuse, and versioning
- And much more
If you’re an XML developer, architect, or content specialist, with this Second Edition you can join the tens of thousands who rely on Definitive XML Schema for practical insights, deeper understanding, and solutions that work.
“XML Schema 1.1 has gone from strong data typing to positively stalwart—so powerful it can enforce database level constraints and business rules, so your data transfer code won’t have to. This book covers the 1.1 changes—and more—in its 500 revisions to Priscilla Walmsley’s 10-year best-selling classic. It’s the guide you need to navigate XML Schema’s complexity—and master its power!”
—Charles F. Goldfarb
For Ten Years the World’s Favorite Guide to XML Schema—Now Extensively Revised for Version 1.1 and Today’s Best Practices!
To leverage XML’s full power, organizations need shared vocabularies based on XML Schema. For a full decade, Definitive XML Schema has been the most practical, accessible, and usable guide to working with XML Schema. Now, author Priscilla Walmsley has thoroughly updated her classic to fully reflect XML Schema 1.1, and to present new best practices for designing successful schemas.
Priscilla helped create XML Schema as a member of the W3C XML Schema Working Group, so she is well qualified to explain the W3C recommendation with insight and clarity. Her book teaches practical techniques for writing schemas to support any application, including many new use cases. You’ll discover how XML Schema 1.1 provides a rigorous, complete specification for modeling XML document structure, content, and datatypes; and walk through the many aspects of designing and applying schemas, including composition, instance validation, documentation, and namespaces. Then, building on the fundamentals, Priscilla introduces powerful advanced techniques ranging from type derivation to identity constraints. This edition’s extensive new coverage includes
- Many new design hints, tips, and tricks – plus a full chapter on creating an enterprise strategy for schema development and maintenance
- Design considerations in creating schemas for relational and object-oriented models, narrative content, and Web services
- An all-new chapter on assertions
- Coverage of new 1.1 features, including overrides, conditional type assignment, open content and more
- Modernized rules for naming and design
- Substantially updated coverage of extensibility, reuse, and versioning
- And much more
If you’re an XML developer, architect, or content specialist, with this Second Edition you can join the tens of thousands who rely on Definitive XML Schema for practical insights, deeper understanding, and solutions that work.


eBook
Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
Related collections and offers
Overview
“XML Schema 1.1 has gone from strong data typing to positively stalwart—so powerful it can enforce database level constraints and business rules, so your data transfer code won’t have to. This book covers the 1.1 changes—and more—in its 500 revisions to Priscilla Walmsley’s 10-year best-selling classic. It’s the guide you need to navigate XML Schema’s complexity—and master its power!”
—Charles F. Goldfarb
For Ten Years the World’s Favorite Guide to XML Schema—Now Extensively Revised for Version 1.1 and Today’s Best Practices!
To leverage XML’s full power, organizations need shared vocabularies based on XML Schema. For a full decade, Definitive XML Schema has been the most practical, accessible, and usable guide to working with XML Schema. Now, author Priscilla Walmsley has thoroughly updated her classic to fully reflect XML Schema 1.1, and to present new best practices for designing successful schemas.
Priscilla helped create XML Schema as a member of the W3C XML Schema Working Group, so she is well qualified to explain the W3C recommendation with insight and clarity. Her book teaches practical techniques for writing schemas to support any application, including many new use cases. You’ll discover how XML Schema 1.1 provides a rigorous, complete specification for modeling XML document structure, content, and datatypes; and walk through the many aspects of designing and applying schemas, including composition, instance validation, documentation, and namespaces. Then, building on the fundamentals, Priscilla introduces powerful advanced techniques ranging from type derivation to identity constraints. This edition’s extensive new coverage includes
- Many new design hints, tips, and tricks – plus a full chapter on creating an enterprise strategy for schema development and maintenance
- Design considerations in creating schemas for relational and object-oriented models, narrative content, and Web services
- An all-new chapter on assertions
- Coverage of new 1.1 features, including overrides, conditional type assignment, open content and more
- Modernized rules for naming and design
- Substantially updated coverage of extensibility, reuse, and versioning
- And much more
If you’re an XML developer, architect, or content specialist, with this Second Edition you can join the tens of thousands who rely on Definitive XML Schema for practical insights, deeper understanding, and solutions that work.
Product Details
ISBN-13: | 9780132886758 |
---|---|
Publisher: | Pearson Education |
Publication date: | 09/04/2012 |
Series: | Charles F. Goldfarb Definitive XML Series |
Sold by: | Barnes & Noble |
Format: | eBook |
Pages: | 768 |
File size: | 60 MB |
Note: | This product may take a few minutes to download. |
Age Range: | 18 Years |
About the Author
PRISCILLA WALMSLEYserves as Managing Director of Datypic, a consultancy specializing in XML architecture and design, SOA and Web services implementation, and content management.
Read an Excerpt
Chapter 9: Simple types
Both element and attribute declarations can use simple types to describe the data content of the components. This chapter introduces simple types, and explains how to define your own atomic simple types for use in your schemas.
9.1 Simple type varieties
There are three varieties of simple type: atomic types, list types, and union types.- Atomic types have values that are indivisible, such as 10 and
large.
- List types have values that are whitespace-separated lists of
atomic values, such as <availableSizes>10 large
2</availableSizes>.
- Union types may have values that are either atomic values or list values. What differentiates them is that the set of valid values, or "value space," for the type is the union of the value spaces of two or more other simple types. For example, to represent a dress size, you may define a union type that allows a value to be either an integer from 2 through 18, or one of the string values small, medium, or large.
9.1.1 Design hint: How much should I break down
my data values?
Data values should be broken down to the most atomic level possible.
This allows them to be processed in a variety of ways for different uses,
such as display, mathematical operations, and validation. It is much
easier to concatenate two data values back together than it is to split
them apart. In addition, more granular data is much easier to validate.
It is a fairly common practice to put a data value and its units in
the same element, for example <length>3cm</length>. How-ever,
the preferred approach is to have a separate data value,
preferably an attribute, for the units, for example <length
units="cm">3</length>.
Using a single concatenated value is limiting because:
- It is extremely cumbersome to validate. You have to apply a
complicated pattern that would need to change every time a
unit type is added.
- You cannot perform comparisons, conversions, or mathematical
operations on the data without splitting it apart.
- If you want to display the data item differently (for example, as "3 centimeters" or "3 cm" or just "3", you have to split it apart. This complicates the stylesheets and applications that process the instance document.
<orderDate> <year>2001</year> <month>06</month> <day>15</day> </orderDate>
This is probably an overkill unless you have a special need to process these items separately.
9.2 Simple type definitions
9.2.1 Named simple types
Simple types can be either named or anonymous. Named simple types are always defined globally (i.e., their parent is always schema or redefine) and are required to have a name that is unique among the data types (both simple and complex) in the schema. The XSDL syntax for a named simple type definition is shown in Table 9–1.The name of a simple type must be an XML non-colonized name, which means that it must start with a letter or underscore, and may only contain letters, digits, underscores, hyphens, and periods. You cannot include a namespace prefix when defining the type; it takes its namespace from the target namespace of the schema document. All of the examples of named types in this book have the word "Type" at the end of their names, to clearly distinguish them from element-type names and attribute names. However, this is not a requirement; you may in fact have a data type definition and an element declaration using the same name.
Example 9–1 shows the definition of a named simple type Dress-SizeType, along with an element declaration that references it. Named types can be used in multiple element and attribute declarations.
Example 9–1. Defining and referencing a named simple type
<xsd:simpleType name="DressSizeType"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="2"/> <xsd:maxInclusive value="18"/> </xsd:restriction> </xsd:simpleType> <xsd:element name="size" type="DressSizeType"/>
9.2.2 Anonymous simple types
Anonymous types, on the other hand, must not have names. They are always defined entirely within an element or attribute declaration, and may only be used once, by that declaration. Defining a type anonymously prevents it from ever being restricted, used in a list or union, or redefined. The XSDL syntax to define an anonymous simple type is shown in Table 9–2.Example 9–2 shows the definition of an anonymous simple type within an element declaration.
Example 9–2. Defining an anonymous simple type
<xsd:element name="size"> <xsd:simpleType> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="2"/> <xsd:maxInclusive value="18"/> </xsd:restriction> </xsd:simpleType> </xsd:element>
9.2.3 Design hint: Should I use named or anonymous
types?
The advantage of named types is that they may be defined once and
used many times. For example, you may define a type named Product-CodeType
that lists all of the valid product codes in your organization.
This type can then be used in many element and attribute declarations in many schemas. This has the advantages of:
- encouraging consistency throughout the organization,
- reducing the possibility of error,
- requiring less time to define new schemas,
- simplifying maintenance, because new product codes need only be added in one place.
An anonymous type, on the other hand, can be used only in the element or attribute declaration that contains it. It can never be redefined, have types derived from it, or be used in a list or union type. This can seriously limit its reusability, extensibility, and ability to change over time.
However, there are cases where anonymous types are preferable to named types. If the type is unlikely to ever be reused, the advantages listed above no longer apply. Also, there is such a thing as too much reuse. For example, if an element can contain the values 1 through 10, it does not make sense to try to define a data type named OneToTen-Type that is reused by other unrelated element declarations with the same value space. If the value space for one of the element declarations that uses the named data type changes, but the other element declarations do not change, it actually makes maintenance more difficult, because a new data type needs to be defined at that time.
In addition, anonymous types can be more readable when they are relatively simple. It is sometimes desirable to have the definition of the data type right there with the element or attribute declaration....
Table of Contents
Foreword xxxi
Acknowledgments xxxiii
How to use this book xxxv
Chapter 1 Schemas: An introduction 2
1.1 What is a schema? 3
1.2 The purpose of schemas 5
1.2.1 Data validation 5
1.2.2 A contract with trading partners 5
1.2.3 System documentation 6
1.2.4 Providing information to processors 6
1.2.5 Augmentation of data 6
1.2.6 Application information 6
1.3 Schema design 7
1.3.1 Accuracy and precision 7
1.3.2 Clarity 8
1.3.3 Broad applicability 8
1.4 Schema languages 9
1.4.1 Document Type Definition (DTD) 9
1.4.2 Schema requirements expand 10
1.4.3 W3C XML Schema 11
1.4.4 Other schema languages 12
1.4.4.1 RELAX NG 12
1.4.4.2 Schematron 13
Chapter 2 A quick tour of XML Schema 16
2.1 An example schema 17
2.2 The components of XML Schema 18
2.2.1 Declarations vs. definitions 18
2.2.2 Global vs. local components 19
2.3 Elements and attributes 20
2.3.1 The tag/type distinction 20
2.4 Types 21
2.4.1 Simple vs. complex types 21
2.4.2 Named vs. anonymous types 22
2.4.3 The type definition hierarchy 22
2.5 Simple types 23
2.5.1 Built-in simple types 23
2.5.2 Restricting simple types 24
2.5.3 List and union types 24
2.6 Complex types 25
2.6.1 Content types 25
2.6.2 Content models 26
2.6.3 Deriving complex types 27
2.7 Namespaces and XML Schema 28
2.8 Schema composition 29
2.9 Instances and schemas 30
2.10 Annotations 31
2.11 Advanced features 32
2.11.1 Named groups 32
2.11.2 Identity constraints 32
2.11.3 Substitution groups 32
2.11.4 Redefinition and overriding 33
2.11.5 Assertions 33
Chapter 3 Namespaces 34
3.1 Namespaces in XML 35
3.1.1 Namespace names 36
3.1.2 Namespace declarations and prefixes 37
3.1.3 Default namespace declarations 39
3.1.4 Name terminology 40
3.1.5 Scope of namespace declarations 41
3.1.6 Overriding namespace declarations 42
3.1.7 Undeclaring namespaces 43
3.1.8 Attributes and namespaces 44
3.1.9 A summary example 46
3.2 The relationship between namespaces and schemas 48
3.3 Using namespaces in schemas 48
3.3.1 Target namespaces 48
3.3.2 The XML Schema Namespace 50
3.3.3 The XML Schema Instance Namespace 51
3.3.4 The Version Control Namespace 51
3.3.5 Namespace declarations in schema documents 52
3.3.5.1 Map a prefix to the XML Schema Namespace 52
3.3.5.2 Map a prefix to the target namespace 53
3.3.5.3 Map prefixes to all namespaces 54
Chapter 4 Schema composition 56
4.1 Modularizing schema documents 57
4.2 Defining schema documents 58
4.3 Combining multiple schema documents 61
4.3.1 include 62
4.3.1.1 The syntax of includes 63
4.3.1.2 Chameleon includes 65
4.3.2 import 66
4.3.2.1 The syntax of imports 67
4.3.2.2 Multiple levels of imports 70
4.3.2.3 Multiple imports of the same namespace 72
4.4 Schema assembly considerations 75
4.4.1 Uniqueness of qualified names 75
4.4.2 Missing components 76
4.4.3 Schema document defaults 77
Chapter 5 Instances and schemas 78
5.1 Using the instance attributes 79
5.2 Schema processing 81
5.2.1 Validation 81
5.2.2 Augmenting the instance 82
5.3 Relating instances to schemas 83
5.3.1 Using hints in the instance 84
5.3.1.1 The xsi:schemaLocation attribute 84
5.3.1.2 The xsi:noNamespaceSchemaLocation attribute 86
5.4 The root element 87
Chapter 6 Element declarations 88
6.1 Global and local element declarations 89
6.1.1 Global element declarations 89
6.1.2 Local element declarations 93
6.1.3 Design hint: Should I use global or local element
declarations? 95
6.2 Declaring the types of elements 96
6.3 Qualified vs. unqualified forms 98
6.3.1 Qualified local names 98
6.3.2 Unqualified local names 98
6.3.3 Using elementFormDefault 99
6.3.4 Using form 100
6.3.5 Default namespaces and unqualified names 101
6.4 Default and fixed values 101
6.4.1 Default values 102
6.4.2 Fixed values 103
6.5 Nils and nillability 105
6.5.1 Using xsi:nil in an instance 108
6.5.2 Making elements nillable 109
Chapter 7 Attribute declarations 112
7.1 Attributes vs. elements 113
7.2 Global and local attribute declarations 115
7.2.1 Global attribute declarations 115
7.2.2 Local attribute declarations 117
7.2.3 Design hint: Should I use global or local attributedeclarations? 119
7.3 Declaring the types of attributes 120
7.4 Qualified vs. unqualified forms 122
7.5 Default and fixed values 123
7.5.1 Default values 124
7.5.2 Fixed values 125
7.6 Inherited attributes 126
Chapter 8 Simple types 128
8.1 Simple type varieties 129
8.1.1 Design hint: How much should I break down my datavalues? 130
8.2 Simple type definitions 131
8.2.1 Named simple types 131
8.2.2 Anonymous simple types 132
8.2.3 Design hint: Should I use named or anonymous types? 133
8.3 Simple type restrictions 135
8.3.1 Defining a restriction 136
8.3.2 Overview of the facets 137
8.3.3 Inheriting and restricting facets 139
8.3.4 Fixed facets 140
8.3.4.1 Design hint: When should I fix a facet? 141
8.4 Facets 142
8.4.1 Bounds facets 142
8.4.2 Length facets 143
8.4.2.1 Design hint: What if I want to allow empty values? 143
8.4.2.2 Design hint: What if I want to restrict the length of an integer? 144
8.4.3 totalDigits and fractionDigits 145
8.4.4 Enumeration 145
8.4.5 Pattern 148
8.4.6 Assertion 150
8.4.7 Explicit Time Zone 150
8.4.8 Whitespace 151
8.5 Preventing simple type derivation 152
8.6 Implementation-defined types and facets 154
8.6.1 Implementation-defined types 154
8.6.2 Implementation-defined facets 155
Chapter 9 Regular expressions 158
9.1 The structure of a regular expression 159
9.2 Atoms 161
9.2.1 Normal characters 162
9.2.2 The wildcard escape character 164
9.2.3 Character class escapes 164
9.2.3.1 Single-character escapes 165
9.2.3.2 Multicharacter escapes 166
9.2.3.3 Category escapes 167
9.2.3.4 Block escapes 170
9.2.4 Character class expressions 171
9.2.4.1 Listing individual characters 171
9.2.4.2 Specifying a range 172
9.2.4.3 Combining individual characters and ranges 173
9.2.4.4 Negating a character class expression 173
9.2.4.5 Subtracting from a character class expression 174
9.2.4.6 Escaping rules for character class expressions 175
9.2.5 Parenthesized regular expressions 175
9.3 Quantifiers 176
9.4 Branches 177
Chapter 10 Union and list types 180
10.1 Varieties and derivation types 181
10.2 Union types 183
10.2.1 Defining union types 183
10.2.2 Restricting union types 185
10.2.3 Unions of unions 186
10.2.4 Specifying the member type in the instance 187
10.3 List types 188
10.3.1 Defining list types 188
10.3.2 Design hint: When should I use lists? 189
10.3.3 Restricting list types 190
10.3.3.1 Length facets 192
10.3.3.2 Enumeration facet 192
10.3.3.3 Pattern facet 194
10.3.4 Lists and strings 195
10.3.5 Lists of unions 196
10.3.6 Lists of lists 196
10.3.7 Restricting the item type 198
Chapter 11 Built-in simple types 200
11.1 The XML Schema type system 201
11.1.1 The type hierarchy 202
11.1.2 Value spaces and lexical spaces 204
11.1.3 Facets and built-in types 204
11.2 String-based types 205
11.2.1 string, normalizedString, and token 205
11.2.1.1 Design hint: Should I use string, normalizedString, or token? 207
11.2.2 Name 208
11.2.3 NCName 210
11.2.4 language 211
11.3 Numeric types 213
11.3.1 float and double 213
11.3.2 decimal 215
11.3.3 Integer types 217
11.3.3.1 Design hint: Is it an integer or a string? 220
11.4 Date and time types 221
11.4.1 date 221
11.4.2 time 222
11.4.3 dateTime 223
11.4.4 dateTimeStamp 224
11.4.5 gYear 225
11.4.6 gYearMonth 226
11.4.7 gMonth 227
11.4.8 gMonthDay 227
11.4.9 gDay 228
11.4.10 duration 229
11.4.11 yearMonthDuration 231
11.4.12 dayTimeDuration 232
11.4.13 Representing time zones 233
11.4.14 Facets 234
11.4.15 Date and time ordering 235
11.5 Legacy types 236
11.5.1 ID 236
11.5.2 IDREF 237
11.5.3 IDREFS 239
11.5.4 ENTITY 240
11.5.5 ENTITIES 242
11.5.6 NMTOKEN 243
11.5.7 NMTOKENS 244
11.5.8 NOTATION 245
11.6 Other types 246
11.6.1 QName 246
11.6.2 boolean 247
11.6.3 The binary types 248
11.6.4 anyURI 250
11.7 Comparing typed values 253
Chapter 12 Complex types 256
12.1 What are complex types? 257
12.2 Defining complex types 258
12.2.1 Named complex types 258
12.2.2 Anonymous complex types 260
12.2.3 Complex type alternatives 261
12.3 Content types 262
12.3.1 Simple content 262
12.3.2 Element-only content 264
12.3.3 Mixed content 264
12.3.4 Empty content 265
12.4 Using element declarations 266
12.4.1 Local element declarations 266
12.4.2 Element references 267
12.4.3 Duplication of element names 268
12.5 Using model groups 270
12.5.1 sequence groups 270
12.5.1.1 Design hint: Should I care about the order of elements? 272
12.5.2 choice groups 273
12.5.3 Nesting of sequence and choice groups 275
12.5.4 all groups 276
12.5.5 Named model group references 278
12.5.6 Deterministic content models 279
12.6 Using attribute declarations 281
12.6.1 Local attribute declarations 281
12.6.2 Attribute references 282
12.6.3 Attribute group references 284
12.6.4 Default attributes 284
12.7 Using wildcards 284
12.7.1 Element wildcards 285
12.7.1.1 Controlling the namespace of replacement elements 287
12.7.1.2 Controlling the strictness of validation 287
12.7.1.3 Negative wildcards 289
12.7.2 Open content models 292
12.7.2.1 Open content in a complex type 292
12.7.2.2 Default open content 295
12.7.3 Attribute wildcards 298
Chapter 13 Deriving complex types 300
13.1 Why derive types? 301
13.2 Restriction and extension 302
13.3 Simple content and complex content 303
13.3.1 simpleContent elements 303
13.3.2 complexContent elements 304
13.4 Complex type extensions 305
13.4.1 Simple content extensions 306
13.4.2 Complex content extensions 307
13.4.2.1 Extending choice groups 309
13.4.2.2 Extending all groups 310
13.4.2.3 Extending open content 311
13.4.3 Mixed content extensions 312
13.4.4 Empty content extensions 313
13.4.5 Attribute extensions 314
13.4.6 Attribute wildcard extensions 315
13.5 Complex type restrictions 316
13.5.1 Simple content restrictions 317
13.5.2 Complex content restrictions 318
13.5.2.1 Eliminating meaningless groups 320
13.5.2.2 Restricting element declarations 321
13.5.2.3 Restricting wildcards 322
13.5.2.4 Restricting groups 324
13.5.2.5 Restricting open content 329
13.5.3 Mixed content restrictions 331
13.5.4 Empty content restrictions 332
13.5.5 Attribute restrictions 333
13.5.6 Attribute wildcard restrictions 335
13.5.7 Restricting types from another namespace 337
13.5.7.1 Using targetNamespace on element and attribute declarations 339
13.6 Type substitution 341
13.7 Controlling type derivation and substitution 343
13.7.1 final: Preventing complex type derivation 343
13.7.2 block: Blocking substitution of derived types 344
13.7.3 Blocking type substitution in element declarations 346
13.7.4 abstract: Forcing derivation 346
Chapter 14 Assertions 350
14.1 Assertions 351
14.1.1 Assertions for simple types 353
14.1.1.1 Using XPath 2.0 operators 355
14.1.1.2 Using XPath 2.0 functions 357
14.1.1.3 Types and assertions 359
14.1.1.4 Inheriting simple type assertions 362
14.1.1.5 Assertions on list types 363
14.1.2 Assertions for complex types 365
14.1.2.1 Path expressions 367
14.1.2.2 Conditional expressions 369
14.1.2.3 Assertions in derived complex types 370
14.1.3 Assertions and namespaces 372
14.1.3.1 Using xpathDefaultNamespace 373
14.2 Conditional type assignment 375
14.2.1 The alternative element 376
14.2.2 Specifying conditional type assignment 377
14.2.3 Using XPath in the test attribute 378
14.2.4 The error type 380
14.2.5 Conditional type assignment and namespaces 381
14.2.6 Using inherited attributes in conditional type Assignment 382
Chapter 15 Named groups 384
15.1 Why named groups? 385
15.2 Named model groups 386
15.2.1 Defining named model groups 386
15.2.2 Referencing named model groups 388
15.2.2.1 Group references 388
15.2.2.2 Referencing a named model group in a complex type 389
15.2.2.3 Using all in named model groups 391
15.2.2.4 Named model groups referencing named model groups 392
15.3 Attribute groups 392
15.3.1 Defining attribute groups 393
15.3.2 Referencing attribute groups 395
15.3.2.1 Attribute group references 395
15.3.2.2 Referencing attribute groups in complex types 396
15.3.2.3 Duplicate attribute names 397
15.3.2.4 Duplicate attribute wildcard handling 398
15.3.2.5 Attribute groups referencing attribute groups 398
15.3.3 The default attribute group 399
15.4 Named groups and namespaces 401
15.5 Design hint: Named groups or complex type derivations? 403
Chapter 16 Substitution groups 406
16.1 Why substitution groups? 407
16.2 The substitution group hierarchy 408
16.3 Declaring a substitution group 409
16.4 Type constraints for substitution groups 412
16.5 Members in multiple groups 413
16.6 Alternatives to substitution groups 414
16.6.1 Reusable choice groups 414
16.6.2 Substituting a derived type in the instance 415
16.7 Controlling substitution groups 418
16.7.1 final: Preventing substitution group declarations 418
16.7.2 block: Blocking substitution in instances 419
16.7.3 abstract: Forcing substitution 420
Chapter 17 Identity constraints 422
17.1 Identity constraint categories 423
17.2 Design hint: Should I use ID/IDREF or key/keyref? 424
17.3 Structure of an identity constraint 424
17.4 Uniqueness constraints 426
17.5 Key constraints 428
17.6 Key references 430
17.6.1 Key references and scope 432
17.6.2 Key references and type equality 432
17.7 Selectors and fields 433
17.7.1 Selectors 433
17.7.2 Fields 434
17.8 XPath subset for identity constraints 435
17.9 Identity constraints and namespaces 439
17.9.1 Using xpathDefaultNamespace 441
17.10 Referencing identity constraints 442
Chapter 18 Redefining and overriding schema components 446
18.1 Redefinition 448
18.1.1 Redefinition basics 448
18.1.1.1 Include plus redefine 450
18.1.1.2 Redefine and namespaces 450
18.1.1.3 Pervasive impact 450
18.1.2 The mechanics of redefinition 451
18.1.3 Redefining simple types 452
18.1.4 Redefining complex types 453
18.1.5 Redefining named model groups 454
18.1.5.1 Defining a subset 454
18.1.5.2 Defining a superset 455
18.1.6 Redefining attribute groups 456
18.1.6.1 Defining a subset 457
18.1.6.2 Defining a superset 458
18.2 Overrides 459
18.2.1 Override basics 459
18.2.1.1 Include plus override 461
18.2.1.2 Override and namespaces 461
18.2.1.3 Pervasive impact 462
18.2.2 The mechanics of overriding components 462
18.2.3 Overriding simple types 464
18.2.4 Overriding complex types 465
18.2.5 Overriding element and attribute declarations 466
18.2.6 Overriding named groups 467
18.3 Risks of redefines and overrides 468
18.3.1 Risks of redefining or overriding types 468
18.3.2 Risks of redefining or overriding named groups 470
Chapter 19 Topics for DTD users 472
19.1 Element declarations 473
19.1.1 Simple types 474
19.1.2 Complex types with simple content 475
19.1.3 Complex types with complex content 476
19.1.4 Mixed content 478
19.1.5 Empty content 479
19.1.6 Any content 480
19.2 Attribute declarations 480
19.2.1 Attribute types 480
19.2.2 Enumerated attribute types 481
19.2.3 Notation attributes 482
19.2.4 Default values 482
19.3 Parameter entities for reuse 483
19.3.1 Reusing content models 484
19.3.2 Reusing attributes 485
19.4 Parameter entities for extensibility 486
19.4.1 Extensions for sequence groups 486
19.4.2 Extensions for choice groups 489
19.4.3 Attribute extensions 490
19.5 External parameter entities 492
19.6 General entities 493
19.6.1 Character and other parsed entities 493
19.6.2 Unparsed entities 493
19.7 Notations 493
19.7.1 Declaring a notation 494
19.7.2 Declaring a notation attribute 495
19.7.3 Notations and unparsed entities 496
19.8 Comments 497
19.9 Using DTDs and schemas together 499
Chapter 20 XML information modeling 500
20.1 Data modeling paradigms 502
20.2 Relational models 503
20.2.1 Entities and attributes 504
20.2.2 Relationships 507
20.2.2.1 One-to-one and one-to-many relationships 507
20.2.2.2 Many-to-many relationships 507
20.2.2.2.1 Approach #1: Use containment with repetition 508
20.2.2.2.2 Approach #2: Use containment with references 510
20.2.2.2.3 Approach #3: Use relationship elements 512
20.3 Modeling object-oriented concepts 514
20.3.1 Inheritance 514
20.3.2 Composition 519
20.4 Modeling web services 522
20.5 Considerations for narrative content 524
20.5.1 Semantics vs. style 524
20.5.1.1 Benefits of excluding styling 524
20.5.1.2 Rendition elements: “block” and “inline” 525
20.5.2 Considerations for schema design 526
20.5.2.1 Flexibility 526
20.5.2.2 Reusing existing vocabularies 526
20.5.2.3 Attributes are for metadata 526
20.5.2.4 Humans write the documents 527
20.6 Considerations for a hierarchical model 527
20.6.1 Intermediate elements 527
20.6.2 Wrapper lists 531
20.6.3 Level of granularity 532
20.6.4 Generic vs. specific elements 533
Chapter 21 Schema design and documentation 538
21.1 The importance of schema design 539
21.2 Uses for schemas 540
21.3 Schema design goals 542
21.3.1 Flexibility and extensibility 542
21.3.2 Reusability 543
21.3.3 Clarity and simplicity 545
21.3.3.1 Naming and documentation 545
21.3.3.2 Clarity of structure 546
21.3.3.3 Simplicity 546
21.3.4 Support for graceful versioning 547
21.3.5 Interoperability and tool compatibility 547
21.4 Developing a schema design strategy 548
21.5 Schema organization considerations 550
21.5.1 Global vs. local components 550
21.5.1.1 Russian Doll 551
21.5.1.2 Salami Slice 553
21.5.1.3 Venetian Blind 554
21.5.1.4 Garden of Eden 555
21.5.2 Modularizing schema documents 557
21.6 Naming considerations 559
21.6.1 Rules for valid XML names 559
21.6.2 Separators 560
21.6.3 Name length 560
21.6.4 Standard terms and abbreviations 561
21.6.5 Use of object terms 562
21.7 Namespace considerations 564
21.7.1 Whether to use namespaces 564
21.7.2 Organizing namespaces 565
21.7.2.1 Same namespace 565
21.7.2.2 Different namespaces 568
21.7.2.3 Chameleon namespaces 572
21.7.3 Qualified vs. unqualified forms 575
21.7.3.1 Qualified local names 575
21.7.3.2 Unqualified local names 576
21.7.3.3 Using form in schemas 576
21.7.3.4 Form and global element declarations 578
21.7.3.5 Default namespaces and unqualified names 578
21.7.3.6 Qualified vs. unqualified element names 579
21.7.3.7 Qualified vs. unqualified attribute names 580
21.8 Schema documentation 580
21.8.1 Annotations 581
21.8.2 User documentation 582
21.8.2.1 Documentation syntax 582
21.8.2.2 Data element definitions 584
21.8.2.3 Code documentation 585
21.8.2.4 Section comments 585
21.8.3 Application information 586
21.8.4 Non-native attributes 588
21.8.4.1 Design hint: Should I use annotations or non-native attributes? 589
21.8.5 Documenting namespaces 589
Chapter 22 Extensibility and reuse 594
22.1 Reuse 596
22.1.1 Reusing schema components 596
22.1.2 Creating schemas that are highly reusable 597
22.1.3 Developing a common components library 597
22.2 Extending schemas 599
22.2.1 Wildcards 601
22.2.2 Open content 604
22.2.3 Type substitution 605
22.2.4 Substitution groups 607
22.2.5 Type redefinition 609
22.2.6 Named group redefinition 611
22.2.7 Overrides 612
Chapter 23 Versioning 616
23.1 Schema compatibility 617
23.1.1 Backward compatibility 618
23.1.2 Forward compatibility 623
23.2 Using version numbers 626
23.2.1 Major and minor versions 626
23.2.2 Placement of version numbers 628
23.2.2.1 Version numbers in schema documents 628
23.2.2.2 Versions in schema locations 630
23.2.2.3 Versions in instances 631
23.2.2.4 Versions in namespace names 632
23.2.2.5 A combination strategy 633
23.3 Application compatibility 634
23.4 Lessening the impact of versioning 635
23.4.1 Define a versioning strategy 636
23.4.2 Make only necessary changes 636
23.4.3 Document all changes 637
23.4.4 Deprecate components before deleting them 638
23.4.5 Provide a conversion capability 639
23.5 Versions of the XML Schema language 639
23.5.1 New features in version 1.1 640
23.5.2 Forward compatibility of XML Schema 1.1 641
23.5.3 Portability of implementation-defined types and facets 642
23.5.3.1 Using typeAvailable and typeUnavailable 644
23.5.3.2 Using facetAvailable and facetUnavailable 645
Appendix A XSD keywords 648
A.1 Elements 649
A.2 Attributes 671
Appendix B Built-in simple types 690
B.1 Built-in simple types 691
B.2 Applicability of facets to built-in simple types 695
Index 699
Preface
Schemas:An Introduction Chapter 1
This chapter provides a brief explanation of schemas and why they are important. It also discusses the basic schema design goals, and describes the various existing schema languages.
1.1 What is an XML schema?The word schema means a diagram, plan, or framework. In XML, it refers to a document that describes an XML document. Suppose you have the XML instance shown in Example 1-1. It consists of a product element that has two children (number and size) and an attribute (effDate).
Example 1-2 shows a schema that describes the instance. It contains element and attribute declarations that assign data types and element-type names to elements and attributes.
Example 1-1. Product instance<product effDate="2001-04-02"> <number>557</number> <size>10</size></product>Example 1-2. Product schema
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="product" type="ProductType"/> <xsd:complexType name="ProductType"> <xsd:sequence> <xsd:element name="number" type="xsd:integer"/> <xsd:element name="size" type="SizeType"/> </xsd:sequence> <xsd:attribute name="effDate" type="xsd:date"/> </xsd:complexType> <xsd:simpleType name="SizeType"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="2"/> <xsd:maxInclusive value="18"/> </xsd:restriction> </xsd:simpleType></xsd:schema>1.2 The purpose of schemas1.2.1 Data validation
One of the most common usesfor schemas is to verify that an XML document is valid according to a defined set of rules. A schema can be used to validate:
- The structure of elements and attributes. For example, a product must have a number and a size, and may optionally have an effDate (effective date).
- The order of elements. For example, number must appear before size.
- The data values of attributes and elements, based on ranges, enumerations, and pattern matching. For example, size must be an integer between 2 and 18, and effDate must be a valid date.
- The uniqueness of values in an instance. For example, all product numbers in an instance must be unique.
Often, XML instances are passed between organizations. A schema may act as a contract with your trading partners. It clearly lays out the rules for document structure and what is required. Since an instance can be validated against a schema, the "contract" can be enforced using available tools.
1.2.3 System documentationSchemas can provide documentation about the data in an XML instance. Anyone who needs to understand the data can refer to the schema for information about names, structures, and data types of the items. To include further documentation, you can add annotations to any schema component.
1.2.4 Augmentation of dataSchema processing can also add to the instance. It inserts default and fixed values for elements and attributes, and normalizes whitespace according to the data type.
1.2.5 Application informationSchemas provide a way for additional information about the data to be supplied to the application when processing a particular type of document. For example, you could include information on how to map the product element instances to a database table, and have the application use this information to automatically update a particular table with the data.
In addition to being available at processing time, this information in schemas can be used to generate code such as:
- User interfaces for editing the information. For example, if you know that size is between 2 and 18, you can generate an interface that has a slider bar with these values as the limits.
- Stylesheets to transform the instance data into a reader-friendly representation such as XHTML. For example, if you know that the human-readable name for the content of a number element is "Product Number" you can use this as a column header.
- Code to insert or extract the data from a database. For example, if you know that the product number maps to the PROD_NUM column on the PRODUCTS table, you can generate an efficient routine to insert it into that column.
Tools have only just begun to take advantage of the possibilities of schemas. In the coming years, we will see schemas used in many creative new ways.
1.3 Schema designXML Schema is packed with features, and there are often several ways to accurately describe the same thing. The decisions made during schema design can affect its usability, accuracy, and applicability. Therefore, it is important to keep in mind your design objectives when creating a schema. These objectives may vary depending on how you are using XML, but some are common to all use cases.
1.3.1 Accuracy and precisionObviously, a schema should accurately describe an XML instance and allow it to be validated. Schemas should also be precise in describing data. Precision can result in more complete validation as well as better documentation. Precision can be achieved by defining restrictive data types that truly represent valid values.
1.3.2 ClaritySchemas should be very clear, allowing a reader to instantly understand the structure and characteristics of the instance being described. Clarity can be achieved by:
- appropriate choice of names,
- consistency in naming,
- consistency in structure,
- good documentation,
- avoiding unnecessary complexity.
There is a temptation to create schemas that are useful only for a specific application purpose. In some cases, this may be appropriate. However, it is better to create a schema that has broader applicability. For example, a business unit that handles only domestic accounts may not use a country element declaration as part of an address. They should consider adding it in as an optional element for the purposes of consis-tency and future usability.
There are two components to a schema's broad applicability: reusability and extensibility. Reusable schema components are modular and well documented, encouraging schema authors to reuse them in other schemas. Extensible components are flexible and open, allowing other schema authors to build on them for future uses. Since reusability and extensibility are important, all of Chapter 21, "Extensibility and reuse," is devoted to them.
1.4 Schema languages1.4.1 Document Type Definitions (DTDs)Document Type Definitions (DTDs) are a commonly used method of describing XML documents. They allow you to define the basic structure of an XML instance, including:
- the structure and order of child elements in an element type,
- the attributes of an element type,
- basic data typing for attributes,
- default and fixed values for attributes,
- notations to represent other data formats.
Example 1-3 shows a DTD that is roughly equivalent to our schema in Example 1-2.
Example 1-3. Product DTD<!ELEMENT product (name, size?)><!ELEMENT name (#PCDATA)><!ELEMENT size (#PCDATA)><!ATTLIST product effDate CDATA #IMPLIED>
DTDs have many advantages. They are relatively simple, have a compact syntax, and are widely understood by XML implementers. When designed well, they can be extremely modular, flexible, and extensible.
However, DTDs also have some shortcomings. They have their own non-XML syntax, do not support namespaces easily, and provide very limited data typing, for attributes only.
1.4.2 Enter schemasAs XML became increasingly popular for data applications such as e-commerce and enterprise application integration (EAI), a more robust schema language was needed. Specifically, XML developers wanted:
- The ability to constrain data based on common data types such as integer and date.
- The ability to define their own data types in order to further constrain data.
- Support for namespaces.
- The ability to specify multiple declarations for the same element-type name in different contexts.
- Object oriented features such as type derivation. The ability to express types as extensions or restrictions of other types allows them to be processed similarly and substituted for each other.
- A schema language that uses XML syntax. This is advantageous because it is extensible, can represent more advanced models and can be processed by many available tools.
- The ability to add structured documentation and application information that is passed to the application during processing.
DTDs are not likely to disappear now that schemas have arrived on the scene. They are supported in many tools, are widely understood, and are currently in use in many applications. In addition, they continue to be useful as a lightweight alternative to schemas.
1.4.3 W3C XML SchemaFour schema languages were developed before work began on XML Schema: XDR (XML Data Reduced), DCD, SOX, and DDML. These four languages were considered together as a starting point for XML Schema, and many of their originators were involved in the creation of XML Schema.
The World Wide Web Consortium (W3C) began work on XML Schema in 1998. The first version, upon which this book is based, became an official Recommendation on May 2, 2001. The formal Recommendation is in three parts:
- XML Schema Part 0: Primer is a non-normative introduction to XML Schema that provides a lot of examples and explanations. It can be found at http://www.w3.org/TR/xmlschema-0/
- XML Schema Part 1: Structures describes most of the components of XML Schema. It can be found at http://www.w3.org/TR/xmlschema-1/
- XML Schema Part 2: Datatypes covers simple data types. It explains the built-in data types and the facets that may be used to restrict them. It is a separate document so that other specifications may use it, without including all of XML Schema. It can be found at http://www.w3.org/TR/xmlschema-2/
"XML Schema" is the official name of the Recommendation and is also sometimes used to refer to conforming schema documents. In order to clearly distinguish between the two, this book uses the term "XML Schema" only to mean the Recommendation itself.
A "schema definition" is the formal expression of a schema.
The initialism "XSDL" (XML Schema Definition Language) is used to refer to the language that is used to create schema definitions in XML. In other words, XSDL is the markup language that uses elements such as schema and complexType.
The term "schema document" is used to refer to an XML document that is written in XSDL, with a schema element as its root. The extension "xsd" is used in the file identifiers of such documents. A schema definition may consist of one or more schema documents, as described in Chapter 4, "Schema composition."
As it is unlikely to cause confusion in this book, for simplicity the word "schema" will be used to refer to both a schema as a concept, and an actual schema definition that conforms to the XML Schema definition language.
1.4.4.2 TypeAccording to the XML Recommendation, every XML element has an element type. In fact, it is the name of the element type that occurs in the start- and end-tags, as individual elements do not have names (although they may have IDs).
XML Schema, however, uses the word "type" exclusively as a shorthand to refer to simple types and complex types. Perhaps to avoid confusion with this usage, the Recommendation does not use the phrase "element type" in conjunction with schemas. This book follows that same practice and generally doesn't speak of element types per se, although it does refer to "element-type names" where appropriate.
1.4.5 Additional schema languagesXML Schema is not the only schema language that is currently in use. While it is very robust, it is not always the most appropriate schema language for all cases. This section describes two other schema languages.
1.4.5.1 RELAX NGRELAX NG covers some of the same ground as XML Schema. As of this writing, it is currently being developed by an OASIS technical committee. RELAX NG is intended only for validation; the processor does not pass documentation or application information from the schema to the application. RELAX NG does not have built-in data types; it is designed to use other data type libraries (such as that of XML Schema).
RELAX NG has some handy features that are not currently part of XML Schema:
- It includes attributes in the elements' content models. For example, you can specify that a product element must either have an effectiveDate attribute or a startDate attribute. XML Schema does not currently provide a way to do this.
- It allows a content model to depend on the value of an attribute. For example, if the value of the type attribute of a product element is shirt, this product element can contain a size child. If it is umbrella, it cannot. XML Schema provides a similar mechanism through type substitution, but it is less flexible.
- It allows you to specify a content model such as "one number, one size, and up to three color elements, in any order." This is quite cumbersome to express in XML Schema if you do not want to enforce a particular order.
- It does not require content models to be deterministic. This is explained in Section 13.5.6, "Deterministic content models."
However, RELAX NG also has some limitations compared to XML Schema:
- It has no inheritance capabilities. XML Schema's restriction and extension mechanisms allow type substitution and many other benefits, described in Section 14.1, "Why derive types?"
- Because it is only intended for validation, it does not provide application information to the processor. In fact, the RELAX NG processor passes the exact same information that is available from a DTD to the application. This is not a disadvantage if your only objective is validation, but it does not allow you to use the schema to help you understand how to process the instance.
For more information on RELAX NG, see http://www.oasis-open.org/committees/relax-ng/
1.4.5.2 SchematronSchematron takes a different approach from XML Schema and RELAX NG. XML Schema and RELAX NG are both grammar-based schema languages. They specify what must appear in an instance, and in what order.
By contrast, Schematron is rule-based. It allows you to define a series of rules to which the document must conform. These rules are expressed using XPath. In contrast to grammar-based languages, Schematron considers anything that does not violate a rule to be valid. There is no need to declare every element type or attribute that may appear in the instance.
Like RELAX NG, Schematron is intended only for validation of instances. It has a number of advantages:
- It is very easy to learn and use. It uses XPath, which is familiar to many people already using XML.
- The use of XPath allows it to very flexibly and succinctly express relationships between elements in a way that is not possible with other schema languages.
- The values in an instance can be involved in validation. For example, in XSDL it is not possible to express "If the value of newCustomer is false, then customerID must appear." Schematron allows such co-occurrence constraints.
The limitations of Schematron compared to XML Schema are:
- It does not provide a model of the instance data. A person cannot gain an understanding of what instance data is expected by looking at the schema.
- It is intended only for validation, and it cannot be used to pass any information about the instance, such as data types or default values, to an application.
- Anything is valid unless it is specifically prohibited. This puts a burden to anticipate all possible errors on the schema author.
Because Schematron and XML Schema complement each other, it makes sense to combine the two. An example of embedding a Schematron schema in XSDL is provided in Section 6.3.2, "Schematron for co-occurrence constraints." For more information on Schematron, see http://www.ascc.net/xml/resource/schematron/schema-tron.html