Definitive XML Schema

“XML Schema 1.1 has gone from strong data typing to positively stalwart—so powerful it can enforce database level constraints and business rules, so your data transfer code won’t have to. This book covers the 1.1 changes—and more—in its 500 revisions to Priscilla Walmsley’s 10-year best-selling classic. It’s the guide you need to navigate XML Schema’s complexity—and master its power!”

—Charles F. Goldfarb

For Ten Years the World’s Favorite Guide to XML Schema—Now Extensively Revised for Version 1.1 and Today’s Best Practices!

To leverage XML’s full power, organizations need shared vocabularies based on XML Schema. For a full decade, Definitive XML Schema has been the most practical, accessible, and usable guide to working with XML Schema. Now, author Priscilla Walmsley has thoroughly updated her classic to fully reflect XML Schema 1.1, and to present new best practices for designing successful schemas.

Priscilla helped create XML Schema as a member of the W3C XML Schema Working Group, so she is well qualified to explain the W3C recommendation with insight and clarity. Her book teaches practical techniques for writing schemas to support any application, including many new use cases. You’ll discover how XML Schema 1.1 provides a rigorous, complete specification for modeling XML document structure, content, and datatypes; and walk through the many aspects of designing and applying schemas, including composition, instance validation, documentation, and namespaces. Then, building on the fundamentals, Priscilla introduces powerful advanced techniques ranging from type derivation to identity constraints. This edition’s extensive new coverage includes

Many new design hints, tips, and tricks – plus a full chapter on creating an enterprise strategy for schema development and maintenance
Design considerations in creating schemas for relational and object-oriented models, narrative content, and Web services
An all-new chapter on assertions
Coverage of new 1.1 features, including overrides, conditional type assignment, open content and more
Modernized rules for naming and design
Substantially updated coverage of extensibility, reuse, and versioning
And much more

If you’re an XML developer, architect, or content specialist, with this Second Edition you can join the tens of thousands who rely on Definitive XML Schema for practical insights, deeper understanding, and solutions that work.

1133468619

Definitive XML Schema

—Charles F. Goldfarb

For Ten Years the World’s Favorite Guide to XML Schema—Now Extensively Revised for Version 1.1 and Today’s Best Practices!

Many new design hints, tips, and tricks – plus a full chapter on creating an enterprise strategy for schema development and maintenance
Design considerations in creating schemas for relational and object-oriented models, narrative content, and Web services
An all-new chapter on assertions
Coverage of new 1.1 features, including overrides, conditional type assignment, open content and more
Modernized rules for naming and design
Substantially updated coverage of extensibility, reuse, and versioning
And much more

57.99 In Stock

Definitive XML Schema

Add to Wishlist

Definitive XML Schema

eBook

$57.99

eBook
$57.99

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.

WANT A NOOK? Explore Now

Buy As Gift

Related collections and offers

Overview

—Charles F. Goldfarb

For Ten Years the World’s Favorite Guide to XML Schema—Now Extensively Revised for Version 1.1 and Today’s Best Practices!

Many new design hints, tips, and tricks – plus a full chapter on creating an enterprise strategy for schema development and maintenance
Design considerations in creating schemas for relational and object-oriented models, narrative content, and Web services
An all-new chapter on assertions
Coverage of new 1.1 features, including overrides, conditional type assignment, open content and more
Modernized rules for naming and design
Substantially updated coverage of extensibility, reuse, and versioning
And much more

Product Details

ISBN-13:	9780132886758
Publisher:	Pearson Education
Publication date:	09/04/2012
Series:	Charles F. Goldfarb Definitive XML Series
Sold by:	Barnes & Noble
Format:	eBook
Pages:	768
File size:	60 MB
Note:	This product may take a few minutes to download.
Age Range:	18 Years

About the Author

PRISCILLA WALMSLEYserves as Managing Director of Datypic, a consultancy specializing in XML architecture and design, SOA and Web services implementation, and content management.

Read an Excerpt

Chapter 9: Simple types

Both element and attribute declarations can use simple types to describe the data content of the components. This chapter introduces simple types, and explains how to define your own atomic simple types for use in your schemas.

9.1 Simple type varieties

There are three varieties of simple type: atomic types, list types, and union types.

Atomic types have values that are indivisible, such as 10 and large.
List types have values that are whitespace-separated lists of atomic values, such as <availableSizes>10 large 2</availableSizes>.
Union types may have values that are either atomic values or list values. What differentiates them is that the set of valid values, or "value space," for the type is the union of the value spaces of two or more other simple types. For example, to represent a dress size, you may define a union type that allows a value to be either an integer from 2 through 18, or one of the string values small, medium, or large.

List and union types are covered in Chapter 11, "Union and list types."

9.1.1 Design hint: How much should I break down my data values?

Data values should be broken down to the most atomic level possible. This allows them to be processed in a variety of ways for different uses, such as display, mathematical operations, and validation. It is much easier to concatenate two data values back together than it is to split them apart. In addition, more granular data is much easier to validate. It is a fairly common practice to put a data value and its units in the same element, for example <length>3cm</length>. How-ever, the preferred approach is to have a separate data value, preferably an attribute, for the units, for example <length units="cm">3</length>.

Using a single concatenated value is limiting because:

It is extremely cumbersome to validate. You have to apply a complicated pattern that would need to change every time a unit type is added.
You cannot perform comparisons, conversions, or mathematical operations on the data without splitting it apart.
If you want to display the data item differently (for example, as "3 centimeters" or "3 cm" or just "3", you have to split it apart. This complicates the stylesheets and applications that process the instance document.

It is possible to go too far, though. For example, you may break a date down as follows:

<orderDate>
<year>2001</year>
<month>06</month>
<day>15</day>
</orderDate>

This is probably an overkill unless you have a special need to process these items separately.

9.2 Simple type definitions

9.2.1 Named simple types

Simple types can be either named or anonymous. Named simple types are always defined globally (i.e., their parent is always schema or redefine) and are required to have a name that is unique among the data types (both simple and complex) in the schema. The XSDL syntax for a named simple type definition is shown in Table 9–1.

The name of a simple type must be an XML non-colonized name, which means that it must start with a letter or underscore, and may only contain letters, digits, underscores, hyphens, and periods. You cannot include a namespace prefix when defining the type; it takes its namespace from the target namespace of the schema document. All of the examples of named types in this book have the word "Type" at the end of their names, to clearly distinguish them from element-type names and attribute names. However, this is not a requirement; you may in fact have a data type definition and an element declaration using the same name.

Example 9–1 shows the definition of a named simple type Dress-SizeType, along with an element declaration that references it. Named types can be used in multiple element and attribute declarations.

Example 9–1. Defining and referencing a named simple type

<xsd:simpleType name="DressSizeType">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="2"/>
<xsd:maxInclusive value="18"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:element name="size" type="DressSizeType"/>

9.2.2 Anonymous simple types
Anonymous types, on the other hand, must not have names. They are always defined entirely within an element or attribute declaration, and may only be used once, by that declaration. Defining a type anonymously prevents it from ever being restricted, used in a list or union, or redefined. The XSDL syntax to define an anonymous simple type is shown in Table 9–2.

Example 9–2 shows the definition of an anonymous simple type within an element declaration.

Example 9–2. Defining an anonymous simple type

<xsd:element name="size">
<xsd:simpleType>
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="2"/>
<xsd:maxInclusive value="18"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>

9.2.3 Design hint: Should I use named or anonymous types?

The advantage of named types is that they may be defined once and used many times. For example, you may define a type named Product-CodeType that lists all of the valid product codes in your organization.

This type can then be used in many element and attribute declarations in many schemas. This has the advantages of:

encouraging consistency throughout the organization,
reducing the possibility of error,
requiring less time to define new schemas,
simplifying maintenance, because new product codes need only be added in one place.

Named types can also make the schema more readable, when the type definitions are complex.

An anonymous type, on the other hand, can be used only in the element or attribute declaration that contains it. It can never be redefined, have types derived from it, or be used in a list or union type. This can seriously limit its reusability, extensibility, and ability to change over time.

However, there are cases where anonymous types are preferable to named types. If the type is unlikely to ever be reused, the advantages listed above no longer apply. Also, there is such a thing as too much reuse. For example, if an element can contain the values 1 through 10, it does not make sense to try to define a data type named OneToTen-Type that is reused by other unrelated element declarations with the same value space. If the value space for one of the element declarations that uses the named data type changes, but the other element declarations do not change, it actually makes maintenance more difficult, because a new data type needs to be defined at that time.

In addition, anonymous types can be more readable when they are relatively simple. It is sometimes desirable to have the definition of the data type right there with the element or attribute declaration....

Foreword xxxi

Acknowledgments xxxiii

How to use this book xxxv

Chapter 1 Schemas: An introduction 2

1.1 What is a schema? 3

1.2 The purpose of schemas 5

1.2.1 Data validation 5

1.2.2 A contract with trading partners 5

1.2.3 System documentation 6

1.2.4 Providing information to processors 6

1.2.5 Augmentation of data 6

1.2.6 Application information 6

1.3 Schema design 7

1.3.1 Accuracy and precision 7

1.3.2 Clarity 8

1.3.3 Broad applicability 8

1.4 Schema languages 9

1.4.1 Document Type Definition (DTD) 9

1.4.2 Schema requirements expand 10

1.4.3 W3C XML Schema 11

1.4.4 Other schema languages 12

1.4.4.1 RELAX NG 12

1.4.4.2 Schematron 13

Chapter 2 A quick tour of XML Schema 16

2.1 An example schema 17

2.2 The components of XML Schema 18

2.2.1 Declarations vs. definitions 18

2.2.2 Global vs. local components 19

2.3 Elements and attributes 20

2.3.1 The tag/type distinction 20

2.4 Types 21

2.4.1 Simple vs. complex types 21

2.4.2 Named vs. anonymous types 22

2.4.3 The type definition hierarchy 22

2.5 Simple types 23

2.5.1 Built-in simple types 23

2.5.2 Restricting simple types 24

2.5.3 List and union types 24

2.6 Complex types 25

2.6.1 Content types 25

2.6.2 Content models 26

2.6.3 Deriving complex types 27

2.7 Namespaces and XML Schema 28

2.8 Schema composition 29

2.9 Instances and schemas 30

2.10 Annotations 31

2.11 Advanced features 32

2.11.1 Named groups 32

2.11.2 Identity constraints 32

2.11.3 Substitution groups 32

2.11.4 Redefinition and overriding 33

2.11.5 Assertions 33

Chapter 3 Namespaces 34

3.1 Namespaces in XML 35

3.1.1 Namespace names 36

3.1.2 Namespace declarations and prefixes 37

3.1.3 Default namespace declarations 39

3.1.4 Name terminology 40

3.1.5 Scope of namespace declarations 41

3.1.6 Overriding namespace declarations 42

3.1.7 Undeclaring namespaces 43

3.1.8 Attributes and namespaces 44

3.1.9 A summary example 46

3.2 The relationship between namespaces and schemas 48

3.3 Using namespaces in schemas 48

3.3.1 Target namespaces 48

3.3.2 The XML Schema Namespace 50

3.3.3 The XML Schema Instance Namespace 51

3.3.4 The Version Control Namespace 51

3.3.5 Namespace declarations in schema documents 52

3.3.5.1 Map a prefix to the XML Schema Namespace 52

3.3.5.2 Map a prefix to the target namespace 53

3.3.5.3 Map prefixes to all namespaces 54

Chapter 4 Schema composition 56

4.1 Modularizing schema documents 57

4.2 Defining schema documents 58

4.3 Combining multiple schema documents 61

4.3.1 include 62

4.3.1.1 The syntax of includes 63

4.3.1.2 Chameleon includes 65

4.3.2 import 66

4.3.2.1 The syntax of imports 67

4.3.2.2 Multiple levels of imports 70

4.3.2.3 Multiple imports of the same namespace 72

4.4 Schema assembly considerations 75

4.4.1 Uniqueness of qualified names 75

4.4.2 Missing components 76

4.4.3 Schema document defaults 77

Chapter 5 Instances and schemas 78

5.1 Using the instance attributes 79

5.2 Schema processing 81

5.2.1 Validation 81

5.2.2 Augmenting the instance 82

5.3 Relating instances to schemas 83

5.3.1 Using hints in the instance 84

5.3.1.1 The xsi:schemaLocation attribute 84

5.3.1.2 The xsi:noNamespaceSchemaLocation attribute 86

5.4 The root element 87

Chapter 6 Element declarations 88

6.1 Global and local element declarations 89

6.1.1 Global element declarations 89

6.1.2 Local element declarations 93

6.1.3 Design hint: Should I use global or local element

declarations? 95

6.2 Declaring the types of elements 96

6.3 Qualified vs. unqualified forms 98

6.3.1 Qualified local names 98

6.3.2 Unqualified local names 98

6.3.3 Using elementFormDefault 99

6.3.4 Using form 100

6.3.5 Default namespaces and unqualified names 101

6.4 Default and fixed values 101

6.4.1 Default values 102

6.4.2 Fixed values 103

6.5 Nils and nillability 105

6.5.1 Using xsi:nil in an instance 108

6.5.2 Making elements nillable 109

Chapter 7 Attribute declarations 112

7.1 Attributes vs. elements 113

7.2 Global and local attribute declarations 115

7.2.1 Global attribute declarations 115

7.2.2 Local attribute declarations 117

7.2.3 Design hint: Should I use global or local attributedeclarations? 119

7.3 Declaring the types of attributes 120

7.4 Qualified vs. unqualified forms 122

7.5 Default and fixed values 123

7.5.1 Default values 124

7.5.2 Fixed values 125

7.6 Inherited attributes 126

Chapter 8 Simple types 128

8.1 Simple type varieties 129

8.1.1 Design hint: How much should I break down my datavalues? 130

8.2 Simple type definitions 131

8.2.1 Named simple types 131

8.2.2 Anonymous simple types 132

8.2.3 Design hint: Should I use named or anonymous types? 133

8.3 Simple type restrictions 135

8.3.1 Defining a restriction 136

8.3.2 Overview of the facets 137

8.3.3 Inheriting and restricting facets 139

8.3.4 Fixed facets 140

8.3.4.1 Design hint: When should I fix a facet? 141

8.4 Facets 142

8.4.1 Bounds facets 142

8.4.2 Length facets 143

8.4.2.1 Design hint: What if I want to allow empty values? 143

8.4.2.2 Design hint: What if I want to restrict the length of an integer? 144

8.4.3 totalDigits and fractionDigits 145

8.4.4 Enumeration 145

8.4.5 Pattern 148

8.4.6 Assertion 150

8.4.7 Explicit Time Zone 150

8.4.8 Whitespace 151

8.5 Preventing simple type derivation 152

8.6 Implementation-defined types and facets 154

8.6.1 Implementation-defined types 154

8.6.2 Implementation-defined facets 155

Chapter 9 Regular expressions 158

9.1 The structure of a regular expression 159

9.2 Atoms 161

9.2.1 Normal characters 162

9.2.2 The wildcard escape character 164

9.2.3 Character class escapes 164

9.2.3.1 Single-character escapes 165

9.2.3.2 Multicharacter escapes 166

9.2.3.3 Category escapes 167

9.2.3.4 Block escapes 170

9.2.4 Character class expressions 171

9.2.4.1 Listing individual characters 171

9.2.4.2 Specifying a range 172

9.2.4.3 Combining individual characters and ranges 173

9.2.4.4 Negating a character class expression 173

9.2.4.5 Subtracting from a character class expression 174

9.2.4.6 Escaping rules for character class expressions 175

9.2.5 Parenthesized regular expressions 175

9.3 Quantifiers 176

9.4 Branches 177

Chapter 10 Union and list types 180

10.1 Varieties and derivation types 181

10.2 Union types 183

10.2.1 Defining union types 183

10.2.2 Restricting union types 185

10.2.3 Unions of unions 186

10.2.4 Specifying the member type in the instance 187

10.3 List types 188

10.3.1 Defining list types 188

10.3.2 Design hint: When should I use lists? 189

10.3.3 Restricting list types 190

10.3.3.1 Length facets 192

10.3.3.2 Enumeration facet 192

10.3.3.3 Pattern facet 194

10.3.4 Lists and strings 195

10.3.5 Lists of unions 196

10.3.6 Lists of lists 196

10.3.7 Restricting the item type 198

Chapter 11 Built-in simple types 200

11.1 The XML Schema type system 201

11.1.1 The type hierarchy 202

11.1.2 Value spaces and lexical spaces 204

11.1.3 Facets and built-in types 204

11.2 String-based types 205

11.2.1 string, normalizedString, and token 205

11.2.1.1 Design hint: Should I use string, normalizedString, or token? 207

11.2.2 Name 208

11.2.3 NCName 210

11.2.4 language 211

11.3 Numeric types 213

11.3.1 float and double 213

11.3.2 decimal 215

11.3.3 Integer types 217

11.3.3.1 Design hint: Is it an integer or a string? 220

11.4 Date and time types 221

11.4.1 date 221

11.4.2 time 222

11.4.3 dateTime 223

11.4.4 dateTimeStamp 224

11.4.5 gYear 225

11.4.6 gYearMonth 226

11.4.7 gMonth 227

11.4.8 gMonthDay 227

11.4.9 gDay 228

11.4.10 duration 229

11.4.11 yearMonthDuration 231

11.4.12 dayTimeDuration 232

11.4.13 Representing time zones 233

11.4.14 Facets 234

11.4.15 Date and time ordering 235

11.5 Legacy types 236

11.5.1 ID 236

11.5.2 IDREF 237

11.5.3 IDREFS 239

11.5.4 ENTITY 240

11.5.5 ENTITIES 242

11.5.6 NMTOKEN 243

11.5.7 NMTOKENS 244

11.5.8 NOTATION 245

11.6 Other types 246

11.6.1 QName 246

11.6.2 boolean 247

11.6.3 The binary types 248

11.6.4 anyURI 250

11.7 Comparing typed values 253

Chapter 12 Complex types 256

12.1 What are complex types? 257

12.2 Defining complex types 258

12.2.1 Named complex types 258

12.2.2 Anonymous complex types 260

12.2.3 Complex type alternatives 261

12.3 Content types 262

12.3.1 Simple content 262

12.3.2 Element-only content 264

12.3.3 Mixed content 264

12.3.4 Empty content 265

12.4 Using element declarations 266

12.4.1 Local element declarations 266

12.4.2 Element references 267

12.4.3 Duplication of element names 268

12.5 Using model groups 270

12.5.1 sequence groups 270

12.5.1.1 Design hint: Should I care about the order of elements? 272

12.5.2 choice groups 273

12.5.3 Nesting of sequence and choice groups 275

12.5.4 all groups 276

12.5.5 Named model group references 278

12.5.6 Deterministic content models 279

12.6 Using attribute declarations 281

12.6.1 Local attribute declarations 281

12.6.2 Attribute references 282

12.6.3 Attribute group references 284

12.6.4 Default attributes 284

12.7 Using wildcards 284

12.7.1 Element wildcards 285

12.7.1.1 Controlling the namespace of replacement elements 287

12.7.1.2 Controlling the strictness of validation 287

12.7.1.3 Negative wildcards 289

12.7.2 Open content models 292

12.7.2.1 Open content in a complex type 292

12.7.2.2 Default open content 295

12.7.3 Attribute wildcards 298

Chapter 13 Deriving complex types 300

13.1 Why derive types? 301

13.2 Restriction and extension 302

13.3 Simple content and complex content 303

13.3.1 simpleContent elements 303

13.3.2 complexContent elements 304

13.4 Complex type extensions 305

13.4.1 Simple content extensions 306

13.4.2 Complex content extensions 307

13.4.2.1 Extending choice groups 309

13.4.2.2 Extending all groups 310

13.4.2.3 Extending open content 311

13.4.3 Mixed content extensions 312

13.4.4 Empty content extensions 313

13.4.5 Attribute extensions 314

13.4.6 Attribute wildcard extensions 315

13.5 Complex type restrictions 316

13.5.1 Simple content restrictions 317

13.5.2 Complex content restrictions 318

13.5.2.1 Eliminating meaningless groups 320

13.5.2.2 Restricting element declarations 321

13.5.2.3 Restricting wildcards 322

13.5.2.4 Restricting groups 324

13.5.2.5 Restricting open content 329

13.5.3 Mixed content restrictions 331

13.5.4 Empty content restrictions 332

13.5.5 Attribute restrictions 333

13.5.6 Attribute wildcard restrictions 335

13.5.7 Restricting types from another namespace 337

13.5.7.1 Using targetNamespace on element and attribute declarations 339

13.6 Type substitution 341

13.7 Controlling type derivation and substitution 343

13.7.1 final: Preventing complex type derivation 343

13.7.2 block: Blocking substitution of derived types 344

13.7.3 Blocking type substitution in element declarations 346

13.7.4 abstract: Forcing derivation 346

Chapter 14 Assertions 350

14.1 Assertions 351

14.1.1 Assertions for simple types 353

14.1.1.1 Using XPath 2.0 operators 355

14.1.1.2 Using XPath 2.0 functions 357

14.1.1.3 Types and assertions 359

14.1.1.4 Inheriting simple type assertions 362

14.1.1.5 Assertions on list types 363

14.1.2 Assertions for complex types 365

14.1.2.1 Path expressions 367

14.1.2.2 Conditional expressions 369

14.1.2.3 Assertions in derived complex types 370

14.1.3 Assertions and namespaces 372

14.1.3.1 Using xpathDefaultNamespace 373

14.2 Conditional type assignment 375

14.2.1 The alternative element 376

14.2.2 Specifying conditional type assignment 377

14.2.3 Using XPath in the test attribute 378

14.2.4 The error type 380

14.2.5 Conditional type assignment and namespaces 381

14.2.6 Using inherited attributes in conditional type Assignment 382

Chapter 15 Named groups 384

15.1 Why named groups? 385

15.2 Named model groups 386

15.2.1 Defining named model groups 386

15.2.2 Referencing named model groups 388

15.2.2.1 Group references 388

15.2.2.2 Referencing a named model group in a complex type 389

15.2.2.3 Using all in named model groups 391

15.2.2.4 Named model groups referencing named model groups 392

15.3 Attribute groups 392

15.3.1 Defining attribute groups 393

15.3.2 Referencing attribute groups 395

15.3.2.1 Attribute group references 395

15.3.2.2 Referencing attribute groups in complex types 396

15.3.2.3 Duplicate attribute names 397

15.3.2.4 Duplicate attribute wildcard handling 398

15.3.2.5 Attribute groups referencing attribute groups 398

15.3.3 The default attribute group 399

15.4 Named groups and namespaces 401

15.5 Design hint: Named groups or complex type derivations? 403

Chapter 16 Substitution groups 406

16.1 Why substitution groups? 407

16.2 The substitution group hierarchy 408

16.3 Declaring a substitution group 409

16.4 Type constraints for substitution groups 412

16.5 Members in multiple groups 413

16.6 Alternatives to substitution groups 414

16.6.1 Reusable choice groups 414

16.6.2 Substituting a derived type in the instance 415

16.7 Controlling substitution groups 418

16.7.1 final: Preventing substitution group declarations 418

16.7.2 block: Blocking substitution in instances 419

16.7.3 abstract: Forcing substitution 420

Chapter 17 Identity constraints 422

17.1 Identity constraint categories 423

17.2 Design hint: Should I use ID/IDREF or key/keyref? 424

17.3 Structure of an identity constraint 424

17.4 Uniqueness constraints 426

17.5 Key constraints 428

17.6 Key references 430

17.6.1 Key references and scope 432

17.6.2 Key references and type equality 432

17.7 Selectors and fields 433

17.7.1 Selectors 433

17.7.2 Fields 434

17.8 XPath subset for identity constraints 435

17.9 Identity constraints and namespaces 439

17.9.1 Using xpathDefaultNamespace 441

17.10 Referencing identity constraints 442

Chapter 18 Redefining and overriding schema components 446

18.1 Redefinition 448

18.1.1 Redefinition basics 448

18.1.1.1 Include plus redefine 450

18.1.1.2 Redefine and namespaces 450

18.1.1.3 Pervasive impact 450

18.1.2 The mechanics of redefinition 451

18.1.3 Redefining simple types 452

18.1.4 Redefining complex types 453

18.1.5 Redefining named model groups 454

18.1.5.1 Defining a subset 454

18.1.5.2 Defining a superset 455

18.1.6 Redefining attribute groups 456

18.1.6.1 Defining a subset 457

18.1.6.2 Defining a superset 458

18.2 Overrides 459

18.2.1 Override basics 459

18.2.1.1 Include plus override 461

18.2.1.2 Override and namespaces 461

18.2.1.3 Pervasive impact 462

18.2.2 The mechanics of overriding components 462

18.2.3 Overriding simple types 464

18.2.4 Overriding complex types 465

18.2.5 Overriding element and attribute declarations 466

18.2.6 Overriding named groups 467

18.3 Risks of redefines and overrides 468

18.3.1 Risks of redefining or overriding types 468

18.3.2 Risks of redefining or overriding named groups 470

Chapter 19 Topics for DTD users 472

19.1 Element declarations 473

19.1.1 Simple types 474

19.1.2 Complex types with simple content 475

19.1.3 Complex types with complex content 476

19.1.4 Mixed content 478

19.1.5 Empty content 479

19.1.6 Any content 480

19.2 Attribute declarations 480

19.2.1 Attribute types 480

19.2.2 Enumerated attribute types 481

19.2.3 Notation attributes 482

19.2.4 Default values 482

19.3 Parameter entities for reuse 483

19.3.1 Reusing content models 484

19.3.2 Reusing attributes 485

19.4 Parameter entities for extensibility 486

19.4.1 Extensions for sequence groups 486

19.4.2 Extensions for choice groups 489

19.4.3 Attribute extensions 490

19.5 External parameter entities 492

19.6 General entities 493

19.6.1 Character and other parsed entities 493

19.6.2 Unparsed entities 493

19.7 Notations 493

19.7.1 Declaring a notation 494

19.7.2 Declaring a notation attribute 495

19.7.3 Notations and unparsed entities 496

19.8 Comments 497

19.9 Using DTDs and schemas together 499

Chapter 20 XML information modeling 500

20.1 Data modeling paradigms 502

20.2 Relational models 503

20.2.1 Entities and attributes 504

20.2.2 Relationships 507

20.2.2.1 One-to-one and one-to-many relationships 507

20.2.2.2 Many-to-many relationships 507

20.2.2.2.1 Approach #1: Use containment with repetition 508

20.2.2.2.2 Approach #2: Use containment with references 510

20.2.2.2.3 Approach #3: Use relationship elements 512

20.3 Modeling object-oriented concepts 514

20.3.1 Inheritance 514

20.3.2 Composition 519

20.4 Modeling web services 522

20.5 Considerations for narrative content 524

20.5.1 Semantics vs. style 524

20.5.1.1 Benefits of excluding styling 524

20.5.1.2 Rendition elements: “block” and “inline” 525

20.5.2 Considerations for schema design 526

20.5.2.1 Flexibility 526

20.5.2.2 Reusing existing vocabularies 526

20.5.2.3 Attributes are for metadata 526

20.5.2.4 Humans write the documents 527

20.6 Considerations for a hierarchical model 527

20.6.1 Intermediate elements 527

20.6.2 Wrapper lists 531

20.6.3 Level of granularity 532

20.6.4 Generic vs. specific elements 533

Chapter 21 Schema design and documentation 538

21.1 The importance of schema design 539

21.2 Uses for schemas 540

21.3 Schema design goals 542

21.3.1 Flexibility and extensibility 542

21.3.2 Reusability 543

21.3.3 Clarity and simplicity 545

21.3.3.1 Naming and documentation 545

21.3.3.2 Clarity of structure 546

21.3.3.3 Simplicity 546

21.3.4 Support for graceful versioning 547

21.3.5 Interoperability and tool compatibility 547

21.4 Developing a schema design strategy 548

21.5 Schema organization considerations 550

21.5.1 Global vs. local components 550

21.5.1.1 Russian Doll 551

21.5.1.2 Salami Slice 553

21.5.1.3 Venetian Blind 554

21.5.1.4 Garden of Eden 555

21.5.2 Modularizing schema documents 557

21.6 Naming considerations 559

21.6.1 Rules for valid XML names 559

21.6.2 Separators 560

21.6.3 Name length 560

21.6.4 Standard terms and abbreviations 561

21.6.5 Use of object terms 562

21.7 Namespace considerations 564

21.7.1 Whether to use namespaces 564

21.7.2 Organizing namespaces 565

21.7.2.1 Same namespace 565

21.7.2.2 Different namespaces 568

21.7.2.3 Chameleon namespaces 572

21.7.3 Qualified vs. unqualified forms 575

21.7.3.1 Qualified local names 575

21.7.3.2 Unqualified local names 576

21.7.3.3 Using form in schemas 576

21.7.3.4 Form and global element declarations 578

21.7.3.5 Default namespaces and unqualified names 578

21.7.3.6 Qualified vs. unqualified element names 579

21.7.3.7 Qualified vs. unqualified attribute names 580

21.8 Schema documentation 580

21.8.1 Annotations 581

21.8.2 User documentation 582

21.8.2.1 Documentation syntax 582

21.8.2.2 Data element definitions 584

21.8.2.3 Code documentation 585

21.8.2.4 Section comments 585

21.8.3 Application information 586

21.8.4 Non-native attributes 588

21.8.4.1 Design hint: Should I use annotations or non-native attributes? 589

21.8.5 Documenting namespaces 589

Chapter 22 Extensibility and reuse 594

22.1 Reuse 596

22.1.1 Reusing schema components 596

22.1.2 Creating schemas that are highly reusable 597

22.1.3 Developing a common components library 597

22.2 Extending schemas 599

22.2.1 Wildcards 601

22.2.2 Open content 604

22.2.3 Type substitution 605

22.2.4 Substitution groups 607

22.2.5 Type redefinition 609

22.2.6 Named group redefinition 611

22.2.7 Overrides 612

Chapter 23 Versioning 616

23.1 Schema compatibility 617

23.1.1 Backward compatibility 618

23.1.2 Forward compatibility 623

23.2 Using version numbers 626

23.2.1 Major and minor versions 626

23.2.2 Placement of version numbers 628

23.2.2.1 Version numbers in schema documents 628

23.2.2.2 Versions in schema locations 630

23.2.2.3 Versions in instances 631

23.2.2.4 Versions in namespace names 632

23.2.2.5 A combination strategy 633

23.3 Application compatibility 634

23.4 Lessening the impact of versioning 635

23.4.1 Define a versioning strategy 636

23.4.2 Make only necessary changes 636

23.4.3 Document all changes 637

23.4.4 Deprecate components before deleting them 638

23.4.5 Provide a conversion capability 639

23.5 Versions of the XML Schema language 639

23.5.1 New features in version 1.1 640

23.5.2 Forward compatibility of XML Schema 1.1 641

23.5.3 Portability of implementation-defined types and facets 642

23.5.3.1 Using typeAvailable and typeUnavailable 644

23.5.3.2 Using facetAvailable and facetUnavailable 645

Appendix A XSD keywords 648

A.1 Elements 649

A.2 Attributes 671

Appendix B Built-in simple types 690

B.1 Built-in simple types 691

B.2 Applicability of facets to built-in simple types 695

Index 699

Preface

Schemas:An Introduction Chapter 1

This chapter provides a brief explanation of schemas and why they are important. It also discusses the basic schema design goals, and describes the various existing schema languages.

1.1 What is an XML schema?

The word schema means a diagram, plan, or framework. In XML, it refers to a document that describes an XML document. Suppose you have the XML instance shown in Example 1-1. It consists of a product element that has two children (number and size) and an attribute (effDate).

Example 1-2 shows a schema that describes the instance. It contains element and attribute declarations that assign data types and element-type names to elements and attributes.

Example 1-1. Product instance

<product effDate="2001-04-02"> <number>557</number> <size>10</size></product>Example 1-2. Product schema

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="product" type="ProductType"/> <xsd:complexType name="ProductType"> <xsd:sequence> <xsd:element name="number" type="xsd:integer"/> <xsd:element name="size" type="SizeType"/> </xsd:sequence> <xsd:attribute name="effDate" type="xsd:date"/> </xsd:complexType> <xsd:simpleType name="SizeType"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="2"/> <xsd:maxInclusive value="18"/> </xsd:restriction> </xsd:simpleType></xsd:schema>1.2 The purpose of schemas1.2.1 Data validation

One of the most common usesfor schemas is to verify that an XML document is valid according to a defined set of rules. A schema can be used to validate:

The structure of elements and attributes. For example, a product must have a number and a size, and may optionally have an effDate (effective date).
The order of elements. For example, number must appear before size.
The data values of attributes and elements, based on ranges, enumerations, and pattern matching. For example, size must be an integer between 2 and 18, and effDate must be a valid date.
The uniqueness of values in an instance. For example, all product numbers in an instance must be unique.

1.2.2 A contract with trading partners

Often, XML instances are passed between organizations. A schema may act as a contract with your trading partners. It clearly lays out the rules for document structure and what is required. Since an instance can be validated against a schema, the "contract" can be enforced using available tools.

1.2.3 System documentation

Schemas can provide documentation about the data in an XML instance. Anyone who needs to understand the data can refer to the schema for information about names, structures, and data types of the items. To include further documentation, you can add annotations to any schema component.

1.2.4 Augmentation of data

Schema processing can also add to the instance. It inserts default and fixed values for elements and attributes, and normalizes whitespace according to the data type.

1.2.5 Application information

Schemas provide a way for additional information about the data to be supplied to the application when processing a particular type of document. For example, you could include information on how to map the product element instances to a database table, and have the application use this information to automatically update a particular table with the data.

In addition to being available at processing time, this information in schemas can be used to generate code such as:

User interfaces for editing the information. For example, if you know that size is between 2 and 18, you can generate an interface that has a slider bar with these values as the limits.
Stylesheets to transform the instance data into a reader-friendly representation such as XHTML. For example, if you know that the human-readable name for the content of a number element is "Product Number" you can use this as a column header.
Code to insert or extract the data from a database. For example, if you know that the product number maps to the PROD_NUM column on the PRODUCTS table, you can generate an efficient routine to insert it into that column.

Tools have only just begun to take advantage of the possibilities of schemas. In the coming years, we will see schemas used in many creative new ways.

1.3 Schema design

XML Schema is packed with features, and there are often several ways to accurately describe the same thing. The decisions made during schema design can affect its usability, accuracy, and applicability. Therefore, it is important to keep in mind your design objectives when creating a schema. These objectives may vary depending on how you are using XML, but some are common to all use cases.

1.3.1 Accuracy and precision

Obviously, a schema should accurately describe an XML instance and allow it to be validated. Schemas should also be precise in describing data. Precision can result in more complete validation as well as better documentation. Precision can be achieved by defining restrictive data types that truly represent valid values.

1.3.2 Clarity

Schemas should be very clear, allowing a reader to instantly understand the structure and characteristics of the instance being described. Clarity can be achieved by:

appropriate choice of names,
consistency in naming,
consistency in structure,
good documentation,
avoiding unnecessary complexity.

1.3.3 Broad applicability

There is a temptation to create schemas that are useful only for a specific application purpose. In some cases, this may be appropriate. However, it is better to create a schema that has broader applicability. For example, a business unit that handles only domestic accounts may not use a country element declaration as part of an address. They should consider adding it in as an optional element for the purposes of consis-tency and future usability.

There are two components to a schema's broad applicability: reusability and extensibility. Reusable schema components are modular and well documented, encouraging schema authors to reuse them in other schemas. Extensible components are flexible and open, allowing other schema authors to build on them for future uses. Since reusability and extensibility are important, all of Chapter 21, "Extensibility and reuse," is devoted to them.

1.4 Schema languages1.4.1 Document Type Definitions (DTDs)

Document Type Definitions (DTDs) are a commonly used method of describing XML documents. They allow you to define the basic structure of an XML instance, including:

the structure and order of child elements in an element type,
the attributes of an element type,
basic data typing for attributes,
default and fixed values for attributes,
notations to represent other data formats.

Example 1-3 shows a DTD that is roughly equivalent to our schema in Example 1-2.

Example 1-3. Product DTD

<!ELEMENT product (name, size?)><!ELEMENT name (#PCDATA)><!ELEMENT size (#PCDATA)><!ATTLIST product effDate CDATA #IMPLIED>

DTDs have many advantages. They are relatively simple, have a compact syntax, and are widely understood by XML implementers. When designed well, they can be extremely modular, flexible, and extensible.

However, DTDs also have some shortcomings. They have their own non-XML syntax, do not support namespaces easily, and provide very limited data typing, for attributes only.

1.4.2 Enter schemas

As XML became increasingly popular for data applications such as e-commerce and enterprise application integration (EAI), a more robust schema language was needed. Specifically, XML developers wanted:

The ability to constrain data based on common data types such as integer and date.
The ability to define their own data types in order to further constrain data.
Support for namespaces.
The ability to specify multiple declarations for the same element-type name in different contexts.
Object oriented features such as type derivation. The ability to express types as extensions or restrictions of other types allows them to be processed similarly and substituted for each other.
A schema language that uses XML syntax. This is advantageous because it is extensible, can represent more advanced models and can be processed by many available tools.
The ability to add structured documentation and application information that is passed to the application during processing.

DTDs are not likely to disappear now that schemas have arrived on the scene. They are supported in many tools, are widely understood, and are currently in use in many applications. In addition, they continue to be useful as a lightweight alternative to schemas.

1.4.3 W3C XML Schema

Four schema languages were developed before work began on XML Schema: XDR (XML Data Reduced), DCD, SOX, and DDML. These four languages were considered together as a starting point for XML Schema, and many of their originators were involved in the creation of XML Schema.

The World Wide Web Consortium (W3C) began work on XML Schema in 1998. The first version, upon which this book is based, became an official Recommendation on May 2, 2001. The formal Recommendation is in three parts:

XML Schema Part 0: Primer is a non-normative introduction to XML Schema that provides a lot of examples and explanations. It can be found at http://www.w3.org/TR/xmlschema-0/
XML Schema Part 1: Structures describes most of the components of XML Schema. It can be found at http://www.w3.org/TR/xmlschema-1/
XML Schema Part 2: Datatypes covers simple data types. It explains the built-in data types and the facets that may be used to restrict them. It is a separate document so that other specifications may use it, without including all of XML Schema. It can be found at http://www.w3.org/TR/xmlschema-2/

1.4.4 Notes on terminology1.4.4.1 Schema

"XML Schema" is the official name of the Recommendation and is also sometimes used to refer to conforming schema documents. In order to clearly distinguish between the two, this book uses the term "XML Schema" only to mean the Recommendation itself.

A "schema definition" is the formal expression of a schema.

The initialism "XSDL" (XML Schema Definition Language) is used to refer to the language that is used to create schema definitions in XML. In other words, XSDL is the markup language that uses elements such as schema and complexType.

The term "schema document" is used to refer to an XML document that is written in XSDL, with a schema element as its root. The extension "xsd" is used in the file identifiers of such documents. A schema definition may consist of one or more schema documents, as described in Chapter 4, "Schema composition."

As it is unlikely to cause confusion in this book, for simplicity the word "schema" will be used to refer to both a schema as a concept, and an actual schema definition that conforms to the XML Schema definition language.

1.4.4.2 Type

According to the XML Recommendation, every XML element has an element type. In fact, it is the name of the element type that occurs in the start- and end-tags, as individual elements do not have names (although they may have IDs).

XML Schema, however, uses the word "type" exclusively as a shorthand to refer to simple types and complex types. Perhaps to avoid confusion with this usage, the Recommendation does not use the phrase "element type" in conjunction with schemas. This book follows that same practice and generally doesn't speak of element types per se, although it does refer to "element-type names" where appropriate.

1.4.5 Additional schema languages

XML Schema is not the only schema language that is currently in use. While it is very robust, it is not always the most appropriate schema language for all cases. This section describes two other schema languages.

1.4.5.1 RELAX NG

RELAX NG covers some of the same ground as XML Schema. As of this writing, it is currently being developed by an OASIS technical committee. RELAX NG is intended only for validation; the processor does not pass documentation or application information from the schema to the application. RELAX NG does not have built-in data types; it is designed to use other data type libraries (such as that of XML Schema).

RELAX NG has some handy features that are not currently part of XML Schema:

It includes attributes in the elements' content models. For example, you can specify that a product element must either have an effectiveDate attribute or a startDate attribute. XML Schema does not currently provide a way to do this.
It allows a content model to depend on the value of an attribute. For example, if the value of the type attribute of a product element is shirt, this product element can contain a size child. If it is umbrella, it cannot. XML Schema provides a similar mechanism through type substitution, but it is less flexible.
It allows you to specify a content model such as "one number, one size, and up to three color elements, in any order." This is quite cumbersome to express in XML Schema if you do not want to enforce a particular order.
It does not require content models to be deterministic. This is explained in Section 13.5.6, "Deterministic content models."

However, RELAX NG also has some limitations compared to XML Schema:

It has no inheritance capabilities. XML Schema's restriction and extension mechanisms allow type substitution and many other benefits, described in Section 14.1, "Why derive types?"
Because it is only intended for validation, it does not provide application information to the processor. In fact, the RELAX NG processor passes the exact same information that is available from a DTD to the application. This is not a disadvantage if your only objective is validation, but it does not allow you to use the schema to help you understand how to process the instance.

For more information on RELAX NG, see http://www.oasis-open.org/committees/relax-ng/

1.4.5.2 Schematron

Schematron takes a different approach from XML Schema and RELAX NG. XML Schema and RELAX NG are both grammar-based schema languages. They specify what must appear in an instance, and in what order.

By contrast, Schematron is rule-based. It allows you to define a series of rules to which the document must conform. These rules are expressed using XPath. In contrast to grammar-based languages, Schematron considers anything that does not violate a rule to be valid. There is no need to declare every element type or attribute that may appear in the instance.

Like RELAX NG, Schematron is intended only for validation of instances. It has a number of advantages:

It is very easy to learn and use. It uses XPath, which is familiar to many people already using XML.
The use of XPath allows it to very flexibly and succinctly express relationships between elements in a way that is not possible with other schema languages.
The values in an instance can be involved in validation. For example, in XSDL it is not possible to express "If the value of newCustomer is false, then customerID must appear." Schematron allows such co-occurrence constraints.

The limitations of Schematron compared to XML Schema are:

It does not provide a model of the instance data. A person cannot gain an understanding of what instance data is expected by looking at the schema.
It is intended only for validation, and it cannot be used to pass any information about the instance, such as data types or default values, to an application.
Anything is valid unless it is specifically prohibited. This puts a burden to anticipate all possible errors on the schema author.

Because Schematron and XML Schema complement each other, it makes sense to combine the two. An example of embedding a Schematron schema in XSDL is provided in Section 6.3.2, "Schematron for co-occurrence constraints." For more information on Schematron, see http://www.ascc.net/xml/resource/schematron/schema-tron.html

From the B&N Reads Blog

Page 1 of

Definitive XML Schema

Definitive XML Schema

eBook

eBook

Related collections and offers

Overview

Product Details

About the Author

Read an Excerpt

Chapter 9: Simple types

9.1 Simple type varieties

9.1.1 Design hint: How much should I break down my data values?

9.2 Simple type definitions

9.2.1 Named simple types

9.2.2 Anonymous simple types

9.2.3 Design hint: Should I use named or anonymous types?

Table of Contents

Preface

Customer Reviews

Related collections and offers

Overview

Product Details

About the Author

Read an Excerpt

Chapter 9: Simple types

9.1 Simple type varieties

9.1.1 Design hint: How much should I break down my data values?

9.2 Simple type definitions

9.2.1 Named simple types

9.2.2 Anonymous simple types

9.2.3 Design hint: Should I use named or anonymous types?

Table of Contents

Preface

Related Subjects

Customer Reviews