×

Uh-oh, it looks like your Internet Explorer is out of date.

For a better shopping experience, please upgrade now.

Professional XML Schemas
     

Professional XML Schemas

by Jon Duckett, Kevin Williams, Kurt Cagle, Stephen Mohr, Francis Norton
 
This book is for all professional XML programmers who need to use XML Schemas to define data and need a practical guide to this new standard.

Overview

This book is for all professional XML programmers who need to use XML Schemas to define data and need a practical guide to this new standard.

Product Details

ISBN-13:
9781861005472
Publisher:
Wrox Press, Inc.
Publication date:
07/01/2001
Series:
Professional Ser.
Pages:
800
Product dimensions:
7.28(w) x 9.04(h) x 1.56(d)

Read an Excerpt

Chapter 2: Datatype Basics

The basic benefit of XML -the ability to describe one's own vocabulary - is greatly enhanced by the use of XML Schema datatypes. These can ensure for example: that numeric data is really numeric, strings are a specific format, otherwise validate the format and/or value of an element or attribute. Pre-defined XML Schema datatypes make provisions for various forms of commonly used values such as dates, times, and URI references, as well as providing the basis for more complex and user-defined data structures.

Strong data typing and the ability to create modern object-oriented (00) structures are imperative for most of the newer uses of XML (such as SOAP or ebXML). These new applications can now use most of the datatypes used in traditional programming languages, plus the conceptual and maintenance benefits of 00 inheritance of datatypes structures.

The use of strong data typing has advantages beyond the description and validation of documents and web pages. Once web sites serve pages in XML, rather than HTML, web spiders will be able to extract much more meaningful information from these sites. For example:

  • Numeric datatypes allow price comparison services that can calculate currency conversions, taxes, and/or multi-item costs.

  • Users searching for date-sensitive items (like newspaper articles or a specific event) can use standardized dates, and search for specific dates or ranges of dates.

  • Type-specific searching can also apply to other specific datatypes such as URls and user- - derived datatypes such as ISBNs, UPCs, and part numbers.
Existing free-text searches can't differentiate the May Company, May Day, the merry month of May, a place called May, or a person's name. Nor can these searches ignore the many appearances of the permissive verb "may" which is rarely the target of a search, and often included in the "stop words" list (terms ignored when searching). The use of XML Schema datatypes will permit much more focused searching, reducing the huge lists of online search engine results. Type-specific searching is an awesome benefit of XML Schema s strong data. typing.

First, we will look at the basic principles of schema datatypes, and then we will look at the two dozen or so built-in datatypes provided as pan of XML Schema.

Datatypes in XML - An Overview

XML 1.0 and its DTDs provided a few simple datatypes, but none were numeric types, and validation mechanisms quite limited. There have been proposals to add some additional type checking to DTDs (such as DT4DTD), but these are beyond the scope of this book. Early schema proposals such as SOX and XMLData provided various sets of pre-defined types, which informed the development of the W3C Schema Recommendation. The lack of strong data typing was one of the principle reasons for the development of XML Schema. Indeed, datatypes are so significant that they comprise half of the XML Schema specification, and they may be used independently from the rest of the XML Schema specification.

XML Schema datatypes are defined in XML Schema Part 2: Datatypes, which became a W3C Recommendation in May 2007. It is available at http://www.w3.org/TR/xmlschema-2.

These datatypes are based upon those in XML 1.0 DTDs, Java, SQL, the ISO 11404 standard on languageindependent datatypes, existing Internet standards, and earlier schema proposals.

It would be useful to have a link to an online version of the ISO 77404 standard, but like most ISO documents, it is only available as expensive paper. You canfind ordering information for this at http.//www.iso.ch/cate/d19346.html.

In the last chapter, we saw how we could use the XML Schema built-in datatypes, such as s tring and integer, in our element declarations, for example:

<element name = "FirstName" type = "string" />

We also saw how we could create our own types rather than those from XML Schema, using the complexType element, like this:

<element name = "Customer">
	<complexType>
		<sequence>
		<element name = "FirstName" type = "string" />
		<element name = "MiddleInitial" type = "string" />
		<element name = "LastName" type =-"string" />
		</sequence>
	</complexType>
</element>

Complex types and simple types are defined in Part 1 of the XML Schema specification (http://www.w3.org/TR/xmlschema-1). These concepts are about defining structures in your schemas. Here is a: quick reminder of the difference between simple and complex types:

O simple types - a simple string that doesn't contain any child elements, but might be constrained: to be numeric or otherwise specially-formatted (attribute values are always simple types)

  • complex types - element values that contain other elements or have attributes, and can be constrained in a similar fashion to simple types The second part of the specification independently defines the set of built-in datatypes. These are all simple types. In the next chapter we'll move on to see how we create our own complex content models using complexType and other schema constructs, but for this chapter, we'll be focusing on the set simple datatypes provided for us by XML Schema. Before we get stuck into the details of the different datatypes, let's spend a bit of time reviewing the basic ideas behind XML Schema datatypes in general.

    Properties of XML L Schema Datatypes

    All datatypes are composed of three parts:
    • A value space - the set of distinct and valid values, each corresponding to one or more string representations (for example, the number 42 is a single value)

      A lexical space - the set of lexical representations, that is, the string literals representing values (for example, any of the strings "42" or "forty-two" or "0.42E2" or even "0.42 102" could represent the value of 42)

    • A set of facets - the properties of the value space, individual values, and/or lexical items
    To illustrate the difference between lexical and value spaces, we'll look at a snippet of XML data where the first child element (Name) is declared to be a string datatype, the second (Population) uses the decimal datatype, and the third is a date datatype (DateAdmission).

    	
    <State>
    <Name>Wyoming</Name>
    		<Population>469557</Population>
    <DateAdmission>1890-07-10</DateAdmission>
    </State>
    

    In the Name element, the value and lexical spaces are identical - the value of a string is the same as its lexical representation.

    On the other hand, the Population element is represented in XML as a string, but its value is the mathematical concept of "four hundred and sixty nine thousand, five hundred and fifty seven". The string 469557 in the above example is just one possible lexical representation. We could also have used 469557.0 or 4695.57e2 to represent the same value.

    The DateAdmission element is also represented as a string, like all elements in XML. This one conforms to an international (ISO) standard, and represents a value of July 10th, 1890. ISO dates are similar to the common data processing or Japanese format preference (yyy-mm-dd). We will look at this and other built-in derived datatypes in the next chapter.

    All comparisons, calculations, ordering, and the like are generally applied to the value of the datatype. There may be several alternative lexical representations for a given value.

    Value Spaces

    Each datatype has a range of possible values. These value spaces are implicit for many datatypes. For example, a floatingpoint number can range from negative to positive infinity. A string can contain any finite-length sequence of legal XML characters. An integer allows a value of zero, or any positive or negative whole number, but wouldn't allow fractional values....

  • Meet the Author

    Jon Duckett has been working with XML since editing and co-authoring Wrox's first XML title in 1998. Having worked for Wrox's Birmingham UK offices for over 3 years, Jon recently moved to Sydney to get a different view from his window.

    Nikola Ozu is a systems and information architect. Recent work has included the use of XML for both production and publishing of text and bibliographic databases, an architectural vocabulary and a new production and delivery system for hypermedia. He designed and developed an early hypertext database, a monthly CD-ROM product called Health Reference Center in 1990, followed by advanced versions of the similar InfoTrac.

    Kevin Williams career has been focused on Windows development - first client-server, then onto Internet work. He's done a little bit of everything, from VB to Powerbuilder to Delphi toC/C++ to MASM to ISAPI, CGI, ASP, HTML, XML, and any other acronym you might care to name, but these days, he's focusing on XML work. Kevin is a Senior System Architect for Equient, an information management company located in Northern Virginia. He may be reached for comment at kevin@realworldxml.com.

    Stephen Mohr is a software systems architect with Omicron Consulting, Philadelphia, USA. He has more than ten years' experience working with a variety of platforms and component technologies. His research interests include distributed computing and artificial intelligence. Stephen holds BS and MS degrees in computer science from Rensselaer Polytechnic Institute.

    Kurt Cagle is president of Cagle Communications, a consulting company in Olympia, Washington specializing in Internet, XML based document management and multimedia technologies. He has authored ten books and more than one hundred articles on topics relating to XML/XSLT and web services on Windows, Java and Linux platforms. Additional XML resources and information are also available from his website at http://www.kurtcagle.net.

    Oliver Griffin decided to combine his interest in technology and publishing by forming Griffin Brown Digital Publishing Ltd with Alex Brown. Based in Cambridge, England, the company has become a world leader in the application of XML to document management, particularly within the academic and STM (Scientific, Technical and Medical) sectors. Oliver is responsible for managing the company and leading the consulting team in a variety of work including DTD and schema development, transformation and workflow design. He also runs training courses in XML and XSLT.

    Francis Norton works at iE (http://www.ie.com) as a senior consultant where he has a special interest in the application of XML technologies to the many challenges of cross-platform applications.

    Ian Stokes-Rees is the Engineering Manager for DecisionSoft Ltd., an Oxford UK based XML company and creators of XML Script. Ian has been working with XSDL since the first working draft and has been involved in the modeling and production of schemas for various applications. He has also been heavily involved in the integration of XML into the business process of many DecisionSoft clients and as such has been working on X-Meta, an XML meta-data repository, which facilitates information modeling and integration of business rules with data definitions. Ian can be reached at ijstokes@ieee.org and is happy to hear from readers.

    Jeni Tennison is a freelance consultant in XML, XSLT and XML Schemas. She is a regular contributor on XSL-List and was an invited speaker on XSLT design patterns at XSLT UK '01 and is one of the people behind the EXSLT initiative.

    Customer Reviews

    Average Review:

    Post to your social network

         

    Most Helpful Customer Reviews

    See all customer reviews