Professional Xml

Professional Xml

by BIRBECK, Keith Visco, Zoran Zaev, Mark Birbeck

This book is for the experienced developer, who already has some knowledge of XML, to learn how to build effective applications using this exciting but simple technology. Web site developers can learn techniques, using XSLT stylesheets and other technologies, to take their sites to the next level of sophistication. Other developers can learn where and how XML fits…  See more details below


This book is for the experienced developer, who already has some knowledge of XML, to learn how to build effective applications using this exciting but simple technology. Web site developers can learn techniques, using XSLT stylesheets and other technologies, to take their sites to the next level of sophistication. Other developers can learn where and how XML fits into their existing systems and how they can use it to solve their application integration problems.

Product Details

Publication date:
Professional Ser.
Edition description:
Product dimensions:
(w) x (h) x 0.09(d)

Read an Excerpt

Chapter 15: XML Data Binding


Application data, whether stored as a plain text file, in a RDBMS, or in a custom binary format typically needs to be converted to native data formats before being manipulated by the application. Storing or representing data in XML is no exception.

With the growing use of XML in many applications today, application developers often have the need to access and manipulate the content of XML documents. There are standard ways for a programmer to access this content, such as the W3C DOM API and the de facto standard SAX API from David Megginson ( , but these APIs are used for lower level XML manipulation. They deal with the structure of the XML document. For most applications which need to manipulate XML data, these APIs will be cumbersome, forcing us to deal with the structural components of the XML document to access the data, and they offer no way to depict the meaning of the data. We can write custom APIs or utilities to access the data, but this would be a laborious task. What is needed is a way to access the data without knowing how it is represented, and in a form more natural for our programming language, in a format that depicts it's intended meaning.

One solution is data binding. In the next few sections we will explain what data binding is and how it can be used to simplify programming applications that need to interact with XML data.

Note: The Professional XML 2md Edition code download available from comes with not only the full code for all the examples, but also Castor, Jakarta Regex library, Xerces, and xslp, to save you some time with your set up for this chapter.

What is Data Binding?

Data binding is the process of mapping the components of a given data format, such as SQL tables or an XML schema, into a specific representation for a given programming language that depicts the intended meaning of the data format (such as objects, for example). Data binding allows programmers to work naturally in the native code of the programming language, while at the same time preserving the meaning of the original data. It allows us to model the logical structure of the data without imposing the actual specific structure imposed by the format in which the data is stored.

Let's look at an XML example. In most cases the content of an XML document, though stored in XML as simply character data, represents a number of different data types such as strings, integers, real numbers, dates, and encoded binary data to name a few. These different data types are usually grouped together in some logical hierarchy to represent some special meaning for the domain in which the XML data is intended. Ideally, interacting with the XML content as a data model represented as objects, data structures, and primitive types native to the programming language we are using, and in a manner which closely reflects the meaning of that data, would make programming with XML more natural, much less tedious, and improve code readability and maintainability.

What does all this mean? It simply means that instead of dealing with such things as parse trees, event models, or record sets, we interact with objects, integers, floats, arrays, and other programming data types and structures. To summarize, data binding gives us a way to:

  • Represent data and structure in the natural format of the programming language we decide to program in.
  • Represent data in a way that depicts the intended meaning.
  • Allows us to be agnostic about how the data is actually stored.

XML Data Binding

Now that we have an understanding of what data binding is, lets continue with our look at how it works with XML. XML data binding simply refers to the mapping of structural XML components, such as elements and attributes, into a programmatic data model that preserves the logical hierarchy of the components, exposes the actual meaning of the data, and represents the components in the native format of the programming language. This chapter will focus on XML data binding specifically for the Java programming language. In Java, our data model would be represented as an Object Model.

An object model in Java is simply a set of classes and primitive types that are typically grouped into a logical hierarchy to model or represent real-world or conceptual objects. An object model could be as simple as consisting of only one class, or very complex consisting of hundreds of classes....

...Most XML based applications written today do some form of data binding, perhaps without the programmer even being aware of it. Unless our application was designed specifically to handle generic XML instances it's hard to imagine interacting with XML data without first needing to convert it to a more manageable format. Each time that we convert the value of an attribute to an integer, or create an object to represent an element structure we are performing data binding.

Simple Data Binding Concepts

At this point we already have a good understanding of what data binding is, and hopefully you're convinced that data binding is more practical and more natural to use than generic structure based APIs for typical application development. Structure based APIs are very useful for what they were designed for, interacting with data in the generic format in which it is stored, but if our intent is to interact with the data in a form that closely models the meaning of the data, then data binding is clearly the better choice. This is true for XML data binding, RDBMS data binding, and almost any kind of data binding you can think of. It's always more natural to manipulate data in the native formats of the programming language.

While data binding may be the clear choice for interacting with data for many XML applications it may not always be the best choice. If there is no need to interact with the data in a form that models the meaning of the data, or if we only want to grab small bits and pieces of the data, then data binding will probably be more trouble than it's worth.

How do we bind our XML instances to a Java object model? We have two options; we can write our own data binding utilities, or we can use a data-binding framework (the best solution in most cases), a data-binding framework is an application (or set of applications) that allows data binding solutions to be developed more easily – these frameworks typically come with a number of features, such as source code generation and automatic data binding. We'll see examples of using such features later in the chapter.

XML data binding consists of three primary concepts – the specification of XML to object model bindings, marshalling, and unmarshalling. Different binding frameworks will specify bindings differently and typically a data binding framework will have more than one way to specify such bindings.

When we convert an object model instance into an XML instance we call it marshalling, and when we go in the other direction, from an XML instance to an object model we call it unmarshalling. Many people often get these two terms mixed up. The best way to remember it is that we always look at it from the point of view of writing the program. Our desired format to interact with the data is when it's in the programming language's own natural format. So we start with our Java object model. When we want to store the data we need to marshal it into the proper format. When we want to access the data again we need to unmarshal it back into the object model....

Data Objects

In order to interact with our XML instances and manipulate the data we'd like to convert the XML instances into a more manageable model that is native to the programming language that we are using. This allows for a more natural way of working with the data and makes our code more readable and maintainable. In our case we'd like to convert the XML model into a Java object model for manipulation and then back into the XML representation when we are finished. In most cases, however, the objects in our model simply do nothing more than hold the data that was once stored as XML, and preserve the logical structure of the XML document. Since these objects have no complex programming or business logic, and just contain typed data, we call them Data Objects.

Typically our data objects will also be Java Beans (or COM objects if we were working in an MS environment). A Java Bean is simply a Java class that adheres to the Java Bean design pattern. This simply means that we follow some basic guidelines when writing our classes so that information can be obtained about our data objects. In most cases, simply following the method naming conventions of the design pattern for Java Beans is sufficient for tools to obtain information about the fields (also called properties) of our classes by examining the method signatures of a given class. This examination process is called Introspection (as defined by the Java Beans Specification). The main guideline for the design pattern is that all publicly accessible fields have proper getter and setter access methods. For a given field, a getter method is quite simply a method that returns the value of that field, while a setter method is one that allows us to set the value of the field.

The Java Beans design pattern indicates that getter methods should begin with "get", followed by the field name, with the first letter of the field name capitalized. The setter methods follow the same pattern, but begin with "set". For example, if we had a field called Name, the getter method should be called getName and the setter method setName.

Let's look at a simple Java Bean. If we had an Invoice class that contained a shipping address, a billing address and a collection of items, our Java Bean would look something like the following:

public class Invoice {
public Invoice() { ... }
public BillingAddress getBillingAddress();
public void setBillingAddress(BillingAddress address);
public ShippingAddress getShippingAddress();
public void setShippingAddress(ShippingAddress address);
public Vector getItem();
public void setItem(Vector items);

For indexed properties such as our Vector of items it is often useful to provide indexed setter and getter methods, such as:

public Item getItem(int index);
public void setItem(int index, Item item);

The indexed setter and getter methods would apply to arrays, vectors, and ordered lists, but doesn't make sense for other types of collections such as sets, or hash tables, which do not preserve the order of the collection.

It's not required that our data objects be Java Bean compliant, but it is good practice. The main reason is that a data-binding framework will most likely need to determine certain information about a given object such as the field names and java types by examining the class type of the object, using introspection. This will be discussed in more detail later in the chapter.

For more information on Java Beans see the Java Beans specification version 1.01

What's Wrong with APIs such as DOM and SAX?

Most readers should be familiar by now with the DOM and SAX APIs discussed in earlier chapters (11and 12, respectively). The W3C DOM is tree-based while SAX is event-based, but both APIs are structure-centric in that they deal purely with the structural representation of an XML document. The basic format of an XML document is a hierarchical structure comprised mainly of elements, attributes and character data. There is nothing actually wrong with these APIs if they are used for their intended purpose, manipulating XML documents in way that represents the generic structure of the XML document.

Most applications that need to access the contents of an XML document, however, will not need to know whether data was represented as an attribute or an element, but simply that the data is structured and has a certain meaning. APIs such as the DOM and SAX, while providing meaningful representations of the data, can only provide extremely generic representations that make accessing our data long winded, which for high-level XML manipulation adds complexity to our applications that is unnecessary. Programmers must essentially walk the structure of the XML in order to access the data. This data also often needs to be converted into its proper format before it can be manipulated, which can be a very tedious process. This is necessary because the data returned using these APIs is simply a set of characters, or a string. If the contents of an element should represent some sort of date or time instance, then the programmer must take the character data and convert it to a date representation in the programming language. In Java this would most likely be an instance of java.util.Date (the Java date function). If the element itself represents a set of other elements, we would need to create the appropriate object representation and continue converting sub-elements.

There are specifications, such as the W3C XPath Recommendation, which improve our ability to easily access the data within an XML document without the need for walking the complete structure. XPath is great for extracting small pieces of information from an XML document. Unfortunately the data still needs to be converted to the proper format if it is to be manipulated. It also tends to add complexity and extra overhead to our applications if our goal is to access data throughout the entire document.

Let's look at an example to demonstrate how one would use SAX to handle data binding (, below). Recall our Invoice class from the previous section. For simplicity let's assume for that the invoice only contains a set of items, where each item in our has a unique product identifier called an SKU number, a product name, a quantity, unit price, and a description. We will also assume the price is in U.S. dollars....

Read More

Meet the Author

Andrew Watt is an independent consultant who enjoys few things more than exploring the technologies others have yet to sample. Since he wrote his first programs in 6502 Assembler and BBC Basic in the mid 1980's he has sampled Pascal, Prolog and C++ among others. More recently he has focussed on the power of Web-relevant technologies including Lotus Domino, Java and HTML. His current interest is in the various applications of the Extensible Markup Meta Language, XMML, sometimes imprecisely and misleadingly called XML. The present glimpse he has of the future of SVG, XSL-FO, XSLT, CSS, XLink, XPointer etc when they actually work properly together is an exciting, if daunting, prospect. He has just begun to dabble with XQuery. Such serial dabbling, so he is told, is called "life-long learning".

Oli Gauti Gudmunsson works for SALT, acting as one of the two Chief System Architects of the SALT systems, and as Development Director in New York. He is currently working on incorporating XML and XSL into SALT's web authoring and content management systems. He has also acted as an instructor in the Computer Science I Java course at the University of Iceland. As a 'hobby he is finishing his BS degree in Computer Engineering.

Jonathan Pinnock started programming in Pal III assembler on his school's PDP 8/e, with a massive 4K of memory, back in the days before Moore's Law reached the statute books. These days he spends most of his time developing and extending the increasingly successful PlatformOne product set that his company, JPA, markets to the financial services community. He seems to spend the rest of his time writing for Wrox, although he occasionally surfaces to say hello to his long-suffering wife and two children

Zoran Zaev is a Sr. Web Solutions Architect with Hitachi Innovative Solutions, Corp. in the Washington DC area. He has worked in technology since the time when 1 MHz CPUs and 48Kb was considered a 'significant power', in the now distant 1980s. In mid 1990s, Zoran became involved in web applications development. Since then, he has worked helping large and small clients alike to leverage the power of web applications. His more recent emphasis has been web applications and web services with XML, SOAP, and other related technologies.

Read More

Customer Reviews

Average Review:

Write a Review

and post it to your social network


Most Helpful Customer Reviews

See all customer reviews >