XML in Data Management: Understanding and Applying Them Together

XML in Data Management: Understanding and Applying Them Together

by Peter Aiken, M. David Allen


View All Available Formats & Editions
Choose Expedited Shipping at checkout for guaranteed delivery by Wednesday, November 21

Product Details

ISBN-13: 9780120455997
Publisher: Elsevier Science
Publication date: 06/21/2004
Series: Morgan Kaufmann Series in Data Management Systems Series
Pages: 398
Product dimensions: 0.85(w) x 7.50(h) x 9.25(d)

About the Author

Dr. Peter H. Aiken is an award-winning, internationally recognized thought leader in the area of organizational data architecture and engineering. As a practicing data manager, consultant, author and researcher, he has been actively studying these and related areas for more than twenty-five years. He has held leadership positions with the US Department of Defense and consulted with more than 50 organizations in 14 different counties. His achievements have resulted in recognition as one of 2000 Outstanding Intellectuals of the 21st Century and bibliographic entries in Who's Who in Science and Engineering, Who's Who in American Education and other recognitions. He is a professor at Virginia Commonwealth University, Founding Director of Data Blueprint, and part-time Visiting Scientist at the Software Engineering Institute at Carnegie Mellon University. He was awarded the 2001 DAMA International Achievement Award and the Defense Information Systems Agency Career Recognition. He has published five books, including Building Corporate Portals Using XML with Clive Finkelstein (1999, McGraw-Hill) and the best-seller Data Reverse Engineering (1996, McGraw-Hill).

M. David Allen has an MSc. degree from Virginia Commonwealth University and is currently the Chief Operating Officer for the Data Blueprint in Richmond Virginia. His background is as a programmer and analyst, and he holds dual degrees in Computer Science and Psychology. He teaches classes on XML-related topics including, XSL, & XSLT, and XML basics.

Read an Excerpt

XML for Data Management

By Peter Aiken David Allen


Copyright © 2004 Elsevier Inc.
All right reserved.

ISBN: 978-0-08-052144-2

Chapter One

XML and DM Basics


XML equips organizations with the tools to develop programmatic solutions to manage their data interchange environments using the same economies of scale associated with their DM environments. XML complements existing DM efforts nicely and enables the development of new and innovative DM techniques. Rather than looking at data management as a set of many problems, each consisting of a method of transporting, transforming, or otherwise evolving data from one point to alternate forms, practitioners can now look at the big picture of DM—the complete set of systems working together, and how data moves amongst them. This is jumping up one conceptual level—moving from thinking of the challenge in terms of its individual instances to thinking of it in terms of a class of similar challenges. It is an architectural pattern that is seen frequently as systems evolve. First, a solution is created for a specific problem. Next, a slightly different problem arises that requires another specific solution. As more and more conceptually similar problems arise that differ only in the details, eventually it makes sense from an architectural standpoint to develop a general solution that deals with an entire class of similar problems by focusing on their commonalities rather than their differences.

This first chapter begins by describing a present-day DM challenge. We then present the definitions of DATA and METADATA used throughout the book. The next section presents a brief overview of DM. This is followed by a justification for investing in metadata/DM. We acknowledge the challenge of XML's hype, but we will provide a brief introduction and two short examples. These lead to an overview of the intersection of DM and XML. The chapter closes with a few XML caveats. We are presenting this information in order to help you make the case that investing in metadata is not only a good idea, but also necessary in order to realize the full benefit of XML and related IT investments.

The DM Challenge

An organization that we worked with once presented us with our most vexing DM challenge. When working with them to resolve a structural data quality situation, our team discovered it would be helpful to have access to a certain master list that was maintained on another machine located physically elsewhere in the organization. Better access to the master list would have sped up validation efforts we were performing as part of the corrective data quality engineering. In order to obtain a more timely copy—ours was a 1-year-old extract refreshed annually—we requested access through the proper channels, crossing many desks without action. As the situation was elevated, one individual decided to address the problem by informally creating a means of accessing the master list data. Volunteering to take on the effort, the individual neglected to mention the effort to our team, and consequently never understood the team's access requirements. Ten months after the initial request, the individual approached us with a solution created over two weeks. The solution allowed us to retrieve up to twelve records of master data at a time with a web browser, and access incorporated a substantial lag time.

After realizing that this solution was inadequate, our team managed to get the attention of developers who worked with the master list directly. They in turn offered their own solution—a utility that they described as capable of handling large volumes of data. This solution also proved inadequate. After one year, our requirements had not changed. We needed approximately four million records weekly to ensure that we were working with the most current data. These extracts were necessary since transactional access to live data was not available.

This was a reasonable and remarkably unchallenging technical request. The sticking point that made both of the offered solutions inappropriate was the way they were built. Because of the way the system was constructed, queries on particular items had to be done one at a time, and could not be done in bulk. This meant that when tools were built for us to access "large volumes of data," those tools simply automated the process of issuing tens of thousands of individual requests. Not surprisingly, extracting the volumes of data that we needed would have put an untenable burden on the system.

While this organization had many technically brilliant individuals, they only used the tools that they knew. As a result, we were unable to gain access to the data that we required. Many aspects of this situation might have been helped by the judicious application of XML to their DM practices.

Over the coming pages, we will describe many of the lessons we have learned with respect to how XML can help data managers. Organizations have resources in the knowledge that resides in the heads of their workers as well as in their systems. The way systems are built can either take advantage of that knowledge, or render it impotent.

Whether you are a database manager, administrator, or an application developer, XML will profoundly impact the way in which you think about and practice data management. Understanding how to correctly manage and apply XML-based metadata will be the key to advancements in the field. XML represents a large and varied body of metadata-related technologies, each of which has individual applications and strengths. Understanding the XML conceptual architecture is central to understanding the applications and strengths of XML, which are presented later in this chapter.

Why are you reading this book? Chances are that you opened the cover because XML is impacting your organization already, or will be very shortly. If you are interested in this material, our experience shows us that one or more of the following is probably true:

* You are leading a group that is working with XML.

* Your new application will benefit from the ability to speak a common language with other platforms.

* You are a technical analyst and need higher-level information on the XML component architecture.

* You are in a business group tasked with ensuring return on existing or potential technology investment.

* You are in IT planning and you need to understand how XML will impact various technology investment decisions.

* You are a CIO and you want to ensure that the organization is able to take advantage of modern XML-based DM technologies.

As a group, you are busy professionals who share a concern for effective architecture-driven development practices. You are as likely to be a manager tasked with leading XML development, as you are to be a technical XML specialist. If you are the latter, you already know that the best and most up-to-date XML documentation is of course on the web precisely because of the web's superior ability to publish and republish information from a central location. Because of the proliferation of excellent documentation on the technical aspects of XML, we will be focusing on the strategic implications for a variety of different roles within the organization.


This section presents the definitions that we will use when referring to data and metadata.

Data and Information

Before defining DM, it is necessary to formally define data and metadata. These definitions may be understood with the assistance of Figure 1.1.

1. Each fact combines with one or more meanings. For example, the fact "$5.99" combines with the meaning "price."

2. Each specific fact and meaning combination is referred to as a datum. For example, the fact "$5.99" is combined with the meaning "price" to form an understandable sentence: The item has a price of USD $5.99.

3. Information is one or more pieces of data that are returned in response to a specific request such as, How much does that used CD cost? The answer, or returned data, is USD $5.99. This request could be a simple acquisition of the data. But whether the data is pulled from a system or pushed to a user, a dataset is used to answer a specific question. This request or acquisition of the data is essentially the question being asked.

There are numerous other possible definitions and organizations of these terms relative to one another, usually specific to certain information disciplines, like applications development. The reason for using this set of definitions is that it acknowledges the need to pair context with an information system's raw response. The fact "123 Pine Street" means nothing without the paired meaning "address." A particular dataset is only useful within the context of a request. For example, what looks like a random assortment of house-for-sale listings becomes meaningful when we know that it was the result of a query requesting all houses within a particular price range in a particular area of town. This contextual model of data and information is important when we consider that one of the most common problems in information systems is that organizations have huge quantities of data that cannot be accessed properly or put into context. Facts without meanings are of limited value. The way facts are paired with meanings touches on the next term we will define—metadata.


The concept of metadata is central to understanding the usefulness of XML. If an organization is to invest in XML and metadata, it is important to understand what it is, so that those investments can be exploited to their fullest potential.

Many people incorrectly believe that metadata is a special type or class of data that they already have. While organizations have it whether they have looked at it or not, metadata actually represents a use of existing facts rather than a type of data itself. What does it mean to say that it is a use of facts? Metadata has to do with the structure and meaning of facts. In other words, metadata describes the use of those facts. Looking back at Figure 1.1, metadata acts as the meaning paired with the facts used to create data.

Take for example a standard hammer. This object can be looked at in two ways. On one hand, it is just an ordinary object, like an automobile, or even a highway bridge. While it is possible to look at it as an object just like any other, what makes the hammer powerful is the understanding of its use. A hammer is something that drives nails—that is how we understand the hammer! Metadata is the same. Rather than understanding it just as data (as some might claim that the hammer is just an object), we understand it in terms of its use—metadata is a categorization and explanation of facts. The implications of the actual meaning of metadata must be understood. In the coming discussion, we will outline why understanding of metadata is critical.

When metadata is treated like data, the benefits are no more or less than just having more data in an existing system. This is because looking at it simply as data causes it to be treated like any other data. When metadata is viewed as a use of data, it gains additional properties. For example, data sources can be conceptually "indexed" using the metadata, allowing anyone to determine which data is being captured by which system. Wouldn't it be useful to know all of the places throughout a system where integer data was used within the context of uniquely identifying a customer? The benefits of data transparency throughout the organization are numerous—the application of metadata to create this transparency is one of the topics that we will address.


Excerpted from XML for Data Management by Peter Aiken David Allen Copyright © 2004 by Elsevier Inc.. Excerpted by permission of MORGAN KAUFMANN PUBLISHERS. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Table of Contents

Preface • Foreword • Chapter 1 - XML and DM Basics • Chapter 2 - XML From the Builders Perspective: Using XML Technologies to Support DM • Chapter 3 - XML Component Architecture (as it relates to DM) • Chapter 4 - XML and Data Engineering • Chapter 5 - Making & Using XML: the technologist's perspective • Chapter 6 -XML Frameworks • Chapter 7 - XML-Based Portal Technologies and Data Management Strategies • Chapter 8 - XML & DM focused on Enterprise Application Integration (EAI) • Chapter 9 - XML, DM & Reengineering • Chapter 10: Networks of Networks, Metadata, and the Future • Chapter 11 - Expanded Data Management Scope • References

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews