Read an Excerpt
XML for Data Management
By Peter Aiken David Allen
MORGAN KAUFMANN PUBLISHERS
Copyright © 2004 Elsevier Inc.
All right reserved.
ISBN: 978-0-08-052144-2
Chapter One
XML and DM Basics
Introduction
XML equips organizations with the tools to develop programmatic solutions to manage their data interchange environments using the same economies of scale associated with their DM environments. XML complements existing DM efforts nicely and enables the development of new and innovative DM techniques. Rather than looking at data management as a set of many problems, each consisting of a method of transporting, transforming, or otherwise evolving data from one point to alternate forms, practitioners can now look at the big picture of DM—the complete set of systems working together, and how data moves amongst them. This is jumping up one conceptual level—moving from thinking of the challenge in terms of its individual instances to thinking of it in terms of a class of similar challenges. It is an architectural pattern that is seen frequently as systems evolve. First, a solution is created for a specific problem. Next, a slightly different problem arises that requires another specific solution. As more and more conceptually similar problems arise that differ only in the details, eventually it makes sense from an architectural standpoint to develop a general solution that deals with an entire class of similar problems by focusing on their commonalities rather than their differences.
This first chapter begins by describing a present-day DM challenge. We then present the definitions of DATA and METADATA used throughout the book. The next section presents a brief overview of DM. This is followed by a justification for investing in metadata/DM. We acknowledge the challenge of XML's hype, but we will provide a brief introduction and two short examples. These lead to an overview of the intersection of DM and XML. The chapter closes with a few XML caveats. We are presenting this information in order to help you make the case that investing in metadata is not only a good idea, but also necessary in order to realize the full benefit of XML and related IT investments.
The DM Challenge
An organization that we worked with once presented us with our most vexing DM challenge. When working with them to resolve a structural data quality situation, our team discovered it would be helpful to have access to a certain master list that was maintained on another machine located physically elsewhere in the organization. Better access to the master list would have sped up validation efforts we were performing as part of the corrective data quality engineering. In order to obtain a more timely copy—ours was a 1-year-old extract refreshed annually—we requested access through the proper channels, crossing many desks without action. As the situation was elevated, one individual decided to address the problem by informally creating a means of accessing the master list data. Volunteering to take on the effort, the individual neglected to mention the effort to our team, and consequently never understood the team's access requirements. Ten months after the initial request, the individual approached us with a solution created over two weeks. The solution allowed us to retrieve up to twelve records of master data at a time with a web browser, and access incorporated a substantial lag time.
After realizing that this solution was inadequate, our team managed to get the attention of developers who worked with the master list directly. They in turn offered their own solution—a utility that they described as capable of handling large volumes of data. This solution also proved inadequate. After one year, our requirements had not changed. We needed approximately four million records weekly to ensure that we were working with the most current data. These extracts were necessary since transactional access to live data was not available.
This was a reasonable and remarkably unchallenging technical request. The sticking point that made both of the offered solutions inappropriate was the way they were built. Because of the way the system was constructed, queries on particular items had to be done one at a time, and could not be done in bulk. This meant that when tools were built for us to access "large volumes of data," those tools simply automated the process of issuing tens of thousands of individual requests. Not surprisingly, extracting the volumes of data that we needed would have put an untenable burden on the system.
While this organization had many technically brilliant individuals, they only used the tools that they knew. As a result, we were unable to gain access to the data that we required. Many aspects of this situation might have been helped by the judicious application of XML to their DM practices.
Over the coming pages, we will describe many of the lessons we have learned with respect to how XML can help data managers. Organizations have resources in the knowledge that resides in the heads of their workers as well as in their systems. The way systems are built can either take advantage of that knowledge, or render it impotent.
Whether you are a database manager, administrator, or an application developer, XML will profoundly impact the way in which you think about and practice data management. Understanding how to correctly manage and apply XML-based metadata will be the key to advancements in the field. XML represents a large and varied body of metadata-related technologies, each of which has individual applications and strengths. Understanding the XML conceptual architecture is central to understanding the applications and strengths of XML, which are presented later in this chapter.
Why are you reading this book? Chances are that you opened the cover because XML is impacting your organization already, or will be very shortly. If you are interested in this material, our experience shows us that one or more of the following is probably true:
* You are leading a group that is working with XML.
* Your new application will benefit from the ability to speak a common language with other platforms.
* You are a technical analyst and need higher-level information on the XML component architecture.
* You are in a business group tasked with ensuring return on existing or potential technology investment.
* You are in IT planning and you need to understand how XML will impact various technology investment decisions.
* You are a CIO and you want to ensure that the organization is able to take advantage of modern XML-based DM technologies.
As a group, you are busy professionals who share a concern for effective architecture-driven development practices. You are as likely to be a manager tasked with leading XML development, as you are to be a technical XML specialist. If you are the latter, you already know that the best and most up-to-date XML documentation is of course on the web precisely because of the web's superior ability to publish and republish information from a central location. Because of the proliferation of excellent documentation on the technical aspects of XML, we will be focusing on the strategic implications for a variety of different roles within the organization.
Definitions
This section presents the definitions that we will use when referring to data and metadata.
Data and Information
Before defining DM, it is necessary to formally define data and metadata. These definitions may be understood with the assistance of Figure 1.1.
1. Each fact combines with one or more meanings. For example, the fact "$5.99" combines with the meaning "price."
2. Each specific fact and meaning combination is referred to as a datum. For example, the fact "$5.99" is combined with the meaning "price" to form an understandable sentence: The item has a price of USD $5.99.
3. Information is one or more pieces of data that are returned in response to a specific request such as, How much does that used CD cost? The answer, or returned data, is USD $5.99. This request could be a simple acquisition of the data. But whether the data is pulled from a system or pushed to a user, a dataset is used to answer a specific question. This request or acquisition of the data is essentially the question being asked.
There are numerous other possible definitions and organizations of these terms relative to one another, usually specific to certain information disciplines, like applications development. The reason for using this set of definitions is that it acknowledges the need to pair context with an information system's raw response. The fact "123 Pine Street" means nothing without the paired meaning "address." A particular dataset is only useful within the context of a request. For example, what looks like a random assortment of house-for-sale listings becomes meaningful when we know that it was the result of a query requesting all houses within a particular price range in a particular area of town. This contextual model of data and information is important when we consider that one of the most common problems in information systems is that organizations have huge quantities of data that cannot be accessed properly or put into context. Facts without meanings are of limited value. The way facts are paired with meanings touches on the next term we will define—metadata.
Metadata
The concept of metadata is central to understanding the usefulness of XML. If an organization is to invest in XML and metadata, it is important to understand what it is, so that those investments can be exploited to their fullest potential.
Many people incorrectly believe that metadata is a special type or class of data that they already have. While organizations have it whether they have looked at it or not, metadata actually represents a use of existing facts rather than a type of data itself. What does it mean to say that it is a use of facts? Metadata has to do with the structure and meaning of facts. In other words, metadata describes the use of those facts. Looking back at Figure 1.1, metadata acts as the meaning paired with the facts used to create data.
Take for example a standard hammer. This object can be looked at in two ways. On one hand, it is just an ordinary object, like an automobile, or even a highway bridge. While it is possible to look at it as an object just like any other, what makes the hammer powerful is the understanding of its use. A hammer is something that drives nails—that is how we understand the hammer! Metadata is the same. Rather than understanding it just as data (as some might claim that the hammer is just an object), we understand it in terms of its use—metadata is a categorization and explanation of facts. The implications of the actual meaning of metadata must be understood. In the coming discussion, we will outline why understanding of metadata is critical.
When metadata is treated like data, the benefits are no more or less than just having more data in an existing system. This is because looking at it simply as data causes it to be treated like any other data. When metadata is viewed as a use of data, it gains additional properties. For example, data sources can be conceptually "indexed" using the metadata, allowing anyone to determine which data is being captured by which system. Wouldn't it be useful to know all of the places throughout a system where integer data was used within the context of uniquely identifying a customer? The benefits of data transparency throughout the organization are numerous—the application of metadata to create this transparency is one of the topics that we will address.
(Continues...)
Excerpted from XML for Data Management by Peter Aiken David Allen Copyright © 2004 by Elsevier Inc.. Excerpted by permission of MORGAN KAUFMANN PUBLISHERS. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.