- Shopping Bag ( 0 items )
Welcome to the first book that thoroughly describes the data modeling techniques
used in constructing a multipurpose, stable, and sustainable data warehouse
used to support business intelligence (BI). This chapter introduces the
data warehouse by describing the objectives of BI and the data warehouse and
by explaining how these fit into the overall Corporate Information Factory
(CIF) architecture. It discusses the iterative nature of the data warehouse construction
and demonstrates the importance of the data warehouse data model
and the justification for the type of data model format suggested in this book.
We discuss why the format of the model should be based on relational design
techniques, illustrating the need to maximize nonredundancy, stability, and
maintainability. Another section of the chapter outlines the characteristics of a
maintainable data warehouse environment. The chapter ends with a discussion
of the impact of this modeling approach on the ultimate delivery of the
data marts. This chapter setsup the reader to understand the rationale behind
the ensuing chapters, which describe in detail how to create the data warehouse
Overview of Business Intelligence
BI, in the context of the data warehouse, is the ability of an enterprise to study
past behaviors and actions in order to understand where the organization has
been, determine its current situation, and predict or change what will happen
in the future. BI has been maturing for more than 20 years. Let's briefly go over
the past decade of this fascinating and innovative history.
You're probably familiar with the technology adoption curve. The first companies
to adopt the new technology are called innovators. The next category is
known as the early adopters, then there are members of the early majority,
members of the late majority, and finally the laggards. The curve is a traditional
bell curve, with exponential growth in the beginning and a slowdown in
market growth occurring during the late majority period. When new technology
is introduced, it is usually hard to get, expensive, and imperfect. Over
time, its availability, cost, and features improve to the point where just about
anyone can benefit from ownership. Cell phones are a good example of this.
Once, only the innovators (doctors and lawyers?) carried them. The phones
were big, heavy, and expensive. The service was spotty at best, and you got
"dropped" a lot. Now, there are deals where you can obtain a cell phone for
about $60, the service providers throw in $25 of airtime, and there are no
monthly fees, and service is quite reliable.
Data warehousing is another good example of the adoption curve. In fact, if
you haven't started your first data warehouse project, there has never been a
better time. Executives today expect, and often get, most of the good, timely
information they need to make informed decisions to lead their companies
into the next decade. But this wasn't always the case.
Just a decade ago, these same executives sanctioned the development of executive
information systems (EIS) to meet their needs. The concept behind EIS
initiatives was sound-to provide executives with easily accessible key performance
information in a timely manner. However, many of these systems
fell short of their objectives, largely because the underlying architecture could
not respond fast enough to the enterprise's changing environment. Another
significant shortcoming of the early EIS days was the enormous effort required
to provide the executives with the data they desired. Data acquisition or the
extract, transform, and load (ETL) process is a complex set of activities whose
sole purpose is to attain the most accurate and integrated data possible and
make it accessible to the enterprise through the data warehouse or operational
data store (ODS).
The entire process began as a manually intensive set of activities. Hard-coded
"data suckers" were the only means of getting data out of the operational systems
for access by business analysts. This is similar to the early days of telephony,
when operators on skates had to connect your phone with the one you
were calling by racing back and forth and manually plugging in the appropriate
Fortunately, we have come a long way from those days, and the data warehouse
industry has developed a plethora of tools and technologies to support
the data acquisition process. Now, progress has allowed most of this process to
be automated, as it has in today's telephony world. Also, similar to telephony
advances, this process remains a difficult, if not temperamental and complicated,
one. No two companies will ever have the same data acquisition activities
or even the same set of problems. Today, most major corporations with
significant data warehousing efforts rely heavily on their ETL tools for design,
construction, and maintenance of their BI environments.
Another major change during the last decade is the introduction of tools and
modeling techniques that bring the phrase "easy to use" to life. The dimensional
modeling concepts developed by Dr. Ralph Kimball and others are
largely responsible for the widespread use of multidimensional data marts to
support online analytical processing.
In addition to multidimensional analyses, other sophisticated technologies
have evolved to support data mining, statistical analysis, and exploration
needs. Now mature BI environments require much more than star schemas-flat
files, statistical subsets of unbiased data, normalized data structures, in
addition to star schemas, are all significant data requirements that must be
supported by your data warehouse.
Of course, we shouldn't underestimate the impact of the Internet on data
warehousing. The Internet helped remove the mystique of the computer. Executives
use the Internet in their daily lives and are no longer wary of touching
the keyboard. The end-user tool vendors recognized the impact of the Internet,
and most of them seized upon that realization: to design their interface such
that it replicated some of the look-and-feel features of the popular Internet
browsers and search engines. The sophistication-and simplicity-of these
tools has led to a widespread use of BI by business analysts and executives.
Another important event taking place in the last few years is the transformation
from technology chasing the business to the business demanding technology. In
the early days of BI, the information technology (IT) group recognized its value
and tried to sell its merits to the business community. In some unfortunate cases,
the IT folks set out to build a data warehouse with the hope that the business
community would use it. Today, the value of a sophisticated decision support
environment is widely recognized throughout the business. As an example, an
effective customer relationship management program could not exist without
strategic (data warehouse with associated marts) and a tactical (operational data
store and oper mart) decision-making capabilities. (See Figure 1.1)
One of the most significant developments during the last 10 years has been the
introduction of a widely accepted architecture to support all BI technological
demands. This architecture recognized that the EIS approach had several
major flaws, the most significant of which was that the EIS data structures
were often fed directly from source systems, resulting in a very complex data
acquisition environment that required significant human and computer
resources to maintain. The Corporate Information Factory (CIF) (see Figure
1.2), the architecture used in most decision support environments today,
addressed that deficiency by segregating data into five major databases (operational
systems, data warehouse, operational data store, data marts, and oper
marts) and incorporating processes to effectively and efficiently move data
from the source systems to the business users.
These components were further separated into two major groupings of components
* Getting data in consists of the processes and databases involved in acquiring
data from the operational systems, integrating it, cleaning it up, and
putting it into a database for easy usage. The components of the CIF that
are found in this function:
* The operational system databases (source systems) contain the data
used to run the day-to-day business of the company. These are still the
major source of data for the decision support environment.
* The data warehouse is a collection or repository of integrated, detailed,
historical data to support strategic decision-making.
* The operational data store is a collection of integrated, detailed, current
data to support tactical decision making.
* Data acquisition is a set of processes and programs that extracts data
for the data warehouse and operational data store from the operational
systems. The data acquisition programs perform the cleansing as well
as the integration of the data and transformation into an enterprise format.
This enterprise format reflects an integrated set of enterprise business
rules that usually causes the data acquisition layer to be the most
complex component in the CIF. In addition to programs that transform
and clean up data, the data acquisition layer also includes audit and
control processes and programs to ensure the integrity of the data as it
enters the data warehouse or operational data store.
* Getting information out consists of the processes and databases involved in
delivering BI to the ultimate business consumer or analyst. The components
of the CIF that are found in this function:
* The data marts are derivatives from the data warehouse used to provide
the business community with access to various types of strategic
* The oper marts are derivatives of the ODS used to provide the business
community with dimensional access to current operational data.
* Data delivery is the process that moves data from the data warehouse
into data and oper marts. Like the data acquisition layer, it manipulates
the data as it moves it. In the case of data delivery, however, the
origin is the data warehouse or ODS, which already contains high-quality,
integrated data that conforms to the enterprise business rules.
The CIF didn't just happen. In the beginning, it consisted of the data warehouse
and sets of lightly summarized and highly summarized data-initially
a collection of the historical data needed to support strategic decisions. Over
time, it spawned the operational data store with a focus on the tactical decision
support requirements as well. The lightly and highly summarized sets of data
evolved into what we now know are data marts.
Let's look at the CIF in action. Customer Relationship Management (CRM) is a
highly popular initiative that needs the components for tactical information
(operational systems, operational data store, and oper marts) and for strategic
information (data warehouse and various types of data marts). Certainly this
technology is necessary for CRM, but CRM requires more than just the technology-it
also requires alignment of the business strategy, corporate culture and
organization, and customer information in addition to technology to provide
long-term value to both the customer and the organization. An architecture
such as that provided by the CIF fits very well within the CRM environment,
and each component has a specific design and function within this architecture.
We describe each component in more detail later in this chapter.
CRM is a popular application of the data warehouse and operational data
store but there are many other applications. For example, the enterprise
resource planning (ERP) vendors such as SAP, Oracle, and PeopleSoft have
embraced data warehousing and augmented their tool suites to provide the
needed capabilities. Many software vendors are now offering various plug-ins
containing generic analytical applications such as profitability or key performance
indicator (KPI) analyses. We will cover the components of the CIF in far
greater detail in the following sections of this chapter.
The evolution of data warehousing has been critical in helping companies better
serve their customers and improve their profitability. It took a combination
of technological changes and a sustainable architecture. The tools for building
this environment have certainly come a long way. They are quite sophisticated
and offer great benefit in the design, implementation, maintenance, and access
to critical corporate data. The CIF architecture capitalizes on these technology
and tool innovations. It creates an environment that segregates data into five
distinct stores, each of which has a key role in providing the business community
with the right information at the right time, in the right place, and in the
right form. So, if you're a data warehousing late majority or even a laggard,
take heart. It was worth the wait.
What Is a Data Warehouse?
Before we get started with the actual description of the modeling techniques,
we need to make sure that all of us are on the same page in terms of what we
mean by a data warehouse, its role and purpose in BI, and the architectural
components that support its construction and usage.
Role and Purpose of the Data Warehouse
As we see in the first section of this chapter, the overall BI architecture has
evolved considerably over the past decade. From simple reporting and EIS
systems to multidimensional analyses to statistical and data mining requirements
to exploration capabilities, and now the introduction of customizable
analytical applications, these technologies are part of a robust and mature BI
environment. See Figure 1.3 for the general timeframe for each of these technological
Given these important but significantly different technologies and data format
requirements, it should be obvious that a repository of quality, trusted data in
a flexible, reusable format must be the starting point to support and maintain
any BI environment. The data warehouse has been a part of the BI architecture
from the very beginning. Different methodologies and data warehouse gurus
have given this component various names such as:
A staging area. A variation on the data warehouse is the "back office" staging
area where data from the operational systems is first brought together.
It is an informally designed and maintained grouping of data whose only
purpose is to feed multidimensional data marts.
The information warehouse. This was an early name for the data warehouse
used by IBM and other vendors. It was not as clearly defined as the
staging area and, in many cases, encompassed not only the repository of
historical data but also the various data marts in its definition.
Excerpted from Mastering Data Warehouse Design
by Claudia Imhoff Nicholas Galemmo Jonathan G. Geiger
Copyright © 2003 by Claudia Imhoff, Nicholas Galemmo, Jonathan G. Geiger.
Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
|About the Authors|
|Ch. 2||Fundamental Relational Concepts||29|
|Pt. 2||Model Development||55|
|Ch. 3||Understanding the Business Model||57|
|Ch. 4||Developing the Model||97|
|Ch. 5||Creating and Maintaining Keys||135|
|Ch. 6||Modeling the Calendar||157|
|Ch. 7||Modeling Hierarchies||197|
|Ch. 8||Modeling Transactions||249|
|Ch. 9||Data Warehouse Optimization||285|
|Pt. 3||Operation and Management||319|
|Ch. 10||Accommodating Business Change||321|
|Ch. 11||Maintaining the Models||339|
|Ch. 12||Deploying the Relational Solution||359|
|Ch. 13||Comparison of Data Warehouse Methodologies||383|
Posted February 7, 2004
An excellent book for all Corporate Information Factory and Data Warehouse Designers and Devlopers. It provides detailed, practical information that provides insight and guidance or solving many problems or issues encountered in designing or building Data Warehouses. Blends the two different Data Warehouses methods well and shows how both can be best used in building the Corporate Information or Data Warehouse. A must read for all individuals involved in designing or building Data Warehouses. Highly recommended for the 'hands on' techniques to resolve issues that one will encounter in designing and building Data WarehousesWas this review helpful? Yes NoThank you for your feedback. Report this reviewThank you, this review has been flagged.
Posted October 29, 2003
'Mastering Data Warehouse Design' is an excellent book to help readers understand how to take maximum advantage of the strengths of diverse approaches associated with Bill Inmon and Ralph Kimball. The main reason I bought a copy of this book, even before it arrived in bookstores, was that I was leading a team to figure out how to merge Inmon and Kimball views for data modelling standards. We had already developed a DW architecture using Inmon's approach, with its associated relational/ERD method, but believed that it lacked rigour in the area of data marts. We also reviewed Kimball's books, and acknowledged the strengths of his dimensional modelling approaches, but were concerned that it lacked rigour for the diversity of analytical requirements in the manufacturing environment, e.g. data exploration/mining on a massive scale. We were struggling to figure out to combine the best of both - and then we discovered the imminent release of 'Mastering Data Warehouse Design'. After checking the Table of Contents on the publisher's web site, we had the book couriered directly from the publishers warehouse because it would not be available in local bookstores fast enough to meet our work schedule. Chapter 1 has an impressive 'sound bite' version of Inmon's DW architecture thinking, but extended to include broader Business Intelligence concepts. Chapter 2 does a commendable job of explaining a tiered approach to data models, e.g. subject area model, business model, Operational system model, DW model. At first, this chapter was confusing because we had just finished a rigourous definition of data modelling standards, using more conventional terminology, e.g. logical/entity model, physical/table model. So the book's terminology didn't seem to fit in with our thinking. But after re-reading it, we realized that it added value in forcing us to look at the whole issue of modelling from a deliverables or outcomes perspective, rather than a modelling process perspective. Chapter 4 discusses how to develop a DW data model. The content outlines the sequence or steps involved in developing a DW data model, and it's rare that I've been able to find as good coverage of the topic as I found in this chapter. Chapters 5 - 11 cover topics like keys, modelling time/hierarchies/transactions, with some solid content on how to model for on-going business change and how to maintain the tiered models. However, I'm not fully conversant with some of these topics, so am not in a good position to evaluate their content. Chapter 12 has a very good discussion on how to deal with a proliferation of legacy data marts, and strategies for migrating to a central DW that feeds a variety of data marts. It also introduces Chapter 13 which has a classic discussion on comparing the relational and dimensional modelling approaches - including the best discussion I've ever seen on the strengths and weaknesses of each approach. While our team didn't buy into all this chapter's points, the clear logical explanation of strengths and weaknesses helped facilitate a consensus agreement among two groups aligned with the Inmon/relational and Kimball/dimensional approaches. The consensus solution, mostly based on Chapter 13's content, would have been difficult to achieve without this book, i.e. chapter 13's content alone was worth much more than the price of the book. So if you're struggling with the merits of the Inmon and Kimball architecture/modelling approaches, this book is a valuable resource to help take advantage of the best of both.Was this review helpful? Yes NoThank you for your feedback. Report this reviewThank you, this review has been flagged.
Posted October 23, 2003
A refreshing power-packed package. Shows full spectrum of (infrastructure concepts) what you need to do right for (DW/BI) efficiency, with integrated discussion of how to do it in the real world (Expectations setting, pitfalls, organizational considerations, financial sponsorship).Was this review helpful? Yes NoThank you for your feedback. Report this reviewThank you, this review has been flagged.
Posted October 17, 2003
Spans the entire spectrum of technical and management considerations in DW design. It is a brilliant blueprint for to do's and what not to-do. A practical guide, text book, and reference.Was this review helpful? Yes NoThank you for your feedback. Report this reviewThank you, this review has been flagged.
Posted August 22, 2003
This book provides a lot of practical and useful advice for modeling and implementing a complex data warehouse environment. Use of realistic case studies helps in understanding the reason why things are done the way they are.Was this review helpful? Yes NoThank you for your feedback. Report this reviewThank you, this review has been flagged.