- Shopping Bag ( 0 items )
You don't need a forklift to work with a data warehouse, but you do need a hefty load of know-how to make wise decisions when setting one up. Data is probably your company's most important asset, so your data warehouse should serve your needs. Here's how to understand, develop, implement, and use data warehouses, plus a sneak peek into their...
You don't need a forklift to work with a data warehouse, but you do need a hefty load of know-how to make wise decisions when setting one up. Data is probably your company's most important asset, so your data warehouse should serve your needs. Here's how to understand, develop, implement, and use data warehouses, plus a sneak peek into their future.
Know your stuff — understand what a data warehouse is, what should be housed there, and what data assets are
Get a handle on technology — learn about column-wise databases, hardware assisted databases, middleware, and master data management
The intelligent view — see how business intelligence and data warehousing work together
Ask the right questions — explore data mining and learn to find what you need
Do the groundwork — choose your project team and apply best development practices to your data warehousing projects
Keep the user in mind — involve your users in defining business needs through testing, and learn how to get valuable feedback
Fix or replace? — learn how to review and upgrade existing data storage to make it serve your needs
Buyer beware — be prepared when dealing with data warehousing product vendors
Open the book and find:
What to expect from your data warehouse
The difference between data warehouses and data marts
All about specialty database technologies
What to look for in a consultant
How your data warehouse feeds dashboards and scorecards
Secrets for managing a successful data warehouse project
How to effectively capture business needs and requirements
Ten signs your project is in trouble
Part I: The Data Warehouse: Home for Your Data Assets.
Chapter 1: What’s in a Data Warehouse?
Chapter 2: What Should You Expect from Your Data Warehouse?
Chapter 3: Have It Your Way: The Structure of a Data Warehouse.
Chapter 4: Data Marts: Your Retail Data Outlet.
Part II: Data Warehousing Technology.
Chapter 5: Relational Databases and Data Warehousing.
Chapter 6: Specialty Databases and Data Warehousing.
Chapter 7: Stuck in the Middle with You: Data Warehousing Middleware.
Part III: Business Intelligence and Data Warehousing.
Chapter 8: An Intelligent Look at Business Intelligence.
Chapter 9: Simple Database Querying and Reporting.
Chapter 10: Business Analysis (OLAP).
Chapter 11: Data Mining: Hi-Ho, Hi-Ho, It’s Off to Mine We Go.
Chapter 12: Dashboards and Scorecards.
Part IV: Data Warehousing Projects: How to Do Them Right.
Chapter 13: Data Warehousing and Other IT Projects: The Same but Different.
Chapter 14: Building a Winning Data Warehousing Project Team.
Chapter 15: You Need What? When? — Capturing Requirements.
Chapter 16: Analyzing Data Sources.
Chapter 17: Delivering the Goods.
Chapter 18: User Testing, Feedback, and Acceptance.
Part V: Data Warehousing: The Big Picture.
Chapter 19: The Information Value Chain: Connecting Internal and External Data.
Chapter 20: Data Warehousing Driving Quality and Integration.
Chapter 21: The View from the Executive Boardroom.
Chapter 22: Existing Sort-of Data Warehouses: Upgrade or Replace?
Chapter 23: Surviving in the Computer Industry (and Handling Vendors).
Chapter 24: Working with Data Warehousing Consultants.
Part VI: Data Warehousing in the Not-Too-Distant Future.
Chapter 25: Expanding Your Data Warehouse with Unstructured Data.
Chapter 26: Agreeing to Disagree about Semantics.
Chapter 27: Collaborative Business Intelligence.
Part VII: The Part of Tens.
Chapter 28: Ten Questions to Consider When You’re Selecting User Tools.
Chapter 29: Ten Secrets to Managing Your Project Successfully.
Chapter 30: Ten Sources of Up-to-Date Information about Data Warehousing.
Chapter 31: Ten Mandatory Skills for a Data Warehousing Consultant.
Chapter 32: Ten Signs of a Data Warehousing Project in Trouble.
Chapter 33: Ten Signs of a Successful Data Warehousing Project.
Chapter 34: Ten Subject Areas to Cover with Product Vendors.
In This Chapter
About two weeks before writing this chapter, I was on a business trip to Los Angeles and driving down the Ventura freeway on a nice summer day. (I always call it the Ventura highway, after the America song from the 1970s: "Ventura highway, in the sunshine; where the days are longer...." Oops, I got a little carried away there.) An ad was on the radio for a regional (I think) hardware chain based on this premise: "Shop at our stores because we have fewer products than the big warehouse-like competition. It's much easier to get in and out of here more quickly with what you need."
Interestingly, another hardware chain on the east coast ran ads a few years ago with almost the identical theme. This chain used to make fun of the warehouse-size competition by featuring radio ads with helicopter search parties looking for shoppers lost in a distant department and references to shuttle buses having to take shoppers between departments in the warehouse-size stores. The premise was the same: "We have less merchandise than the other guy, so shop with us because it's easier."
That's the idea of the data mart.
Forget all the hype. The idea of a data mart is hardly revolutionary, despite what you may read in the computer trade press and hear at conferences and seminars.
A data mart is simply a scaled-down data warehouse, that's all.
No official, standard definition exists for a data mart, just as no official, standard definition exists for a data warehouse or operational data store or data mining or virtually any other concept in this realm of the technology landscape. Vendors do their best to define data marts in the context of their products; consultants and analysts usually define data marts in a way that's advantageous to their particular offerings and specialties. That's the way this business goes, and there's nothing wrong with it; be prepared, however, to ask the tough questions.
You can take one of three main approaches to creating a data mart:
Many data warehousing experts would argue (and I'm one of them, in this case) that a true data mart is a "retail outlet" with its contents provided from a data warehouse, as shown in Figure 4-1.
In an environment like the one shown in Figure 4-1, look at the relationship between the data sources, the data warehouse, the data mart, and the user in this way:
Before moving on to the next approach, though, I want to show you a variation of the sourced-from-the-warehouse model. Sometimes the data warehouse that serves as the source for the data mart does not have all the information the data mart's users need. You can solve this problem in one of two ways:
If your data mart needs data that's not in the data warehouse, which of these two approaches should you choose? If your data mart is the only one within your company that needs that additional data (be sure to ask around), leave the warehouse alone and bring the supplemental data directly into your data mart. If other data marts or other projects served by the data warehouse can use the additional information, you should add it to the data warehouse first and then send it, along with the other contents you need, to your data mart.
Sometimes you just don't have a data warehouse from which to get data for your data mart, so you have to do it yourself. In many (probably most) of these situations, you create a tactical, quick-strike data mart that is, in effect, a miniature data warehouse. You follow the same methodology and complete the same processes of data extraction, transformation, quality assurance, and loading. The difference is that you're doing it on a smaller scale than with a full-blown data warehouse.
What does a smaller scale mean? As shown in Figure 4-4, data brought into a tactical data mart is often necessary to answer a specific set of business questions within relatively narrow confines. Some examples are a specific region or territory within a company, a subset of a company's overall product line, or some other subsetting model.
So the question must be asked: If you have to start from scratch and don't have a data warehouse to provide data to your data mart, why not build a full-scale data warehouse instead? Here are three reasons to go the data mart route:
What happens, though, if pressing business needs steer you toward a tactical data mart and you have a longer-term vision of its contents being integrated with other data? Have you created an architectural dead end? Will you have to throw away your data mart at some point and start over with a "real" data warehousing effort? Will Batman arrive in time to save Robin from the Riddler? (Sorry about that -- the reruns are getting to me.)
Theoretically, you can design data marts so that they're eventually integrated in a bottom-up manner, by building a data warehousing environment (in contrast to a single, monolithic data warehouse).
Pay close attention to the word theoretically. Bottom-up integration of data marts isn't for the fainthearted. You can do it, but it's more difficult than creating a tactical data mart that will always remain stand-alone. Can this approach be successful? The answer is a definite maybe.
If a data mart is a smaller-scale version of a data warehouse, the question comes up again: What does "smaller scale" mean in reference to the contents of a data mart?
This section describes some ways you can select subsets of information for a data mart and the circumstances under which you may want to try each approach.
A data mart may contain only the information relevant to a certain geographical area, such as a region or territory within your company. Figure 4-5 illustrates an example of geography-bounded data.
Although the use of a geography-bounded data mart is technically feasible and relatively straightforward, it's often not such a good idea. Why? Because a cross-geography comparison (for example, "How are our Arizona stores doing versus our Pennsylvania stores?") is a natural use of any data warehouse environment. When you create separate data marts for various geographical reasons, you make it much more difficult to make these types of comparisons.
Another approach to deciding what should be in your data mart is to base decisions on what information a specific organization needs when it's the sole (or at least primary) user of the data mart. As shown in Figure 4-6, a bank may create one data mart for consumer checking-account analysis and another data mart for commercial checking accounts.
This approach works well when the overwhelming majority of inquiries and reports are organization-oriented. That is, the commercial checking group has no need whatsoever to analyze consumer checking accounts and vice versa. It pays to dig into the business needs during the scope phase of a data warehousing or data mart project. Outsiders, for example, may think, "Okay, put all checking-account information, both consumer and commercial, into the same environment because, this way, reports can be run comparing average balances and other information for the entire checking-account portfolio at the bank." After additional analysis, though, you may notice that the bank doesn't do this type of comparison, so why not keep the two areas separate and avoid unnecessary complexity?
Using another approach, one that crosses organizational boundaries, you establish a data mart's contents based on a specific function (or set of related functions) within the company. A multinational chemical company, for example, may create a data mart exclusively for the sales and marketing functions across all organizations and across all product lines, as shown in Figure 4-7.
A company may occasionally be so focused on a specific competitor that it may make sense to create a data mart oriented toward that particular competitor. As shown in Figure 4-8, this type of environment may include competitive sales, all available public information about the competitor (particularly if the information is available over the Internet), and industry analysts' reports, for example.
To truly provide the business intelligence that's necessary in a competitor-driven situation, you should construct the data mart to include multimedia information in addition to the traditional data types typically found in a data warehouse. (Chapter 24 describes multimedia data and data warehousing.)
An organization's operations occasionally are driven by the answers to a selected number (often a handful) of business questions. Based on the answers, a company may speed up or slow down production lines, start up extra shifts to increase production or initiate layoffs, or, possibly, choose to acquire or not acquire other companies.
Business questions with this degree of weighty importance have traditionally caused nightmares for the in-house employees chartered with digging out data and reports, consolidating and checking the information, and reporting the results to executive management. Sounds like a job for a data warehouse, you say? Before constructing a full-scale data warehouse that can answer these (and many other) business questions, however, it's probably worth considering that the answer may be a small-scale data mart designed specifically to answer those high-impact, high-value "How are we doing?" type of questions.
Later, this type of environment may grow into a larger-scale data warehouse. It often makes more sense, however, to concentrate your efforts on supporting a data mart with known business value instead of on supplementing it with volumes of additional data that may provide business value (but can also slow response time or significantly complicate the end-to-end architecture). Again, the job you do up-front, in the early phases of your project, make a big difference in the direction you take and your level of success.
Any set of criteria you can dream up can determine a data mart's contents. Some make sense; others don't. Some take you into an architectural dead end because you get only limited value and have to start all over to expand your capabilities.
Data mart or data warehouse?
Now that you (presumably) have read an entire chapter about the concept of data marts, I want to make one important point. If you start a project from the outset with either of the following premises, you already have two strikes against you:
By labeling your project as one or the other of these terms, you already have made some preconceived notions about the work you will do, before you have even begun to dig into the business problem. Until you understand the following three issues, you have no foundation on which to classify your impending project as either a data mart or a data warehouse:
As mentioned at the beginning of this chapter, what's more important is that no formal definitions distinguish a data mart from a data warehouse. If you're extracting and rehosting a subset of data from an existing application into another environment, you can accurately deem what you're building as a data mart.
If you're starting from scratch, however, and extracting data from one or more source systems, handling the quality assurance and transformation, and copying that data into a separate environment, what determines whether you're building a data warehouse or a data mart? Although some guidelines exist, such as number of subject areas and volumes of data, it all comes down to this statement: As soon as you start labeling your environment as one or the other, you're adding preconceived notions and beliefs about its characteristics that may not fit your business needs.
Here's the answer: Forget about the terms data warehouse and data mart. Concentrate instead on your business problem and its possible solution: What data do you need in order to perform certain informational and analytical functions; where is that data now and in what form; and what do you have to do to make it available to your users?
Leave the terminology wars to the vendors and analysts. Don't get caught up in the hype.
No matter how you decide to divide the universe of possible contents into some subset for your data mart, it's important to remember that in order to obtain maximum business value from your data mart, you must implement it quickly.
Here are the three keys to speedy implementation: