Read an Excerpt
Data Marts: Your Retail Data Outlet
In This Chapter
- Revolutionizing the data warehouse world with data marts
- Getting past the data mart hype and focusing on business value
- Looking at different architectures for data marts
- Determining what should be in your data mart
- Developing a data mart
About two weeks before writing this chapter, I was on a business trip to Los Angeles and driving down the Ventura freeway on a nice summer day. (I always call it the Ventura highway, after the America song from the 1970s: "Ventura highway, in the sunshine; where the days are longer...." Oops, I got a little carried away there.) An ad was on the radio for a regional (I think) hardware chain based on this premise: "Shop at our stores because we have fewer products than the big warehouse-like competition. It's much easier to get in and out of here more quickly with what you need."
Interestingly, another hardware chain on the east coast ran ads a few years ago with almost the identical theme. This chain used to make fun of the warehouse-size competition by featuring radio ads with helicopter search parties looking for shoppers lost in a distant department and references to shuttle buses having to take shoppers between departments in the warehouse-size stores. The premise was the same: "We have less merchandise than the other guy, so shop with us because it's easier."
That's the idea of the data mart.
The Idea behind Data Marts
Forget all the hype. The idea of a data mart is hardly revolutionary, despite what you may read in the computer trade press and hear at conferences and seminars.
A data mart is simply a scaled-down data warehouse, that's all.
No official, standard definition exists for a data mart, just as no official, standard definition exists for a data warehouse or operational data store or data mining or virtually any other concept in this realm of the technology landscape. Vendors do their best to define data marts in the context of their products; consultants and analysts usually define data marts in a way that's advantageous to their particular offerings and specialties. That's the way this business goes, and there's nothing wrong with it; be prepared, however, to ask the tough questions.
Architectural Approaches to Data Marts
You can take one of three main approaches to creating a data mart:
- Sourced by a data warehouse (most or all of the data mart's contents come from a data warehouse)
- Tactical, quickly developed, and created from scratch
- Developed from scratch with an eye toward eventual integration
Data marts sourced by a data warehouse
Many data warehousing experts would argue (and I'm one of them, in this case) that a true data mart is a "retail outlet" with its contents provided from a data warehouse, as shown in Figure 4-1.
In an environment like the one shown in Figure 4-1, look at the relationship between the data sources, the data warehouse, the data mart, and the user in this way:
- The data sources, acting as suppliers of raw materials, send data into the data warehouse.
- The data warehouse then serves as a consolidation and distribution center, collecting the raw materials in much the same way as in any data warehouse.
- Instead of the user (the consumer) going straight to the data warehouse, though, the data warehouse serves as a wholesaler with the premise of "we sell only to retailers, not directly to the public." In this case, the retailers are the data marts.
- The data marts order data from the warehouse and, after stocking the newly acquired information, make it available to consumers (users).
Before moving on to the next approach, though, I want to show you a variation of the sourced-from-the-warehouse model. Sometimes the data warehouse that serves as the source for the data mart does not have all the information the data mart's users need. You can solve this problem in one of two ways:
- Supplement the missing information directly into the data warehouse before sending the selected contents to the data mart, as shown in Figure 4-2.
- Don't touch the data warehouse; instead, add the supplemental information to the data mart in addition to what it receives from the data warehouse, as shown in Figure 4-3.
If your data mart needs data that's not in the data warehouse, which of these two approaches should you choose? If your data mart is the only one within your company that needs that additional data (be sure to ask around), leave the warehouse alone and bring the supplemental data directly into your data mart. If other data marts or other projects served by the data warehouse can use the additional information, you should add it to the data warehouse first and then send it, along with the other contents you need, to your data mart.
Tactical, quick-strike data marts
Sometimes you just don't have a data warehouse from which to get data for your data mart, so you have to do it yourself. In many (probably most) of these situations, you create a tactical, quick-strike data mart that is, in effect, a miniature data warehouse. You follow the same methodology and complete the same processes of data extraction, transformation, quality assurance, and loading. The difference is that you're doing it on a smaller scale than with a full-blown data warehouse.
What does a smaller scale mean? As shown in Figure 4-4, data brought into a tactical data mart is often necessary to answer a specific set of business questions within relatively narrow confines. Some examples are a specific region or territory within a company, a subset of a company's overall product line, or some other subsetting model.
So the question must be asked: If you have to start from scratch and don't have a data warehouse to provide data to your data mart, why not build a full-scale data warehouse instead? Here are three reasons to go the data mart route:
- Speed: A tactical data mart is typically completed in 90 to 120 days rather than the much longer time required for a full-scale data warehouse.
- Cost: Doing the job faster means that you spend less money; it's that simple.
- Complexity and risk: When you work with less data and fewer sources over a shorter period, your environment is likely to be significantly less complex -- and have fewer associated risks.
Bottom-up, integration-oriented data marts
What happens, though, if pressing business needs steer you toward a tactical data mart and you have a longer-term vision of its contents being integrated with other data? Have you created an architectural dead end? Will you have to throw away your data mart at some point and start over with a "real" data warehousing effort? Will Batman arrive in time to save Robin from the Riddler? (Sorry about that -- the reruns are getting to me.)
Theoretically, you can design data marts so that they're eventually integrated in a bottom-up manner, by building a data warehousing environment (in contrast to a single, monolithic data warehouse).
Pay close attention to the word theoretically. Bottom-up integration of data marts isn't for the fainthearted. You can do it, but it's more difficult than creating a tactical data mart that will always remain stand-alone. Can this approach be successful? The answer is a definite maybe.
What Should Be in a Data Mart
If a data mart is a smaller-scale version of a data warehouse, the question comes up again: What does "smaller scale" mean in reference to the contents of a data mart?
This section describes some ways you can select subsets of information for a data mart and the circumstances under which you may want to try each approach.
A data mart may contain only the information relevant to a certain geographical area, such as a region or territory within your company. Figure 4-5 illustrates an example of geography-bounded data.
Although the use of a geography-bounded data mart is technically feasible and relatively straightforward, it's often not such a good idea. Why? Because a cross-geography comparison (for example, "How are our Arizona stores doing versus our Pennsylvania stores?") is a natural use of any data warehouse environment. When you create separate data marts for various geographical reasons, you make it much more difficult to make these types of comparisons.
Another approach to deciding what should be in your data mart is to base decisions on what information a specific organization needs when it's the sole (or at least primary) user of the data mart. As shown in Figure 4-6, a bank may create one data mart for consumer checking-account analysis and another data mart for commercial checking accounts.
This approach works well when the overwhelming majority of inquiries and reports are organization-oriented. That is, the commercial checking group has no need whatsoever to analyze consumer checking accounts and vice versa. It pays to dig into the business needs during the scope phase of a data warehousing or data mart project. Outsiders, for example, may think, "Okay, put all checking-account information, both consumer and commercial, into the same environment because, this way, reports can be run comparing average balances and other information for the entire checking-account portfolio at the bank." After additional analysis, though, you may notice that the bank doesn't do this type of comparison, so why not keep the two areas separate and avoid unnecessary complexity?
Using another approach, one that crosses organizational boundaries, you establish a data mart's contents based on a specific function (or set of related functions) within the company. A multinational chemical company, for example, may create a data mart exclusively for the sales and marketing functions across all organizations and across all product lines, as shown in Figure 4-7.
A company may occasionally be so focused on a specific competitor that it may make sense to create a data mart oriented toward that particular competitor. As shown in Figure 4-8, this type of environment may include competitive sales, all available public information about the competitor (particularly if the information is available over the Internet), and industry analysts' reports, for example.
To truly provide the business intelligence that's necessary in a competitor-driven situation, you should construct the data mart to include multimedia information in addition to the traditional data types typically found in a data warehouse. (Chapter 24 describes multimedia data and data warehousing.)
Answers to specific business questions
An organization's operations occasionally are driven by the answers to a selected number (often a handful) of business questions. Based on the answers, a company may speed up or slow down production lines, start up extra shifts to increase production or initiate layoffs, or, possibly, choose to acquire or not acquire other companies.
Business questions with this degree of weighty importance have traditionally caused nightmares for the in-house employees chartered with digging out data and reports, consolidating and checking the information, and reporting the results to executive management. Sounds like a job for a data warehouse, you say? Before constructing a full-scale data warehouse that can answer these (and many other) business questions, however, it's probably worth considering that the answer may be a small-scale data mart designed specifically to answer those high-impact, high-value "How are we doing?" type of questions.
Later, this type of environment may grow into a larger-scale data warehouse. It often makes more sense, however, to concentrate your efforts on supporting a data mart with known business value instead of on supplementing it with volumes of additional data that may provide business value (but can also slow response time or significantly complicate the end-to-end architecture). Again, the job you do up-front, in the early phases of your project, make a big difference in the direction you take and your level of success.
Any set of criteria you can dream up can determine a data mart's contents. Some make sense; others don't. Some take you into an architectural dead end because you get only limited value and have to start all over to expand your capabilities.
Data mart or data warehouse?
Now that you (presumably) have read an entire chapter about the concept of data marts, I want to make one important point. If you start a project from the outset with either of the following premises, you already have two strikes against you:
By labeling your project as one or the other of these terms, you already have made some preconceived notions about the work you will do, before you have even begun to dig into the business problem. Until you understand the following three issues, you have no foundation on which to classify your impending project as either a data mart or a data warehouse:
As mentioned at the beginning of this chapter, what's more important is that no formal definitions distinguish a data mart from a data warehouse. If you're extracting and rehosting a subset of data from an existing application into another environment, you can accurately deem what you're building as a data mart.
If you're starting from scratch, however, and extracting data from one or more source systems, handling the quality assurance and transformation, and copying that data into a separate environment, what determines whether you're building a data warehouse or a data mart? Although some guidelines exist, such as number of subject areas and volumes of data, it all comes down to this statement: As soon as you start labeling your environment as one or the other, you're adding preconceived notions and beliefs about its characteristics that may not fit your business needs.
Here's the answer: Forget about the terms data warehouse and data mart. Concentrate instead on your business problem and its possible solution: What data do you need in order to perform certain informational and analytical functions; where is that data now and in what form; and what do you have to do to make it available to your users?
Leave the terminology wars to the vendors and analysts. Don't get caught up in the hype.
Implementing a Data Mart -- Quickly
No matter how you decide to divide the universe of possible contents into some subset for your data mart, it's important to remember that in order to obtain maximum business value from your data mart, you must implement it quickly.
Here are the three keys to speedy implementation:
- Follow a phased methodology. As described in Chapter 13, you spend the majority of your up-front time on the project focusing on the specific business value you want.
- Hold to a fixed time for each phase. If you set aside two weeks for your scope, for example, stick to that window. Don't extend any phase (especially the early ones) unless the project is doomed to failure.
- Avoid scope creep at all costs. Though costly and dangerous in any project -- data warehousing or otherwise -- scope creep (when additional feature requests keep creeping in long past the cutoff point) is devastating in a data mart effort. You probably will add complexity with only marginal incremental business value (if any) and do little other than put your project at risk.