Microsoft SQL Server 2000 Resource Kit

Overview

Microsoft SQL Server 2000 is the high-end, mission-critical relational database management system for rapidly building the next generation of scalable e-commerce, line-of-business, and data-warehousing solutions. "Microsoft SQL Server 2000 Resource Kit" gives database administrators and other IT professionals the definitive technical information and tools they need to deploy, manage, and maintain SQL Server 2000. Delivered direct from the Microsoft SQL Server 2000 product team and Microsoft Consulting Services, ...

See more details below
Available through our Marketplace sellers.
Other sellers (Other Format)
  • All (10) from $1.99   
  • New (1) from $125.00   
  • Used (9) from $1.99   
Close
Sort by
Page 1 of 1
Showing All
Note: Marketplace items are not eligible for any BN.com coupons and promotions
$125.00
Seller since 2014

Feedback rating:

(188)

Condition:

New — never opened or used in original packaging.

Like New — packaging may have been opened. A "Like New" item is suitable to give as a gift.

Very Good — may have minor signs of wear on packaging but item works perfectly and has no damage.

Good — item is in good condition but packaging may have signs of shelf wear/aging or torn packaging. All specific defects should be noted in the Comments section associated with each item.

Acceptable — item is in working order but may show signs of wear such as scratches or torn packaging. All specific defects should be noted in the Comments section associated with each item.

Used — An item that has been opened and may show signs of wear. All specific defects should be noted in the Comments section associated with each item.

Refurbished — A used item that has been renewed or updated and verified to be in proper working condition. Not necessarily completed by the original manufacturer.

New
Brand new.

Ships from: acton, MA

Usually ships in 1-2 business days

  • Standard, 48 States
  • Standard (AK, HI)
Page 1 of 1
Showing All
Close
Sort by
Sending request ...

Overview

Microsoft SQL Server 2000 is the high-end, mission-critical relational database management system for rapidly building the next generation of scalable e-commerce, line-of-business, and data-warehousing solutions. "Microsoft SQL Server 2000 Resource Kit" gives database administrators and other IT professionals the definitive technical information and tools they need to deploy, manage, and maintain SQL Server 2000. Delivered direct from the Microsoft SQL Server 2000 product team and Microsoft Consulting Services, it includes special information about core functionality and new features of the product, plus a CD-ROM packed with unique tools and utilities to simplify SQL Server management. It also covers vital topics such as security, clustering and scalability, and Extensible Markup Language (XML). This powerhouse reference is an essential resource for every database administrator who seeks maximum performance from SQL Server 2000.

Microsoft SQL Server 2000 Resource Kit gives database administrators and other IT professionals the definitive technical information and tools they need to deploy, manage, and maintain SQL Server 2000. This powerhouse reference is an essential resource for every database administrator who seeks maximum performance from SQL Server 2000.

Read More Show Less

Editorial Reviews

From Barnes & Noble
The Barnes & Noble Review
If you've invested in Microsoft SQL Server 2000 (or are seriously considering it), you'll find the Microsoft SQL Server 2000 Resource Kit to be an absolutely essential companion.

First, there's complete lifecycle coverage for getting SQL Server 2000 deployed efficiently and keeping it running smoothly. That includes critical planning information, guidance on Microsoft's latest DBA tools, backup/recovery techniques, and Microsoft's best wisdom on optimization, troubleshooting, and security.

You'll also find advanced availability and scalability techniques, from clustering to data center best practices, as well as coverage of SQL Server 2000's improved (and thankfully simplified) replication capabilities. There's even coverage of using VB to create merge-replication custom conflict resolvers that apply your specific data or business decision rules.

You'll find extensive guidance on building data warehouses with SQL Server 2000: designing, partitioning, data extraction, transformation, loading, and Microsoft's souped-up Analysis Services. You'll also learn how to customize internal web portals with Microsoft's latest Digital Dashboard framework.

There's a CD-ROM full of tools and resources (including a complete eBook and excerpts in Pocket PC format), plus another disk containing a 120-day SQL Server 2000 trial version. No other SQL Server 2000 resource compares. (Bill Camarda)

Bill Camarda is a consultant and writer with nearly 20 years' experience in helping technology companies deploy and market advanced software, computing, and networking products and services. His 15 books include Special Edition Using Word 2000 and Upgrading & Fixing Networks For Dummies®, Second Edition.

Booknews
This book/CD-ROM kit offers tools, samples, and best practices for using this software. It includes information about core functionality and new features of SQL Server 2000, covering data warehousing, security, failover clustering, scalability, XML, and digital dashboards. There are sections on planning, database administration, availability, replication, analysis services, and performance tuning and security. Included on the two CD-ROMs is an evaluation copy of SQL Server 2000, a searchable copy of the text, material from SQL Server Books Online in eBook format, and sample code. Annotation c. Book News, Inc., Portland, OR (booknews.com)
Read More Show Less

Product Details

  • ISBN-13: 9780735612662
  • Publisher: Microsoft Press
  • Publication date: 4/28/2001
  • Series: Resource Kit Series
  • Edition description: BK&CD-ROM
  • Pages: 1164
  • Product dimensions: 7.38 (w) x 9.14 (h) x 2.08 (d)

Meet the Author

Founded in 1975, Microsoft Corporation (Nasdaq 'MSFT') is the worldwide leader in software for personal and business computing. The company offers a wide range of products and services designed to empower people through great software—any time, any place, and on any device.
Read More Show Less

Read an Excerpt

Chapter 17 Data Warehouse Design Considerations

Data warehouses support business decisions by collecting, consolidating, and organizing data for reporting and analysis with tools such as online analytical processing (OLAP) and data mining. Although data warehouses are built on relational database technology, the design of a data warehouse database differs substantially from the design of an online transaction processing system (OLTP) database.

The topics in this chapter address approaches and choices to be considered when designing and implementing a data warehouse. The chapter begins by contrasting data warehouse databases with OLTP databases and introducing OLAP and data mining, and then adds information about design issues to be considered when developing a data warehouse with Microsoft® SQL Server™ 2000.

Data Warehouses, OLTP, OLAP, and Data Mining

A relational database is designed for a specific purpose. Because the purpose of a data warehouse differs from that of an OLTP, the design characteristics of a relational database that supports a data warehouse differ from the design characteristics of an OLTP database.
Data warehouse database OLTP database
Designed for analysis of business measures by categories and attributes Designed for real-time business operations
Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table
Loaded with consistent, valid data; requires no real time validation Optimized for validation of incoming data during transactions; uses validation data tables
Supports few concurrent users relative to OLTP Supports thousands of concurrent users

A Data Warehouse Supports OLTP

A data warehouse supports an OLTP system by providing a place for the OLTP database to offload data as it accumulates, and by providing services that would complicate and degrade OLTP operations if they were performed in the OLTP database.

Without a data warehouse to hold historical information, data is archived to static media such as magnetic tape, or allowed to accumulate in the OLTP database.

If data is simply archived for preservation, it is not available or organized for use by analysts and decision makers. If data is allowed to accumulate in the OLTP so it can be used for analysis, the OLTP database continues to grow in size and requires more indexes to service analytical and report queries. These queries access and process large portions of the continually growing historical data and add a substantial load to the database. The large indexes needed to support these queries also tax the OLTP transactions with additional index maintenance. These queries can also be complicated to develop due to the typically complex OLTP database schema.

A data warehouse offloads the historical data from the OLTP, allowing the OLTP to operate at peak transaction efficiency. High volume analytical and reporting queries are handled by the data warehouse and do not load the OLTP, which does not need additional indexes for their support. As data is moved to the data warehouse, it is also reorganized and consolidated so that analytical queries are simpler and more efficient.

OLAP Is a Data Warehouse Tool

Online analytical processing (OLAP) is a technology designed to provide superior performance for ad hoc business intelligence queries. OLAP is designed to operate efficiently with data organized in accordance with the common dimensional model used in data warehouses.

A data warehouse provides a multidimensional view of data in an intuitive model designed to match the types of queries posed by analysts and decision makers. OLAP organizes data warehouse data into multidimensional cubes based on this dimensional model, and then preprocesses these cubes to provide maximum performance for queries that summarize data in various ways. For example, a query that requests the total sales income and quantity sold for a range of products in a specific geographical region for a specific time period can typically be answered in a few seconds or less regardless of how many hundreds of millions of rows of data are stored in the data warehouse database.

OLAP is not designed to store large volumes of text or binary data, nor is it designed to support high volume update transactions. The inherent stability and consistency of historical data in a data warehouse enables OLAP to provide its remarkable performance in rapidly summarizing information for analytical queries.

In SQL Server 2000, Analysis Services provides tools for developing OLAP applications and a server specifically designed to service OLAP queries.

Data Mining is a Data Warehouse Tool

Data mining is a technology that applies sophisticated and complex algorithms to analyze data and expose interesting information for analysis by decision makers. Whereas OLAP organizes data in a model suited for exploration by analysts, data mining performs analysis on data and provides the results to decision makers. Thus, OLAP supports model-driven analysis and data mining supports data-driven analysis.

Data mining has traditionally operated only on raw data in the data warehouse database or, more commonly, text files of data extracted from the data warehouse database. In SQL Server 2000, Analysis Services provides data mining technology that can analyze data in OLAP cubes, as well as data in the relational data warehouse database. In addition, data mining results can be incorporated into OLAP cubes to further enhance model-driven analysis by providing an additional dimensional viewpoint into the OLAP model. For example, data mining can be used to analyze sales data against customer attributes and create a new cube dimension to assist the analyst in the discovery of the information embedded in the cube data.

For more information and details about data mining in SQL Server 2000, see Chapter 24, "Effective Strategies for Data Mining."

Designing a Data Warehouse: Prerequisites

Before embarking on the design of a data warehouse, it is imperative that the architectural goals of the data warehouse be clear and well understood. Because the purpose of a data warehouse is to serve users, it is also critical to understand the various types of users, their needs, and the characteristics of their interactions with the data warehouse.

Data Warehouse Architecture Goals

A data warehouse exists to serve its users — analysts and decision makers. A data warehouse must be designed to satisfy the following requirements:
  • Deliver a great user experience — user acceptance is the measure of success
  • Function without interfering with OLTP systems
  • Provide a central repository of consistent data
  • Answer complex queries quickly
  • Provide a variety of powerful analytical tools such as OLAP and data mining

Most successful data warehouses that meet these requirements have these common characteristics:

  • Are based on a dimensional model
  • Contain historical data
  • Include both detailed and summarized data
  • Consolidate disparate data from multiple sources while retaining consistency
  • Focus on a single subject such as sales, inventory, or finance

Data warehouses are often quite large. However, size is not an architectural goal — it is a characteristic driven by the amount of data needed to serve the users.

Data Warehouse Users

The success of a data warehouse is measured solely by its acceptance by users. Without users, historical data might as well be archived to magnetic tape and stored in the basement. Successful data warehouse design starts with understanding the users and their needs.

Data warehouse users can be divided into four categories: Statisticians, Knowledge Workers, Information Consumers, and Executives. Each type makes up a portion of the user population as illustrated in this diagram....


Click to view graphic

Statisticians: There are typically only a handful of statisticians and operations research types in any organization. Their work can contribute to closed loop systems that deeply influence the operations and profitability of the company.

Knowledge Workers: A relatively small number of analysts perform the bulk of new queries and analyses against the data warehouse. These are the users who get the Designer or Analyst versions of user access tools. They will figure out how to quantify a subject area. After a few iterations, their queries and reports typically get published for the benefit of the Information Consumers. Knowledge Workers are often deeply engaged with the data warehouse design and place the greatest demands on the ongoing data warehouse operations team for training and support.

Information Consumers: Most users of the data warehouse are Information Consumers; they will probably never compose a true ad hoc query. They use static or simple interactive reports that others have developed. They usually interact with the data warehouse only through the work product of others. This group includes a large number of people, and published reports are highly visible. Set up a great communication infrastructure for distributing information widely, and gather feedback from these users to improve the information sites over time.

Executives: Executives are a special case of the Information Consumers group.

How Users Query the Data Warehouse

Information for users can be extracted from the data warehouse relational database or from the output of analytical services such as OLAP or data mining. Direct queries to the data warehouse relational database should be limited to those that cannot be accomplished through existing tools, which are often more efficient than direct queries and impose less load on the relational database.

Reporting tools and custom applications often access the database directly. Statisticians frequently extract data for use by special analytical tools. Analysts may write complex queries to extract and compile specific information not readily accessible through existing tools. Information consumers do not interact directly with the relational database but may receive e-mail reports or access web pages that expose data from the relational database. Executives use standard reports or ask others to create specialized reports for them.

When using the Analysis Services tools in SQL Server 2000, Statisticians will often perform data mining, Analysts will write MDX queries against OLAP cubes and use data mining, and Information Consumers will use interactive reports designed by others....

Read More Show Less

Table of Contents



PART 1 INTRODUCING SQL SERVER 2000 AND THIS RESOURCE KIT Page 1
CHAPTER 1 Introducing the SQL Server 2000 Resource Kit Page 3
 Inside the Resource Kit Page 3
 Additional Sources of Information Page 10
  SQL Server 2000 Product Documentation Page 10
  SQL Server 2000 Internet Sites Page 11
 Conventions Used in This Resource Kit Page 11
 Resource Kit Support Policy Page 11
CHAPTER 2 New Features in SQL Server 2000 Page 13
 Relational Database Enhancements Page 13
 XML Integration of Relational Data Page 18
 Graphical Administration Enhancements Page 19
 Replication Enhancements Page 20
 Data Transformation Services Enhancements Page 24
 Analysis Services Enhancements Page 25
  Cube Enhancements Page 25
  Dimension Enhancements Page 28
  Data Mining Enhancements Page 29
  Security Enhancements Page 31
  Client Connectivity Enhancements in PivotTable Service Page 32
  Other Enhancements Page 32
 Meta Data Services Enhancements Page 34
  Meta Data Browser Enhancement Page 34
  XML Encoding Enhancements Page 34
  Repository Engine Programming Enhancements Page 35
  Repository Engine Modeling Enhancements Page 37
 English Query Enhancements Page 40
 Documentation Enhancements Page 42
PART 2 PLANNING Page 45
CHAPTER 3 Choosing an Edition of SQL Server 2000 Page 47
 Introduction Page 47
 SQL Server 2000 Server Editions Explained Page 48
  SQL Server 2000 Enterprise Edition Page 48
   Scalability Requirements Page 49
   Availability/Uptime Page 49
   Performance Page 49
   Advanced Analysis Page 50
  SQL Server 2000 Standard Edition Page 50
 SQL Server 2000 Editions for Special Uses Page 51
  SQL Server 2000 Personal Edition Page 51
  SQL Server 2000 Developer Edition Page 51
  SQL Server 2000 Evaluation Edition Page 52
  SQL Server 2000 Windows CE Edition Page 52
  SQL Server 2000 Desktop Engine Page 53
 Obtaining SQL Server 2000 Page 54
 Conclusion Page 55
CHAPTER 4 Choosing How to License SQL Server Page 57
 Licensing Model Changes Page 57
 What is a Processor License? Page 58
 Upgrades Page 58
 Choosing a Licensing Model Page 59
  Mixed License Environments Page 60
  Licensing for a Failover Cluster Configuration Page 60
  Licensing for a Multi-Instance Configuration Page 60
   Licensing in Multi-Tier Environments (Including Multiplexing or Pooling) Page 61
  SQL Server 2000 Personal Edition Licensing Page 61
  SQL Server 2000 Desktop Engine Licensing Page 61
 Switching Licenses Page 62
CHAPTER 5 Migrating Access 2000 Databases to SQL Server 2000 Page 63
 Migration Options Page 64
 Before You Migrate Page 64
 Migration Tools Page 65
  Upsizing Wizard Page 65
  SQL Server Tools Used in Migrations Page 66
   SQL Server Enterprise Manager Page 66
   Data Transformation Services (DTS) Page 66
   SQL Query Analyzer Page 67
   SQL Profiler Page 67
 Moving Data Page 67
 Migrating Access Queries Page 68
  Limitations in Upsizing Queries Page 69
  Migrating Access Queries into User–Defined Functions Page 71
  Migrating Access Queries into Stored Procedures and Views Page 71
   Converting Make-Table and Crosstab Queries Page 72
  Migrating Access Queries into Transact-SQL Scripts Page 73
 Additional Design Considerations for Queries Page 73
 Verifying SQL Server–Compliant Syntax Page 75
  Access and SQL Server Syntax Page 76
  Visual Basic Functions Page 78
  Access and SQL Server Data Types Page 79
 Migrating Your Applications Page 80
  Creating a Client/Server Application Page 80
   Converting Code Page 80
   Forms Page 81
  Optimizing the Application for the Client/Server Environment Page 81
  Optimizing Data Structure Page 82
CHAPTER 6 Migrating Sybase Databases to SQL Server 2000 Page 83
 Why Migrate to SQL Server 2000? Page 83
 Understanding the Migration Process Page 86
 Reviewing Architectural Differences Page 87
 Migrating Tables and Data Page 90
 Reviewing the Differences Between Sybase T-SQL and Transact-SQL Page 91
  Transaction Management Page 91
   ROLLBACK Triggers Page 91
   Chained Transactions Page 91
   Transaction Isolation Levels Page 92
   Cursors Page 93
   Cursor Error Checking Page 93
   Index Optimizer Hints Page 94
   Optimizer Hints for Locking Page 94
   Server Roles Page 94
   Raising Errors Page 96
   PRINT Page 96
   Partitioned Tables vs. Row Locking Page 96
  Join Syntax Page 98
  Subquery Behavior Page 98
  Grouping Results Page 99
  System Stored Procedures Page 99
   DUMP/LOAD Page 100
 Understanding Database Administration Differences Page 101
 Migration Checklist Page 103
CHAPTER 7 Migrating Oracle Databases to SQL Server 2000 Page 105
  Target Audience Page 105
 Overview Page 105
  SQL Language Extensions Page 106
  ODBC Page 106
  OLE DB Page 107
  Organization of This Chapter Page 107
 Architecture and Terminology Page 108
  Definition of Database Page 108
  Database System Catalogs Page 109
  Physical and Logical Storage Structures Page 110
  Striping Data Page 110
  Transaction Logs and Automatic Recovery Page 111
  Backing Up and Restoring Data Page 112
  Networks Page 113
  Database Security and Roles Page 114
   Database File Encryption Page 114
   Network Security Page 114
   Login Accounts Page 114
   Groups, Roles, and Permissions Page 115
   Database Users and the guest Account Page 115
   sysadmin Role Page 116
   db_owner Role Page 117
 Defining Database Objects Page 117
  Database Object Identifiers Page 119
  Qualifying Table Names Page 119
  Creating Tables Page 121
  Table and Index Storage Parameters Page 122
  Creating Tables With SELECT Statements Page 122
  Views Page 123
  Indexes Page 125
   Clustered Indexes Page 125
   Nonclustered Indexes Page 127
   Index Syntax and Naming Page 127
   Index Data Storage Parameters Page 128
   Ignoring Duplicate Keys Page 129
   Indexes on Computed Columns Page 129
  Using Temporary Tables Page 129
  Data Types Page 130
   Using Unicode Data Page 131
   User-Defined Data Types Page 132
   SQL Server timestamp Columns Page 132
  Object-Level Permissions Page 133
 Enforcing Data Integrity and Business Rules Page 134
  Entity Integrity Page 135
   Naming Constraints Page 135
   Primary Keys and Unique Columns Page 135
   Adding and Removing Constraints Page 136
   Generating Unique Values Page 138
  Domain Integrity Page 139
   DEFAULT and CHECK Constraints Page 139
   Nullability Page 140
  Referential Integrity Page 141
   Foreign Keys Page 142
  User-Defined Integrity Page 143
   Stored Procedures Page 143
   Delaying the Execution of a Stored Procedure Page 145
   Specifying Parameters in a Stored Procedure Page 146
   Triggers Page 146
 Transactions, Locking, and Concurrency Page 149
  Transactions Page 149
  Locking and Transaction Isolation Page 151
  Dynamic Locking Page 152
  Changing Default Locking Behavior Page 152
  SELECT…FOR UPDATE Page 154
  Explicitly Requesting Table-Level Locks Page 154
  Handling Deadlocks Page 155
  Remote Transactions Page 156
  Distributed Transactions Page 156
  Two-Phase Commit Processing Page 157
 SQL Language Support Page 157
  SELECT and Data Manipulation Statements Page 157
   SELECT Statements Page 158
   INSERT Statements Page 159
   UPDATE Statements Page 160
   DELETE Statements Page 162
   TRUNCATE TABLE Statement Page 163
   Manipulating Data in Identity and timestamp Columns Page 163
   Locking Requested Rows Page 164
   Row Aggregates and the Compute Clause Page 164
   Join Clauses Page 164
   Using SELECT Statements as Table Names Page 166
   Reading and Modifying BLOBs Page 166
  Functions Page 167
   Number/Mathematical Functions Page 167
   Character Functions Page 168
   Date Functions Page 169
   Conversion Functions Page 170
   Other Row-Level Functions Page 170
   Aggregate Functions Page 171
   Conditional Tests Page 171
   Converting Values to Different Data Types Page 172
   User-Defined Functions Page 174
  Comparison Operators Page 175
   Pattern Matches Page 176
   Using NULL in Comparisons Page 177
   String Concatenation Page 177
  Control-of-Flow Language Page 177
   Keywords Page 178
   Declaring Variables Page 179
   Assigning Variables Page 179
   Statement Blocks Page 180
   Conditional Processing Page 181
   Repeated Statement Execution (Looping) Page 181
   GOTO Statement Page 182
   PRINT Statement Page 182
   Returning from Stored Procedures Page 182
   Raising Program Errors Page 183
 Implementing Cursors Page 184
  Cursor Syntax Page 184
  Declaring a Cursor Page 185
  Opening a Cursor Page 186
  Fetching Data Page 186
  CURRENT OF Clause Page 187
  Closing a Cursor Page 187
  Cursor Example Page 187
 Tuning Transact–SQL Statements Page 188
 Using XML Page 190
 Using ODBC Page 190
  Recommended Conversion Strategy Page 191
  ODBC Architecture Page 191
  Forward-Only Cursors Page 192
  Server Cursors Page 193
  Scrollable Cursors Page 194
  Strategies for Using SQL Server Default Result Sets and Server Cursors Page 195
  Multiple Active Statements (hstmt) per Connection Page 196
  Data Type Mappings Page 196
  ODBC Extended SQL Page 198
  Outer Joins Page 198
  Date, Time, and Timestamp Values Page 199
  Calling Stored Procedures Page 199
  Native SQL Translation Page 200
  Manual Commit Mode Page 200
 Developing and Administering Database Replication Page 201
  ODBC, OLE/DB, and Replication Page 202
 Migrating Your Data and Applications Page 203
  Data Migration Using DTS Page 203
  Oracle Call Interface (OCI) Page 204
  Embedded SQL Page 205
  Developer 2000 and Third-Party Applications Page 208
  Internet Applications Page 209
PART 3 DATABASE ADMINISTRATION Page 211
CHAPTER 8 Managing Database Change Page 213
 Preparing for a Changing Environment Page 213
  Conflicting Goals Page 214
  Managing the Development Environment Page 215
   Development Database Process Page 215
   Control: Helping or Hindering? Page 216
   Duplication of the Production Database Page 219
   Security Page 219
   Using Command Line Scripts for Implementation Page 220
   Expecting the Unexpected During Implementation Page 224
  Managing the QA Environment Page 225
   Implementing in QA Page 225
   QA Administration Page 226
  Managing Production Implementations Page 227
   Owning the Change: Production vs. DBA Page 228
   When a Good Plan Comes Together Page 229
  Conclusion Page 231
   Further Reading Page 231
CHAPTER 9 Storage Engine Enhancements Page 233
  Storage Engine Enhancements Page 234
  Interacting with Data Page 237
   Reading Data More Effectively Page 238
   Concurrency Page 239
  Tables and Indexes Page 241
   In-Row Text Page 241
   New Data Types Page 242
   Indexes Page 242
  Logging and Recovery Page 244
   Recovery Models Page 246
  Administrative Improvements Page 249
   Dynamic Tuning Page 251
  Data Storage Components Page 252
   Files, Filegroups, and Disks Page 253
  Innovation and Evolution Page 254
CHAPTER 10 Implementing Security Page 255
 Introduction Page 255
 New Security Features Page 255
  Secure Setup Page 255
  C2 Security Evaluation Completed Page 256
  Kerberos and Delegation in Windows 2000 Environments Page 256
  Security Auditing Page 257
  Elimination of the SQLAgentCmdExec Proxy Account Page 258
  Server Role Enhancements Page 259
  Encryption Page 259
   Network Encryption Using SSL/TLS Page 259
   Encrypted File System Support on Windows 2000 Page 260
   Server-Based Encryption Enhanced Page 260
   DTS Package Encryption Page 261
  Password Protection Page 261
   Backups and Backup Media Sets Page 261
   SQL Server Enterprise Manager Page 261
   Service Account Changes Using SQL Server Enterprise Manager Page 261
  SUID Column Page 261
 Security Model Page 262
  Authentication Modes Page 263
  Using SIDs Internally Page 263
  Roles Page 264
   Public Role Page 264
   Predefined Roles Page 264
   User-Defined Roles Page 266
   Application Roles Page 266
  Securing Access to the Server Page 269
  Securing Access to the Database Page 273
   User-Defined Database Roles Page 274
   Permissions System Page 276
   Granting and Denying Permissions to Users and Roles Page 276
   Ownership Chains Page 279
 Implementation of Server-Level Security Page 280
   Use of SIDs Page 280
   Elimination of SUIDs Page 280
   Generation of GUIDs for Non-Trusted Users Page 281
   Renaming Windows User or Group Accounts Page 281
   sysxlogins System Table Page 281
 Implementation of Object-Level Security Page 284
  How Permissions Are Checked Page 284
   Cost of Changing Permissions Page 285
   Changes to Windows User or Group Account Names Page 285
   sysprocedures System Table Removed Page 286
   WITH GRANT OPTION Page 286
   sysusers System Table Page 286
   sysmembers System Table Page 287
   syspermissions System Table Page 287
   sysprotects System Table Page 288
  Named Pipes and Multiprotocol Permissions Page 288
 Upgrading from SQL Server 7.0 Page 289
 Upgrading from SQL Server 6.5 Page 289
  Upgrade Process Page 289
   Analyzing the Upgrade Output Page 290
   Preparing the SQL Server 6.5 Security Environment Page 291
 Setting Up a Secure SQL Server 2000 Installation Page 292
  Service Accounts Page 293
  File System Page 295
  Registry Page 296
  Auditing Page 296
  Profiling for Auditing Page 297
  Backup and Restore Page 298
   Security of Backup Files and Media Page 298
   Restoring to Another Server Page 298
   Attaching and Detaching Database Files Page 300
  General Windows Security Configurations Page 300
   Additional Resources Page 301
CHAPTER 11 Using BLOBs Page 303
 Designing BLOBs Page 304
  BLOB Storage in SQL Server Page 304
  Learning from the TerraServer Design and Implementation Page 312
  BLOBs in Special Operations Page 315
 Implementing BLOBs Page 316
  BLOBs on the Server Page 318
  BLOBs on the Client Page 325
 Working with BLOBs in SQL Server Page 336
PART 4 AVAILABILITY Page 337
CHAPTER 12 Failover Clustering Page 339
 Enhancements to Failover Clustering Page 339
 Windows Clustering Page 340
  Microsoft Cluster Service Components Page 341
   Hardware Page 341
   Operating System Page 342
   Virtual Server Page 343
   SQL Server 2000 Page 343
   Components Page 343
   Instances of SQL Server Page 344
   How SQL Server 2000 Failover Clustering Works Page 346
 Configuring SQL Server 2000 Failover Cluster Servers Page 347
  Software Requirements Page 347
   Memory Page 348
   Networking Page 351
   Location Page 352
   Hardware Compatibility List Page 352
  Configuration Worksheets Page 352
 Implementing SQL Server 2000 Failover Clustering Page 353
  Prerequisites Page 354
  Installation Order Page 355
  Creating the MS DTC Resources (Windows NT 4.0, Enterprise Edition Only) Page 356
  Best Practices Page 357
   Using More IP Addresses Page 357
   Configuring Node Failover Preferences Page 358
   Memory Configuration Page 359
   Using More Than Two Nodes Page 364
   Failover/Failback Strategies Page 366
 Maintaining a SQL Server 2000 Failover Cluster Page 367
  Backing Up and Restoring Page 367
   Backing Up to Disk Page 368
   Backing Up to Tape Page 368
   Snapshot Backups Page 368
   Backing Up an Entire Clustered System Page 368
  Ensuring a Virtual Server Will Not Fail Due to Other Service Failures Page 369
  Adding, Changing, or Updating a TCP/IP Address Page 369
  Adding or Removing a Cluster Node from the Virtual Server Definition Page 370
 Troubleshooting SQL Server 2000 Failover Clusters Page 371
 Finding More Information Page 372
CHAPTER 13 Log Shipping Page 373
 How Log Shipping Works Page 373
  Components Page 373
   Database Tables Page 374
   Stored Procedures Page 375
   log_shipping_monitor_probe User Page 376
  Log Shipping Process Page 376
   Bringing a Secondary Server Online as a Primary Page 378
 Configuring Log Shipping Page 378
  Keeping the Data in Sync Page 378
  Servers Page 379
   Location Page 379
   Connectivity Page 380
  Keeping Old Transaction Log Files Page 380
  Thresholds Page 380
  Installation Considerations Page 381
  Preparation Worksheet Page 382
 Log Shipping Tips and Best Practices Page 384
  Secondary Server Capacity Page 384
  Generating Database Backups from the Secondary Page 385
  Keeping Logins in Sync Page 385
  Monitoring Log Shipping Page 385
  Modifying or Removing Log Shipping Page 385
  Log Shipping Interoperability Between SQL Server 7.0 and SQL Server 2000 Page 386
  Using the Log Shipped Database to Check the Health of the Production Database Page 386
  Using the Log Shipped Database for Reporting Page 387
  Combining Log Shipping and Snapshot Backups Page 387
  Terminating User Connections in the Secondary Database Page 387
  Warm Standby Role Change Page 388
  Failback to Primary Page 388
   Network Load Balancing and Log Shipping Page 389
  Log Shipping and Replication Page 389
  Log Shipping and Application Code Page 390
  Log Shipping and Failover Clustering Page 390
  Monitor Server Page 390
  Using Full-Text Search with a Log Shipped Database Page 390
 Troubleshooting Page 391
CHAPTER 14 Data Center Availability: Facilities, Staffing, and Operations Page 393
 Data Centers Page 393
 Facility and Equipment Requirements Page 394
  The Data Center Facility Page 394
  Data Center Hardware Page 396
  Data Communication Within the Data Center Page 397
 Staffing Recommendations Page 397
 Operational Guidelines Page 402
  General Operations Page 402
   Quality Assurance Page 402
   Change Control Page 402
   Emergency Preparedness Page 403
  SQL Server Operations Page 404
   Security Page 404
   Monitoring Page 405
   Backup and Recovery Page 408
   Maintenance Page 408
 Application Service Providers Page 409
 Summary Page 410
CHAPTER 15 High Availability Options Page 413
 The Importance of People, Policies, and Processes Page 413
  Are There Any 100 Percent Solutions? Page 414
  Meeting High Uptime Page 414
  Uptime Solutions and Risk Management Page 414
  People: The Best Solution Page 415
   Roles of DBAs? Page 415
  The Essentials of an Operations Plan Page 416
  Planning Redundancy Page 416
  Segmenting Your Solutions Page 417
  Manual Procedures Page 417
  Increased Corporate Awareness: The Importance of Communication Page 417
  High Availability and Mobile and Disconnected Devices Page 418
 The Technical Side of High Availability Page 418
  Hardware Alternatives Page 418
   Disk Drives Page 419
   RAID Page 419
   SANS Page 420
   Disk Configuration Page 420
   RAID Solutions Page 420
  Software Alternatives Page 422
   Windows Clustering and SQL Server 2000 Failover Clustering Page 422
   Cluster Option 1 – Shared Disk Backup Page 425
   Cluster Option 2 – Snapshot Backup Page 425
   Option 3 – Failover Clustering Page 426
   Detail Configuration Showing Database Placement Page 427
   Network Load Balancing Page 428
  SQL Server Alternatives Page 429
   Database Maintenance and Availability Page 430
   Backup and Restore Page 430
   Two-Phase Commit Page 431
   Replication Page 432
   Replication: Immediate Updating with Queued Updating as a Failover Page 434
   Log Shipping Page 435
   Message Queuing Page 436
  Combining SQL Server Solutions Page 437
   Server Clusters, Hardware Mirroring, and Replication Page 438
   Log Shipping with Network Load Balancing Page 438
 Conclusion Page 441
CHAPTER 16 Five Nines: The Ultimate in High Availability Page 443
 Determine Your Desired Level of Nines Page 443
 Achieving High Availability with SQL Server 2000 Page 444
  Application Design Page 444
  Underlying Hardware and Software Page 446
   Choosing the Right High Availability Technology for Your Environment Page 446
   Designing Hardware for High Availability Page 451
 Creating a Disaster Recovery Plan Page 457
  Preparing Your Environment Page 457
  The Failover Plan Page 460
  The Failback Plan Page 460
  Personnel Page 461
  Creating a Run Book Page 461
  Testing the Plan Page 463
 Diagnosing a Failure Page 463
 High Availability Scenarios Page 464
  Corporate Web Site with Dynamic Content, no E–Commerce Page 464
  E-Commerce Web Site Page 466
  Partitioned Database Page 469
  Small Company Page 470
 Conclusion Page 471
PART 5 DATA WAREHOUSING Page 473
CHAPTER 17 Data Warehouse Design Considerations Page 475
 Data Warehouses, OLTP, OLAP, and Data Mining Page 475
   A Data Warehouse Supports OLTP Page 476
   OLAP Is a Data Warehouse Tool Page 476
   Data Mining is a Data Warehouse Tool Page 477
 Designing a Data Warehouse: Prerequisites Page 477
   Data Warehouse Architecture Goals Page 477
   Data Warehouse Users Page 478
   How Users Query the Data Warehouse Page 479
 Developing a Data Warehouse: Details Page 479
  Identify and Gather Requirements Page 480
  Design the Dimensional Model Page 480
   Dimensional Model Schemas Page 482
   Dimension Tables Page 485
   Fact Tables Page 494
  Develop the Architecture Page 497
  Design the Relational Database and OLAP Cubes Page 498
  Develop the Operational Data Store Page 500
  Develop the Data Maintenance Applications Page 500
  Develop Analysis Applications Page 501
  Test and Deploy the System Page 501
 Conclusion Page 501
CHAPTER 18 Using Partitions in a SQL Server 2000 Data Warehouse Page 503
 Using Partitions in a SQL Server 2000 Relational Data Warehouse Page 504
  Advantages of Partitions Page 504
   Data Pruning Page 504
   Load Speed Page 505
   Maintainability Page 505
   Query Speed Page 505
  Disadvantages of Partitions Page 505
   Complexity Page 505
   Query Design Constraints Page 505
  Design Considerations Page 506
   Overview of Partition Design Page 506
   Sample Syntax Page 508
   Apply Conditions Directly to the Fact Table Page 509
   Choice of Partition Key(s) Page 510
   Naming Conventions Page 511
   Partitioning for Downstream Cubes Page 511
  Managing the Partitioned Fact Table Page 511
   Meta Data Page 512
   Creating New Partitions Page 512
   Populating the Partitions Page 513
   Defining the UNION ALL View Page 513
   Merging Partitions Page 513
 Using Partitions in SQL Server 2000 Analysis Services Page 514
  Advantages of Partitions Page 514
   Query Performance Page 514
   Pruning Old Data Page 515
   Maintenance Page 515
   Load Performance Page 515
  Disadvantages of Partitions Page 516
   Complexity Page 516
   Meta Data Operations Page 516
  Design Considerations Page 516
   Overview of Partitions Page 516
   Slices and Filters Page 517
   Advanced Slices and Filters Page 518
   Aligning Partitions Page 519
   Storage Modes and Aggregation Plans Page 519
  Managing the Partitioned Cube Page 519
   Create New Partitions Page 520
   Data Integrity Page 521
   Processing Partitions Page 521
   Merging Partitions Page 522
   Rolling Off Old Partitions Page 523
 Conclusions Page 523
 For More Information Page 524
 VBScript Code Example for Cloning a Partition Page 524
CHAPTER 19 Data Extraction, Transformation, and Loading Techniques Page 529
 Introduction Page 529
 ETL Functional Elements Page 530
  Extraction Page 530
  Transformation Page 531
  Loading Page 532
  Meta Data Page 532
 ETL Design Considerations Page 533
 ETL Architectures Page 534
  Homogenous Architecture Page 534
  Heterogeneous Architecture Page 535
 ETL Development Page 535
  Identify and Map Data Page 536
   Identify Source Data Page 536
   Identify Target Data Page 536
   Map Source Data to Target Data Page 536
  Develop Functional Elements Page 537
   Extraction Page 537
   Transformation Page 537
   Loading Page 538
   Meta Data Logging Page 538
   Common Tasks Page 538
 SQL Server 2000 ETL Components Page 539
 The ETL Staging Database Page 539
  Server Configuration Page 541
   RAID Page 541
   Server Configuration Options (sp_configure) Page 541
  Database Configuration Page 541
   Data File Growth Page 541
   Database Configuration Options Page 542
 Managing Surrogate Keys Page 543
 ETL Code Examples Page 543
  Tables for Code Examples Page 544
   Define Example Tables Page 545
   Populate Example Tables Page 546
  Inserting New Dimension Records Page 547
  Managing Slowly Changing Dimensions Page 548
   Type 1: Overwrite the Dimension Record Page 549
   Type 2: Add a New Dimension Record Page 550
  Managing the Fact Table Page 551
  Advanced Techniques Page 558
  Meta Data Logging Page 562
   Job Audit Page 562
   Step Audit Page 564
   Error Tracking Page 566
   Code Sample: Job Audit Page 566
   Code Sample: Step Audit Page 569
   Code Sample: Error Tracking Page 571
 Conclusion Page 573
CHAPTER 20 RDBMS Performance Tuning Guide for Data Warehousing Page 575
 Introduction Page 575
 Basic Principles of Performance Tuning Page 576
  Managing Performance Page 576
  Take Advantage of SQL Server Performance Tools Page 577
  Configuration Options That Impact Performance Page 577
   max async IO Page 577
   Database Recovery Models Page 578
   Multi-Instance Considerations Page 579
   Extended Memory Support Page 580
   Windows 2000 Usage Considerations Page 580
   SQL Server 2000 Usage Considerations Page 581
 Optimizing Disk I/O Performance Page 584
  Optimizing Transfer Rates Page 584
  RAID Page 586
 Partitioning for Performance Page 595
  Objects For Partitioning Consideration Page 597
  Parallel Data Retrieval Page 600
  Optimizing Data Loads Page 602
   Choosing an Appropriate Database Recovery Model Page 602
   Using bcp, BULK INSERT, or the Bulk Copy APIs Page 603
   Controlling the Locking Behavior Page 604
   Loading Data in Parallel Page 604
   Loading Pre-Sorted Data Page 607
   Impact of FILLFACTOR and PAD_INDEX on Data Loads Page 607
   General Guidelines for Initial Data Loads Page 607
   General Guidelines for Incremental Data Loads Page 608
  Indexes and Index Maintenance Page 608
   Types of Indexes in SQL Server Page 608
   How Indexes Work Page 609
   Index Intersection Page 610
   Index Architecture In SQL Server Page 611
   Clustered Indexes Page 611
   Nonclustered Indexes Page 613
   Unique Indexes Page 615
   Indexes on Computed Columns Page 615
   Indexed Views Page 617
   Covering Indexes Page 619
   Index Selection Page 619
   Index Creation and Parallel Operations Page 620
   Index Maintenance Page 621
  SQL Server Tools for Analysis and Tuning Page 625
   Sample Data and Workload Page 625
   SQL Profiler Page 626
   SQL Query Analyzer Page 630
  System Monitoring Page 636
   Key Performance Counters to Watch Page 639
  Understanding SQL Server Internals Page 644
   Worker Threads Page 644
   Lazy Writer Page 645
   Checkpoint Page 645
   Log Manager Page 646
   Read-Ahead Management Page 647
  Miscellaneous Performance Topics Page 648
   Database Design Using Star and Snowflake Schemas Page 648
   Use Equality Operators in Transact-SQL Queries Page 648
   Reduce Rowset Size and Communications Overhead Page 649
   Reusing Execution Plans Page 650
   Maintaining Statistics on Columns Page 652
 Finding More Information Page 653
CHAPTER 21 Monitoring the DTS Multiphase Data Pump in Visual Basic Page 655
  Exposing the Multiphase Data Pump Page 655
   Programming Interfaces Page 655
   Package Execution Context Page 656
   Troubleshooting the Data Pump Page 656
  Multiphase Data Pump Review Page 656
   Basic Multiphase Data Pump Process Page 657
   Transformation Status Page 657
   Multiphase Data Pump Phases Page 657
   Properties that Impact Phases Page 661
  Sample Monitoring Solution Page 661
   Solution Architecture Page 662
   COM+ Event Class: MonitorDTSEvents.DLL Page 662
   Publisher Application: MonitorDTS.DLL Page 664
   Subscriber Application: MonitorDTSWatch.EXE Page 666
   DTS Package: MonitorDTS Sample.DTS Page 668
   Executing the Solution Page 672
PART 6 ANALYSIS SERVICES Page 673
CHAPTER 22 Cubes in the Real World Page 675
 Design Fundamentals Page 675
   Data Explosion Page 675
   Sparsity Page 676
 Designing Dimensions Page 678
  Initial Design Questions Page 679
   Star Schema or Snowflake Schema? Page 679
   Shared or Private? Page 681
  Dimension Varieties Page 683
   Changing Dimensions Page 683
   Virtual Dimensions Page 684
   Parent-Child Dimensions Page 685
  Dimension Characteristics Page 686
   Dimension Hierarchies Page 686
   Levels and Members Page 689
   Member Properties Page 690
   Real-time OLAP Page 691
   Dimension Security Page 692
  Dimension Storage and Processing Page 692
   Dimension Storage Page 692
   Dimension Processing Page 693
 Designing Cubes Page 694
  Cube Varieties Page 695
   Regular Cubes Page 695
   Virtual Cubes Page 696
   Linked Cubes Page 696
   Distributed Partitioned Cubes Page 697
   Real-Time Cubes Page 697
   Offline Cubes Page 698
   Caching and Cubes Page 698
  Cube Characteristics Page 699
   Partitions Page 699
   Aggregations Page 701
   Measures Page 702
   Calculated Cells Page 704
   Actions Page 705
   Named Sets Page 706
   Cell Security Page 706
  Cube Storage and Processing Page 707
   Cube Storage Page 707
   Cube Processing Page 707
CHAPTER 23 Business Case Solutions Using MDX Page 711
 General Questions Page 712
  How Can I Retrieve Results from Different Cubes? Page 712
  How Can I Perform Basic Basket Analysis? Page 713
  How Can I Perform Complex String Comparisons? Page 715
  How Can I Show Percentages as Measures? Page 716
  How Can I Show Cumulative Sums as Measures? Page 717
  How Can I Implement a Logical AND or OR Condition in a WHERE Clause? Page 719
  How Can I Use Custom Member Properties in MDX? Page 721
 Navigation Questions Page 722
  How Can I Drill Down More Than One Level Deep, or Skip Levels When Drilling Down? Page 723
  How Do I Get the Topmost Members of a Level Broken Out by an Ancestor Level? Page 724
 Manipulation Questions Page 727
  How Can I Rank or Reorder Members? Page 727
  How Can I Use Different Calculations for Different Levels in a Dimension? Page 728
  How Can I Use Different Calculations for Different Dimensions? Page 731
 Date and Time Questions Page 733
  How Can I Use Date Ranges in MDX? Page 733
  How Can I Use Rolling Date Ranges in MDX? Page 734
  How Can I Use Different Calculations for Different Time Periods? Page 736
  How Can I Compare Time Periods in MDX? Page 738
CHAPTER 24 Effective Strategies for Data Mining Page 741
 Introduction Page 741
 The Data Mining Process Page 746
  Data Selection Page 747
  Data Cleaning Page 750
  Data Enrichment Page 751
  Data Transformation Page 752
  Training Case Set Preparation Page 752
  Data Mining Model Construction Page 754
   Model-Driven and Data-Driven Data Mining Page 755
   Data Mining Algorithm Provider Selection Page 757
   Creating Data Mining Models Page 761
   Training Data Mining Models Page 762
  Data Mining Model Evaluation Page 764
   Visualizing Data Mining Models Page 768
  Data Mining Model Feedback Page 770
   Predicting with Data Mining Models Page 770
CHAPTER 25 Getting Data to the Client Page 777
 Developing Analysis Services Client Applications Page 777
  Working with Data Page 778
   Data and PivotTable Service Page 778
   Data and ActiveX Data Objects Page 781
   Data and ActiveX Data Objects (Multidimensional) Page 783
  Working with Meta Data Page 785
   Meta Data and Decision Support Objects Page 785
   Meta Data and PivotTable Service Page 789
   Meta Data and OLE DB Page 789
   Meta Data and ActiveX Data Objects Page 790
   Meta Data and ActiveX Data Objects (Multidimensional) Page 792
 Using the Internet with Analysis Services Page 794
CHAPTER 26 Performance Tuning Analysis Services Page 797
 Introduction Page 797
  Why Use OLAP? Page 797
 Architecture Page 799
  Overview Page 799
  Memory Management Page 800
   Server Memory Management Page 800
   Client Cache Management Page 805
  Thread Management Page 806
   Server Thread Management Page 806
   Client Thread Management Page 809
  Processing Interaction Page 810
  Querying Interaction Page 812
 Improving Overall Performance Page 813
  Hardware Configuration Page 813
   Processors Page 813
   Memory Page 813
   Disk Storage Page 814
  Dimension and Cube Design Page 815
  Storage Mode Selection Page 815
  Aggregation Design Page 817
  Schema Optimization Page 818
  Partition Strategy Page 818
 Improving Processing Performance Page 820
  Processing Options Page 820
  Memory Requirements Page 821
  Storage Requirements Page 822
 Improving Querying Performance Page 822
  Memory Requirements Page 822
  Usage Analysis and Aggregation Design Page 823
 Evaluating Performance Page 825
  Analysis Services Performance Counters Page 825
   Analysis Server:Agg Cache Page 826
   Analysis Server:Connection Page 827
   Analysis Server:Last Query Page 827
   Analysis Server:Locks Page 828
   Analysis Server:Proc Page 829
   Analysis Server:Proc Aggs Page 830
   Analysis Server:Proc Indexes Page 831
   Analysis Server:Query Page 831
   Analysis Server:Query Dims Page 832
   Analysis Server:Startup Page 833
  System Performance Counters Page 833
   Memory Page 833
   Network Interface Page 834
   PhysicalDisk Page 834
   Process Page 835
   Processor Page 836
   System Page 836
PART 7 DIGITAL DASHBOARDS Page 837
CHAPTER 27 Creating an Interactive Digital Dashboard Page 839
 Introduction Page 839
  About the Code Samples Page 840
 Required Software Page 841
  SQL Server 2000 Page 841
  Windows 2000 Page 842
  Internet Explorer 5.X Page 842
  Digital Dashboard Resource Kit (DDRK) Page 842
  Downloading and Installing the DDRK and SQL Server Sample Digital Dashboard Page 842
 Setting Up Page 843
  Download the Code Samples Page 843
  Create Physical and Virtual Directories for Your HTM and HTC Files Page 843
  Create Physical and Virtual Directories for Your XML and XSL Files Page 844
  Copy and Edit the HTM and HTC Files Page 844
 Building the Dashboard Page 845
  Defining the Dashboard Page 845
  Defining the Customer List Web Part Page 846
  Defining the Order Chart Web Part Page 846
  Testing the Dashboard Page 847
  Reviewing the Code Samples Page 847
   Customerlist.htm Page 848
   Customerlist.xml Page 849
   Customerlist.xsl Page 849
   Customerlist.htc Page 850
   Orderchart.htm Page 850
   Orderchart.xsl Page 851
CHAPTER 28 A Digital Dashboard Browser for Analysis Services Meta Data Page 853
 Introduction Page 853
 Requirements Page 854
  Windows 2000 Server Page 854
  SQL Server 2000 with Analysis Services Page 854
  Digital Dashboard Resource Kit (DDRK) 2.01 Page 854
  Internet Explorer 5.5 Page 855
   DDSC Versions Page 855
 Setup Page 855
  Copy Files Page 855
  Set Up an IIS Virtual Directory Page 856
  Grant Permissions Page 856
 Creating the Digital Dashboard Page 857
  Set Up the Dashboard Page 857
  Create the ServerConnect Web Part Page 857
  Create the DBSelect Web Part Page 858
  Create the CollSelect Web Part Page 858
  Create the MemberSelect Web Part Page 859
  Create the MetaData Web Part Page 859
  Test the Dashboard Page 860
 Using the Dashboard Page 860
 Sample Files Page 860
  Text Files (Embedded Content) Page 861
  ASP Files Page 862
   Serverconnect.asp Page 862
   Dbselect.asp Page 863
   Collselect.asp Page 863
   Memberselect.asp Page 863
   Metadata.asp Page 863
 Known Issues Page 864
  Unable to Connect to the Registry Page 864
  Sizing of Web Parts Page 865
PART 8 REPLICATION Page 867
CHAPTER 29 Common Questions in Replication Page 869
 Types of Replication and Replication Options Page 870
  What Type of Replication Should I Use? Page 870
  What Is the Difference Between Merge Replication and Updatable Subscriptions? Page 871
  Should I Use SQL Server Queues or Microsoft Message Queuing Services When Using Transactional Replication and Queued Updating? Page 871
 Implementing Replication Page 872
  What Is the Difference Between a Local Distributor and a Remote Distributor? Page 872
  What Type of Subscription Should I Use: Push or Pull? Page 873
  If I am Using Pull Subscriptions, When Should I Specify Them as Anonymous? Page 873
  What are the Advantages of Scripting Replication? Page 874
  Should I Apply the Snapshot Manually or Apply It Automatically? Page 874
  Can I Replicate Data Between SQL Server and Heterogeneous Databases? Page 875
  If I Am Using SQL Server 6.5 or SQL Server 7.0 Subscribers, Can I Use the New Features in SQL Server 2000? Page 876
  Can Microsoft SQL Server Desktop Engine Participate in Replication? Page 876
  When Upgrading to SQL Server 2000, Do I Need to Upgrade All Servers in Replication at the Same Time? Page 877
 Replication and Warm Standby Server Recovery Options Page 877
  Should I Use Replication, Log Shipping, or Clustering as a Failover Solution? Page 878
  Does Replication Work on a Cluster? Page 878
CHAPTER 30 Creating Merge Replication Custom Conflict Resolvers Using Visual Basic Page 879
 Using the Microsoft SQL Replication Conflict Resolver Library Page 880
  Adding the Microsoft SQL Replication Conflict Resolver Library to Visual Basic Page 881
  IVBCustomResolver Interface Page 881
   GetHandledStates Method Page 881
   Reconcile Method Page 882
  IReplRowChange and IConnectionInfo Interfaces Page 883
   IReplRowChange Interface and Methods Page 883
   IConnectionInfo Interface and Methods Page 894
  Constants Page 903
 Registering a Custom Conflict Resolver Page 913
 Merge Replication Custom Conflict Resolver Samples Page 914
PART 9 WEB PROGRAMMING Page 919
CHAPTER 31 Exposing SQL Server Data to the Web with XML Page 921
 Generating XML with the SELECT Statement Page 921
 Generating XML over the Internet Page 922
  Retrieving XML Formatted Data from SQL Server Page 923
   XML Templates Page 927
CHAPTER 32 English Query Best Practices Page 935
 An Overview of English Query Page 935
  A Simple Example Page 936
 Before You Begin Page 937
 Starting a Basic Model Page 937
  Edit Entity Properties Page 939
  Formulate and Test Typical Questions Page 939
  Use the Suggestion Wizard Page 940
  Add Help Text Page 941
 Expanding a Model Page 941
  Create Good Entity Relationships Page 942
  First Create Broad Relationships, and Then Work on Specific Questions Page 943
  Retest Questions Page 943
  For "Free-Form" Text Searches, Enable Full-Text Search Page 944
  For Data Analysis Questions, Create an OLAP Model Page 944
 Deploying an English Query Solution Page 945
  Use the Sample Applications Page 945
  Provide Sample Questions for Users Page 946
  Provide Question Builder Page 946
 Maintaining and Improving a Model Page 948
  Keep the Model Up-To-Date Page 948
  Use Logs to Improve Results Page 948
 Troubleshooting Page 949
PART 10 DESIGNING FOR PEFORMANCE AND SCALABILITY Page 951
CHAPTER 33 The Data Tier: An Approach to Database Optimization Page 953
 A New Approach Page 953
 Optimization Cycle Page 954
 Evaluating the Situation Page 956
  Performance Monitoring Tools Page 957
   SQL Profiler Page 957
   System Stored Procedures Page 958
   System Monitor Page 959
  Staging a Test Page 960
 Monitoring and Optimizing Page 961
  Monitoring a System Page 962
   Address Operating System and SQL Server Errors Page 962
   Monitor to Identify Areas for Improvement Page 963
   Monitoring SQL Server in General Page 965
  Analyzing the Results: Database and Code Level Page 966
   Blocking Based on Database Design Page 967
   Slowness Due to Indexing Schema Page 969
   Data Storage Component Issues Page 971
   Other Issues for Optimization Page 973
   Managing the Changes Page 974
  Optimizing the Data Components Page 974
  Optimizing the Code Components Page 975
  Optimizing the Storage Components Page 977
   Database File Placement Page 977
   Log File Placement Page 978
   tempdb File Placement Page 979
   Other File Placement Page 979
  Optimizing the Server Configuration Page 979
 Exploiting the Hardware Page 980
  Maximizing Performance Page 980
  Capacity Planning Page 981
   General Hardware Recommendations Page 982
   Memory Planning Page 982
   Disk Planning Page 983
  Working with Existing Hardware Page 983
   Storage Subsystem Design Page 984
  Sample Server Configurations Page 988
   Small Entry-Level System Layout Page 988
   OLTP System Server Layout Page 990
   DSS System Server Layout Page 991
   Multi-Instance N+1 Failover Cluster Configuration: SQL Server 2000 on Windows 2000 Datacenter Server Page 992
 Conclusion Page 993
CHAPTER 34 Identifying Common Administrative Issues Page 995
 Installing the Stored Procedures Page 996
 Check Server Configuration (sp_rk_audit_configure) Page 999
  Running sp_rk_audit_configure Page 1000
  How sp_rk_audit_configure Works Page 1001
  Modifying sp_rk_audit_configure Page 1002
 Check Database Configuration (sp_rk_audit_dboptions) Page 1005
  Running sp_rk_audit_dboptions Page 1006
  How sp_rk_audit_dboptions Works Page 1008
  Modifying sp_rk_audit_dboptions_check_1_db to Look at Different Values Page 1012
 Application Troubleshooting Page 1013
  Running sp_rk_blocker_blockee Page 1014
  How sp_rk_blocker_blockee Works Page 1016
  Modifying sp_rk_blocker_blockee Page 1016
CHAPTER 35 Using Visual Basic to Remotely Manage SQL Server 2000 Page 1017
 Inside the SQL Junior Administrator Application Page 1018
  User Interface Page 1019
  Visual Basic Code Page 1024
 Summary Page 1047
CHAPTER 36 Using Views with a View on Performance Page 1049
 What Is an Indexed View? Page 1049
  Performance Gains from Indexed Views Page 1050
 Getting the Most from Indexed Views Page 1051
  How the Query Optimizer Uses Indexed Views Page 1052
   Optimizer Considerations Page 1052
 Designing Indexed Views Page 1053
  Guidelines for Designing Indexed Views Page 1055
  Using the Index Tuning Wizard Page 1056
  Maintaining Indexed Views Page 1056
   Maintenance Cost Considerations Page 1057
 Creating Indexed Views Page 1057
  Using SET Options to Obtain Consistent Results Page 1058
  Using Deterministic Functions Page 1059
  Additional Requirements for Indexed Views Page 1060
 Indexed View Examples Page 1062
CHAPTER 37 Extending Triggers with INSTEAD OF Page 1069
 What Are INSTEAD OF Triggers? Page 1069
 Customizing Error Messages with INSTEAD OF Triggers Page 1070
 Creating Updatable Views with INSTEAD OF Triggers Page 1073
  Handling NOT NULL Values and Computed Columns in Updatable Views with INSTEAD OF Triggers Page 1075
 INSTEAD OF Triggers on Partitioned Views Page 1078
 Guidelines for Designing INSTEAD OF Triggers Page 1078
 Performance Guidelines for INSTEAD OF Triggers Page 1080
CHAPTER 38 Scaling Out on SQL Server Page 1083
 Readiness Checklists Page 1084
  Are You Ready to Scale Out on SQL Server? Page 1084
  Design Considerations Page 1085
  Understanding the Federation Page 1087
 Data Partitioning Components Page 1088
  How Partitioned Views Work Page 1091
  Creating Partitioned Views Page 1092
  Partitioned Query Plans Page 1093
  Data-Dependent Routing Page 1095
  Other Options Page 1096
   Replication Page 1096
   Adding a Unique Column Page 1097
   INSTEAD OF Trigger Page 1097
 Administration Considerations Page 1097
  Partition Maintenance Page 1097
  Disaster Recovery and Partitioning Page 1098
   Backing Up and Restoring Partitioned Databases Page 1098
  High Availability Page 1099
PART 11 CD-ROM CONTENT Page 1101
CHAPTER 39 Tools, Samples, eBooks, and More Page 1103
 Electronic Version of the Resource Kit Book Page 1103
 eBooks Page 1103
 System Table Map Page 1104
 Tools and Samples Page 1106
About the Authors Page 1109
INDEX Page 1111
Read More Show Less

First Chapter

Chapter 17.Data Warehouse Design Considerations
  • Data Warehouses, OLTP, OLAP, and Data Mining
    • OLAP Is a Data Warehouse Tool
    • Data Mining is a Data Warehouse Tool
  • Designing a Data Warehouse: Prerequisites
    • Data Warehouse Architecture Goals
    • Data Warehouse Users
    • How Users Query the Data Warehouse
  • Developing a Data Warehouse: Details
    • Identify and Gather Requirements
    • Design the Dimensional Model
    • Dimensional Model Schemas
    • Dimension Tables
    • Fact Tables
    • Develop the Architecture
    • Design the Relational Database and OLAP Cubes
    • Develop the Operational Data Store
    • Develop the Data Maintenance Applications
    • Develop Analysis Applications
    • Test and Deploy the System
  • Conclusion

Chapter 17 Data Warehouse Design Considerations

Data warehouses support business decisions by collecting, consolidating, and organizing data for reporting and analysis with tools such as online analytical processing (OLAP) and data mining. Although data warehouses are built on relational database technology, the design of a data warehouse database differs substantially from the design of an online transaction processing system (OLTP) database.

The topics in this chapter address approaches and choices to be considered when designing and implementing a data warehouse. The chapter begins by contrasting data warehouse databases with OLTP databases and introducing OLAP and data mining, and then adds information about design issues to be considered when developing a data warehouse with Microsoft® SQL Server™ 2000.

Data Warehouses, OLTP, OLAP, and Data Mining

A relational database is designed for a specific purpose. Because the purpose of a data warehouse differs from that of an OLTP, the design characteristics of a relational database that supports a data warehouse differ from the design characteristics of an OLTP database.

Data warehouse database OLTP database
Designed for analysis of business measures by categories and attributes Designed for real-time business operations
Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table
Loaded with consistent, valid data; requires no real time validation Optimized for validation of incoming data during transactions; uses validation data tables
Supports few concurrent users relative to OLTP Supports thousands of concurrent users

A Data Warehouse Supports OLTP

A data warehouse supports an OLTP system by providing a place for the OLTP database to offload data as it accumulates, and by providing services that would complicate and degrade OLTP operations if they were performed in the OLTP database.

Without a data warehouse to hold historical information, data is archived to static media such as magnetic tape, or allowed to accumulate in the OLTP database.

If data is simply archived for preservation, it is not available or organized for use by analysts and decision makers. If data is allowed to accumulate in the OLTP so it can be used for analysis, the OLTP database continues to grow in size and requires more indexes to service analytical and report queries. These queries access and process large portions of the continually growing historical data and add a substantial load to the database. The large indexes needed to support these queries also tax the OLTP transactions with additional index maintenance. These queries can also be complicated to develop due to the typically complex OLTP database schema.

A data warehouse offloads the historical data from the OLTP, allowing the OLTP to operate at peak transaction efficiency. High volume analytical and reporting queries are handled by the data warehouse and do not load the OLTP, which does not need additional indexes for their support. As data is moved to the data warehouse, it is also reorganized and consolidated so that analytical queries are simpler and more efficient.

OLAP Is a Data Warehouse Tool

Online analytical processing (OLAP) is a technology designed to provide superior performance for ad hoc business intelligence queries. OLAP is designed to operate efficiently with data organized in accordance with the common dimensional model used in data warehouses.

A data warehouse provides a multidimensional view of data in an intuitive model designed to match the types of queries posed by analysts and decision makers. OLAP organizes data warehouse data into multidimensional cubes based on this dimensional model, and then preprocesses these cubes to provide maximum performance for queries that summarize data in various ways. For example, a query that requests the total sales income and quantity sold for a range of products in a specific geographical region for a specific time period can typically be answered in a few seconds or less regardless of how many hundreds of millions of rows of data are stored in the data warehouse database.

OLAP is not designed to store large volumes of text or binary data, nor is it designed to support high volume update transactions. The inherent stability and consistency of historical data in a data warehouse enables OLAP to provide its remarkable performance in rapidly summarizing information for analytical queries.

In SQL Server 2000, Analysis Services provides tools for developing OLAP applications and a server specifically designed to service OLAP queries.

Data Mining is a Data Warehouse Tool

Data mining is a technology that applies sophisticated and complex algorithms to analyze data and expose interesting information for analysis by decision makers. Whereas OLAP organizes data in a model suited for exploration by analysts, data mining performs analysis on data and provides the results to decision makers. Thus, OLAP supports model-driven analysis and data mining supports data-driven analysis.

Data mining has traditionally operated only on raw data in the data warehouse database or, more commonly, text files of data extracted from the data warehouse database. In SQL Server 2000, Analysis Services provides data mining technology that can analyze data in OLAP cubes, as well as data in the relational data warehouse database. In addition, data mining results can be incorporated into OLAP cubes to further enhance model-driven analysis by providing an additional dimensional viewpoint into the OLAP model. For example, data mining can be used to analyze sales data against customer attributes and create a new cube dimension to assist the analyst in the discovery of the information embedded in the cube data.

For more information and details about data mining in SQL Server 2000, see Chapter 24, "Effective Strategies for Data Mining."

Designing a Data Warehouse: Prerequisites

Before embarking on the design of a data warehouse, it is imperative that the architectural goals of the data warehouse be clear and well understood. Because the purpose of a data warehouse is to serve users, it is also critical to understand the various types of users, their needs, and the characteristics of their interactions with the data warehouse.

Data Warehouse Architecture Goals

A data warehouse exists to serve its users — analysts and decision makers. A data warehouse must be designed to satisfy the following requirements:

  • Deliver a great user experience — user acceptance is the measure of success
  • Function without interfering with OLTP systems
  • Provide a central repository of consistent data
  • Answer complex queries quickly
  • Provide a variety of powerful analytical tools such as OLAP and data mining

Most successful data warehouses that meet these requirements have these common characteristics:

  • Are based on a dimensional model
  • Contain historical data
  • Include both detailed and summarized data
  • Consolidate disparate data from multiple sources while retaining consistency
  • Focus on a single subject such as sales, inventory, or finance

Data warehouses are often quite large. However, size is not an architectural goal — it is a characteristic driven by the amount of data needed to serve the users.

Data Warehouse Users

The success of a data warehouse is measured solely by its acceptance by users. Without users, historical data might as well be archived to magnetic tape and stored in the basement. Successful data warehouse design starts with understanding the users and their needs.

Data warehouse users can be divided into four categories: Statisticians, Knowledge Workers, Information Consumers, and Executives. Each type makes up a portion of the user population as illustrated in this diagram.

(Image Unavailable)

Statisticians: There are typically only a handful of statisticians and operations research types in any organization. Their work can contribute to closed loop systems that deeply influence the operations and profitability of the company.

Knowledge Workers: A relatively small number of analysts perform the bulk of new queries and analyses against the data warehouse. These are the users who get the Designer or Analyst versions of user access tools. They will figure out how to quantify a subject area. After a few iterations, their queries and reports typically get published for the benefit of the Information Consumers. Knowledge Workers are often deeply engaged with the data warehouse design and place the greatest demands on the ongoing data warehouse operations team for training and support.

Information Consumers: Most users of the data warehouse are Information Consumers; they will probably never compose a true ad hoc query. They use static or simple interactive reports that others have developed. They usually interact with the data warehouse only through the work product of others. This group includes a large number of people, and published reports are highly visible. Set up a great communication infrastructure for distributing information widely, and gather feedback from these users to improve the information sites over time.

Executives: Executives are a special case of the Information Consumers group.

How Users Query the Data Warehouse

Information for users can be extracted from the data warehouse relational database or from the output of analytical services such as OLAP or data mining. Direct queries to the data warehouse relational database should be limited to those that cannot be accomplished through existing tools, which are often more efficient than direct queries and impose less load on the relational database.

Reporting tools and custom applications often access the database directly. Statisticians frequently extract data for use by special analytical tools. Analysts may write complex queries to extract and compile specific information not readily accessible through existing tools. Information consumers do not interact directly with the relational database but may receive e-mail reports or access web pages that expose data from the relational database. Executives use standard reports or ask others to create specialized reports for them.

When using the Analysis Services tools in SQL Server 2000, Statisticians will often perform data mining, Analysts will write MDX queries against OLAP cubes and use data mining, and Information Consumers will use interactive reports designed by others.

Developing a Data Warehouse: Details

The phases of a data warehouse project listed below are similar to those of most database projects, starting with identifying requirements and ending with deploying the system:

  • Identify and gather requirements
  • Design the dimensional model
  • Develop the architecture, including the Operational Data Store (ODS)
  • Design the relational database and OLAP cubes
  • Develop the data maintenance applications
  • Develop analysis applications
  • Test and deploy the system

Identify and Gather Requirements

Identify sponsors. A successful data warehouse project needs a sponsor in the business organization and usually a second sponsor in the Information Technology group. Sponsors must understand and support the business value of the project.

Understand the business before entering into discussions with users. Then interview and work with the users, not the data – learn the needs of the users and turn these needs into project requirements. Find out what information they need to be more successful at their jobs, not what data they think should be in the data warehouse; it is the data warehouse designer’s job to determine what data is necessary to provide the information. Topics for discussion are the users’ objectives and challenges and how they go about making business decisions. Business users should be closely tied to the design team during the logical design process; they are the people who understand the meaning of existing data. Many successful projects include several business users on the design team to act as data experts and sounding boards for design concepts. Whatever the structure of the team, it is important that business users feel ownership for the resulting system.

Interview data experts after interviewing several users. Find out from the experts what data exists and where it resides, but only after you understand the basic business needs of the end users. Information about available data is needed early in the process, before you complete the analysis of the business needs, but the physical design of existing data should not be allowed to have much influence on discussions about business needs.

Communicate with users often and thoroughly – continue discussions as requirements continue to solidify so that everyone participates in the progress of the requirements definition.

Design the Dimensional Model

User requirements and data realities drive the design of the dimensional model, which must address business needs, grain of detail, and what dimensions and facts to include.

The dimensional model must suit the requirements of the users and support ease of use for direct access. The model must also be designed so that it is easy to maintain and can adapt to future changes. The model design must result in a relational database that supports OLAP cubes to provide instantaneous query results for analysts.

An OLTP system requires a normalized structure to minimize redundancy, provide validation of input data, and support a high volume of fast transactions. A transaction usually involves a single business event, such as placing an order or posting an invoice payment. An OLTP model often looks like a spider web of hundreds or even thousands of related tables.

In contrast, a typical dimensional model uses a star or snowflake design that is easy to understand and relate to business needs, supports simplified business queries, and provides superior query performance by minimizing table joins.

For example, contrast the very simplified OLTP data model in the first diagram below with the data warehouse dimensional model in the second diagram. Which one better supports the ease of developing reports and simple, efficient summarization queries?

(Image Unavailable)

(Image Unavailable)

Dimensional Model Schemas

The principal characteristic of a dimensional model is a set of detailed business facts surrounded by multiple dimensions that describe those facts. When realized in a database, the schema for a dimensional model contains a central fact table and multiple dimension tables. A dimensional model may produce a star schema or a snowflake schema.

Star Schemas

A schema is called a star schema if all dimension tables can be joined directly to the fact table. The following diagram shows a classic star schema.

(Image Unavailable)

The following diagram shows a clickstream star schema.

(Image Unavailable)

Snowflake Schemas

A schema is called a snowflake schema if one or more dimension tables do not join directly to the fact table but must join through other dimension tables. For example, a dimension that describes products may be separated into three tables (snowflaked) as illustrated in the following diagram.

(Image Unavailable)

A snowflake schema with multiple heavily snowflaked dimensions is illustrated in the following diagram.

(Image Unavailable)

Star or Snowflake

Both star and snowflake schemas are dimensional models; the difference is in their physical implementations. Snowflake schemas support ease of dimension maintenance because they are more normalized. Star schemas are easier for direct user access and often support simpler and more efficient queries. The decision to model a dimension as a star or snowflake depends on the nature of the dimension itself, such as how frequently it changes and which of its elements change, and often involves evaluating tradeoffs between ease of use and ease of maintenance. It is often easiest to maintain a complex dimension by snowflaking the dimension. By pulling hierarchical levels into separate tables, referential integrity between the levels of the hierarchy is guaranteed. Analysis Services reads from a snowflaked dimension as well as, or better than, from a star dimension. However, it is important to present a simple and appealing user interface to business users who are developing ad hoc queries on the dimensional database. It may be better to create a star version of the snowflaked dimension for presentation to the users. Often, this is best accomplished by creating an indexed view across the snowflaked dimension, collapsing it to a virtual star.

Dimension Tables

Dimension tables encapsulate the attributes associated with facts and separate these attributes into logically distinct groupings, such as time, geography, products, customers, and so forth.

A dimension table may be used in multiple places if the data warehouse contains multiple fact tables or contributes data to data marts. For example, a product dimension may be used with a sales fact table and an inventory fact table in the data warehouse, and also in one or more departmental data marts. A dimension such as customer, time, or product that is used in multiple schemas is called a conforming dimension if all copies of the dimension are the same. Summarization data and reports will not correspond if different schemas use different versions of a dimension table. Using conforming dimensions is critical to successful data warehouse design.

User input and evaluation of existing business reports help define the dimensions to include in the data warehouse. A user who wants to see data by region and by product has just identified two dimensions (geography and product). Business reports that group sales by salesperson or sales by customer identify two more dimensions (salesforce and customer). Almost every data warehouse includes a time dimension.

In contrast to a fact table, dimension tables are usually small and change relatively slowly. Dimension tables are seldom keyed to date.

The records in a dimension table establish one-to-many relationships with the fact table. For example, there may be a number of sales to a single customer, or a number of sales of a single product. The dimension table contains attributes associated with the dimension entry; these attributes are rich and user-oriented textual details, such as product name or customer name and address. Attributes serve as report labels and query constraints. Attributes that are coded in an OLTP database should be decoded into descriptions. For example, product category may exist as a simple integer in the OLTP database, but the dimension table should contain the actual text for the category. The code may also be carried in the dimension table if needed for maintenance. This denormalization simplifies and improves the efficiency of queries and simplifies user query tools. However, if a dimension attribute changes frequently, maintenance may be easier if the attribute is assigned to its own table to create a snowflake dimension.

It is often useful to have a pre-established no such member or unknown member record in each dimension to which orphan fact records can be tied during the update process. Business needs and the reliability of consistent source data will drive the decision as to whether such placeholder dimension records are required.

Hierarchies

The data in a dimension is usually hierarchical in nature. Hierarchies are determined by the business need to group and summarize data into usable information. For example, a time dimension often contains the hierarchy elements: (all time), Year, Quarter, Month, Day, or (all time), Year Quarter, Week, Day. A dimension may contain multiple hierarchies – a time dimension often contains both calendar and fiscal year hierarchies. Geography is seldom a dimension of its own; it is usually a hierarchy that imposes a structure on sales points, customers, or other geographically distributed dimensions. An example geography hierarchy for sales points is: (all), Country, Region, State or Province, City, Store.

Note that each hierarchy example has an (all) entry such as (all time), (all stores), (all customers), and so forth. This top-level entry is an artificial category used for grouping the first-level categories of a dimension and permits summarization of fact data to a single number for a dimension. For example, if the first level of a product hierarchy includes product line categories for hardware, software, peripherals, and services, the question "What was the total amount for sales of all products last year?" is equivalent to "What was the total amount for the combined sales of hardware, software, peripherals, and services last year?" The concept of an (all) node at the top of each hierarchy helps reflect the way users want to phrase their questions. OLAP tools depend on hierarchies to categorize data – Analysis Services will create by default an (all) entry for a hierarchy used in a cube if none is specified.

A hierarchy may be balanced, unbalanced, ragged, or composed of parent-child relationships such as an organizational structure. For more information about hierarchies in OLAP cubes, see SQL Server Books Online.

Surrogate Keys

A critical part of data warehouse design is the creation and use of surrogate keys in dimension tables. A surrogate key is the primary key for a dimension table and is independent of any keys provided by source data systems. Surrogate keys are created and maintained in the data warehouse and should not encode any information about the contents of records; automatically increasing integers make good surrogate keys. The original key for each record is carried in the dimension table but is not used as the primary key. Surrogate keys provide the means to maintain data warehouse information when dimensions change. Special keys are used for date and time dimensions, but these keys differ from surrogate keys used for other dimension tables.

GUID and IDENTITY Keys

Avoid using GUIDs (globally unique identifiers) as keys in the data warehouse database. GUIDs may be used in data from distributed source systems, but they are difficult to use as table keys. GUIDs use a significant amount of storage (16 bytes each), cannot be efficiently sorted, and are difficult for humans to read. Indexes on GUID columns may be relatively slower than indexes on integer keys because GUIDs are four times larger. The Transact–SQL NEWID function can be used to create GUIDs for a column of uniqueidentifier data type, and the ROWGUIDCOL property can be set for such a column to indicate that the GUID values in the column uniquely identify rows in the table, but uniqueness is not enforced.

Because a uniqueidentifier data type cannot be sorted, the GUID cannot be used in a GROUP BY statement, nor can the occurrences of the uniqueidentifier GUID be distinctly counted – both GROUP BY and COUNT DISTINCT operations are very common in data warehouses. The uniqueidentifier GUID cannot be used as a measure in an Analysis Services cube.

The IDENTITY property and IDENTITY function can be used to create identity columns in tables and to manage series of generated numeric keys. IDENTITY functionality is more useful in surrogate key management than uniqueidentifier GUIDs.

Date and Time Dimensions

Each event in a data warehouse occurs at a specific date and time; and data is often summarized by a specified time period for analysis. Although the date and time of a business fact is usually recorded in the source data, special date and time dimensions provide more effective and efficient mechanisms for time-oriented analysis than the raw event time stamp. Date and time dimensions are designed to meet the needs of the data warehouse users and are created within the data warehouse.

A date dimension often contains two hierarchies, one for calendar year and another for fiscal year.

Time Granularity

A date dimension with one record per day will suffice if users do not need time granularity finer than a single day. A date by day dimension table will contain 365 records per year (366 in leap years).

A separate time dimension table should be constructed if a fine time granularity, such as minute or second, is needed. A time dimension table of one-minute granularity will contain 1,440 rows for a day, and a table of seconds will contain 86,400 rows for a day. If exact event time is needed, it should be stored in the fact table.

When a separate time dimension is used, the fact table contains one foreign key for the date dimension and another for the time dimension. Separate date and time dimensions simplify many filtering operations. For example, summarizing data for a range of days requires joining only the date dimension table to the fact table. Analyzing cyclical data by time period within a day requires joining just the time dimension table. The date and time dimension tables can both be joined to the fact table when a specific time range is needed.

For hourly time granularity, the hour breakdown can be incorporated into the date dimension or placed in a separate dimension. Business needs influence this design decision. If the main use is to extract contiguous chunks of time that cross day boundaries (for example 11/24/2000 10 p.m. to 11/25/2000 6 a.m.), then it is easier if the hour and day are in the same dimension. However, it is easier to analyze cyclical and recurring daily events if they are in separate dimensions. Unless there is a clear reason to combine date and hour in a single dimension, it is generally better to keep them in separate dimensions.

Date and Time Dimension Attributes

It is often useful to maintain attribute columns in a date dimension to provide additional convenience or business information that supports analysis. For example, one or more columns in the time-by-hour dimension table can indicate peak periods in a daily cycle, such as meal times for a restaurant chain or heavy usage hours for an Internet service provider. Peak period columns may be Boolean, but it is better to decode the Boolean yes/no into a brief description, such as peak/ offpeak. In a report, the decoded values will be easier for business users to read than multiple columns of yes and no.

These are some possible attribute columns that may be used in a date table. Fiscal year versions are the same, although values such as quarter numbers may differ.

Column name Data type Format/Example Comment
date_key int yyyymmdd  
day_date smalldatetime    
day_of_week char Monday  
week_begin_date smalldatetime    
week_num tinyint 1 to 52 or 53 Week 1 defined by business rules
month_num tinyint 1 to 12  
month_name char January  
month_short_name char Jan  
month_end_date smalldatetime   Useful for days in the month
days_in_month tinyint   Alternative for, or in addition to month_end_date
yearmo int yyyymm  
quarter_num tinyint 1 to 4  
quarter_name char 1Q2000  
year smallint    
weekend_ind bit   Indicates weekend
workday_ind bit   Indicates work day
weekend_weekday char weekend Alternative for weekend_ind and weekday_ind. Can be used to make reports more readable.
holiday_ind bit   Indicates holiday
holiday_name char Thanksgiving  
peak_period_ind bit   Meaning defined by business rules

Date and Time Dimension Keys

In contrast to surrogate keys used in other dimension tables, date and time dimension keys should be smart. A suggested key for a date dimension is of the form yyyymmdd. This format is easy for users to remember and incorporate into queries. It is also a recommended surrogate key format for fact tables that are partitioned into multiple tables by date.

Slowly Changing Dimensions

A characteristic of dimensions is that dimension data is relatively stable – data may be added as new products are released or customers are acquired, but data, such as the names of existing products and customers, changes infrequently. However, business events do occur that cause dimension attributes to change, and the effects of these changes on the data warehouse must be managed (in particular, the potential effect of a change to a dimension attribute on how historical data is tracked and summarized). Slowly changing dimensions is the customary term used for discussions of issues associated with the impact of changes to dimension attributes. Design approaches to dealing with the issues of slowly changing dimensions are commonly categorized into the following three change types:

  • Type 1: Overwrite the dimension record.
  • Type 2: Add a new dimension record.
  • Type 3: Create new fields in the dimension record.

Type 1

Type 1 changes cause history to be rewritten, which may affect analysis results if an attribute is changed that is used to group data for summarization. Changes to a dimension attribute that is never used for analysis can be managed by simply changing the data to the new value. For example, if customer addresses are stored in the customer dimension table, a change to a customer’s apartment number is unlikely to affect any summarized information, but a customer’s move to a new city or state would affect summarization of data by customer location.

A Type 1 change is the easiest kind of slowly changing dimension to manage in the relational data warehouse. The maintenance procedure is simply to update an attribute column in the dimension table. However, the Type 1 slowly changing dimension presents complex management problems for aggregate tables and OLAP cubes. This is especially true if the updated attribute is a member of a hierarchy on which aggregates are pre-computed, either in the relational database or the OLAP store.

For business users, a Type 1 change can hide valuable information. By updating the attribute in the dimension table, the prior value history of the attribute’s value is lost. Consider the example where a customer has upgraded from the Silver to the Gold level of service. If the dimension table simply updates the attribute value, business users will not easily be able to explore differences of behavior before, during, and after the change of service level. In many cases, these questions are of tremendous importance to the business.

Type 2

Type 2 changes cause history to be partitioned at the event that triggered the change. Data prior to the event continues to be summarized and analyzed as before; new data is summarized and analyzed in accordance with the new value of the data.

Consider this example: In a sales organization, salespeople receive commissions on their sales. These commissions influence the commissions of sales managers and executives, and are summarized by sales group in standard reports. When a salesperson transfers from one group in the organization to another group, the historical information about commission amounts must remain applicable to the original group and new commissions must apply to the salesperson’s new group. In addition, the total lifetime commission history for the employee must remain available regardless of the number of groups in which the person worked. A type 1 change is not appropriate because it would move all of the salesperson’s commission history to the new group.

The type 2 solution is to retain the existing salesperson’s dimension record and add a new record for the salesperson that contains the new reporting information. The original record still associates historical commission data with the previous sales group and the new record associates new commission data with the new group. It is customary to include fields in the dimension table to document the change. The following are examples of some common fields that can be used to document the change event:

  • Row Current, a Boolean field that identifies which record represents the current status
  • Row Start, a date field that identifies the date the record was added
  • Row Stop, a date field that identifies the date the record ceased to be current

Surrogate keys on the dimension table are required for type 2 solutions. The salesperson’s employee number is most likely used as the record key in OLTP systems. Even if some other key is used, it is unlikely that OLTP systems will need to create a new record for this individual. A second record for this individual cannot be created in the dimension table unless a different value is used for its primary key – surrogate keys avoid this restriction. In addition, because the salesperson’s employee number is carried in the dimension table as an attribute, a summarization of the entire employee’s commission history is possible, regardless of the number of sales groups to which the person has belonged.

Some queries or reports may be affected by type 2 changes. In the salesperson example, existing reports that summarize by dimension records will now show two entries for the same salesperson. This may not be what is desired, and the report query will have to be modified to summarize by employee number instead of by the surrogate key.

Type 3

Type 3 solutions attempt to track changes horizontally in the dimension table by adding fields to contain the old data. Often only the original and current values are retained and intermediate values are discarded. The advantage of type 3 solutions is the avoidance of multiple dimension records for a single entity. However, the disadvantages are history perturbation and complexity of queries that need to access the additional fields. Type 2 solutions can address all situations where type 3 solutions can be used, and many more as well. If the data warehouse is designed to manage slowly changing dimensions using type 1 and 2 solutions, there is no need to add the maintenance and user complexity inherent in type 3 solutions.

Rapidly Changing Dimensions, or Large Slowly Changing Dimensions

A dimension is considered to be a rapidly changing dimension if one or more of its attributes changes frequently in many rows. For a rapidly changing dimension, the dimension table can grow very large from the application of numerous type 2 changes. The terms rapid and large are relative, of course. For example, a customer table with 50,000 rows and an average of 10 changes per customer per year will grow to about five million rows in 10 years, assuming the number of customers does not grow. This may be an acceptable growth rate. On the other hand, only one or two changes per customer per year for a ten million row customer table will cause it to grow to hundreds of millions of rows in ten years.

Tracking bands can be used to reduce the rate of change of many attributes that have continuously variable values such as age, size, weight, or income. For example, income can be categorized into ranges such as [0-14,999], [15,000-24,999], [25,000-39,999], and so on, which reduce the frequency of change to the attribute. Although type 2 change records should not be needed to track age, age bands are often used for other purposes, such as analytical grouping. Birth date can be used to calculate exact age when needed. Business needs will determine which continuously variable attributes are suitable for converting to bands.

Often, the correct solution for a dimension with rapidly changing attributes is to break the offending attributes out of the dimension and create one or more new dimensions. Consider the following example.

An important attribute for customers might be their account status (good, late, very late, in arrears, suspended), and the history of their account status. Over time many customers will move from one of these states to another. If this attribute is kept in the customer dimension table and a type 2 change is made each time a customer’s status changes, an entire row is added only to track this one attribute. The solution is to create a separate account_status dimension with five members to represent the account states.

A foreign key in the customer table points to the record in the account_status dimension table that represents the current account status of that customer. A type 1 change is made to the customer record when the customer’s account status changes. The fact table also contains a foreign key for the account_status dimension. When a new fact record is loaded into the fact table, the customer id in the incoming fact record is used to look up the current account_table key in the customer record and populate it into the fact record. This captures a customer’s account history in the fact table.

In. addition to the benefit of removing the rapidly changing item from the customer dimension, the separate account status dimension enables easy pivot analysis of customers by current account status in OLAP cubes. However, to see entire account history for a customer, the fact table must be joined to the customer table and the account_status table and then filtered on customer id, which is not very efficient for frequent queries for a customer’s account history.

This scenario works reasonably well for a single rapidly changing attribute. What if there are ten or more rapidly changing attributes? Should there be a separate dimension for each attribute? Maybe, but the number of dimensions can rapidly get out of hand and the fact table can end up with a large number of foreign keys. One approach is to combine several of these mini-dimensions into a single physical dimension. This is the same technique used to create what is often called a junk dimension that contains unrelated attributes and flags to get them out of the fact table. However, it is still difficult to query these customer attributes well because the fact table must be involved to relate customers to their attributes. Unfortunately, business users are often very interested in this kind of historical information, such as the movement of a customer through the various account status values.

If business users frequently need to query a dimension that has been broken apart like this, the best solution is to create a factless schema that focuses on attribute changes. For example, consider a primary data warehouse schema that keeps track of customers’ purchases. The Customer dimension has been developed as a Type 2 slowly changing dimension, and account status has been pulled out into a separate dimension. Create a new fact table, CustomerChanges, that tracks only the changes to the customer and account status. A sample schema is illustrated in the following figure.

(Image Unavailable)

The fact table, CustomerChanges, receives a new row only when a change is made to the Customer table that includes information about the customer’s current account status. The fact table has no numeric measure or fact; an entry in the table signifies that an interesting change has occurred to the customer. Optionally, the CustomerChanges schema can track the reason for the change in the CustomerChangeReason and AccountChangeReason dimension tables. Sample values for the account_change_reason might include Customer terminated account, Account closed for non-payment, and Outstanding balance paid in full.

Attribute history tables like this are neither dimension tables nor fact tables in the usual sense. The information in this kind of table is something like Quantity on Hand in an inventory fact table, which cannot be summarized by adding. However, unlike Quantity on Hand in an inventory table, these attributes do not change on a fixed periodic basis, so they cannot be numerically quantified and meaningfully averaged unless the average is weighted by the time between events.

Multi-Use Dimensions

Sometimes data warehouse design can be simplified by combining a number of small, unrelated dimensions into a single physical dimension, often called a junk dimension. This can greatly reduce the size of the fact table by reducing the number of foreign keys in fact table records. Often the combined dimension will be prepopulated with the Cartesian product of all dimension values. If the number of discrete values creates a very large table of all possible value combinations, the table can be populated with value combinations as they are encountered during the load or update process.

A common example of a multi-use dimension is a dimension that contains customer demographics selected for reporting standardization. Another multiuse dimension might contain useful textual comments that occur infrequently in the source data records; collecting these comments in a single dimension removes a sparse text field from the fact table and replaces it with a compact foreign key.

Fact Tables

A fact table must address the business problem, business process, and needs of the users. Without this information, a fact table design may overlook a critical piece of data or incorporate unused data that unnecessarily adds to complexity, storage space, and processing requirements.

Fact tables contain business event details for summarization. Fact tables are often very large, containing hundreds of millions of rows and consuming hundreds of gigabytes or multiple terabytes of storage. Because dimension tables contain records that describe facts, the fact table can be reduced to columns for dimension foreign keys and numeric fact values. Text, blobs, and denormalized data are typically not stored in the fact table.

Multiple Fact Tables

Multiple fact tables are used in data warehouses that address multiple business functions, such as sales, inventory, and finance. Each business function should have its own fact table and will probably have some unique dimension tables. Any dimensions that are common across the business functions must represent the dimension information in the same way, as discussed earlier in "Dimension Tables." Each business function will typically have its own schema that contains a fact table, several conforming dimension tables, and some dimension tables unique to the specific business function. Such business-specific schemas may be part of the central data warehouse or implemented as data marts.

Very large fact tables may be physically partitioned for implementation and maintenance design considerations. The partition divisions are almost always along a single dimension, and the time dimension is the most common one to use because of the historical nature of most data warehouse data. If fact tables are partitioned, OLAP cubes are usually partitioned to match the partitioned fact table segments for ease of maintenance. Partitioned fact tables can be viewed as one table with an SQL UNION query as long as the number of tables involved does not exceed the limit for a single query. For more information about partitioning fact tables and OLAP cubes, see Chapter 18, "Using Partitions in a SQL Server 2000 Data Warehouse."

Additive and Non-additive Measures

The values that quantify facts are usually numeric, and are often referred to as measures. Measures are typically additive along all dimensions, such as Quantity in a sales fact table. A sum of Quantity by customer, product, time, or any combination of these dimensions results in a meaningful value.

Some measures are not additive along one or more dimensions, such as Quantity-on-Hand in an inventory system or Price in a sales system. Some measures can be added along dimensions other than the time dimension; such measures are sometimes referred to as semiadditive. For example, Quantity-on-Hand can be added along the Warehouse dimension to achieve a meaningful total of the quantity of items on hand in all warehouses at a specific point in time. Along the time dimension, however, an aggregate function, such as Average, must be applied to provide meaningful information about the quantity of items on hand. Measures that cannot be added along any dimension are truly nonadditive. Queries, reports, and applications must evaluate measures properly according to their summarization constraints.

Nonadditive measures can often be combined with additive measures to create new additive measures. For example, Quantity times Price produces Extended Price or Sale Amount, an additive value.

Calculated Measures

A calculated measure is a measure that results from applying a function to one or more measures, for example, the computed Extended Price value resulting from multiplying Quantity times Price. Other calculated measures may be more complex, such as profit, contribution to margin, allocation of sales tax, and so forth.

Calculated measures may be precomputed during the load process and stored in the fact table, or they may be computed on the fly as they are used. Determination of which measures should be precomputed is a design consideration. There are other considerations in addition to the usual tradeoff between storage space and computational time. The ability of SQL to state complex relationships and expressions is not as powerful as that of MDX, so complex calculated measures are more likely to be candidates for precomputing if they are accessed using SQL than if they are accessed through Analysis Services using MDX.

Fact Table Keys

A fact table contains a foreign key column for the primary keys of each dimension. The combination of these foreign keys defines the primary key for the fact table. Physical design considerations, such as fact table partitioning, load performance, and query performance, may indicate a different structure for the fact table primary key than the composite key that is in the logical model. These considerations are discussed below in the section "Design the Relational Database and OLAP Cubes." For more information about partitioning fact tables, see Chapter 18, "Using Partitions in a SQL Server 2000 Data Warehouse."

The logical model for a fact table resolves many-to-many relationships between dimensions because dimension tables join through the fact table. For examples of fact tables, see the illustrations earlier in this chapter in "Dimensional Model Schemas."

Granularity

The grain of the fact table is determined after the fact content columns have been identified. Granularity is a measure of the level of detail addressed by an individual entry in the fact table. Examples of grain include "at the Transaction level", "at the Line Item level", "Sales to each customer, by product, by month". As you can see, the grain for a fact table is closely related to the dimensions to which it links. Including only summarized records of individual facts will reduce the grain and size of a fact table, but the resulting level of detail must remain sufficient for the business needs.

Business needs, rather than physical implementation considerations, must determine the minimum granularity of the fact table. However, it is better to keep the data as granular as possible, even if current business needs do not require it – the additional detail might be critical for tomorrow’s business analysis. Analysis Services is designed to rapidly and efficiently summarize detailed facts into OLAP cubes so highly granular fact tables impose no performance burden on user response time. Alternatively, the OLAP cubes can be designed to include a higher level of aggregation than the relational database. Fine grained data allows the data mining functionality of Analysis Services to discover more interesting nuggets of information.

Do not mix granularities in the fact table. Do not add summary records to the fact table that include detail facts already in the fact table. Aggregation summary records, if used, must be stored in separate tables, one table for each level of granularity. Aggregation tables are automatically created by Analysis Services for OLAP cubes so there is no need to design, create, and manage them manually.

Care must also be taken to properly handle records from source systems that may contain summarized data, such as records for product orders, bills of lading, or invoices. An order typically contains totals of line items for products, shipping charges, taxes, and discounts. Line items are facts. Order totals are summarized facts – do not include both in the fact table. The order number should be carried as a field in the line item fact records to allow summarization by order, but a separate record for the order totals is not only unnecessary, including it will make the fact table almost unusable.

The most successful way to handle summary data like taxes and shipping charges is to get business users to define rules for allocating those amounts down to the detailed level. For taxes, the rule already exists and is easy to implement. By contrast, shipping charges may be allocated by weight, product value, or some more arcane formula. It is common for business users to resist providing an allocation scheme that they may view as arbitrary. It is important for the data warehouse designers to push on this point, as the resulting schema is generally much more useful and usable.

Business needs, rather than physical implementation considerations, must determine the minimum granularity of the fact table. However, it is better to keep the data as granular as possible, even if current business needs do not require it – the additional detail might be critical for tomorrow’s business analysis. Analysis Services is designed to rapidly and efficiently summarize detailed facts into OLAP cubes so highly granular fact tables impose no performance burden on user response time. More detail also allows the data mining functionality of Analysis Services to discover more interesting nuggets of information.

Develop the Architecture

The data warehouse architecture reflects the dimensional model developed to meet the business requirements. Dimension design largely determines dimension table design, and fact definitions determine fact table design.

Whether to create a star or snowflake schema depends more on implementation and maintenance considerations than on business needs. Information can be presented to the user in the same way regardless of whether a dimension is snowflaked. Data warehouse schemas are quite simple and straightforward, in contrast to OLTP database schemas with their hundreds or thousands of tables and relationships. However, the quantity of data in data warehouses requires attention to performance and efficiency in their design.

Design for Update and Expansion

Data warehouse architectures must be designed to accommodate ongoing data updates, and to allow for future expansion with minimum impact on existing design. Fortunately, the dimensional model and its straightforward schemas simplify these activities. Records are added to the fact table in periodic batches, often with little effect on most dimensions. For example, a sale of an existing product to an existing customer at an existing store will not affect the product, customer, or store dimensions at all. If the customer is new, a new record is added to the customer dimension table when the fact record is added to the fact table. The historical nature of data warehouses means that records almost never have to be deleted from tables except to correct errors. Errors in source data are often detected in the extraction and transformation processes in the staging area and are corrected before the data is loaded into the data warehouse database.

The date and time dimensions are created and maintained in the data warehouse independent of the other dimension tables or fact tables – updating date and time dimensions may involve only a simple annual task to mechanically add the records for the next year.

The dimensional model also lends itself to easy expansion. New dimension attributes and new dimensions can be added, usually without affecting existing schemas other than by extension. Existing historical data should remain unchanged. Data warehouse maintenance applications will need to be extended, but well-designed user applications should still function although some may need to be updated to make use of the new information.

An entirely new schema can be added to a data warehouse without affecting existing functionality. A new business subject area can be added by designing and creating a fact table and any dimensions specific to the subject area. Existing dimensions can be reused without modification to maintain conformity throughout the entire warehouse. If a different, more aggregated, grain is used in a new subject area, dimensions may be reduced in size by eliminating fine-grained members, but the resulting dimension must still conform to the master dimension and must be maintained in conformance with it.

Analysis Services OLAP cubes can be extended to accommodate new dimensions by extending their schemas and reprocessing, or by creating new virtual cubes that contain the new dimensions and incorporate existing cubes without modification to them.

Design the Relational Database and OLAP Cubes

In this phase, the star or snowflake schema is created in the relational database, surrogate keys are defined, and primary and foreign key relationships are established. Views, indexes, and fact table partitions are also defined. OLAP cubes are designed that support the needs of the users.

Keys and Relationships

Tables are implemented in the relational database after surrogate keys for dimension tables have been defined and primary and foreign keys and their relationships have been identified. Primary/foreign key relationships should be established in the database schema. For an illustration of these relationships, see the sample star schema, Classic Sales, in "Dimension Model Schema," earlier in this chapter.

The composite primary key in the fact table is an expensive key to maintain:

  • The index alone is almost as large as the fact table.
  • The index on the primary key is often created as a clustered index. In many scenarios a clustered primary key provides excellent query performance. However, all other indexes on the fact table use the large clustered index key. All indexes on the fact table will be large, the system will require significant additional storage space, and query performance may degrade.

As a result, many star schemas are defined with an integer surrogate primary key, or no primary key at all. We recommend that the fact table be defined using the composite primary key. Also create an IDENTITY column in the fact table that could be used as a unique clustered index, should the database administrator determine this structure would provide better performance.

Indexes

Dimension tables must be indexed on their primary keys, which are the surrogate keys created for the data warehouse tables. The fact table must have a unique index on the primary key. There are scenarios where the primary key index should be clustered, and other scenarios where it should not. The larger the number of dimensions in the schema, the less beneficial it is to cluster the primary key index. With a large number of dimensions, it is usually more effective to create a unique clustered index on a meaningless IDENTITY column.

Elaborate initial design and development of index plans for end-user queries is not necessary with SQL Server 2000, which has sophisticated index techniques and an easy to use Index Tuning Wizard tool to tune indexes to the query workload.

The SQL Server 2000 Index Tuning Wizard allows you to select and create an optimal set of indexes and statistics for a database without requiring an expert understanding of the structure of the database, the workload, or the internals of SQL Server. The wizard analyzes a query workload captured in a SQL Profiler trace or provided by an SQL script, and recommends an index configuration to improve the performance of the database.

The Index Tuning Wizard provides the following features and functionality:

  • It ca n use the query optimizer to analyze the queries in the provided workload and recommend the best combination of indexes to support the query mix in the workload.
  • It analyzes the effects of the proposed changes, including index usage, distribution of queries among tables, and performance of queries in the workload.
  • It can recommend ways to tune the database for a small set of problem queries.
  • It allows you to customize its recommendations by specifying advanced options, such as disk space constraints.

A recommendation from the wizard consists of SQL statements that can be executed to create new, more effective indexes and, if wanted, drop existing indexes that are ineffective. Indexed views are recommended on platforms that support their use. After the Index Tuning Wizard has suggested a recommendation, it can then be implemented immediately, scheduled as a SQL Server job, or executed manually at a later time.

The empirical tuning approach provided by the Index Tuning Wizard can be used frequently when the data warehouse is first implemented to develop the initial index set, and then employed periodically during ongoing operation to maintain indexes in tune with the user query workload.

SQL Server Books Online provides detailed discussions of indexes and the Index Tuning Wizard, and procedures for using the wizard to tune database indexes.

Views

Views should be created for users who need direct access to data in the data warehouse relational database. Users can be granted access to views without having access to the underlying data. Indexed views can be used to improve performance of user queries that access data through views. Indexed views are discussed in depth in SQL Server Books Online.

View definitions should create column and table names that will make sense to business users. If Analysis Services will be the primary query engine to the data warehouse, it will be easier to create clear and consistent cubes from views with readable column names.

Design OLAP Cubes

OLAP cube design requirements will be a natural outcome of the dimensional model if the data warehouse is designed to support the way users want to query data. Effective cube design is addressed in depth in "Getting the Most Out of Analysis Services" in Chapter 22, "Cubes in the Real World."

Develop the Operational Data Store

Some business problems are best addressed by creating a database designed to support tactical decision-making. The Operational Data Store (ODS) is an operational construct that has elements of both data warehouse and a transaction system. Like a data warehouse, the ODS typically contains data consolidated from multiple systems and grouped by subject area. Like a transaction system, the ODS may be updated by business users, and contains relatively little historical data.

A classic business case for an operational data store is to support the Customer Call Center. Call center operators have little need for broad analytical queries that reveal trends in customer behavior. Rather, their needs are more immediate: the operator should have up-to-date information about all transactions involving the complaining customer. This data may come from multiple source systems, but should be presented to the call center operator in a simplified and consolidated way.

Implementations of the ODS vary widely depending on business requirements. There are no strict rules for how the ODS must be implemented. A successful ODS for one business problem may be a replicated mirror of the transaction system; for another business problem a star schema will be most effective. Most effective operational data stores fall between those two extremes, and include some level of transformation and integration of data. It is possible to architect the ODS so that it serves its primary operational need, and also functions as the proximate source for the data warehouse staging process.

A detailed discussion of operational data store design and its implications for data warehouse staging, is beyond the scope of this chapter.

Develop the Data Maintenance Applications

The data maintenance applications, including extraction, transformation, and loading processes, must be automated, often by specialized custom applications. Data Transformation Services (DTS) in SQL Server 2000 is a powerful tool for defining many transformations. Other tools are Transact–SQL and applications developed using Microsoft Visual Basic® Scripting Edition (VBScript) or languages such as Visual Basic.

An extensive discussion of the extraction, transformation, and loading (ETL) processes is provided in Chapter 19, "Data Extraction, Transformation, and Loading Techniques."

Develop Analysis Applications

The applications that support data analysis by the data warehouse users are constructed in this phase of data warehouse development.

OLAP cubes and data mining models are constructed using Analysis Services tools, and client access to analysis data is supported by the Analysis Server. Techniques for cube design, MDX, data mining, and client data access to Analysis Services data are covered in depth in the section "Getting the Most Out of Analysis Services."

Other analysis applications, such as Excel PivotTables®, predefined reports, Web sites, and digital dashboards, are also developed in this phase, as are natural language applications using English Query. Specialized third-party analysis tools are also acquired and implemented or installed. Details of these specialized applications are determined directly by user needs. Digital dashboards are discussed in Chapter 27, "Creating an Interactive Digital Dashboard," and Chapter 28, "A Digital Dashboard Browser for Analysis Services Meta Data."

Test and Deploy the System

It is important to involve users in the testing phase. After initial testing by development and test groups, users should load the system with queries and use it the way they intend to after the system is brought on line. Substantial user involvement in testing will provide a significant number of benefits. Among the benefits are:

  • Discrepancies can be found and corrected.
  • Users become familiar with the system.
  • Index tuning can be performed.

It is important that users exercise the system during the test phase with the kinds of queries they will be using in production. This can enable a considerable amount of empirical index tuning to take place before the system comes online. Additional tuning needs to take place after deployment, but starting with satisfactory performance is a key to success. Users who have participated in the testing and have seen performance continually improve as the system is exercised will be inclined to be supportive during the initial deployment phase as early issues are discovered and addressed.

Conclusion

Businesses have collected operational data for years, and continue to accumulate ever-larger amounts of data at ever-increasing rates as transaction databases become more powerful, communication networks grow, and the flow of commerce expands. Data warehouses collect, consolidate, organize, and summarize this data so it can be used for business decisions.

Data warehouses have been used for years to support business decision makers. Data warehousing approaches and techniques are well established, widely adopted, successful, and not controversial. Dimensional modeling, the foundation of data warehouse design, is not an arcane art or science; it is a mature methodology that organizes data in a straightforward, simple, and intuitive representation of the way business decision makers want to view and analyze their data.

The key to data warehousing is data design. The business users know what data they need and how they want to use it. Focus on the users, determine what data is needed, locate sources for the data, and organize the data in a dimensional model that represents the business needs. The remaining tasks flow naturally from a well-designed model – extracting, transforming, and loading the data into the data warehouse, creating the OLAP and data mining analytical applications, developing or acquiring end-user tools, deploying the system, and tuning the system design as users gain experience.

Microsoft SQL Server 2000 provides a wide range of powerful and easy to use tools you can use to create a data warehouse and analyze the data it contains. The ability to design and create data warehouses is no longer isolated to experts working with primitive implements.

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)