- Shopping Bag ( 0 items )
What is this book about?
(DTS) Data Transformation Services helps you create a transactional system to load and transform your data from and to a wide variety of sources. SQL Server DTS was introduced in the release of SQL Server 7.0; however, SQL Server 2000 sees the functionality and power of this outstanding and incredibly useful tool greatly increased. DTS can be used to maximum effect in data-driven applications, offering rapid and effective solutions to a wide range of ...
What is this book about?
(DTS) Data Transformation Services helps you create a transactional system to load and transform your data from and to a wide variety of sources. SQL Server DTS was introduced in the release of SQL Server 7.0; however, SQL Server 2000 sees the functionality and power of this outstanding and incredibly useful tool greatly increased. DTS can be used to maximum effect in data-driven applications, offering rapid and effective solutions to a wide range of common problems.
Professional SQL Server 2000 DTS provides a complete introduction to DTS fundamentals and architecture before exploring the more complex data transformations involved in moving data between different servers, applications, and providers. The book then focuses on DTS programming via the DTS object model, enabling developers to incorporate custom transformations and reporting capabilities into their applications. Advanced topics are explained including error handling, dynamic data loading, and data warehouses. With code and case studies, this book gives the reader a complete picture of how to use DTS to its fullest potential.
What does this book cover?
Here are just a few of the things you'll find covered in this book:
Who is this book for?
This book is principally aimed at database programmers and administrators who have a working knowledge of SQL Server, and who wish to take DTS beyond its most basic level and tailor it to their needs. It will also appeal to managers and project managers who want to gain an understanding of DTS and how it could benefit their businesses.
Chapter 1: Welcome to DTS.
Chapter 2: DTS Fundamentals.
Chapter 3: Advanced Tasks.
Chapter 4: Advanced Settings in DTS.
Chapter 5: Querying Heterogeneous Data.
Chapter 6: Converting Heterogeneous Data.
Chapter 7: ActiveX Script Task Fundamentals.
Chapter 8: Dynamic Configuration of Package Objects.
Chapter 9: Advanced ActiveX Scripting.
Chapter 10: Security and DTS.
Chapter 11: Error Handling.
Chapter 12: Writing Your Own Custom Tasks.
Chapter 13: Data Warehouse Loading.
Chapter 14: Using DTS from Other Applications.
Chapter 15: Warehouse Data Load.
Appendix A: Microsoft Data Transformation Services Package Object Library Reference.
Appendix B: ADO Object Summary.
Appendix C: Visual Basic Functions and Keywords.
Appendix D: Variable Naming and Other Code Conventions.
Appendix E: Microsoft Scripting Run Time Reference.
A company's data is its life. Since the evolution of the mainframe and relational databases, companies have been evolving and increasing their productivity through database products like SQL Server and Oracle. The problem with evolution, however, is that some database systems are left behind. They are just too expensive to convert to the newer systems.
In the past, as a company upgraded its database system, a programmer would have to reprogram the communication layer through complex code. To remedy the time-to-market problem this created, Microsoft invented different ways to communicate with legacy and modern relational databases via an open standard layer called OLE DB. This breakthrough allowed programmers to communicate with IBM's DB2 database using code similar to that used to communicate to a Microsoft's SQL Server database. Suddenly portability became much easier.
This solved the problem of a program's portability, but what happens when you have a DB2 database that you need to convert to a SQL Server database? For example, your company purchases another company that still uses a legacy VSAM (mainframe driven) system. Your company, however, has its corporate database infrastructure in SQL Server. Converting the VSAM system may be too expensive and time consuming. In the meantime, you need the data that a sales representative is entering into a dumb terminal (DT) from the company you bought to flow to your SQL Server system in the shipping department. In the past, this process would have taken quite a while to program and employee turnover would be so bad from the merger that by the time you finished the development of a workflow system, you would have no one to support it.
This scenario is being repeated throughout the data processing industry daily. As companies merge, they need a way to rapidly develop an application to transition data from any source to any destination. Doing this on the mainframe is expensive and the human resources to do this are dwindling.
Developers also need to be able to transform data. This means that you can have the data look one way on the source and pass the data through a cleansing process to appear a different way on the destination. We can, for example, perform calculations to generate new values for loading, combine multiple columns into a single column, or conversely, break a single column into multiple columns. Imagine that the company you purchased calls sales territories by one set of names, and you'd like to call them by another; this could cause problems when the data was queried, as the values would appear to belong to different territories. However, you can allow them to keep their current system and just scrub the data (that is, make it consistent) as it enters your tracking systems. Another common example is seen when the fields FirstName and LastName need to become a single field named Name. In either case, programmers would have had to spend months staring at their computer screen turning over line after line to perform this simple translation.
DTS to the rescue! Data Transformation Services (DTS) was first introduced with the release of SQL Server 7.0. It was immediately hailed as a revolution in data transformation, since it was, after all, built into SQL Server.
Data source here refers to any source in an OLE DB, ODBC, or text file format.
In this chapter we will:
Provide an introduction to DTS Explain the different types of connection methods to SQL Server using OLE DB and ODBC Dive into the DTS Architecture Explore the new DTS features in SQL Server 2000 Examine the tools we can use to create packages, concentrating on the wizards to introduce the basics of DTS
Let's start by looking at exactly what we can do with DTS.
The Next Generation in Database Solutions
In SQL Server 6.5, data movement was done through clunky utilities such as Transfer Manager. Transfer Manager allowed us to transfer data with a single step from one SQL Server to another SQL Server. However, we had no access to any other data source, such as Oracle or Microsoft Access.
DBAs also had access to Bulk Copy Program (BCP). BCP is a non-logged event that inserts data into SQL Server in bulk fashion without any type of transformation. A BCP file's columns can be de-limited by a character such as a comma or a tab. BCP files can also use fixed column width to assign where each column begins and ends. The problem with BCPing files was that the schema positioning on the server and in the BCP file had to exactly match. For example, Column1 in the flat file had to be Column1 in the SQL Server.
Products such as Data Junction, produced by Data Junction Corporation, were released to fill the void. Data Junction allowed you to transfer data from just about any Database Management System (DBMS) or flat file, to any other DBMS. It was an extremely powerful tool, but expensive.
Today, developers and DBAs have access to the same power that existed in Data Junction, but through a tool that ships with all editions of SQL Server 7.0 and 2000 (including MSDE). DTS can act independently of SQL Server using a set of programs from a command line, or integrate tightly with SQL Server. This means that Oracle and Access users for example, can use DTS for their own data movement needs without having to buy a SQL Server license. However, if you do have a SQL license, you get the benefit of using the DTS Designer, which we'll look at more later on in this chapter. DTS Designer is an integral part of Enterprise Manager and no standalone program is available. If you don't have the DTS Designer, then you will have to develop DTS workflows on a system that has a SQL Server license and then port them to the other system, or program the workflows using the DTS object model, which we will discuss later in the book, and a COM compliant programming language. Keep in mind that if you connect to SQL Server in your workflow, you'll still need to have a SQL Server Client Access License (CAL). We will go into further detail about how to make DTS standalone in Chapter 2.
DTS allows users to convert data from any OLE DB compliant data source to any data source. This is done through a workflow system where you can create a process that sequentially moves data. It also allows you to expand the functionality by adding your own COM components. These components can be in any programming language that supports COM, such as Visual C++ or Visual Basic, for example.
What is OLE DB?
OLE DB is an API that allows COM applications, such as DTS, to communicate with almost any data storage system, including non-relational databases, e-mail, directories, and DTS packages. With OLE DB, services are made available via a provider, which is a dynamic link library (DLL) that allows you to connect to a data source. The OLE DB provider actually contains the API that is used by OLE DB.
OLE DB is also extensible. This means that you can install new providers to connect to a countless number of data sources. For example, you can install the OLE DB provider for OLAP to access Microsoft Analysis Services cubes. OLE DB is the only connectivity tool that will allow you to connect to such systems. OLE DB has truly opened the door for open communication between database systems and allowed applications to become much more portable. For example, a web application could simply change its connection string in one file (GLOBAL.ASA) to move from connecting to Access to SQL Server.
A different type of common communication layer is Open Database Connectivity (ODBC). ODBC was Microsoft's first attempt at an open standard of data access. It was widely accepted around the industry as the data access standard years ago, and even grew acceptance in the UNIX world. An application that uses ODBC generally connects using Remote Data Objects (RDO). Most new development at Microsoft surrounds OLE DB because it offers more compatibility with other systems.
One of the OLE DB providers is the OLE DB Provider for ODBC. This substantially expands the list of database systems you can access. The OLE DB Provider for ODBC is considerably slower than the straight OLE DB one, however.
OLE DB is able to communicate with almost any system because it exposes the database's communication layer to the programmer through open COM APIs. A programmer can easily communicate to the provider through Active Data Objects (ADO). Because OLE DB uses these lower level COM API calls, you can communicate to almost any consumer.
In the diagram below, you can see the application using ADO to communicate to the database through OLE DB. Optionally, you can use the OLE DB Provider for ODBC to communicate to systems that may not have converted to OLE DB yet.
Additionally, you can install programs such as Microsoft Host Integration Services (formally SNA Server) to expand the available providers. For example, after installing Host Integration Services, you can communicate to DB2 and legacy systems. Some providers do take a little more work to configure. By default, Oracle providers are installed when you install SQL Server 2000, however, you must install Oracle tools (SQL Net) to establish connectivity.
The DTS Package
The core component of DTS is the package. The package is what you create and execute when you are using DTS. Every other object in DTS is a child to a package. A package is used to accomplish a particular goal using the various children that it may have. It contains all the connection objects and items that DTS will need to connect and transform the data. A package may not contain a connection to a database at all, instead executing a program or an ActiveX script. All of these objects are defined in a simple GUI called the DTS Designer. You can also design a package in any COM compliant programming language.
DTS packages can be saved into the local SQL Server, the Microsoft Repository, Visual Basic Files (SQL Server 2000 only) or COM-Structured files. You can also execute packages without saving them. It is only compulsory to save packages if you plan to schedule them for later execution.
A package has four major components. The four components, which are shown in the following figure, don't all have to exist for a package to be executable. For example, you can have a package with one task that executes an external custom program or process. In this case, you don't have any of the other three components.
Let's take a look at each of these components in turn.
DTS allows you to connect to any OLE DB compliant data source. Natively, DTS can facilitate OLE DB providers such as:
SQL Server Microsoft Access Microsoft Excel Visual FoxPro Paradox dBase Text Files HTML Files Oracle
DTS also supports Microsoft Data Link files. Data link files are physical files that can be written to connect to a database at run time (SQL 7.0 compiles these at design time). Because they're files, they can be transported from system to system. The list of OLE DB providers can also be expanded. For example, on the Microsoft SNA CD-ROM (or HIS CD-ROM) there is an installation for OLE DB providers for IBM's DB2 database. Sybase providers are also available through the vendor's website.
More information about OLE DB providers can be found at microsoft.com/data.
We'll examine establishing connections in more detail in Chapter 2.
The package holds a number of instructions called tasks. Without tasks, a package would be a car without an engine. There were 8 built-in tasks for SQL Server 7.0, which could do a number of things, such as:
Transform data Bulk insert data Execute a SQL script Execute a program Execute an ActiveX script Move a database
There are 17 built-in tasks for SQL Server 2000, which have additional functionality including the ability to:
FTP a file Execute another package Send or receive MSMQ (Microsoft Message Queue) messages from another package
We'll cover all of these tasks in Chapter 2 and 3. If you find that the built-in tasks do not fit your needs, you can expand the capabilities by registering your own custom tasks. A custom task can be written in any language that uses COM. Creating custom tasks will also be covered in a later chapter.
A step gives a package its logic. The step object allows you to connect tasks in a sequential order. Each task has one step associated with it that can either execute in sequential order or in parallel order, depending on how you have the package configured. The key difference between a task and a step is that the task object holds the information about the individual function that you're doing. For example, a task would contain what file you're executing. A step would tell the task when to execute.
You can also set constraints on tasks, which are called precedence constraints. Precedence constraints allow you to dictate if a task will be executed in the event of a failure, success, or completion of another task. For example, as shown below, step 1, which creates the table will have to execute and succeed for step 2, which transfers data from Excel to SQL Server, to execute.
Global variables can extend the dynamic abilities of DTS. Global variables allow you to set a variable in a single area in your package, and use the variable over and over throughout the package, whether in an ActiveX script or a data transformation. They allow you to communicate with other DTS tasks and pass messages between them. Rowsets can also be stored into a global variable for later use by a different task.
An ideal case for using these variables would be to set up a dynamic database load. For example, say you receive an extract each day from a mainframe. The extract file name changes daily based on the client the extract is for and the date: for example, the filename CLIENT1-010198.txt would mean client 1 run on January 1 1998. By using an ActiveX script, you can read the file name, and change the global variable for the client number and run date based on the name you read. You can later read these two global variables to determine where to insert the data and what security to allow. We'll see a complete example of how to do this is in Chapter 8 (Dynamic Configuration of Package Objects).
A Simple Package
Much of your package creation, as we will discuss later in the book, will be done in DTS Designer. An example of this is shown below:
In this package, data is transformed from SQL Server 1 to SQL Server 2. The execution of this task is considered a step.
Excerpted from Professional SQL Server 2000 DTS (Data Transformation Services) by Mark Chaffin Brian Knight Todd Robinson Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.