—Omri Gazitt, Product Manager of ADO.NET and Product Unit Manager, XML, Microsoft
"Bob Beauchemin packs a career's worth of knowledge into ten well-organized chapters, each one dense with information and insights. If you work with databases on Windows, buy this book."
—David Chappell, Principal of Chappell & Associates
Microsoft's ADO.NET enables Windows-based applications to access many types of databases from a variety of different vendors. This technology offers the advantage of cross-database flexibility that is essential for Internet and distributed-computing success.
Essential ADO.NET presents a thorough explanation of ADO.NET, exploring in depth the capabilities of classes, interfaces, properties, and methods. This book also examines the structure of data and demonstrates the ways in which ADO.NET can solve data access challenges. In particular, the discussion focuses on how ADO.NET effectively balances the need for generic functionality with efficiency, and how it is specifically designed to address today's need for scalability, concurrency, and robustness. A convenient reference for programmers moving from other data access APIs, including OLE DB, ADO, ODBC, and JDBC, Essential ADO.NET correlates classes and functions from these other APIs to those of ADO.NET.
You will find practical informationon:
Essential ADO.NET is the most complete source of information for this crucial piece of the new Microsoft platform. Whether you are new to this application or have used it before, this book will help you to understand and get the most out of ADO.NET.
Table of ContentsForeword by Tim Ewald.
Foreword by Richard Grimes.
1. Data: Models, Applications, and APIs.
Database and API Fundamentals.
The Relational Model.
Benefits of the Relational Model.
Relational Model Support in ADO.NET.
Hierarchies, Objects, and Hybrids.
Nonrelational Data and ADO.NET.
XML and the Infoset.
XML, Databases, and Universal Data Representation.
Data-centric Application Models.
Evolution of Data Access APIs.
Where Are We?
2. ADO.NET Basics.
Two Patterns of Data Access.
Writing Generic Data Access Code with ADO.NET Data Providers.
Cursors in the Data Provider Model.
Disconnected Mode and the DataSet.
The XML APIs in .NET.
The XmlDocument, XPath, and XPathNavigators.
Mixing XML and Data Providers.
Layout of the Managed Data Classes.
Where Are We?
3. The Connected Model: Streaming Data Access.
Command Preparation, Cancellation, Timeouts, and Cleanup.
Streaming Data through a DataReader.
Reading Column Values through IDataRecord.
How Connection Pooling Works.
How Declarative Transactions Work.
Where Are We?
4. The DataSet Class: Sets of Relational Data.
What Can You Do with a DataSet?
The DataSet Object Model.
DataTable and Its Uses.
Keys, Relations, and Constraints.
Navigating through Relationships: Select and Find.
Adding, Retrieving, Changing, and Deleting Rows.
DataRow States and Versions.
Rules and Relationships.
DataSets and Nonrelational Types.
Defining an Information Schema.
Where Are We?
5. DataAdapters: Synchronizing Databases and Datasets.
Populating a DataSet from a Managed Provider.
Error Handling during DataAdapter.Fill.
How DataAdapter.Fill Works.
ADO Integration in OleDbDataAdapter.
Updating a Database through DataAdapter.
Coding Update Logic Directly.
How Update Works.
The DataSet Event Model.
Writing General Customized Commands.
Building a Batch Update Command.
DataSets and Nonrelational Data Types Revisited.
Should You Use DataSet or DataReader?
Where Are We?
6. Data Binding: ADO.NET and Graphical User Interfaces.
Patterns of Data Presentation.
Using Databound Controls.
Anatomy of Databound Control Types.
Binding to a DataReader.
Data Binding with DataSets.
DataViews and Common Transformations.
Table and Column Mappings.
Editable List Controls: DataList and DataGrid.
Nonrelational Data and DataViews.
Integrating Visual Studio.
Controls and Data Forms.
Where Are We?
7. XML and Data Access Integration.
XML and ADO.NET.
Refining DataSet's XML Schema.
Reading XML into DataSet.
Writing XML Schemas from DataSet.
Writing XML Data from DataSet.
Serialization, Marshaling, and DataSet.
The XmlDataDocument Class.
XmlDataDocument and DataDocumentXPathNavigator.
Why Databases and XML?
Document Composition and Decomposition.
SQL Server, XML, and Managed Data Access.
The SQLOLEDB Provider.
The SqlXml Managed Classes.
The SQLXML Web Application.
FOR XML in the SQLXMLOLEDB Provider.
Using SQLXML and .NET.
Where Are We?
8. Providers: ADO.NET and Data Providers.
Staying with OLE DB: A Summary of OLE DB Concepts.
Staying with OLE DB: Interaction with the OleDb Data Provider.
Executing Commands, Returning Results, and Using OpenRowset.
Command Results Format and Behaviors.
Updating from a Rowset.
Supported and Unsupported Providers.
Writing a Data Provider.
Implementing the Connection Class.
Implementing the Command Class.
Implementing the DataReader Class.
Implementing the DataAdapter Class.
Adding Enhanced Functionality.
Implementing XML Infoset Providers.
Implementation Alternatives: Conclusions.
Is a Single Universal Data Access Model Possible?
Where Are We?
9. Consumers: ADO.NET Migration for Consumers.
ADO.NET for OLE DB Programmers.
Data Provider Transparency.
Using Provider-Specific Functionality.
A Brief Overview of ADO.
ADO.NET for ADO Programmers.
ADO Connections, Commands, and Transactions.
Commands and Command Behaviors.
ADO.NET Versus ADO Disconnected Model.
What Happened to GetRows in ADO.NET?
Update Statement Creation.
ADO.NET DataSet Extensions.
Column and Table Naming.
Sorting and Filtering.
ADO DB Interoperability.
ADO.NET for ODBC Programmers.
Metadata and Schema Information.
ADO.NET for JDBC Programmers.
Nonrelational Data Types.
Other SQL-99 Extensions.
Commands and Behaviors.
Executing Queries and Returning Results.
ADO.NET JDBC Programmers and the Disconnected Model.
SQL/J Part 0 and Typed DataSets.
Where Are We?
10. ADO.NET and Various Types of Data.
ADO.NET with Server- and File-Based RDBMS.
ADO.NET with Homogeneous Hierarchical Data and ORDBMS.
ADO.NET and Network Data: Object Graphs, OODBMS, and Network DBMS.
ADO.NET and Structured Files, Multidimensional Data,and ORDBMS.
ADO.NET Flat Files and Semistructured Files.
Where Are We Going?
Appendix A. Data Types and Type Mappings.
Appendix B. Expression Syntax.
Appendix C. Schema Inference Rules.
Although business problems fall into certain repeating patterns, there are no hard-and-fast rules that encompass a cookbook solution, especially with respect to data access, presentation, and exchange. The lightning-fast solution you're programming today in native assembly language for the ultimate in speed may turnout to be a maintenance nightmare tomorrow. Patterns that work perfectly with a single user may fail with a group of users. Patterns for a small workgroup may flounder when applied to a global enterprise. Although this book contains solutionsfor certain generic problems, it doesn't give you cut-and-paste solutions.
Generic Versus Specific Programming Models
Until recently, I'd always programmed database access using the most vendor-specific, down-to-the-metal programming interface available, on the (correct)premise that it was always fastest. However, throughout my career I've oftenmade my living doing application conversions when the organization decided to change hardware type or database. In a given long-lived company, hardware and databases seem to change as often as clothingfashions.
I was first convinced of the usefulness of an abstract, provider-style interface when I was asked to convert a company's Web site from a vendor-specific database API to Open Database Connectivity (ODBC), a more abstract API. As with many organizations, the actual database choice depended on the direction of other parts of the company and was outside our project's control. The wisdom of my project lead's choice of ODBC over native API was soon confirmed. I put the finishing touches on our new application and generated the database.
The project was on schedule, and the project lead trundled off to a Friday afternoon meeting. Then, on Monday morning, I was informed that the choice of data-base had changed. Luckily, because we had used ODBC (and because bothdatabases had ODBC drivers) we needed only to generate the new database,make small changes to the programs because of subtle nuances in StructuredQuery Language (SQL) dialect, retest the programs, and the application was converted. We were lucky.
A provider-based data access API must always balance the problems of abstracting generic functionality with the need to allow vendors enough room so thatthe API doesn't become an excuse for lowest common denominator coding. ADO.NET seems to do a good job in the latter area: The SQL Server-specific data provider (known as SqlClient) seems to be proof of the room afforded to vendor-specific extensions. The real proof will come with future data providers.
Cyclical Technologies and Innovation
I've been doing computerized information processing professionally for a long time. When I went through the interview at my current job (at DevelopMentor) I was asked, "As you get older, does it get more difficult to acquire new skills in the constantly changing world of programming languages, design paradigms, and application programming interfaces?" I laughed and responded that it actually gets easier. After being exposed to multiple generations of "new ideas" in computing,I've realized that the problem domain being abstracted doesn't change, and differences in the abstractions themselves are usually subtle. In addition, the abstractionsare sometimes cyclical; today's new idea may be a throwback.
For example, I've always been intrigued by the problems of data access and structure of database management system products. Having lived through usingfiles as databases (or at least data storage), hierarchical databases, CODASYL databases (standardized by the Conference on Data Systems Languages), relational databases, object databases, multidimensional databases, and all their variations, I was intrigued when I was introduced to the use of XML as a data model. I was impressed when I investigated its data abstraction model, its in-memory object model, its rich data manipulation tools, and its query languages, but I also had the feeling I'd been here (or somewhere close) before. Then, thumbing through an old book that compared relational and hierarchical databases, I was struck by the similarity of the hierarchical navigation functions in, IMS (IBM's flagship database until the mid-1980s) to those used in the XML Document Object Model (DOM) APIs. Later, I was able to correlate the (old) idea of letting the underlying SQL engine do the navigation with the possibility of using the same concept in the XPath query language. Speed improvements will be based on future query engine improvements, without the need to rewrite the navigation-based program.
As another example, one of the latest technologies seems to be centered on the use of business-to-business data exchange between unlike systems using a known data format (XML). This pattern appears to be similar to a system that I worked on to import and export banking transactions using the well-known automatic clearinghouse (ACH) format. This approach used a known data format and telecommunication protocol to exchange data between unlike systems.
It's always helpful to be able to grab on to something familiar in a new technology. I found out, however, that I'd irritated co-workers by overuse of the phrase "It's just like..." when speaking about a new technology. They were irritated because I was implying that there was nothing new in the technology, no improvements. On the contrary, there are usually improvements based on new problems to solve and novel ways of looking at an existing problem. What "It's just like..." meant to me was that I had a bridge from the old world into the new world, and I could go from there to consider the improvements. I noticed that when I mentioned the improvements in new technologies, folks were a lot less irritated.
One of my major assignments at DevelopMentor was to develop and teach a course on OLE DB. In my travels, my students would ask, "What makes this different from ODBC?" or "Why should I switch to this from JDBC?" In other words, what do I gain, what do I lose, and what's the same? After having to answer these questions many times, and having considered changes in many different technologies, I find myself in a good position to assess the repercussions of the new ADO.NET interface.(By the way, Sun Microsystems, the inventor of JDBC, claims that JDBC is not an abbreviation for anything. One would think it stands for Java Database Connectivity, but Sun says this is not so, even though some of the company's own documentation uses those words.)
Data Access Has Changed
When giving talks on data access and presentation strategies, I used to joke that a Web browser was "just a multicolor terminal that can sing and dance." Well, maybe sing. But, to borrow a cliche, accessing applications through a Web browser has drastically changed the way we design our data access strategies. Here's how.
First, user volumes can vary wildly and can change without much prior notice. In the days of terminals, user volume was fairly predictable or at least quantifiable, and growth was constrained by the ability to add terminals and users to the private network. The users also had to fill out forms to gain access to corporate applications, something that made it possible to control growth. In the age of Internet commerce, however, the number of users of your application is con-strained only by its popularity. This is one of the major selling points of the Internet as a vehicle of commerce. Businesses want to be able to start with a small server and scale up or out to a large user base on limited notice. It changes the way we design and write applications.
Second, users do not "sign off " of Internet applications, and sometimes you cannot physically identify them. Hyperlinks in Internet applications let users branch immediately from the application to checking on their favorite stock or sports team. If that other Web site contains something new and interesting, they may never return to the application. As a result, it's impossible to incorporate signoff logic into Web applications and guarantee its use. Also, users cannot be identified by terminal ID because of the presence of dynamic address assignment protocols such as Dynamic Host Configuration Protocol (DHCP) and proxy server software that obfuscate the end user's address. Combined with the fact that users do not sign off applications, this makes it impossible to set aside a chunk of storage on a per-user basis, based on the location of a computing device. That's another big change.
Third, users do a lot of their own data entry. In older systems, pessimistic concurrency and record locking were based on the premise that data entry was done by an array of terminal operators. If two operators were accessing data tied to your account at the same time and if your data was "busy," one operator could put your slip of paper at the bottom of the pile and key it in later. Web applications, on the other hand, make everyone a data entry clerk. Because there probably won't be "two of you" updating your personal and financial information at the same time, optimistic concurrency is more viable. It also means that data editing techniques and business rules must be more robust because nonprofessional operators key in the darnedest things. The ADO.NET APIs use data access patterns tuned to the way data access and input are done today. The techniques of connection pooling, disconnected updates, and optimistic concurrency play a large part in shaping functionality to today's problems even when you're using some of yesterday's technologies.
The Structure of This Book
The thesis of Chapter 1, Data: Models, Applications, and APIs, is that a data access API must strike a balance between, on the one hand, being skewed toward a single data storage style or database and, on the other hand, trying to be so all-encompassing that it renders individual classes and methods meaningless. This chapter enumerates the novel functionality in ADO.NET and describes the data storage styles and application types with which it is most useful.
Chapter 2, ADO.NET Basics, divides the landscape into three parts: dataproviders, the ADO.NET DataSet object model, and the XML data access model. A simple program illustrates each part. I include the XML API portions, along with a discussion of XML and ADO.NET integration in Chapter 7, because I consider the XML stack to be an important part of the overall data access stack in the .NET framework.
Chapter 3, The Connected Model, explains that most data access APIs are centered on a provider model that abstracts away the differences between slightly different concrete implementations. This chapter investigates ADO.NET's variation on the familiar connection, command, and results model that is used to access relational data stores. The chapter concludes with an examination of some application-specific features, including connection pooling and automatic distributed transactions.
In Chapter 4, The DataSet, you'll learn that most data access APIs include the concept of an in-memory data representation that you can use to populate controls on a form and can traverse and manipulate. In ADO.NET, this model is based on the DataSet class and its supporting collection classes, which modela relational database. This chapter introduces the DataSet by examining its intricacies as a stand-alone object model.
Although the DataSet is a convenient way to represent data in memory, it is most useful as a client-side model of data in a database. Chapter 5, Data-Adapters,is about getting data from a data provider into a DataSet and persisting offline changes made to the DataSet back to the database. Because adatabase is shared among many users who work on the same data, this chapter explores the problems that ensue when more than one user changes the samedata in offline mode, known as the optimistic concurrency conflict resolutionproblem. In addition, there has been a lot of debate about when to use a direct-from-the-database stream (the ADO.NET DataReader) and when to use an offline cache (the ADO.NET DataSet) with common application access patterns.This chapter presents guidelines.
If you want to present data to a human via a graphical program or Webpage, you must map the bits in the database or data cache to graphical controls such as text boxes, list boxes, and data grids. Chapter 6, Data Binding, looks atADO.NET's integration with the user interface items. This chapter reprises theDataReader versus DataSet debate, this time when using the ASP.NET WebForms presentation style.Because a large part of the world's data is stored in relational databases, Chapter 7, XML and Data Access Integration, examines the various methods contained in ADO.NET for integrating relational data storage and XML representation and manipulation. Improved XML integration is one of the main advances in ADO.NET. The chapter concludes with information on XML integration withSQL Server, Microsoft's flagship database. Data access (and other models with multiple similar implementations that differ only in details) is abstracted through a provider model. Chapter 8, Providers, looks at the mapping of the OLE DB provider model to the ADO.NET data provider model. It also explains the concept of XML Infoset providers, an abstraction of the provider model over the XML Infoset model.Chapter 9, Consumers, provides a useful reference for data consumers(such as programmers writing data access code) who may be moving fromother data access APIs, such as ADO, ODBC, and JDBC. It's designed to make members of each group feel at home with ADO.NET by correlating their class X or function Y to ADO.NET's classes and methods.
The book starts in Chapter 1 by describing the various types of data storageand applications that use this data. The book concludes in Chapter 10 by revisiting the landscape presented in Chapter 1 in light of the knowledge gained in the rest of the book, examining whether the data stacks (ADO.NET and XML) really contain a solution for everyone. Then we take a look at future directions in dataaccess.
How to Read This Book
This book falls into three parts. It is part manifesto and analysis, part API referenceby example, and part pure reference material for the use of programmers who are involved in data conversions. Ideally, you should read it sequentially, butyou need not understand it completely upon first reading. Whenever you find yourself wondering, "Why should I care about this?" or "I have no point of referenceto this," read on. The meaning will become clear later.
Chapters 1 and 10 present a manifesto and analysis of database types, database APIs, and application styles in broad brushstrokes. You may not completely understand or agree with every point. Chapters 2 through 7 explain inminute detail how the ADO.NET classes, interfaces, and methods work. You can read these chapters in their entirety or browse through them piecemeal, but it's best to read them in order. Chapters 8 and 9 are designed for provider writers and consumer writers who are using other database libraries and are looking for a way in to ADO.NET. These chapters are reference-oriented; some idiosyncrasies and analogies in Chapter 8 may be a bit detailed for those who have not written OLE DB code. This chapter is for my friends and students with whom I've learned and written OLE DB providers in the past few years. They always complain that no one ever writes books for them.
What you won't find here are hundreds of pages of class, interface, method, and property information arranged by class name. For this information, I suggest you consult the .NET framework software development kit (SDK) documentation.
Having spent many hours browsing the Windows platform documentation, my impression is that it seems to mature with the Windows platform itself. The framework SDK documentation is probably the best yet, although the OLE DBprogrammer's reference docs were also impressive.
Observation Versus Documentation
There are a few reasons this book was not released earlier. One is that I knew the .NET product would change during the beta process, perhaps drastically. I did not want to produce a "programming with the beta" book as my one contribution to this space, even though these books have their place. Such a book would be obsolete or misleading mere months after it had been published.
In addition, I've tried to verify everything in the documentation through direct experimentation with code. I've also tried to describe some internal behaviors ofthe providers and details of their implementation. These changed (sometimessignificantly) during the beta process, and I was forced to rewrite sections of the book and rethink solutions that I'd come up with. I waited until the final product was released so that I could run the code fragments and experiments one moretime.
Deduction of internals through experimentation and observation is risky because the implementation of these hidden details will change over time. It will change because software vendors are constantly working to update and improve their products. Where there exists the most risk of change, I've attempted to document my experimentation so that you can repeat the experiment as the software or underlying database changes.
Additional code, errata (in the original or due to product changes), and additional information will be posted on this book's Web site http://staff.develop.com/bobb/adonetbook.