Read an Excerpt
Chapter 2: Database Consistency Mechanisms
Once you understand the underlying architecture of DB2 Universal Database, you should become familiar with the mechanisms DB2 uses to provide and maintain data consistency. This chapter is designed to introduce you to the concepts of data consistency and to the three mechanisms DB2 uses to enforce consistency: transactions, locking, and transaction logging. The first part of this chapter defines database consistency and examines some of the requirements a database management system must meet to provide and maintain consistency. This part is followed by a close look at the heart of all data manipulation: the transaction. Next, DB2's locking mechanism is described and how that mechanism is used by multiple transactions working concurrently to maintain data integrity is discussed. Finally, this chapter concludes with a discussion of transaction logging and the data recovery process used by DB2 to restore data consistency if application or system failure occurs.
What Is Data Consistency?
The best way to define data consistency is by example. Suppose your company owns a chain of restaurants, and you have a database designed to keep track of supplies stored in each of those restaurants. To facilitate the supplies purchasing process, your database contains an inventory table for each restaurant in the chain. Whenever supplies are received or used by a restaurant, the inventory table for that restaurant is updated. Now, suppose some bottles of ketchup are physically moved from one restaurant to another. The ketchup bottle count value in the donating restaurant's inventory table needs to be lowered, and the ketchup bottlecount value in the receiving restaurant's inventory table needs to be raised to accurately represent this inventory move. If a user lowers the ketchup bottle count from the donating restaurant's inventory table but fails to raise the ketchup bottle count in the receiving restaurant's inventory table, the data has become inconsistent. Now, the total ketchup bottle, count for the entire chain of restaurants is incorrect.
Data can become inconsistent if a user fails to make all necessary changes (as in the previous example), if the system crashes while the user is in the middle of making changes, or if an application accessing data stops prematurely for some reason. Inconsistency can also occur when several users are accessing the same data at the same time. For example, one user might read another user's changes before the data has been properly updated and take some inappropriate action-or make an incorrect change based on the premature data values read.
To properly maintain data consistency, solutions must be provided for the following questions:
- How can you maintain generic consistency of data if you do not know what each individual data owner or user wants?
- How can you keep a single application from accidentally destroying data consistency?
- How can you ensure that multiple applications accessing the same data at the same time will not destroy data consistency?
- If the system fails while a database is in use, how can the database be returned to a consistent state?
DB2 provides solutions to these questions with its transaction support, locking, an logging mechanisms.
A transaction, or a unit of work, is a recoverable sequence of one or more SQL operations grouped together as a single unit within an application process. The initiation and termination of a transaction define the points of data consistency within an application process. Either all SQL operations within a transaction are applied the data source, or the effects of all SQL operations within a transaction are completely "undone."
Transactions and commitment control are relational database concepts that have been around for quite some time. They provide the capability to commit or recover from pending changes made to a database in order to enforce data consistency and integrity. With embedded SQL applications, transactions are automatically initiated when the application process is started. With Open Database Connectivity (ODBC) and Call-Level Interface (CLI), transactions are implicitly started whenever the application begins working with a data source.
Regardless of how transactions are initiated, they are terminated when they are either committed or rolled back. When a transaction is committed, all changes made to the data source since the transaction was initiated are made permanent. When a transaction is rolled back, all changes made to the data source since the transaction was initiated are removed, and the data in the data source is returned to its previous state (before the transaction began). In either case, the data source is guaranteed to be in a consistent state at the completion of each transaction.
A commit or roll back operation only affects the data changes made within the transaction they end. As long as data changes remain uncommitted, other application processes are usually unable to see them, and they can be removed with the roll back operation. However, once data changes are committed, they become accessible to other application processes and can no longer be removed by a roll back operation.
A database application program can do all of its work in a single transaction or spread its work over several sequential transactions. Data used within a transaction is protected from being changed or seen by other transactions through various isolation levels.
Transactions provide generic database consistency by ensuring that changes become permanent only when you issue a COMMIT SQL statement or via API calls defined within a Transaction Manager. Your responsibility, however, is to ensure that the sequence of SQL operations in each transaction results in a consistent database. DB2 then ensures that each transaction is either completed (committed) or removed (rolled back) as a single unit of work. If a failure occurs before the transaction is complete, DB2 will back out all uncommitted changes to restore the database consistency that DB2 assumes existed when the transaction was initiated. Figure 2-1 shows the effects of both a successful transaction and a transaction that failed.
Concurrency and Transaction Isolation Levels
So far, we have only looked at transactions from a single-user data source point-of-view. With single-user data sources, each transaction occurs serially and does not have to contend with interference from other transactions. With multi-user data sources, however, transactions can occur simultaneously, and each transaction has the potential to interfere with another transaction. Transactions that have the potential of interfering with one another are said to be interleaved, or parallel, transactions. Transactions that run isolated from each other are said to be serializable, which means that the results of running them simultaneously are the same as the results of running them one right after another (serially). Ideally, all transactions should be serializable.
So why should transactions be serializable? Consider the following problem. Suppose a salesman is entering orders on a database system at the same time a clerk is sending out bills. Now, suppose the salesman enters an order from Company X but does not commit the order (the salesman is still talking to the representative from Company X), While the salesman is on the phone, the clerk queries the database for a list of all outstanding orders, sees the order for Company X, and sends Company X a bill. Now, suppose the representative from Company X decides to cancel the order. The salesman rolls back the transaction, because the representative changed his mind and the order information was never committed. A week later, Company X receives a bill for a part it never ordered. If the salesman's transaction and the clerk's transaction had been isolated from each other (serialized), this problem would never have occurred. Either the salesman's transaction would have finished before the clerk's transaction started, or the clerk's transaction would have finished before the salesman's transaction started. In either case, Company X would not have received a bill.
When transactions are not isolated from each other in multi-user environments, the following three types of events (or phenomena) can occur as a result:
- Dirty reads-This event occurs when a transaction reads data that has not yet been committed. For example: Transaction 1 changes a row of data, and Transaction 2 reads the changed row before Transaction 1 commits the change. If Transaction 1 rolls back the change, then Transaction 2 will have read data that is considered never to have existed.
- Nonrepeatable readsThis event occurs when a transaction reads the same row of data twice but receives different data values each time. For example: Transaction 1 reads a row of data, and Transaction 2 changes or deletes that row and commits the change. If Transaction 1 attempts to reread the row, Transaction 1 retrieves different data values (if the row was updated) or discovers that the row no longer exists (if the row was deleted).
- Phantoms-This event occurs when a row of data matches a search criteria but initially is not seen. For example: Transaction 1 reads a set of rows that satisfy some search criteria, and Transaction 2 inserts a new row matching Transaction 1's search criteria. If Transaction 1 re-executes the query statement that produced the original set of rows, a different set of rows will be retrieved.
Maintaining database consistency and data integrity while enabling more than one application to access the same data at the same time is known as concurrency. DB2 enforces concurrency by using four different transaction isolation levels. An isolation level determines how data is locked or isolated from other processes while the data is being accessed. DB2 supports the following isolation levels:
- Repeatable read
- Read stability
- Cursor stability
- Uncommitted read
The repeatable read isolation level locks all the rows an application retrieves within a single transaction. If you use the repeatable read isolation level, SELECT SQL statements issued multiple times within the same transaction will yield the same result. A transaction running under the repeatable read isolation level can retrieve and operate on the same rows as many times as needed until the transaction completes. However, no other transactions can update, delete, or insert a row (which would affect the result table being accessed) until the isolating transaction terminates. Transactions running under the repeatable read isolation level cannot see uncommitted changes of other transactions. The repeatable read isolation level does not allow phantom rows to be seen....