- Shopping Bag ( 0 items )
Ships from: Chatham, NJ
Usually ships in 1-2 business days
Ships from: fallbrook, CA
Usually ships in 1-2 business days
|Pt. I||Advanced Programming Techniques|
|1||Beyond the Basics of Data Manipulation Language||3|
|2||Using Advanced Data Manipulation Language||31|
|4||Advanced Transact-SQL Statements||83|
|5||Effective Use of Built-in Functions||109|
|6||Effective Use of Stored Procedures as an Administrative Tool||135|
|8||Advanced String Manipulation and Bitwise Operators||177|
|12||Updating Table Indexes and Statistics||257|
|Pt. II||SQL Server - Essential Information|
|14||Writing Effective Code||305|
|15||Taking Advantage of the Tools||327|
|16||Using Multiple Tiers and Client/Server Architecture||359|
|Pt. III||Migrating to SQL Server: Data Conversion and Integration|
|18||Legacy Databases: Conversion and Integration Issues||395|
|20||The Oracle Transition||433|
|21||Making the Switch from Sybase SQL Server||461|
|Pt. IV||Programming Internal and External Connections|
|22||Using the DBLibrary||481|
|23||Using ODBC and Visual C++ with SQL Server||511|
|24||Connecting to SQL Server from Visual Basic||571|
|25||SQL Server on Internet/Intranet Systems||618|
|Pt. V||Programming Real-World Implementations|
|26||Integrating SQL Server with Other Microsoft Products||675|
|27||Programming a Line-of-Business Solution||701|
|28||Data Warehousing and Online Transaction Processing (OLTP)||711|
|A||The System Tables of the Master Database||759|
|B||The System Tables of All Databases||775|
|C||System Stored and Extended Stored Procedures Quick Reference||797|
|D||Common Error Codes||811|
[Figures are not included in this sample chapter]
Indexes can make or break (or even brake) the performance of SQL Server if implemented without much thought. Most programmers know to put an index on the primary key field(s) of a table and even the foreign key field(s) of related tables. But what type of index should you use? Do you need to put indexes on other fields within a particular table? When should you use a composite index?
Locks and transaction isolation levels are another necessary part of optimizing performance and selecting the types of queries to use. For example, you might eventually need to know whether using read uncommitted isolation levels is better than using read committed isolation levels. A better understanding of locks and isolation levels will help you, the programmer, make better choices about indexes and table structure.
We will answer these and other questions in the following section. But to understand when to use indexes, you must understand index structures themselves.
SQL Server provides for two types of indexes: clustered and non-clustered. Although the role of any index is to speed data retrieval and updates, each type of index is tuned to do certain jobs more effectively than the other. In some cases either type of index will provide similarly quick data retrieval, but the performance from those same indexes may differ greatly when there's a minor change to the query.
Most people are familiar with the concept of non-clustered indexes, although they might not be aware of it. This type of index is similar to an index of a book or labels on a file cabinet's drawers. You use the index to find out where an item is located, and then you use the information presented in the index to go to that item (be it a page or a drawer--or a row of data).
Examine Figure 3.1 to see a better picture of a non-clustered index.
This figure is abridged for readability. It shows the index root with its first two entries: Alvin and Marti. The index root is a page of data that contains evenly dispersed entries from the entire set of data from the indexed column. The number of entries that will fit in this initial page will depend upon the size of the column being indexed.
Next, each entry points to another page in the index node level. The first page of data will contain evenly dispersed entries from the first item in the index root to the second item in the index root. Depending on how many rows are in the table being indexed, the number of levels in the index nodes will vary. This figure shows just one level, but larger tables with larger indexed columns could increase the number of levels of index nodes.
Figure 3.1. The structure of a non-clustered index.
Imagine you were trying to find an order number that ranged from 1 to 1048576. If each index page could only hold four entries, the index root would contain 1, 262144, 524288, and 786432. The first page in the first index node level would contain 1, 65536, 131072, and 196608. The first page in the next index node level would contain 1, 16384, 32768, and 49152. The first page in the next index node level would contain 1, 4096, 8192, and 12288. This would continue until the eighth level, known as the leaf level, where the first page would contain 1, 2, 3 and 4; the second page would contain 5, 6, 7 and 8; and so on.
This leaf level points to the actual data (in the data level) being sought, and is not necessarily in any particular order. Hence the crossing of the lines in Figure 3.1. The name "Marti" was entered before the name "Alvin," and thus comes before that entry in the table's data pages. Information in a book (such as this one) is not presented in alphabetical order, but rather by topic. The last names in a customer table are not necessarily in alphabetical order, but rather ordered based on when they were entered into the table.
Clustered indexes are a little different than their non-clustered counterparts. Whereas a textbook uses the equivalent of a non-clustered index, a dictionary or encyclopedia uses a mechanism like that of a clustered index. Encyclopedias are often separated into volumes based on the letter of the topic you want to investigate (index nodes). Then, the topics are listed alphabetically and the pages are marked to reveal the topic that starts on that page and the one that ends on that page (leaf level). However, the data itself is already in order. There is no need to find the topic in the index and then look it up based on some page number reference.
This is how a clustered index functions. The data itself is actually reorganized to match with the index nodes and, therefore, is combined with the leaf level. Examine Figure 3.2 for a better view of this phenomenon.
Figure 3.2. The structure of a clustered index.
This structure has the potential to both increase and decrease performance when manipulating data. You will read more about this in the coming sections.
Because the data and leaf levels are now one, clustered indexes require less space than an equivalent non-clustered index. And because the data is reorganized with the clustered index, you can only create one clustered index per table, as opposed to non-clustered indexes, of which there may be up to 254.
So when should you use a clustered index and when should you use a non-clustered index? Let's look at some examples and determine which index would be a better choice. ORDER BY and GROUP BY Clauses If you are often selecting data and using the ORDER BY and/or the GROUP BY clause, either type of index will help with SELECT performance. If you typically select customers and sort by last and first name, either index will provide a quick means of retrieving that data. Some of the following factors may sway you to use one type or the other, however. Returning a Range of Values For example, if you are returning all names between `Smith' and `Talbert', or orders for dates between '11/1/97' and '11/30/97', and you do this sort of thing often, you are better off using a clustered index on the particular column on which the range is based. Because clustered indexes already contain the data in a sorted order, it is more efficient for retrieving data that is in a particular range. It only needs to find the start and end of the data to retrieve all of it, unlike a non-clustered index, which needs to look up each entry from the leaf level in the data level.
Some columns in your table will contain few, if any, unique values. An example is a status column that contains only the values `Inactive', `Active', or `Terminated'. In such a case, it is not wise to use any type of index on that column. The justification is simple: If you have a table with 15,000 rows, approximately 1/3 (or 5,000) of the rows will contain `Active' in the status column. It is just as efficient, if not more so, to scan the entire table than to look up each entry in the index pages (the index root and node) and then find the actual data page on which each row with an `Active' status resides. Listing 3.1 is an example script that will create a simple table with few unique values and an index on the column containing those highly duplicated values. This script can be found in the file SCR0301.SQL from Sams' Web site for this book. Don't be alarmed when this script takes a few minutes to run (you will be inserting 15,000 rows).
CREATE TABLE FewUniques ( Id int IDENTITY(1,1) NOT NULL, Status char(10) NULL ) GO SET IDENTITY_INSERT FewUniques ON DECLARE @intCounter int BEGIN TRAN SELECT @intCounter = 1 WHILE @intCounter <= 15000 BEGIN INSERT FewUniques (Id, Status) VALUES (@intCOunter, `Active') SELECT @intCounter = @intCounter + 3 END SELECT @intCounter = 2 WHILE @intCounter <= 15000 BEGIN INSERT FewUniques (Id, Status) VALUES (@intCOunter, `Inactive') SELECT @intCounter = @intCounter + 3 END SELECT @intCounter = 3 WHILE @intCounter <= 15000 BEGIN INSERT FewUniques (Id, Status) VALUES (@intCOunter, `Terminated') SELECT @intCounter = @intCounter + 3 END COMMIT TRAN SET IDENTITY_INSERT FewUniques OFF GO DUMP TRANSACTION pubs WITH NO_LOG, TRUNCATE_ONLY GO CREATE INDEX inFewUniquesStatus ON FewUniques (Status) GO
Next, you can run the two SELECT statements shown in Listing 3.2. Be sure to turn on the Query options "Show Query Plan" and "Show Stats I/O." The results will amaze you.
--Force the Query Optimizer to use a table scan SELECT * FROM FewUniques (index=0) WHERE Status = `Inactive' --Force a particular index to be used by the Query Optimizer SELECT * FROM FewUniques (index=inFewUniquesStatus) WHERE Status = `Inactive'
You may notice the extra code in parentheses after the table name in each query. This feature (known as an optimizer hint) will be discussed in the "Using Index Optimizer Hints" section later in this chapter. The Stats I/O information shows two very different results (see Listing 3.3). The first SELECT statement forced a table scan and only needed to perform 157 reads from memory (all the data was in memory because it was just inserted, so no disk or physical reads needed to be done). The second SELECT statement required 5,053 reads.
--Stats I/O from table scan access (comment was added) Table: FewUniques scan count 1, logical reads: 157, physical reads: 0, Â read ahead reads: 0 --Stats I/O from indexed access (comment was added) Table: FewUniques scan count 1, logical reads: 5053, physical reads: 0, Â read ahead reads: 0
The statistics page, the index root, the index nodes, and the data pages all must be read when selecting data via an index. This normally will decrease the number of reads required, but when the number of records being returned is high, more reads are required.
What if the number of unique values increases? And what if the table is larger? As the number of rows in of the table and the number of unique values in a column grow, the index becomes more beneficial. Examine and run the code in Listing 3.4.
DROP TABLE FewUniques GO CREATE TABLE FewUniques ( Id int IDENTITY(1,1) NOT NULL, Status char(10) NULL, Col3 char(20) NOT NULL, Col4 char(50) NOT NULL ) GO DECLARE @intNum int SELECT @intNum = 0 BEGIN TRAN WHILE @intNum <= 1300 BEGIN INSERT FewUniques VALUES (CHAR(@intNum % 26 + 65), `test3', `test4') SELECT @intNum = @intNum + 1 END COMMIT TRAN GO CREATE INDEX inFewUniquesStatus ON FewUniques (Status) GO
This script will create, drop, and re-create the FewUniques table (with some modifications to increase the row size). It then inserts data into rows so that the status column will contain `A', `B', ..., `Z', `A', and so on. Next, it creates an index (once again) on the status column. When the two SELECT statements in Listing 3.5 are executed, the results are quite different (see Listing 3.6).
--Force the Query Optimizer to use a table scan SELECT * FROM FewUniques (index=0) WHERE Status = `A' --Force a particular index to be used by the Query Optimizer SELECT * FROM FewUniques (index=inFewUniquesStatus) WHERE Status = `A'
--Stats I/O from table scan access (comment was added) Table: FewUniques scan count 1, logical reads: 55, physical reads: 0, Â read ahead reads: 0 --Stats I/O from non-clustered indexed access (comment was added) Table: FewUniques scan count 1, logical reads: 53, physical reads: 0, Â read ahead reads: 0
The end results show that by using the index, two fewer pages needed to be read. The index has now become efficient. As the table grows in size (either number of rows or size of rows), this difference will increase and make the index the preferable method of accessing the data. Thus, a clustered index will do even more for the performance of this query by requiring fewer comparative reads than an equivalent non-clustered index.
Listing 3.7 shows another version of our table FewUniques where a clustered index is being created instead of a non-clustered one. This will drastically increase performance when you're selecting data from the table based on the status column.
DROP TABLE FewUniques GO CREATE TABLE FewUniques ( Id int IDENTITY(1,1) NOT NULL, Status char(10) NULL, Col3 char(20) NOT NULL, Col4 char(50) NOT NULL ) GO DECLARE @intNum int SELECT @intNum = 0 BEGIN TRAN WHILE @intNum <= 1300 BEGIN INSERT FewUniques VALUES (CHAR(@intNum % 26 + 65), `test3', `test4') SELECT @intNum = @intNum + 1 END COMMIT TRAN GO CREATE CLUSTERED INDEX icFewUniquesStatus ON FewUniques (Status) GO
Listing 3.8 shows the results of querying this table, which looks like Listing 3.5 with one change--the index name is inFewUniquesStatus instead of icFewUniquesStatus. Note that the clustered index was much more efficient at retrieving the data. Because the data resides on the leaf level of the index, no additional jump to the data pages was necessary to read the data. Thus, fewer page reads needed to be performed.
--Stats I/O from table scan access (comment was added) Table: FewUniques scan count 1, logical reads: 78, physical reads: 0, Â read ahead reads: 0 --Stats I/O from clustered indexed access (comment was added) Table: FewUniques scan count 1, logical reads: 5, physical reads: 0, Â read ahead reads: 0
As you have seen, clustered indexes help when there are some unique values on the column being examined, but a non-clustered index is the preferred method of data access when the number of unique values increases to the number of rows in the table. Although the number of pages read will not vary much from non-clustered to clustered index access, the write performance of the table is now in question.
Whenever you make a change to the column that is indexed, SQL Server has to make modifications to the index using that column. When using a clustered index, this requires possible changes to the index root, changes to the index nodes, and the insertion of the entire row in the data/leaf level, potentially requiring a shift of rows.
When you modify a column that has a non-clustered index, the index root may change, index nodes will change, and the leaf level will change. But the data level (separate from the leaf level) will not need to be shifted around because rows will be appended if needed. Because the data pages are handled separately from the index pages, less movement of entire rows of data will be required. This allows for quicker modifications to the data.
TIP Remember to update the statistics (UPDATE STATISTICS) and rebuild your indexes (DBCC DBREINDEX) on a regular basis. Over time, indexes become, well, fragmented, in a similar fashion to a hard drive. Rebuilding an index is really a kind of index defragmentation. Updating statistical information helps the Query Optimizer make better decisions about how it should process the query. Statistics are always updated when you rebuild an index.
If you are instead using the indexed column to find the row that needs to be updated, a clustered index will get to the data faster. If you never update the column(s) on which the clustered index is built, you will not hurt performance when updating data because the index's data isn't involved. We will see more about this in the "Using Indexes for Retrieving and Updating Data" section later in this chapter.
The winner--most of the time--is the clustered index. The exception to the rule appears in a later section. But for most cases, the clustered index will do a much better job of retrieving a range of data. Listings 3.9 through 3.11 show the table and index creation, the two SELECT statements (using non-clustered and clustered indexes), and the Stats I/O results from both.
DROP TABLE FewUniques GO CREATE TABLE FewUniques ( Id int IDENTITY(1,1) NOT NULL, status char(20) not null ) GO DECLARE @intNum int SELECT @intNum = 0 BEGIN TRAN WHILE @intNum <= 5000 BEGIN INSERT FewUniques VALUES (`test' + convert(char(6),@intNum)) SELECT @intNum = @intNum + 1 END COMMIT TRAN GO CREATE CLUSTERED INDEX icFewUniquesId ON FewUniques (Id) GO CREATE INDEX inFewUniquesId ON FewUniques (Id) GO
SELECT Status FROM FewUniques (index=0) WHERE Id BETWEEN 1000 and 1500 SELECT Status FROM FewUniques (index=inFewUniquesId) WHERE Id BETWEEN 1000 and 1500 SELECT Status FROM FewUniques (index=icFewUniquesId) WHERE Id BETWEEN 1000 and 1500
--Stats I/O from table scan access (comment was added) Table: FewUniques scan count 1, logical reads: 71, physical reads: 8, Â read ahead reads: 54 --Stats I/O from non-clustered indexed access (comment was added) Table: FewUniques scan count 1, logical reads: 506, physical reads: 0, Â read ahead reads: 0 --Stats I/O from clustered indexed access (comment was added) Table: FewUniques scan count 1, logical reads: 9, physical reads: 0, Â read ahead reads: 0
Even the table scan was more efficient than the non-clustered index. As I stated earlier, if more than a few rows are being returned, using a non-clustered index is usually a poor choice. The clustered index requires the least number of reads to get the job done. SQL Server searches for only the first and last values in the index node pages and finds the first and last page on which the data resides. Then the data/leaf pages are read sequentially to get the requested information.
The decision to create a clustered index on a primary key will really depend on if another index would better benefit by being clustered. It isn't necessary to make the primary key a clustered index. Another index, one that has frequent range retrievals, for example, might be a better candidate for the clustered index. If this situation exists, use your good judgment and make the primary key a non-clustered index.
As for foreign keys, the same rule applies. Keep in mind, however, that foreign keys often contain repeated values (being the "many" side of a one-to-many relationship) and thus often fit the criteria for becoming a clustered index. Again, you are not obliged to make a foreign key a clustered index, but if it is the best candidate, there should be no question in your mind. If the table contains both a primary and foreign key, the decision becomes a little more involved.
For example, the titles table in the pubs database could have been designed to have a clustered index on the pub_id field and a non-clustered index on the title_id (primary key) field. The pub_id of titles will not have a unique set of data (most publishers produce more than one book). A clustered index on the pub_id column would be wiser than one on the title_id column because the latter has no duplicates. Thus, you would see a better overall performance when retrieving data from titles and publishers. If the publishers table is rarely joined to the titles table, or only a few titles are ever selected when joining to the publishers table, the present scenario of a clustered index on the title_id field and no index on the pub_id field is better.
The key is to know what types of queries will be performed. The more you know about what the users will be retrieving from the database, the easier the decisions about what types of indexes to use will be.
The following table contains a summary of when to use clustered and non-clustered indexes.
|Use Clustered||Use Non-Clustered|
|Columns often grouped/ordered by||Yes||Yes|
|Return a range of values||Yes||No|
|Low number of unique values||Yes||No|
|High number of unique values||No||Yes|
|One or a few unique values||No||No|
|Low number of returned rows||No||Yes|
|Frequently updated columns||No||Yes|
|Foreign key columns||Yes||Yes|
|Primary key columns||Yes||Yes|
|Frequent indexed column modifications||No||Yes|
Of course, on a regular basis you will encounter situations that really are a combination of these conditions. Test out each type of index and see which will best do the job for you.
When creating composite (multi-column) indexes, it is important to keep a few rules in mind. The order of the columns plays a very important role in the efficiency of the index and even the Query Optimizer's decision to use the index. Too many columns can also dramatically increase the size of the index, taking up more space and requiring more time to find the information being sought.
TIP The following SELECT statement (replace column and table information as needed) will find duplicate items within Col1 and Col2 (and their count) in Table1.
SELECT Col1, Col2, COUNT(*) as Total FROM Table1 GROUP BY Col1, Col2 HAVING COUNT(*) > 1.
This is known as a covered query. If the index contains all the data that needs to be retrieved, there is no need to go to the data pages to get the data because the index already can supply all the requested information. Covered queries should only be used if the same few columns are very frequently requested. In addition, do not forget about performance when modifying data. If some of these fields are updated frequently, the index will need more maintenance from you and SQL Server. If the speeding up of the SELECT statement will not make up for the slowing down of these updates, a covered query is not recommended.
For example, let's say you create an index on the authors table that contains the following columns: au_lname, city, state, and phone. You are requesting the information (for a directory listing) from these four columns on a regular basis. However, you find that the authors tend to move a lot (we're funny that way), requiring changes to the city, state, and phone columns as frequently as you look up the information. The index is causing the updates to move much more slowly because three of the four columns get updated often, and the retrieval increase does not make up for that lost time. Thus, it's time to lose the index.
Listing 3.12 creates some more sample data in order to test the covered query scenario. Listing 3.13 selects data using both a clustered and non-clustered index. Listing 3.14 shows that the non-clustered index requires fewer reads to get the same data, even though a range of values is being retrieved.
DROP TABLE FewUniques GO CREATE TABLE FewUniques ( Id int IDENTITY(1,1) NOT NULL, status char(20) not null ) GO DECLARE @intNum int SELECT @intNum = 0 BEGIN TRAN WHILE @intNum <= 5000 BEGIN INSERT FewUniques VALUES (`test' + convert(char(6),@intNum)) SELECT @intNum = @intNum + 1 END COMMIT TRAN GO CREATE CLUSTERED INDEX icFewUniquesId ON FewUniques (Id) GO CREATE INDEX inFewUniquesId ON FewUniques (Id) GO
SELECT Id FROM FewUniques (index=inFewUniquesId) WHERE Id BETWEEN 1000 and 1500 SELECT Id FROM FewUniques (index=icFewUniquesId) WHERE Id BETWEEN 1000 and 1500
--Stats I/O from non-clustered indexed access (comment was added) Table: FewUniques scan count 1, logical reads: 5, physical reads: 0, read ahead Â reads: 0 --Stats I/O from clustered indexed access (comment was added) Table: FewUniques scan count 1, logical reads: 9, physical reads: 0, read ahead Â reads: 0
The results show that no data pages were required to be read by the non-clustered index because it contained all the data being retrieved. It even beat the clustered index on range retrieval (because the data/leaf level pages take up more room). If the table had a larger row size, the non-clustered index would be that much more efficient than its clustered counterpart.
Yes, Virginia, you can have too many indexes. Indexes can certainly help with data retrieval performance, but too many indexes can lead to inefficiency. With each index that you add to a table, that much more work must be done to maintain the set of indexes. Standard maintenance tasks, like updating table statistics and rebuilding indexes, will take longer with each new index you create. If an index is based on a column that gets updated, that is also additional work for SQL Server to perform every time you update that column.
If you are creating a data warehouse and few updates will be performed on the data, have at it. Create all the indexes you need to help speed up data retrieval. If, on the other hand, you are creating an online transaction processing (OLTP) database, you probably should keep the number of indexes to a minimum so that the data modifications will be faster.
Part of the standard database testing should include trying various combinations of clustered and non-clustered indexes. Some scenarios would reveal that several indexes fit in both the clustered and non-clustered index categories. The only way to find out which is best is to give each a try. Test out queries with each type and see which is more beneficial. Then let each test run for a few days or so and check with the users to see if they notice any difference in performance.
TIP Try varying the order of clustered and non-clustered indexes when you're asking the users about performance. You may find that they always see an increase on the second go-around, for example. If you vary the order of your index type when testing with users and see a pattern like this, you know they are being primed to see better (or worse) performance the second time around. You must then rely on your own tests to see what works best.
When is an index good for both data retrieval and modification? The answer: When the field of the index is used to locate data for retrieval and modification but is not modified itself and when data is more frequently updated than inserted or deleted from the table. For example, the authors table in the pubs database has a key column au_id, which is used to find the author for retrieval and updates but is never modified itself.
All indexes cause a decrease in performance when inserting data. Because there is new information being introduced to the table, the index must react accordingly. If a new author is inserted into the authors table, a new entry in all indexes must also be added, adding time to the process.
The performance can increase when deleting data from a table if the field of the index is used to locate the row(s) to be deleted. Although some maintenance by SQL Server is required when removing a row from a table, the increased speed in finding the record to be deleted will normally outweigh the decreased speed of that maintenance. If you delete an author with an au_id of `123-45-6789' by using an index that has a key of only au_id, this index speeds the search for said author. Although an entry must be removed from the index (because the row is no longer in the table), the amount of time it takes to do this is much less than the amount of time saved by using the index to find the author in the first place.
Updating data always results in a performance increase if the field in the index is used to locate the row and if that field is not changed by any of the updates. If you update the field within an index, you cause the same maintenance by SQL Server that results in a performance decrease. If you were to update the author whose au_id was `111-22-3333' and an index using just au_id was present, using it would decrease the amount of time needed to find the record and make the change. Without it, SQL Server would have to search the entire table to make sure it found all records that needed updating.
Retrieving data will usually result in a speed boost if the indexes' field is used to locate the data. Keep in mind the information already presented to you in this chapter. If you select all rows of a table, it's best to use no index because the additional page reads of the index will just be added onto the number of data pages in the table, resulting in more work for SQL Server.
This section aims to fill in a few gaps before going on to the discussion of locking and query optimization. The topic of transaction isolation levels introduces the four basic locking schemes in SQL Server.
SQL Server syntactically provides for the four transaction isolation levels as prescribed in the ANSI-SQL standard. The four transaction isolation levels are
Two of these, repeatable read and serializable, are functionally the same in SQL Server and are provided for compatibility with ANSI SQL. (So it's actually only three isolation levels.) Read committed is the default isolation level for SQL Server connections. Transaction isolation levels can be set as follows:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
This setting is on a per-connection basis and lasts until the connection is dropped or the transaction isolation level is changed. You can temporarily override the setting by influencing the query optimizer (see "Using Locking Optimizer Hints" later in this chapter).
Potentially, read uncommitted will give you the greatest increase in performance because it will read data regardless of its current locking state. Locking concurrency does not exist as far as this level is concerned. But this also provides for the least data integrity, allowing you to read data that has not yet been committed (mid-transaction). Imagine if a user was running a long update on some book prices, and at the same time another user was retrieving title prices with read uncommitted in effect. If the first update was rolled back, the second user's data would be incorrect, reflecting whatever values were in the table at the time it was read.
Read committed is the default setting for SQL Server. When data is being modified, a request to read that data will wait until the modifier's transaction has finished. This prevents data from being read until it has been committed, which provides better data integrity but slows down performance.
When SQL Server reads data, normally the lock is released as soon as SQL Server is done. However, repeatable read and serializable cause locks to be held on data being read until the transaction in which the read exists has completed. This prevents what is known as the phantom phenomenon--reading different values from the same data within a single transaction. Although these two transaction isolation levels provide for the best data integrity, they also have the biggest potential for performance decrease because they hold locks on data that is being read. Unless it is vital to have consistent reads within the same transaction, these two should be avoided.
NOTE SQL Server treats both repeatable read and serializable the same, even though standard ANSI SQL differentiates between the two (although it's a subtle differentiation). Because they work the same in SQL Server, you can use either one when coding.
Now to the heart of the matter. We have only mentioned locking in passing up until this point. We will now examine SQL Server's locking mechanism and how indexes and transaction isolation levels affect it. We will first examine the types of locks available in SQL Server. Next, we will investigate how you can use indexes to decrease locking concurrency. Finally, we will take a look at transaction scopes and the locking mechanism.
There are two basic categories of locks in SQL Server: table and page. Table locks affect the entire table on which the data resides. If a table is locked for a read, no data modifications may occur anywhere in the table until that read is finished. If only a page is locked for a read within the table, other pages may be modified at will.
Based on this information, it is preferable to use page locks over table locks. But what are the differences and what are all of the actual locks? Table 3.2 shows the available page locks in SQL Server and their compatibility with each other.
An Update lock is a special type of shared lock that is not compatible with itself. It is used to indicate that an update will be performed next, and its incompatibility with itself prevents more than one process from scaling up to an exclusive lock.
Denoted by asterisks in Table 3.2, Insert_Page and Link_Page are the two locks used in Insert Row-Level Locking, which allows more than one row to be inserted at the same time on the same (last) page of a table. Insert_Page allows for compatibility with itself and is the lock that allows for multiple inserts on the same page of data. Link_Page is used to go from a full page to the next page of data. An Insert_Page lock automatically escalates to a Link_Page lock when the data page is full.
Table locks are all-encompassing, as if you were locking all pages at the same time. Examine Table 3.3 for a list of table locks and their compatibility with each other.
|Exclusive||Shared||Exclusive Intent||Shared Intent|
Intent locks are actually a reflection of the page locks that are occurring in the table. Consider the following scenario: A user runs a query that causes a Shared page lock, and another user runs a query that needs to place a Shared lock on the entire table. This is acceptable because a Shared Intent table lock allows for a Shared table lock. If another user runs a query that needs to place an Exclusive table lock while a page has a Shared lock, the request has to wait until the page lock is released (pending no other Shared page locks occur during the interim).
If more than four Shared lock requests occur while an Exclusive lock is waiting, the Exclusive lock will be next in line and additional Shared locks will wait until the previous Shared and Exclusive locks have completed. This is known as a demand lock and is automatically performed by SQL Server.
A third category of locks, extent locks, are automatically allocated by SQL Server when needed. An extent (8 pages) can be locked by the CREATE and DROP statements and by the INSERT and UPDATE statements if new pages for data or indexes are being added to the table.
So why all the talk about locking? Well, it just so happens that if you can lock as few pages as possible when manipulating data, you will potentially increase the performance of SQL Server by not making processes wait for locked data to be freed. If 200 users are modifying different customers' data, it is more beneficial to lock only the page on which the individual customer's information resides than to lock the entire table. Although several customers may reside on the same page, thus making some requests wait until each has completed, this is preferable to making all requests wait in a queue while each modification locks the entire table.
So how do you request page locks instead of table locks? By using indexes, of course. As mentioned earlier, an index on a field that locates the data to be modified will increase performance by going right to the page where the data resides, rather than wading through all the data pages to find the particular row. Well, if you only go to the row's page in the table, you only need to lock that page, rather than the entire table, when updating the data.
This helps increase performance yet again. Not only are fewer pages read when making the change to the row, but fewer pages are locked, potentially preventing others from having to wait until your modification has finished.
Certain locking rules apply within a transaction, depending on what type of transaction isolation level is currently in effect. Table 3.4 helps describe those differences.
Lock Type and Length
|Read uncommitted||As Needed||Length of Transaction|
|Read committed||As Needed||Length of Transaction|
|Repeatable read||Length of Transaction||Length of Transaction|
As you can see, Shared locks are normally held only while data is actually being retrieved. The Shared lock is released once that process is completed, even if the transaction has not yet completed. This is why the phantom phenomenon can happen. If the transaction isolation level is set to repeatable read (or serializable), the Shared lock is held for the length of the transaction. Exclusive locks are always held for the length of the transaction, regardless of the isolation level.
What does this mean? It means your transactions should be only as long as needed to maintain data integrity. Lengthy transactions that modify data (that use Exclusive locks, in other words) can slow down SQL Server by making other processes wait for those transactions to complete. The longer a transaction that contains data modification statements takes to execute, the greater the potential for locking concurrency.
Listing 3.15 shows a sample script that will hold onto a Shared lock in the authors table. It also has a delay that lasts two minutes to give you time to check the current server activity.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ GO BEGIN TRAN SELECT * FROM authors WHERE au_lname = `Green' WAITFOR DELAY '00:02:00' ROLLBACK TRAN GO
Once you execute this script, you can go to the Server menu and select Current Activity in Enterprise Manager to see what is happening in the server. Your results should look something like Figure 3.3.
Figure 3.3. Current Activity window while shared page locks are in effect.
The Activity Legend reveals that page table intent locks are being taken out by SQL Server when retrieving the data. The Current Activity Window shows that the Shared locks are held until the transaction is completed (in other words, until the WAITFOR and ROLLBACK TRAN statements are completed).
Now we get to the good stuff: Telling SQL Server how to perform its queries. SQL Server has excellent decision-making capabilities when it comes to figuring out which index, if any at all, would get the job done most efficiently. Sometimes, however, complicated queries with many joins and elaborate WHERE clauses can be interpreted incorrectly. SQL Server provides a mechanism to override the Query Optimizer's decision and specify exactly what indexes, locking, and table order should be used to get or change the data.
You have already seen this feature in action in previous sections of this chapter. The following code listing demonstrates two examples of data retrieval with index optimizer hints from the authors table.
SELECT au_lname, au_fname FROM authors (index = 0) SELECT au_lname, au_fname FROM authors (index = 1) WHERE au_id = `213-46-8915' SELECT au_lname, au_fname FROM authors (index = aunmind) WHERE au_lname = `Green'
The first SELECT statement uses the hint (index = 0) to force a table scan (no index) when retrieving the data. The second SELECT statement uses the hint (index = 1) to force the use of the clustered index. The third SELECT statement forces the use of the non-clustered index aunmind with the hint (index = aunmind). The index number and name are interchangeable (except for table scans, where a 0 will need to be used). Therefore, if you preferred, the third SELECT statement could have very well been as shown in Listing 3.17.
SELECT au_lname, au_fname FROM authors (index = 2) WHERE au_lname = `Green'
Because the aunmind index in the authors table has an ID of 2, either value can be used by the query optimizer.
TIP The index ID can be found in the sysindexes table of the database in which the index resides. An ID of 0 means no index, and an ID of 1 always represents the clustered index. And IDs 2 through 255 represent non-clustered indexes.
Listing 3.18 is a variation on Listing 3.15, and shows how a table lock can be used instead of page locks by forcing a table scan.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ GO BEGIN TRAN SELECT * FROM authors (index=0) WHERE au_lname = `Green' WAITFOR DELAY '00:02:00' ROLLBACK TRAN GO
Once again, you need to go to the Current Activity window to see the results shown in Figure 3.4. If the Current Activity window is already open, you will need to refresh the displayed data by clicking the Refresh button.
Figure 3.4. The Current Activity window while a table lock is in effect.
Let's take the example in Listing 3.18 one step further. By changing the optimizer hint, we can make SQL Server do an Exclusive table lock on the authors table (as shown in Listing 3.19 and Figure 3.5).
BEGIN TRAN SELECT * FROM authors (tablockx) WHERE au_lname = `Green' WAITFOR DELAY '00:02:00' ROLLBACK TRAN GO
Figure 3.5. The Current Activity window while an Exclusive table lock is in effect.
This script will give activity results like those shown in Figure 3.5. The SELECT statement uses the optimizer hint of tablockx to keep exclusive table locks in place until the transaction has completed.
The available locking optimizer hints are shown in Table 3.5.
|holdlock||Hold the lock until the end of the transaction.|
|nolock||Do not use locks when retrieving data.|
|paglock||Use page locks.|
|tablock||Use a table lock.|
|tablockx||Use an exclusive table lock.|
|updlock||Use an update lock.|
holdlock can be used to hold shared locks for the duration of the transaction. This should already sound familiar to you in the form of serializable and repeatable read transaction isolation levels. If you would prefer to occasionally hold Shared locks, you would be better off keeping the default transaction isolation level of read committed and using the holdlock optimizer hint as needed.
nolock has the same functionality as the read uncommitted transaction isolation level. It forces the reading of uncommitted data by not requiring any locks while reading the data (thus bypassing the blocking by any exclusive locks).
One final note regarding index and locking optimizer hints: You can combine both types, but it is important to list the index hint last. The code in Listing 3.20 shows an example of a valid set of optimizer hints.
SELECT * FROM authors (paglock holdlock index=aunmind)
There is no optimizer hint to place in the SELECT statement that will force tables to be selected in the order in which they appear. Rather, SQL Server will decide which table would be the better choice to select from first, based on what data is being requested and what is in the WHERE clause. So when tables are joined, SQL Server decides which one it uses first.
Well, not exactly. You see, there is a way to force the optimizer to use the tables in the order they are presented. It's just not part of the SELECT statement itself. Rather, you use the SET FORCEPLAN statement to tell SQL Server to use the table order as it appears or to decide on its own. Examine Listing 3.21 and its query plan in Listing 3.22.
SELECT au_lname, title FROM titles t JOIN titleauthor ta ON ta.title_id = t.title_id JOIN authors a ON a.au_id = ta.au_id WHERE au_lname = `Green'
STEP 1 The type of query is SELECT FROM TABLE authors a Nested iteration Index : aunmind FROM TABLE titleauthor ta Nested iteration Table Scan FROM TABLE titles t Nested iteration Using Clustered Index
Although the titles table is listed first, then titleauthor, and finally authors in the FROM clause, authors is used first by the query optimizer because its data is being preselected. We can force SQL Server to use the tables as they appear, as shown in Listings 3.23 and 3.24.
SET FORCEPLAN ON SELECT au_lname, title FROM titles t JOIN titleauthor ta ON ta.title_id = t.title_id JOIN authors a ON a.au_id = ta.au_id WHERE au_lname = `Green' SET FORCEPLAN OFF
STEP 1 The type of query is SELECT FROM TABLE titles t Nested iteration Table Scan FROM TABLE titleauthor ta Nested iteration Table Scan FROM TABLE authors a Nested iteration Table Scan
Now SQL Server is using titles first, then titleauthor, and finally authors, just as they are listed in the FROM clause. In this case, however, examination of the Stats I/O would reveal that the latter is much more inefficient than the former. SQL Server decided to use authors first because it would cut down on the number of lookups in the other tables by limiting the authors table to a single or few rows. In most cases, SQL Server will make the best decision. But if you find a complex join query that's running more slowly than you think it should, you may want to play around with table order and the SET FORCEPLAN statement.
In all honesty, the information provided here provides only some of the SQL Server optimization techniques. Network performance, NT Server performance, application design, and so on will all have an effect on how well your SQL Server will do its job.
Great queries and indexes mean nothing if there is not enough memory on the NT Server on which SQL Server resides. What may appear to the users to be poor performance could be alleviated not by playing with indexes but by installing faster network cards or using switches on the network. Applications that issue client-based transaction statements and hold them open can slow down a server to a crawl.
Be sure to check all possibilities when you need to increase performance. Work with the network administrators and application programmers. Make sure that everyone is doing their part to make SQL Server perform to the best of its ability.
For more information on other optimization techniques, see Chapter 14, "Writing Effective Code," or see the book Microsoft SQL Server 6.5 Unleashed.