Read an Excerpt
Chapter 1: Introduction to SQL
The acronym "SQL" is short for Structured Query Language. Unfortunately, SQL is not structured, SQL is not only for queries, and SQL is not a language, per se, because SQL is embedded within other languages such as C and COBOL. Regardless of the mistaken name, SQL has emerged as the dominant access method for relational databases.
This chapter will introduce the nature of Oracle SQL and lay the foundation for techniques that we will be using throughout this book. In this chapter, we will cover the following topics.
- The basic nature of SQL This section will compare SQL to navigational database query languages.
- The beginning of SQL This section will show how SQL has evolved as the de facto standard for database access.
- The SQL optimizer This will be a brief introduction to the process of SQL optimization.
- The goals of SQL tuning This will cover the overall goals of SQL tuning.
- SQL tuning as a phase of Oracle tuning This section will explore how SQL tuning fits into the overall tuning model.
- The barriers to SQL tuning This section discusses the problems encountered when attempting to tune Oracle SQL.
- The process of SQL tuning This section will explore the general goals for tuning an individual SQL statement.
- Our SQL tuning toolkit This section will introduce the toolkit that we will be using throughout the book for examining SQL statements for tuning.
The Basic Nature of SQL
The SQL standard proposal was originally created as an alternative to the cumbersome navigational languages of existing databases. In the 1960s, the IBM IMS database was the only large-scale commercial database management system. Unlike databases on the relational model, IMS is a hierarchical database with an internal pointer structure used for navigating between database records.
The navigational database access tools required the programmer to navigate through the data structures by means of pointer chasing. Here is an actual example of a query from the popular IDMS database, an early CODASYL network database...
...Here we see that the query navigates between data records, accessing the record, finding a pointer, and moving between pointers according to the pointer values (as shown in Figure 1-1). The point is that this type of database query requires knowledge of the internal structures of the database in order to extract data.
The equivalent statement in SQL is quite different in syntax and function. Unlike a navigational database access language, SQL is designed to require only a specification of the columns you want to display, the tables that contain the data, and the join criteria for the tables....
We will take a closer look at the basic structure of SQL later in this chapter. While SQL is generally associated with relational databases, it is important to note that SQL is also popular in nonrelational databases. The IDMS network database developers renamed their product to IDMS/R after they created a SQL engine, and several object-oriented databases now offer SQL front ends that make their databases appear to be relational.
The Beginnings of SQL
In 1970, Dr. Edgar Codd of IBM and Chris Date developed a relational model for data storage. In the model, data would be stored in simple linear structures called "relations" or "tables." One of the best improvements of the relational model over its predecessors was its simplicity. Rather than requiring users to know dozens of navigational data manipulation Language (DML) commands, the relational model introduced a declarative language called SQL to simplify data access and manipulation.
In Codd and Date's model, the tables are represented as two-dimensional arrays of "rows" and "columns." Rows were called "tuples" (rhymes with "couples"), and columns were called "attributes." A table will always have a field or several fields that make a "primary key" for a table. In their relational database model, the tables
are independent, unlike in hierarchical and network models, where they are pointerconnected.
The relational database model offered the following improvements over the existing hierarchical and network databases:
- Simplicity The concept of tables with rows and columns is extremely simple and easy to understand. End users have a simple data model. Complex network diagrams used with the hierarchical and network databases are not used with a relational database.
- Data independence Data independence is the ability to modify data structures (in this case, tables) without affecting existing programs. Much of this ability comes because tables are not hard-linked to one another. Columns can be added to tables, tables can be added to the database, and new data relationships can be added with little or no restructuring of the tables. A relational database provides a much higher degree of data independence than do hierarchical and network databases.
- Declarative Data Access The SQL user specifies what data is wanted, and then the embedded SQL, a procedural language, determines how to get the data. In relational database access, the user tells the system the conditions for the retrieval of data. The system then gets the data that meets the selection conditions in the SQL statements. The database navigation is hidden from the end user or programmer, unlike in a CODASYL DML language, where the programmer had to know the details of the access path.
In the marketplace, the declarative data access capability was far more interesting than the internal storage components of the relational database, and SQL became synonymous with the relational model.
A Model for SQL
The first model of SQL can be thought of as having three categories of function: Define, Manipulate, and Authorize.
- Define refers to the data definition language (DDL) that performs object create, drop, and alter functions.
- Manipulate refers to the data manipulation language (DML) that performs select, insert, update, and delete functions.
- Authorize refers to the control that performs grant and revoke functions.
Within the Manipulate functions, we see three dimensions to SQL,
select,
project, and
join. These three simple metrics define the whole functionality of SQL.
Select Operation
A
select operation reduces the length of a table by filtering out unwanted rows. By specifying conditions in the
where clause, the user can filter unwanted rows out of the result set, as shown in Figure 1-2. In sum, the
select operation reduces the results vertically.
Project Operation
Just as the
select operation reduces the number of rows, the
project operation reduces the number of columns. The column names specified in the SQL select determine those columns that are displayed, as shown in Figure 1-3. In sum, the project operation reduces the size of the result set horizontally.
Join Operation
A
join operation such as is shown in Figure 1-4 is used to relate two or more independent tables that share a common column. In a join, two or more independent tables are merged according to a common column value....