Read an Excerpt
SQL
Practical Guide for Developers
By Michael J. Donahoo Gregory D. Speegle
MORGAN KAUFMANN PUBLISHERS
Copyright © 2005 Elsevier Inc.
All right reserved.
ISBN: 978-0-08-048974-2
Chapter One
Databasics
A database is a repository designed for organizing and accessing information. For simple data, management is easy. For example, a grocery list may be written on scratch paper, and contact information may be kept in an address book. Both the scratch paper and address book are examples of databases.
Basic lists may work for very simple databases. However, the limitations of this approach can make even simple tasks difficult. Let's look at our address book example again. One problem is that it has space for information we don't know/need (e.g., home address), whereas it does not have space for information we need to know (e.g., pager number). Searching presents another problem. An address book is typically organized alphabetically by name. What if we want to search using some other criteria? Finding all of our friends who live in a particular city requires an exhaustive search. Data entry and maintenance can also be problematic. Paper-based address books are fine for a small set of people; however, it would be very time-consuming to add all of the people in your company. Keeping up with the changes quickly becomes unmanageable. Another problem is relating our paper-based address book with other databases. If we wanted to call all of the individuals on our softball team, we would need to combine the information from our team roster with our address book. Clearly, additional complexity makes it harder and harder to effectively represent the various data relationships. Likewise, as the volume of data increases, the complexity of managing and querying information grows.
Lucky for us, computers are especially adept at managing and quickly accessing information. Software designed to store, manipulate, and retrieve data in a database is called a Database management system (DBMS). Here, we focus exclusively on a specific (and by far the most common) type of DBMS, the relational DBMS. There are many relational DBMSs (see the book Web site for pointers to some examples). Fortunately, relational DBMSs speak a common language called SQL. Using SQL, we can define, manipulate, and query our data. SQL is correctly pronounced as S-Q-L; however, many people also pronounce it see-kwel.
This text is based on the ANSI 2003 SQL standard. NOTE: Each DBMS does things differently, and no major DBMS follows the specification exactly. The specifics of a particular DBMS can be found in its documentation. There are many reasons a DBMS may vary from the standard, including performance, legacy, or marketing. Fortunately, the basic syntax of SQL is the same for all DBMS. See the book Web site for information on DBMS-specific syntax.
SQL is divided into three major parts. Data manipulation language (DML) is used to store and retrieve data from the database. The majority of this book is on SQL DML, and that will be the part of SQL used by the most people. Data description language (DDL) is used to define the structure of the data. Chapters 9 and 10 cover the basics in SQL DDL. Data control language (DCL) is used to restrict access to data by certain users. We introduce DCL in Chapter 12.
We assume that you have a DBMS already installed and are able to enter SQL commands. Consult your DBMS documentation for instructions on how to do this. If you do not have a DBMS, consider downloading one from the Internet. There are some free relational databases, and several commercial DBMSs provide a free version of their product for experimentation. See the book Web site for suggestions of DBMSs.
The best way to understand how a relational database works is to consider a specific application. Imagine that you are the proud owner of a restaurant named Garden Variety Salads. You need to manage information about a variety of real-world entities to run your establishment. For example, you need to maintain a list of food items so your customers will have something to purchase. Each food item is made from one or more different ingredients, such as lettuce, tomatoes, and so on. You also want to track how much each ingredient costs so you can determine how much to charge and how much profit you're making. If you don't want to grow the ingredients yourself, you should keep a list of vendors.
Let's look at representing this information in a relational database called the Restaurant Database. Our design is contrived solely for the purpose of teaching SQL, not database design. Proper database design is a difficult problem, well beyond the scope of this text. Most introductory database texts address design.
1.1 Tables
In the relational model, a database contains a set of tables. A table is made up of rows and columns. Each table has a name, which is unique within the database. Each column has a name and a data type. We discuss data types in the next section. The name of a column need only be unique within a table so other tables in the same database can have columns of the same name. Each row constitutes one record in the table. A table may contain zero or more rows. A row is subdivided into fields, one per column. Tables may be used to model real-world objects and relationships.
Let's look at an example table. In our Restaurant Database, we record information about the suppliers of our ingredients in a table named vendors.
Each row in the vendors table records information about a particular vendor. For example, the row with a vendorid of VGRUS is a record representing the vendor named Veggies_R_Us. The attributes that we wish to record for each vendor are represented by the five columns of the vendors table: a unique vendor identifier (vendorid), the name of the vendor (companyname), the first and last name of the vendor representative (repfname and replname), and an identifier for the vendor (referredby) who recommended this vendor.
Not everybody uses the same terminology. A table may also be called a relation. Technically, a table and a relation are not exactly the same thing. Unlike a table, a relation cannot contain duplicate rows. A row is sometimes called a tuple. Finally, a column may also be called an attribute of the relation. A table may be represented by its name followed by a comma-delimited list of columns:
vendors (vendorid, companyname, repfname, replname, referredby)
We refer to individual columns of a table using the tablename.columnname notation (e.g., vendors.vendorid).
1.2 Data Types
Every column has a declared data type, which specifies what kind (e.g., characters, numbers, time, etc.) of information may be contained in the column. Your DBMS may not support all of these types, and it may have some additional types as well. Consult your DBMS documentation for details. We divide the SQL data types into five categories: character string, numeric, temporal, binary, and boolean.
1.2.1 Character String
Attributes such as names and addresses are typically represented by strings of characters (e.g., 'Bob Smith'). There are many database types for character strings. The most common are as follows:
CHARACTER[(L)] specifies a fixed-length character string containing exactly L characters. If the length is not specified, SQL uses a length of 1. If the string contains fewer than L characters, the remaining characters contain padding characters (usually spaces). CHARACTER may be abbreviated as CHAR.
CHARACTER VARYING(L) specifies a variable-length character string that may hold up to L characters. Only the specified characters are stored so there is no padding. CHARACTER VARYING may be abbreviated as CHAR VARYING or, most often, VARCHAR.
String literals are enclosed in single quotes (e.g., 'Bob'). A single quote within a literal is indicated by two single quotes (e.g., 'Bob"s Car').
The inclusion of trailing spaces is the primary difference between CHAR and VARCHAR. For example, vendorid is defined as CHAR(5) and repfname is defined as VARCHAR(20). Assume we add a row such as the following:
BOB Bob's Bakery Bob Bobson VGRUS
to vendors. Because vendorid is CHAR(5), two extra spaces are stored. However, repfname is VARCHAR(20), so no extra spaces are added. This also makes a difference when we retrieve data from the database. The vendorid 'BOB ' will match 'BOB' or 'BOB', but the repfname 'Bob' will not match 'Bob'.
The default character set for your DBMS may be limited in the kinds of characters it can represent. For example, if your DBMS uses the ASCII (American Standard Code for Information Interchange) character encoding, it can only represent English characters. Unfortunately, this doesn't work well for internationalization. The national character string type can represent internationalized character sets such as Unicode. The fixed-length national character string type is NATIONAL CHARACTER (abbreviated NATIONAL CHAR or NCHAR). The variable-length national character string type is NATIONAL CHARACTER VARYING (abbreviated NATIONAL CHAR VARYING or NCHAR VARYING). String literals for NATIONAL CHARACTER types are prefixed with an N (e.g., N'Bob').
Most DBMSs place an upper limit on the size of a character string. To store large strings, SQL provides the CHARACTER LARGE OBJECT data type (abbreviated as CHAR LARGE OBJECT or CLOB). SQL allows the use of the national character string encoding with NATIONAL CHARACTER LARGE OBJECT (abbreviated NCHAR LARGE OBJECT or NCLOB).
1.2.2 Numeric
SQL includes several different types to store numeric information, such as age and salary. In specifying a numeric type, we need to consider three questions: 1) Are our data limited to whole numbers, 2) What range of values do we wish to support, and 3) How much control do we need over precision?
INTEGER, SMALLINT, and BIGINT—INTEGER, SMALLINT, and BIGINT store signed whole numbers. The range of possible values is DBMS dependent; however, the range of values for SMALLINT is less than or equal to the range of value for INTEGER, which is less than or equal to the range of values for BIGINT. INTEGER may be abbreviated as INT.
NUMERIC[(P [, S])] and DECIMAL[(P [, S])]—NUMERIC(P, S) specifies a signed, fixed-point number where P (precision) specifies the total number (to the left and right of the decimal) of digits in the number and S (scale) specifies the number of digits to the right of the decimal place. For example, NUMERIC(5, 2) specifies a type ranging from -999.99 to 999.99. DECIMAL(P, S) is the same as NUMERIC(P, S) except that the actual precision may exceed the specification. For both NUMERIC and DECIMAL, if P or S are not specified, default values will be used. DECIMAL may be abbreviated as DEC.
REAL and DOUBLE PRECISION—REAL specifies a signed, single-precision, floating-point number. The range is DBMS specific. DOUBLE PRECISION is the same as REAL except it supports a greater range of values.
FLOAT[(P)]—FLOAT(P) specifies a signed, floating-point number with a precision of at least P. Here P specifies the number of binary digits.
Numeric literals look like you would expect. For REAL, DOUBLE, and FLOAT, literals may be written in exponential notation as nEp representing n × 10p where n is a signed, floatingpoint number and p is a signed, whole number (e.g., 7.4E-3).
If you try to use a number with an absolute value that is too large for the data type, the database should generate an exception. For example, an exception would be raised if we tried to put -1000 into a NUMERIC(5,2) field. If you use a number with too many digits to the right of the decimal point, the DBMS may either truncate or round the value. So, placing 0.0001 into a NUMERIC(5,2) field results in a value of 0.00. A number with fewer digits than the limit of the data type can be placed into a column without exception or change. For example, 9.9 fits in a NUMERIC(5,2) field.
1.2.3 Temporal
SQL provides several data types specific to storing information about temporal information. Representing and utilizing data and time information can be complicated with considerations of time zone, daylight savings time, and so on. As a result, temporal data tend to be more DBMS specific than other types. If the syntax presented here does not work on your database, check your DBMS documentation. SQL divides its temporal types into two categories: datetime and interval. We begin by looking at the date and time types in SQL.
(Continues...)
Excerpted from SQL by Michael J. Donahoo Gregory D. Speegle Copyright © 2005 by Elsevier Inc.. Excerpted by permission of MORGAN KAUFMANN PUBLISHERS. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.