- Shopping Bag ( 0 items )
"...covers proper structure of SQL queries, how to produce a desired result & minimize burden on SQL command processors & how to modify data & database ...
"...covers proper structure of SQL queries, how to produce a desired result & minimize burden on SQL command processors & how to modify data & database structure for ease of use & increased efficiency.
"This book functions as a very good tutorial, taking you from the basics through expert topics in a very straightforward manner."--Bill Arledge, Consulting DB2 Product Manager
"SQL Clearly Explained is one of those rare books that delivers 100 percent what its title promises. If you need to use SQL on the job, and who doesn't, then this book should be on your desk."--Craig S. Mullins, DB2 Consultant and IBM Information Champion, Mullins Consulting, Inc.
"This book is of great value to SQL practitioners. It also makes an excellent college-level text."--Rudy Limeback, SQL Consultant, r937.com
You don't need to be a database designer to use SQL successfully. However, you do need to know a bit about how relational databases are structured and how to manipulate those structures. This chapter therefore will acquaint you with the basic elements of the relational data model and its terminology. We'll finish by looking at the design of the sample database used throughout this book.
Schemas and entities
A database is a place where we store data, but there is more to it than that: We also store information about the relationships between pieces of data. The organization of a database is a logical concept rather than a physical one. Yes, there are files that store the data in a database, but the physical structure of those files usually isn't a concern for those who use the data.
The software that organizes, stores, retrieves, and analyzes database data is known as a database management system (DBMS). It isolates the user from the physical data storage mechanisms and structures and lets the user work with data in terms of the logical structure of the data.
The overall logical plan of a database is known as a schema. A schema has two types of elements:
* Entities: An entity is something about which we store data, such as a customer or a product or an order for a product. Entities are described by pieces of data known as attributes. When we have a collection of data for all the attributes of an entity, we say we have an occurrence of the entity. Databases actually store occurrences of entities. Schemas show us what entities will be in the database and what attributes are used to represent those entities. * Relationships: Relationships define how entities interact. For example, a customer entity is typically related to many order entities. There are three types of relationships, all of which we will discuss shortly.
The most important thing to keep in mind is that a schema shows the logical plan for a database, what entities and relationships could possibly be stored. However, inside the real-world database, we have many occurrences of many entities, each represented by descriptive data. We may not have occurrences of every entity in the schema or we may have thousands (even hundreds of thousands) of occurrences of entities.
Relations and Tables
A relational database takes its name from the structure used to represent an entity: a two-dimensional table with special characteristics taken from mathematical set theory, where such a structure is known as a relation. To begin, let's look at the simple relation in Figure 1-1. At first glance, the relation looks like any table, but unlike other tables you may have encountered (for example, rectangular areas of spreadsheets), it has some very specific characteristics.
Columns and Rows
A relation is a two-dimensional table with no repeating groups. That means that if you look at the intersection of a column and a row, there will be only one value. What you see in Figure 1-2 is certainly a table, but it isn't a relation. Why? Because there are multiple values in some of the rows in the Children column. In contrast, Figure 1-1 is a legal relation.
Note: Although the official name of the two-dimensional "thing" we have been discussing is "relation," most people consider the word "table" to be synonymous and we will use both terms interchangeably throughout this book.
A relation has a name that is unique within its schema. Each column (or attribute) in a relation also has a name, but in this case, the name needs to be unique only within the table. In fact, as you will see shortly, there are times when you actually want to have columns with the same names in multiple tables.
In a well-designed relational database, each table represents an entity. We often document entities (and, as you will see, the relationships among them) in a diagram known as an entity-relationship diagram (ERD). There are many ways to draw ERDs, each of which can convey just about the same information. The particular style we'll be using in this book is known as the information engineering (IE) style. An entity is represented as a rectangle with its name in the top section and its attributes in the bottom, as you see in Figure 1-3.
A relation is both column-order independent and row-order independent. This mean that we can view the columns in any order and the rows in any order without losing the meaning of the data. The assumption is, however, that all the data in one row remain in that row.
Each column in a relation has a domain, an expression of the legal values for that column. In some cases, a domain is very specific. For example, if you are working with a column that stores the sizes of T-shirts, the entire domain might consist of the values S, M, L, XL, and XXL. Domains are more commonly, however, general data types, such as integer or date.
Once you assign a domain to a column, the DBMS will enforce that domain, rejecting any command that attempts to enter a value into the column that isn't from the domain. This is an example of a constraint on a relation, a rule to which the relation must adhere.
Each row in a relation must have a unique value that identifies the row. This primary key is made up of the values in one or more columns (the smallest number of columns needed to enforce uniqueness). A table that stores information about an order, for example, would probably use the order number as its primary key.
People are particularly difficult to identify uniquely, so we often assign each person in a table an arbitrary number. If you look back at Figure 1-3, you will see that there is a customer_numb attribute, representing a number that will be simply given to each customer when a row for a new customer is entered into the table. The IE diagramming method places an asterisk in front of the column or columns that make up a primary key, just as is done in Figure 1-3.
Sometimes there is no single column that will uniquely identify each row in a table. As an example, consider the table in Figure 1-4 (dependents), which lists employees' dependent children. We can't use the employee number as the primary key because customer numbers repeat for each child an employee has, and many employees have more than one child. By the same token, the children's names and birthdates aren't unique. The solution is to consider the values in two columns as the primary key. In this case, the employee number and the child's name make the best primary key. Taken as a unit, the two values are unique in every row. A primary key made up of more than one column is known as a concatenated key.
Why are unique primary keys so important? Because they ensure that you can retrieve every piece of data that you put into a database. If primary keys aren't unique, a query will retrieve one or more rows with a value you specify, but you can't be certain which is the exact row you want unless you know something that identifies just that one row. In fact, you should be able to retrieve any single data value knowing three things: the name of the table, the name of the column, and the primary key of the row.
As you will see later in this book, you specify a table's primary key when you define the table to the DBMS. The DBMS will then enforce a constraint that requires unique primary key values.
Note: It is actually possible to create a table that has no primary key, but some DBMSs won't let you put any data in it.
Sometimes you don't put data in some columns of some rows because you don't know the appropriate data values. The empty columns don't contain a zero or a blank. Instead, they contain a special indicator known as null, which means "unknown."
There are two important implications of the presence of nulls in a table. First, we can't allow nulls as all or part of a primary key. If there is only one row with null for a primary key, then the property of unique primary key values is preserved. The minute we introduce a second row with a null primary key, however, the primary keys are no longer unique. A DBMS will therefore ensure that all primary keys have values, a constraint known as entity integrity.
Secondly, nulls can affect the result of queries. Assume, for example, that you want to retrieve the names of all employees who have a salary of more than $100,000. For all employees that have a value in the salary column, the answer to "Is the salary more than $100,000" will be either "yes" or "no." But if the salary column contains null, the DBMS doesn't know the answer to the question; the result is "maybe."
We say that a DBMS operates using three-valued logic: yes, no, or maybe. The question that remains is what a DBMS should do when the answer to the question it is asking is "maybe." Should it retrieve rows with null or leave them out? The relational data model doesn't specify exactly what a DBMS should do, but does require the DBMS to act consistently—either always retrieve rows with nulls or always leave them out—and that the user be aware of what is happening. We'll deal with effect of nulls at various places throughout this book.
Base versus Virtual Tables
There are two primary types of tables with which you will be working when you use SQL. The tables that contain data that are stored in the database are known as base tables. However, the DBMS also uses several types of temporary tables that only exist in main memory. These are virtual tables and by definition they are not stored in the database. Most modern DBMS use several types of virtual tables, including views, temporary tables, and query result tables. If you want to keep the data in a virtual table, then those data must be inserted into a base table.
Along with data describing entities, a database must somehow represent relationships between entities. Prior to the relational data model, databases used data structures embedded in the data to show relationships. However, the relational data model relies on it data to show relationships.
Types of Relationships
There are three types of relationships between entities that we encounter in our database environments: one-to-one, one-to-many, and many-to-many.
A one-to-one relationship exists between two entities when an occurrence of entity A is related to zero or one occurrences of entity B and an occurrence of entity B is related to zero or one occurrences of entity A. Although the specific occurrences involved in the relationship may change over time, there is never more than one related occurrence at any given time. For example, a car and its engine have unique serial numbers. At any one time, an engine is installed in only one car; at the same time, a car has only one engine. The engine may be in no car or it can be moved from one car to another, but it can't be in more than one place at a time. By the same token, a car can have no engine or one engine. The specific engine may change over time but there is never more than one.
We include a relationship in an ERD by drawing a line between the rectangles for two related entities. The line ends identify the type of the relationship. In Figure 1-5 you can see the way in which we would diagram the one-to-one relationship between a car and its engine. The |0 at the end of the line means "zero or one."
If the relationship is required (mandatory), then the |0 at the end of the line changes to || (one and only one). We use mandatory relationships when we don't want an occurrence of an entity to be store in the database unless it is related to an occurrence of the entity at the other end of the relationship. For example, if we didn't want an engine in the database unless that engine was in a car, the end of the line next to the car entity would be ||.