- Shopping Bag ( 0 items )
Uncover the secrets of SQL and start building better relational databases today!
This fun and friendly guide will help you demystify database management systems so you can create more powerful databases and access information with ease. Updated for the latest SQL functionality, SQL For Dummies, 8th Edition covers the core SQL language and shows you how to use SQL to structure a DBMS, implement a database design, secure your data, and retrieve ...
Uncover the secrets of SQL and start building better relational databases today!
This fun and friendly guide will help you demystify database management systems so you can create more powerful databases and access information with ease. Updated for the latest SQL functionality, SQL For Dummies, 8th Edition covers the core SQL language and shows you how to use SQL to structure a DBMS, implement a database design, secure your data, and retrieve information when you need it.
Don't be daunted by database development anymore - get SQL For Dummies, 8th Edition, and you'll be on your way to SQL stardom.
* * *
In This Chapter
* Organizing information
* Defining database
* Defining DBMS
* Comparing database models
* Defining relational database
* Considering the challenges of database design
* * *
SQL (short for structured query language) is an industry-standard language
specifically designed to enable people to create databases, add new data
to databases, maintain the data, and retrieve selected parts of the data.
Various kinds of databases exist, each adhering to a different conceptual
model. SQL was originally developed to operate on data in databases that
follow the relational model. Recently, the international SQL standard has
incorporated part of the object model, resulting in hybrid structures called
object-relational databases. In this chapter, I discuss data storage, devote a
section to how the relational model compares with other major models, and
provide a look at the important features of relational databases.
Before I talk about SQL, however, first things first: I needto nail down what I
mean by the term database. Its meaning has changed as computers have
changed the way people record and maintain information.
Keeping Track of Things
Today, people use computers to perform many tasks formerly done with
other tools. Computers have replaced typewriters for creating and modifying
documents. They've surpassed electromechanical calculators as the best
way to do math. They've also replaced millions of pieces of paper, file folders,
and file cabinets as the principal storage medium for important information.
Compared to those old tools, of course, computers do much more, much
faster - and with greater accuracy. These increased benefits do come at a
cost, however. Computer users no longer have direct physical access to their
When computers occasionally fail, office workers may wonder whether computerization
really improved anything at all. In the old days, a manila file
folder only "crashed" if you dropped it - then you merely knelt down, picked
up the papers, and put them back in the folder. Barring earthquakes or other
major disasters, file cabinets never "went down," and they never gave you an
error message. A hard drive crash is another matter entirely: You can't "pick
up" lost bits and bytes. Mechanical, electrical, and human failures can make
your data go away into the Great Beyond, never to return.
Taking the necessary precautions to protect yourself from accidental data
loss allows you to start cashing in on the greater speed and accuracy that
If you're storing important data, you have four main concerns:
State-of-the-art computer databases satisfy these four criteria. If you store
more than a dozen or so data items, you probably want to store those items
in a database.
What Is a Database?
The term database has fallen into loose use lately, losing much of its original
meaning. To some people, a database is any collection of data items (phone
books, laundry lists, parchment scrolls ... whatever). Other people define
the term more strictly.
In this book, I define a database as a self-describing collection of integrated
records. And yes, that does imply computer technology, complete with languages
such as SQL.
A record is a representation of some physical or conceptual object. Say, for
example, that you want to keep track of a business's customers. You assign a
record for each customer. Each record has multiple attributes, such as name,
address, and telephone number. Individual names, addresses, and so on are
A database consists of both data and metadata. Metadata is the data that
describes the data's structure within a database. If you know how your data
is arranged, then you can retrieve it. Because the database contains a description
of its own structure, it's self-describing. The database is integrated because
it includes not only data items but also the relationships among data items.
The database stores metadata in an area called the data dictionary, which
describes the tables, columns, indexes, constraints, and other items that
make up the database.
Because a flat file system (described later in this chapter) has no metadata,
applications written to work with flat files must contain the equivalent of the
metadata as part of the application program.
Database Size and Complexity
Databases come in all sizes, from simple collections of a few records to mammoth
systems holding millions of records.
A personal database is designed for use by a single person on a single computer.
Such a database usually has a rather simple structure and a relatively
small size. A departmental or workgroup database is used by the members of a
single department or workgroup within an organization. This type of database
is generally larger than a personal database and is necessarily more complex;
such a database must handle multiple users trying to access the same data at
the same time. An enterprise database can be huge. Enterprise databases may
model the critical information flow of entire large organizations.
What Is a Database Management
Glad you asked. A database management system (DBMS) is a set of programs
used to define, administer, and process databases and their associated applications.
The database being "managed" is, in essence, a structure that you
build to hold valuable data. A DBMS is the tool you use to build that structure
and operate on the data contained within the database.
Many DBMS programs are on the market today. Some run only on mainframe
computers, some only on minicomputers, and some only on personal computers.
A strong trend, however, is for such products to work on multiple
platforms or on networks that contain all three classes of machines.
A DBMS that runs on platforms of multiple classes, large and small, is called
Whatever the size of the computer that hosts the database - and regardless
of whether the machine is connected to a network - the flow of information
between database and user is the same. Figure 1-1 shows that the user communicates
with the database through the DBMS. The DBMS masks the physical
details of the database storage so that the application need only concern
itself with the logical characteristics of the data, not how the data is stored.
Where structured data is concerned, the flat file is as simple as it gets. No, a
flat file isn't a folder that's been squashed under a stack of books. Flat files
are so called because they have minimal structure. If they were buildings,
they'd barely stick up from the ground. A flat file is simply a collection of one
data record after another in a specified format - the data, the whole data,
and nothing but the data - in effect, a list. In computer terms, a flat file is
simple. Because the file doesn't store structural information (metadata), its
overhead (stuff in the file that is not data) is minimal.
Say that you want to keep track of the names and addresses of your company's
customers in a flat file system. The system may have a structure something
Harold Perciva l26262 S. Howards Mill Rd Westminster CA92683
Jerry Appel 32323 S. River Lane Rd Santa Ana CA92705
Adrian Hansen 232 Glenwood Court Anaheim CA92640
John Baker 2222 Lafayette St Garden GroveCA92643
Michael Pens 77730 S. New Era Rd Irvine CA92715
Bob Michimoto 25252 S. Kelmsley Dr Stanton CA92610
Linda Smith 444 S. E. Seventh St Costa Mesa CA92635
Robert Funnell 2424 Sheri Court Anaheim CA92640
Bill Checkal 9595 Curry Dr Stanton CA92610
Jed Style 3535 Randall St Santa Ana CA92705
As you can see, the file contains nothing but data. Each field has a fixed
length (the Name field, for example, is always exactly 15 characters long),
and no structure separates one field from another. The person who created
the database assigned field positions and lengths. Any program using this file
must "know" how each field was assigned, because that information is not
contained in the database itself.
Such low overhead means that operating on flat files can be very fast. On the
minus side, however, application programs must include logic that manipulates
the file's data at a very low level of complexity. The application must
know exactly where and how the file stores its data. Thus, for small systems,
flat files work fine. The larger a system is, however, the more cumbersome a
flat file system becomes. Using a database instead of a flat file system eliminates
duplication of effort. Although database files themselves may have
more overhead, the applications can be more portable across various hardware
platforms and operating systems. A database also makes writing application
programs easier because the programmer doesn't need to know the
physical details of where and how the files store their data.
Databases eliminate duplication of effort, because the DBMS handles the
data-manipulation details. Applications written to operate on flat files must
include those details in the application code. If multiple applications all
access the same flat file data, these applications must all (redundantly)
include that data manipulation code. By using a DBMS, you don't need to
include such code in the applications at all.
Clearly, if a flat file-based application includes data-manipulation code that
only runs on a particular hardware platform, then migrating the application
to a new platform is a headache waiting to happen. You have to change all
the hardware-specific code - and that's just for openers. Migrating a similar
DBMS-based application to another platform is much simpler - fewer complicated
steps, fewer aspirin consumed.
Different as databases may be in size, they are generally always structured
according to one of three database models:
The first databases to see wide use were large organizational databases that
today would be called enterprise databases, built according to either the
hierarchical or the network model. Systems built according to the relational
model followed several years later. SQL is a strictly modern language; it
applies only to the relational model and its descendant, the object-relational
model. So here's where this book says, "So long, it's been good to know ya,"
to the hierarchical and network models.
New database management systems that are not based on the relational
model probably conform to the newer object model or the hybrid object-relational
Dr. E. F. Codd of IBM first formulated the relational database model in 1970,
and this model started appearing in products about a decade later. Ironically,
IBM did not deliver the first relational DBMS. That distinction went to a small
start-up company, which named its product Oracle.
Relational databases have replaced databases built according to earlier
models because the relational type has valuable attributes that distinguish
relational databases from those other database types. Probably the most
important of these attributes is that, in a relational database, you can change
the database structure without requiring changes to applications that were
based on the old structures. Suppose, for example, that you add one or more
new columns to a database table. You don't need to change any previously
written applications that will continue to process that table, unless you alter
one or more of the columns used by those applications.
Of course, if you remove a column that an existing application references,
you experience problems no matter what database model you follow. One of
the best ways to make a database application crash is to ask it to retrieve a
kind of data that your database doesn't contain.
Why relational is better
In applications written with DBMSs that follow the hierarchical or network
model, database structure is hard-coded into the application - that is, the
application is dependent on the specific physical implementation of the database.
If you add a new attribute to the database, you must change your application
to accommodate the change, whether or not the application uses the
Relational databases offer structural flexibility; applications written for those
databases are easier to maintain than similar applications written for hierarchical
or network databases. That same structural flexibility enables you to
retrieve combinations of data that you may not have anticipated needing at
the time of the database's design.
Components of a relational database
Relational databases gain their flexibility because their data resides in tables
that are largely independent of each other. You can add, delete, or change
data in a table without affecting the data in the other tables, provided that
the affected table is not a parent of any of the other tables. (Parent-child table
relationships are explained in Chapter 5, and no, it doesn't mean discussing
allowances over dinner.) In this section, I show what these tables consist of
and how they relate to the other parts of a relational database.
Excerpted from SQL For Dummies
by Allen G. Taylor
Copyright © 2003 by Allen G. Taylor.
Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
|Ch. 1||Relational database fundamentals||7|
|Ch. 2||SQL fundamentals||21|
|Ch. 3||The components of SQL||47|
|Ch. 4||Building and maintaining a simple database structure||75|
|Ch. 5||Building a multitable relational database||91|
|Ch. 6||Manipulating database data||123|
|Ch. 7||Specifying values||141|
|Ch. 8||Using advanced SQL value expressions||163|
|Ch. 9||Zeroing in on the data you want||175|
|Ch. 10||Using relational operators||201|
|Ch. 11||Delving deep with nested queries||225|
|Ch. 12||Recursive queries||243|
|Ch. 13||Providing database security||255|
|Ch. 14||Protecting data||269|
|Ch. 15||Using SQL within applications||287|
|Ch. 16||Accessing data with ODBC and JDBC||303|
|Ch. 17||Operating on XML data with SQL||313|
|Ch. 18||Stepping through a dataset with cursors||335|
|Ch. 19||Adding procedural capabilities with persistent stored modules||345|
|Ch. 20||Handling errors||361|
|Ch. 21||Ten common mistakes||375|
|Ch. 22||Ten retrieval tips||379|
In This Chapter
The Internet, and particularly that portion of it known as the World Wide Web, has mushroomed in importance in the last couple of years. Just about every aspect of computing seems to be viewed in light of how it relates to the Web. Database is no exception. The World Wide Web lives up to its name. It provides a web of connectivity that envelops the globe. Anyone anywhere who has an Internet connection can access data residing on a Web server on the other side of town or, just as easily, on the other side of the world.
The ability to make your data available to anyone anywhere in the world opens up a whole new kind of database usage. This new usage, database publishing, is more akin to book publishing or radio broadcasting than it is to the point-to-point communication characteristic of operations on a local area network. The information you make available on the Web can be accessed and used by thousands or even millions of people that you will never meet. The most popular sites on the Web today receive more than a million visits, or hits, a day. You can make a substantial impact with your Web-based database, even if you are nowhere near that league.
SQL was originally created by IBM to facilitate communication between large databases residing on mainframe computers and users on client machines that were connected to those mainframes by a local area network (LAN). SQL gradually became a defacto standard means, and then an official ANSI and ISO standard means, of communicating between users and databases. Companies producing relational databases designed to operate across local area networks embraced the SQL standard and made it the communications medium of choice on systems in which the user was located on a different machine from the database, with a LAN running between them.
SQL, coupled with ODBC, enabled an application running on a user's machine to simultaneously access data located on two or even more server machines. This combination proved to be a great boon to organizations whose information processing infrastructure had grown up over time without the benefit of centralized planning. Different machines, running different operating systems and different applications, could share information. Marvelous as this kind of flexibility is, it pales in comparison with what is possible over the Internet.
A local area network (LAN) is a collection of computers that are all in physical proximity (that's where the local comes from). The computers, forming nodes on the network, are interconnected by wired or wireless communication links. Many local area networks are small, having anywhere from 10 to 50 nodes. Large organizations may be served by LANs that have more than a thousand nodes. In either case, you can exercise some centralized control over the network. This makes specifying a proprietary database interface possible, and you can expect all the users to be using access tools that are compatible with it.
The Internet is an entirely different story. It has millions of nodes, and they are not in physical proximity. No one has centralized control over what goes on. In this environment, the owner of a database server cannot make any assumptions about what kind of access tools the user has. The user has a Web browser, possibly supplemented with a plug-in that hosts the client end of a client-server database system. Because the most popular Web browsers run on all the popular client platforms, the client software does not have to be specifically tailored to run on a specific back-end database.
Note: The ordinary Web browser, such as Netscape Navigator or Microsoft Internet Explorer, comes close to being that Holy Grail of database access, the Universal Front End. If it existed, the Universal Front End would interface seamlessly with any database server that you want. It would allow the user to create tables easily, manipulate data, and operate database applications regardless of what kind of server the database is on or what kind of DBMS is controlling it. By itself, a browser cannot do this, of course. But by downloading the appropriate Netscape plug-in or ActiveX component (see Chapter 15 for more about these) before attempting to deal with the database, the browser can come very close. When a connection is established, state-of-the-art database publishers check the client machine for the appropriate plug-in. If they find it, they download the client part of their application and proceed. If they do not find the appropriate plug-in, they download the plug-in, followed by the client part of their application. This whole sequence can be relatively transparent to the user.
Two areas where operation on the Internet may differ significantly from operation on a LAN are network protocol and security. If you are considering allowing remote access to your database from over the Internet, you should carefully consider the impact of these two aspects of operation.
In order for the nodes on a network to communicate with each other, they must all speak the same "language." When one node sends a message, it must be formatted in such a way that the intended receiving node can understand it and take appropriate action. The people who first hooked personal computers together to form local area networks were not concerned with making their systems compatible with the Internet. At that time, the Internet was running only on large mainframe computers that ran the UNIX operating system and that were located at government organizations and research universities. The personal computer world seemed far removed from that of the mainframes used by "big science." Consequently, the "languages," or protocols, that were developed for PC LANs were different from what the Internet used.
Today, many PC LANs still operate with protocols that have evolved from those early PC protocols. The IPX/SPX protocol and the NetBEUI protocol are probably the most common of these. In contrast, the Internet uses a protocol named TCP/IP (Transmission Control Protocol/Internet Protocol). Anyone who wants to engage in database operations over the Web must do so using TCP/IP. Generally, this doesn't require any kind of a hardware change, but it can require a software reconfiguration.
Security is a much bigger issue on the Internet than it is on any organizational LAN. On a LAN, you can be reasonably sure that no one is going to purposely try to sabotage your system. On the Internet, that would be a very foolish and dangerous assumption to make. All kinds of people are out there on the Internet, and some of them may want to hurt you -- just for the sheer, twisted fun of it. Competitors or even enemies may have stronger reasons to give you trouble. When you are exposing your database server to the Internet, you must take significant extra precautions, beyond what would be normal for a LAN.
The principal defense against attacks by hackers or other malefactors on the Internet is to install a firewall between your organizational network and the Internet. A firewall is a software system, or combination of hardware and software, that insulates your network from the Internet. All traffic, both in and out, must pass through the firewall. The firewall authenticates the packets passing through it according to standards that you set up. It passes packets that meet your criteria and throws away those that don't. It also allows you to monitor traffic for suspicious activity and to trace attempts at breaching your security.
When you make the decision to take the big step of putting your server on the Internet, be sure to provide adequate protection to sensitive information that you do not want inquisitive outsiders to know or malicious outsiders to damage.
Most database systems found on LANs are structured according to client/server architecture. Data is stored on one or more servers whose specific task is providing access to that data. Smaller, client machines are spread throughout the organization. They host the user interface of the applications that access the database. Users, interacting with the client part of the application, access the data on the server by communicating over the LAN.
Compelling reasons exist to make database data available over the Internet. A commercial enterprise may want certain of its operational data to be available to vendors or customers with which it works closely. Such an enterprise may want to make detailed information about its products available to the general public, in hopes that some of them will become customers. Entities that are in the information dissemination business, such as libraries, may want to make their information available to a wider audience than those who are able to make a physical visit. For these and other reasons, many groups have decided to establish a presence on the Internet.
Beyond putting up a simple Web page, many organizations are engaging in database publishing, making selected internal information available to those who access their Web site. Some such information is freely available to anyone who logs in to the Web site. Using passwords, publishers can restrict access to authorized users, enabling them to access proprietary databases on the site or databases for which a fee is being charged.
The client/server architecture provides many of the key ingredients of a successful Web database publishing installation. Clients on the Web have similar equipment and operating environments to what is typical for clients on a corporate LAN. The database server of a Web-based system is no different from what serves that purpose on a LAN. Yes, you must address protocol and security issues, but good solutions exist for both. Investigating how client/server architecture may be applied to Web database publishing makes sense.
The original implementation of client/server computing on PC LANs used a two-tier architecture. This architecture had two main elements -- the database client and the database server -- connected by the LAN. You can implement a two-tier client/server system in several ways. One way, the so-called fat client architecture, places most of the computational burden on the client machine and relatively little on the server. A second major architecture is the thin client (also called fat server) model. Here, most of the computation is done by the server, and the client provides little more than the user interface. Figure 16-1 is a schematic representation of a two-tier client/server system.
Regardless of how a two-tier client/server system is implemented, all the necessary functions are performed by either the database client software on the client machine or by the database server software on the server machine.
Three-tier client/server architecture is a relatively new development that is rapidly replacing the older two-tier model. It adds another functional block or level to the server side of the system. This new functional block, often called middleware, assumes some of the responsibilities normally handled by both the database client and the database server, allowing both of them to be thinner. Thinning the client is good, because potentially so many of them exist, and the less capable the client machines need to be, the cheaper overall the system will be. Thinning the database server is also good because, when freed of computational tasks, the server can concentrate on moving data into and out of the database, speeding up operations. The higher level of modularization in a three-tier system also makes maintenance and troubleshooting easier. Figure 16-2 is a schematic representation of a three-tier client/server system.
The traditional architecture of the World Wide Web can also be viewed as a two-tier structure. A Web server hosts HTML (HyperText Markup Language) pages, which are accessible over the Internet to Web browsers running on client machines. This architecture is similar to a two-tier client/server system in that the Web browser on an Internet client performs the same function as the user interface running on a client/server database client. The Web server performs a similar job to that of the client/server database server -- dispensing information. The main differences are that a Web browser is thinner than even the thinnest database client in a thin-client client/server system, and a Web server is incapable of the database manipulation required of even a thin-server implementation of a client/server system. This state of affairs is fine as long as you are not trying to perform database operations over the Web. If all you are doing is putting HTML pages up for people to read, you don't need to do anything more. Figure 16-3 shows the structure of a two-tier Web system.
To effectively perform database operations over the Web, you must combine elements from a two-tier client/server system with elements from a two-tier Web system to produce a composite three-tier solution. On the client side, the Web browser, perhaps enhanced by a Netscape plug-in or ActiveX component, provides the database application user interface. On the server side, the database server interfaces directly with the data source, just as it does in a classic client/server system.
The three-tier Web database architecture differs from the three-tier client/server database architecture in the middleware. The third tier (middleware) of a three-tier Web database system incorporates the Web server of a two-tier system and adds to it a server extension program. The signals and protocols handled by the Web server grew up in the Web environment and are accepted as standards in that realm. The signals and protocols that the database server is accustomed to seeing grew up in the client/server environment and are accepted as standards in that realm. The server extension program translates between these two incompatible standards. When requests are traveling from the client out on the Web to the data source behind the database server, the server extension program translates HTML to a form that the database server can understand, such as ODBC-compliant SQL. When result sets are traveling in the opposite direction, the server extension program translates them back into HTML for transmission over the Web. Figure 16-4 schematically shows the structure of a three-tier Web database system.
SQL was originally developed as a means for a remote client to communicate with a database. Local area networks (or wide area networks) passed the SQL from client to server, encoding it according to a network protocol on the source end and decoding it at the destination end. A Web-based system adds an additional level of complication. The Web browser on the client end transforms a user request into packets in TCP/IP format for transmission over the Web. At the server end, the Web server passes these packets on to the server extension program, which translates them back into SQL that the database server can understand and respond to. So, whether you are accessing a database over the Web or on a LAN, SQL is the means by which communication is conducted.
Whereas SQL is a standard language for communicating with a database, database vendors comply with that standard (commonly called SQL-92) to a greater or lesser extent. An application using SQL for database access is by no means guaranteed to successfully communicate with a DBMS that claims to be SQL-92 compliant. You have two ways to address this problem. One is to write native drivers for all the popular database servers. A native driver is specifically written to communicate with a particular database server, and no other. For example, Netscape provides native drivers for Informix, Oracle, and Sybase databases and is working on a native driver for IBM's DB2 database. Microsoft provides native driver support for its own SQL Server database.
Native drivers are fast and efficient because they are specifically written for the database client and database server that they are connecting. The disadvantage is that a different native driver must be written for each database that you want to access -- for each database client to which you want that access to be provided. The magnitude of the task of providing all those drivers for all those combinations of clients and servers motivated Microsoft to develop, and the industry to adopt, ODBC as a standard method of conveying SQL statements from clients to servers. If the SQL on the client end is always ODBC compliant, only one driver must be written for each type of server, and far fewer different server types exist than do clients. ODBC-compliant drivers are now available for the overwhelming majority of servers that anyone would want to connect to.
Java is a language developed by Sun Microsystems specifically for use on the World Wide Web. It is similar in many respects to C++, but simpler to learn and use. People maintaining Web sites create applications written in Java, called applets, that reside on their Web server. When a user connects to a database server, the server downloads an applet to the user's browser, where it serves as a client-side extension to the browser. This system allows the user to access much more of the functionality available on the server than what's possible with just a "plain vanilla" browser.
SQL is a data sublanguage. It was never meant to be a complete language in itself, but was designed to be embedded in programs written in some other "host" language. Java can serve the function of being that host language just as well as can C++, Basic, or any other commonly used programming language. Sun has published a specification for JDBC (Java DataBase Connectivity) that performs the same function that ODBC performs in making a client-generated SQL statement understandable to a wide array of possible database servers. The JDBC standard provides writers of Java applets with the ground rules they need to produce applets that will work with multiple, different database servers.