Readings in Database Systems / Edition 2 available in Paperback
- Pub. Date:
- Elsevier Science & Technology Books
Readings in Database Systems, 3rd Edition is the most up-to-date compilation of papers to explore DBMS applications which were first published in the now classic "Red Book" in 1988. Dr. Stonebraker and Dr. Hellerstein have selected a spectrum of papers on the roots of the field, which include classic papers from the ‘70’s on the relational model to timely discourses on future directions. This new streamlined edition includes 46 papers that cover much of the significant research and development in the database field, organized by area of technology.
Expert introductory analysis of each section topic of the book is provided by leaders of the DBMS field along with a discussion of each reading.
- Third edition is completely revised and streamlined to include the most significant new and classic papers along with introductory materials
- Coverage spans the entire field of database, including relational implementation, transaction management, distributed database, parallel database, objects and databases, data analysis, and benchmarking
- Offers a new section on objects and databases including selections on object oriented databases as well as Object-Relational databases
- Lecture notes available on Morgan Kaufmann Web Site updated by the authors to include each paper
- The definitive book on DBMS applications
Read an Excerpt
Chapter 5: Parallel Database Systems
The success of parallel database systems is due to a failed idea called the database machine. In the eighties, there were innumerable proposals to provide specialized hardware support to make databases run faster. None of these turned out to be economical-the lesson learned was that special-purpose hardware is too expensive relative to commodity hardware, which has economy of scale in its production. Put differently, it is cheaper to buy an overpowered general-purpose machine off the shelf than it is to build a special-purpose, lean-and-mean database machine. A postmortem database machine research appears in [BORA83].
Database machine research was not a total wash, however, because some of the key ideas could be implemented in software rather than hardware. Chief among these was the use of parallelism, which is the focus of this chapter.
There are three basic architectural options for multiprocessor parallelism, namely, shared memory, shared disk, and shared nothing. In a shared memory configuration, a collection of processors is attached to the memory bus and each can access a common shared memory. This architecture is quite popular in the UNIX and NT server marketplace, with hardware offerings from essentially all UNIX vendors as well as most PC vendor and RDBMSs from essentially all the big DBMS vendors. A second option uses processors with private memory that access a shared disk system. This shared disk architecture was popular in the failed "massively parallel" systems of the early nineties from Thinking Machines Intel, and N-cube. The VAXcluster is a more conventional shared disk architecture; IBM's Parallel Sysplex isanother. The third option is to connect a collection of processors with private memory and disks on a local area network or specialized interconnect and then run a shared nothing parallel database system on the configuration, IBM is SP-2 is a machine with a shared nothing architecture, but shared nothing database systems are also run on clusters of standard UNIX or NT workstations connected by a high-speed LAN. Shared nothing RDBMSs include NCR's Teradata (a pioneer in the area), IBM's DB2/PE, and Informix Dynamic Server with Extended Parallel Option.
In all three architectures, little or no custom hardware is oriented toward supporting the DBMS. Rather, a conventional hardware multiprocessor is utilized by the DBMS software-hence the term software database machine. The relative merits of shared nothing, shared memory, and shared disk architectures have been hotly debated by the research community [GAWL87, CARe94]. What seems to be emerging as conventional wisdom is that a hybrid of shared memory and shared nothing techniques is what makes most sense, given current technical and commercial constraints.
Shared memory is the easiest platform to code for. You can run a conventional single-site DBMS on the architecture and depend on the operating system to multiplex DBMS threads or processes onto the available processors. Most commercial DBMSs have been adapted easily to this architecture, and linear speedup with the number of processors has been widely observed. Since shared memory hardware is now a commodity product in both the UNIX and NT market, DBMSs will routinely exploit the parallelism that is naturally available in this configuration. As long as a user requires only a few processors' worth of power, a shared memory system is the simplest, cheapest solution.
Shared nothing is the architecture of choice for maximum scalability. The hardware infrastructure is inexpensive and scales arbitrarily: unlike shared memory systems that are constrained by the number of processors that can share a memory bus, a shared nothing system can accommodate arbitrary numbers of processors. All of the very large databases used for decision support are shared nothing; no other architecture can accommodate terabytes of data and thousands of complex queries. A hidden advantage of shared nothing is that a system can grow as you use it. You can buy 20 machines in year one, and if that is not enough you can buy 20 more in year two. This is not merely a convenience, the machines you buy in year two will have better price/performance than the ones from year one, so the ability to postpone scaling the system is a big economic gain, Contrast this with the shared memory approach, in which you can either buy a 20-way processor in year one and throw it away in favor of a 40-way processor in year two, or buy a 40-way processor in year one but utilize it to capacity only in year two, by which time it is year-old technology.
Shared disk systems offer no particularly persuasive arguments for their support. They are not easy to program (as shared memory systems are). and they do not scale the way that shared nothing systems do. They present numerous technical challenges. For example. each processor must be able to set locks in a common lock table. but there is no shared memory in which to physically store the table. This requires that the table be partitioned and fancy algorithms run to guarantee reasonable locking costs. Similar issues, apply to the buffer pool; a detailed discussion of this area is contained in [CARE91, WANG91]. In addition, crash recovery is difficult in this environment especially when you try to recover one processor, while allowing N-1 that never crashed to continue normal operation. . . .
Table of ContentsThe Roots
Relational DBMS Implementation
Objects and Databases
Data Analysis and Decision Support
Vision and Politics