Pub. Date:
Elsevier Science & Technology Books
Readings in Database Systems / Edition 2

Readings in Database Systems / Edition 2

by Michael Stonebraker


Current price is , Original price is $54.95. You
Select a Purchase Option (Older Edition)
  • purchase options
    $43.96 $54.95 Save 20% Current price is $43.96, Original price is $54.95. You Save 20%.
    Currently Unavailable for Shipping
    Please check below for Buy Online, Pick up in Store.
  • purchase options

Product Details

ISBN-13: 9781558602526
Publisher: Elsevier Science & Technology Books
Publication date: 01/28/1994
Series: Morgan Kaufmann Series in Data Management Systems
Edition description: Older Edition
Pages: 970
Product dimensions: 8.66(w) x 11.42(h) x (d)

Read an Excerpt

Chapter 5: Parallel Database Systems

The success of parallel database systems is due to a failed idea called the database machine. In the eighties, there were innumerable proposals to provide specialized hardware support to make databases run faster. None of these turned out to be economical-the lesson learned was that special-purpose hardware is too expensive relative to commodity hardware, which has economy of scale in its production. Put differently, it is cheaper to buy an overpowered general-purpose machine off the shelf than it is to build a special-purpose, lean-and-mean database machine. A postmortem database machine research appears in [BORA83].

Database machine research was not a total wash, however, because some of the key ideas could be implemented in software rather than hardware. Chief among these was the use of parallelism, which is the focus of this chapter.

There are three basic architectural options for multiprocessor parallelism, namely, shared memory, shared disk, and shared nothing. In a shared memory configuration, a collection of processors is attached to the memory bus and each can access a common shared memory. This architecture is quite popular in the UNIX and NT server marketplace, with hardware offerings from essentially all UNIX vendors as well as most PC vendor and RDBMSs from essentially all the big DBMS vendors. A second option uses processors with private memory that access a shared disk system. This shared disk architecture was popular in the failed "massively parallel" systems of the early nineties from Thinking Machines Intel, and N-cube. The VAXcluster is a more conventional shared disk architecture; IBM's Parallel Sysplex isanother. The third option is to connect a collection of processors with private memory and disks on a local area network or specialized interconnect and then run a shared nothing parallel database system on the configuration, IBM is SP-2 is a machine with a shared nothing architecture, but shared nothing database systems are also run on clusters of standard UNIX or NT workstations connected by a high-speed LAN. Shared nothing RDBMSs include NCR's Teradata (a pioneer in the area), IBM's DB2/PE, and Informix Dynamic Server with Extended Parallel Option.

In all three architectures, little or no custom hardware is oriented toward supporting the DBMS. Rather, a conventional hardware multiprocessor is utilized by the DBMS software-hence the term software database machine. The relative merits of shared nothing, shared memory, and shared disk architectures have been hotly debated by the research community [GAWL87, CARe94]. What seems to be emerging as conventional wisdom is that a hybrid of shared memory and shared nothing techniques is what makes most sense, given current technical and commercial constraints.

Shared memory is the easiest platform to code for. You can run a conventional single-site DBMS on the architecture and depend on the operating system to multiplex DBMS threads or processes onto the available processors. Most commercial DBMSs have been adapted easily to this architecture, and linear speedup with the number of processors has been widely observed. Since shared memory hardware is now a commodity product in both the UNIX and NT market, DBMSs will routinely exploit the parallelism that is naturally available in this configuration. As long as a user requires only a few processors' worth of power, a shared memory system is the simplest, cheapest solution.

Shared nothing is the architecture of choice for maximum scalability. The hardware infrastructure is inexpensive and scales arbitrarily: unlike shared memory systems that are constrained by the number of processors that can share a memory bus, a shared nothing system can accommodate arbitrary numbers of processors. All of the very large databases used for decision support are shared nothing; no other architecture can accommodate terabytes of data and thousands of complex queries. A hidden advantage of shared nothing is that a system can grow as you use it. You can buy 20 machines in year one, and if that is not enough you can buy 20 more in year two. This is not merely a convenience, the machines you buy in year two will have better price/performance than the ones from year one, so the ability to postpone scaling the system is a big economic gain, Contrast this with the shared memory approach, in which you can either buy a 20-way processor in year one and throw it away in favor of a 40-way processor in year two, or buy a 40-way processor in year one but utilize it to capacity only in year two, by which time it is year-old technology.

Shared disk systems offer no particularly persuasive arguments for their support. They are not easy to program (as shared memory systems are). and they do not scale the way that shared nothing systems do. They present numerous technical challenges. For example. each processor must be able to set locks in a common lock table. but there is no shared memory in which to physically store the table. This requires that the table be partitioned and fancy algorithms run to guarantee reasonable locking costs. Similar issues, apply to the buffer pool; a detailed discussion of this area is contained in [CARE91, WANG91]. In addition, crash recovery is difficult in this environment especially when you try to recover one processor, while allowing N-1 that never crashed to continue normal operation. . . .

Table of Contents

The Roots
Relational DBMS Implementation
Transaction Management
Distributed Databases
Parallel Databases
Objects and Databases
Object-Relational DBs
Data Analysis and Decision Support
Vision and Politics

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews