Advanced Computer Architecture and Parallel Processing / Edition 1

Hardcover (Print)
Buy New
Buy New from BN.com
$128.77
Used and New from Other Sellers
Used and New from Other Sellers
from $121.05
Usually ships in 1-2 business days
(Save 23%)
Other sellers (Hardcover)
  • All (9) from $121.05   
  • New (7) from $121.05   
  • Used (2) from $128.76   

Overview

Computer architecture deals with the physical configuration, logical structure, formats, protocols, and operational sequences for processing data, controlling the configuration, and controlling the operations over a computer. It also encompasses word lengths, instruction codes, and the interrelationships among the main parts of a computer or group of computers. This two-volume set offers a comprehensive coverage of the field of computer organization and architecture.

Read More Show Less

Editorial Reviews

From the Publisher
"This outstanding treatise...allows students and professionals to become familiar with the inner workings of an inherently complex architecture." (CHOICE, July 2005)
Read More Show Less

Product Details

Meet the Author

HESHAM EL-REWINI, PHD, PE, is a full professor and chairman of the Department of Computer Sciences and Engineering at Southern Methodist University (SMU). He has co-authored several books, published numerous research papers in journals and conference proceedings, and chaired many international conferences.

MOSTAFA ABD-EL-BARR, PHD, PEnG, is a professor and chairman of the Department of Information Science at Kuwait University. He has co-authored two other books, published more than 120 research papers in journals and conference proceedings, and served as chair for a number of international conferences and symposia.

Read More Show Less

Read an Excerpt

Advanced Computer Architecture and Parallel Processing


By Hesham El-Rewini M. Abd-El-Barr

John Wiley & Sons

Copyright © 2005 John Wiley & Sons, Inc.
All right reserved.

ISBN: 0-471-46740-5


Chapter One

Introduction to Advanced Computer Architecture and Parallel Processing

Computer architects have always strived to increase the performance of their computer architectures. High performance may come from fast dense circuitry, packaging technology, and parallelism. Single-processor supercomputers have achieved unheard of speeds and have been pushing hardware technology to the physical limit of chip manufacturing. However, this trend will soon come to an end, because there are physical and architectural bounds that limit the computational power that can be achieved with a single-processor system. In this book we will study advanced computer architectures that utilize parallelism via multiple processing units.

Parallel processors are computer systems consisting of multiple processing units connected via some interconnection network plus the software needed to make the processing units work together. There are two major factors used to categorize such systems: the processing units themselves, and the interconnection network that ties them together. The processing units can communicate and interact with each other using either shared memory or message passing methods. The interconnection network for shared memory systems can be classified as bus-based versus switch-based. In message passing systems, the interconnection network is divided into static and dynamic. Static connections have a fixed topology that does not change while programs are running. Dynamic connections create links on the fly as the program executes.

The main argument for using multiprocessors is to create powerful computers by simply connecting multiple processors. A multiprocessor is expected to reach faster speed than the fastest single-processor system. In addition, a multiprocessor consisting of a number of single processors is expected to be more cost-effective than building a high-performance single processor. Another advantage of a multiprocessor is fault tolerance. If a processor fails, the remaining processors should be able to provide continued service, albeit with degraded performance.

1.1 FOUR DECADES OF COMPUTING

Most computer scientists agree that there have been four distinct paradigms or eras of computing. These are: batch, time-sharing, desktop, and network. Table 1.1 is modified from a table proposed by Lawrence Tesler. In this table, major characteristics of the different computing paradigms are associated with each decade of computing, starting from 1960.

1.1.1 Batch Era

By 1965 the IBM System/360 mainframe dominated the corporate computer centers. It was the typical batch processing machine with punched card readers, tapes and disk drives, but no connection beyond the computer room. This single mainframe established large centralized computers as the standard form of computing for decades. The IBM System/360 had an operating system, multiple programming languages, and 10 megabytes of disk storage. The System/360 filled a room with metal boxes and people to run them. Its transistor circuits were reasonably fast. Power users could order magnetic core memories with up to one megabyte of 32-bit words. This machine was large enough to support many programs in memory at the same time, even though the central processing unit had to switch from one program to another.

1.1.2 Time-Sharing Era

The mainframes of the batch era were firmly established by the late 1960s when advances in semiconductor technology made the solid-state memory and integrated circuit feasible. These advances in hardware technology spawned the minicomputer era. They were small, fast, and inexpensive enough to be spread throughout the company at the divisional level. However, they were still too expensive and difficult to use to hand over to end-users. Minicomputers made by DEC, Prime, and Data General led the way in defining a new kind of computing: time-sharing. By the 1970s it was clear that there existed two kinds of commercial or business computing: (1) centralized data processing mainframes, and (2) time-sharing minicomputers. In parallel with small-scale machines, supercomputers were coming into play. The first such supercomputer, the CDC 6600, was introduced in 1961 by Control Data Corporation. Cray Research Corporation introduced the best cost/performance supercomputer, the Cray-1, in 1976.

1.1.3 Desktop Era

Personal computers (PCs), which were introduced in 1977 by Altair, Processor Technology, North Star, Tandy, Commodore, Apple, and many others, enhanced the productivity of end-users in numerous departments. Personal computers from Compaq, Apple, IBM, Dell, and many others soon became pervasive, and changed the face of computing.

Local area networks (LAN) of powerful personal computers and workstations began to replace mainframes and minis by 1990. The power of the most capable big machine could be had in a desktop model for one-tenth of the cost. However, these individual desktop computers were soon to be connected into larger complexes of computing by wide area networks (WAN).

1.1.4 Network Era

The fourth era, or network paradigm of computing, is in full swing because of rapid advances in network technology. Network technology outstripped processor technology throughout most of the 1990s. This explains the rise of the network paradigm listed in Table 1.1. The surge of network capacity tipped the balance from a processor-centric view of computing to a network-centric view.

The 1980s and 1990s witnessed the introduction of many commercial parallel computers with multiple processors. They can generally be classified into two main categories: (1) shared memory, and (2) distributed memory systems. The number of processors in a single machine ranged from several in a shared memory computer to hundreds of thousands in a massively parallel system. Examples of parallel computers during this era include Sequent Symmetry, Intel iPSC, nCUBE, Intel Paragon, Thinking Machines (CM-2, CM-5), MsPar (MP), Fujitsu (VPP500), and others.

1.1.5 Current Trends

One of the clear trends in computing is the substitution of expensive and specialized parallel machines by the more cost-effective clusters of workstations. A cluster is a collection of stand-alone computers connected using some interconnection network. Additionally, the pervasiveness of the Internet created interest in network computing and more recently in grid computing. Grids are geographically distributed platforms of computation. They should provide dependable, consistent, pervasive, and inexpensive access to high-end computational facilities.

1.2 FLYNN'S TAXONOMY OF COMPUTER ARCHITECTURE

The most popular taxonomy of computer architecture was defined by Flynn in 1966. Flynn's classification scheme is based on the notion of a stream of information. Two types of information flow into a processor: instructions and data. The instruction stream is defined as the sequence of instructions performed by the processing unit. The data stream is defined as the data traffic exchanged between the memory and the processing unit. According to Flynn's classification, either of the instruction or data streams can be single or multiple. Computer architecture can be classified into the following four distinct categories:

single-instruction single-data streams (SISD);

single-instruction multiple-data streams (SIMD);

multiple-instruction single-data streams (MISD); and

multiple-instruction multiple-data streams (MIMD).

Conventional single-processor von Neumann computers are classified as SISD systems. Parallel computers are either SIMD or MIMD. When there is only one control unit and all processors execute the same instruction in a synchronized fashion, the parallel machine is classified as SIMD. In a MIMD machine, each processor has its own control unit and can execute different instructions on different data. In the MISD category, the same stream of data flows through a linear array of processors executing different instruction streams. In practice, there is no viable MISD machine; however, some authors have considered pipelined machines (and perhaps systolic-array computers) as examples for MISD. Figures 1.1, 1.2, and 1.3 depict the block diagrams of SISD, SIMD, and MIMD, respectively.

An extension of Flynn's taxonomy was introduced by D. J. Kuck in 1978. In his classification, Kuck extended the instruction stream further to single (scalar and array) and multiple (scalar and array) streams. The data stream in Kuck's classification is called the execution stream and is also extended to include single (scalar and array) and multiple (scalar and array) streams. The combination of these streams results in a total of 16 categories of architectures.

1.3 SIMD ARCHITECTURE

The SIMD model of parallel computing consists of two parts: a front-end computer of the usual von Neumann style, and a processor array as shown in Figure 1.4. The processor array is a set of identical synchronized processing elements capable of simultaneously performing the same operation on different data. Each processor in the array has a small amount of local memory where the distributed data resides while it is being processed in parallel. The processor array is connected to the memory bus of the front end so that the front end can randomly access the local processor memories as if it were another memory. Thus, the front end can issue special commands that cause parts of the memory to be operated on simultaneously or cause data to move around in the memory. A program can be developed and executed on the front end using a traditional serial programming language. The application program is executed by the front end in the usual serial way, but issues commands to the processor array to carry out SIMD operations in parallel. The similarity between serial and data parallel programming is one of the strong points of data parallelism. Synchronization is made irrelevant by the lock-step synchronization of the processors. Processors either do nothing or exactly the same operations at the same time. In SIMD architecture, parallelism is exploited by applying simultaneous operations across large sets of data. This paradigm is most useful for solving problems that have lots of data that need to be updated on a wholesale basis. It is especially powerful in many regular numerical calculations.

There are two main configurations that have been used in SIMD machines (see Fig. 1.5). In the first scheme, each processor has its own local memory. Processors can communicate with each other through the interconnection network. If the interconnection network does not provide direct connection between a given pair of processors, then this pair can exchange data via an intermediate processor. The ILLIAC IV used such an interconnection scheme. The interconnection network in the ILLIAC IV allowed each processor to communicate directly with four neighboring processors in an 8 x 8 matrix pattern such that the i th processor can communicate directly with the (i - 1)th, (i + 1)th, (i - 8)th, and (i + 8)th processors. In the second SIMD scheme, processors and memory modules communicate with each other via the interconnection network. Two processors can transfer data between each other via intermediate memory module(s) or possibly via intermediate processor(s). The BSP (Burroughs' Scientific Processor) used the second SIMD scheme.

1.4 MIMD ARCHITECTURE

Multiple-instruction multiple-data streams (MIMD) parallel architectures are made of multiple processors and multiple memory modules connected together via some interconnection network. They fall into two broad categories: shared memory or message passing. Figure 1.6 illustrates the general architecture of these two categories. Processors exchange information through their central shared memory in shared memory systems, and exchange information through their interconnection network in message passing systems.

A shared memory system typically accomplishes interprocessor coordination through a global memory shared by all processors. These are typically server systems that communicate through a bus and cache memory controller. The bus/ cache architecture alleviates the need for expensive multiported memories and interface circuitry as well as the need to adopt a message-passing paradigm when developing application software. Because access to shared memory is balanced, these systems are also called SMP (symmetric multiprocessor) systems. Each processor has equal opportunity to read/write to memory, including equal access speed. Commercial examples of SMPs are Sequent Computer's Balance and Symmetry, Sun Microsystems multiprocessor servers, and Silicon Graphics Inc. multiprocessor servers.

A message passing system (also referred to as distributed memory) typically combines the local memory and processor at each node of the interconnection network. There is no global memory, so it is necessary to move data from one local memory to another by means of message passing. This is typically done by a Send/Receive pair of commands, which must be written into the application software by a programmer. Thus, programmers must learn the message-passing paradigm, which involves data copying and dealing with consistency issues. Commercial examples of message passing architectures c. 1990 were the nCUBE, iPSC/2, and various Transputer-based systems. These systems eventually gave way to Internet connected systems whereby the processor/memory nodes were either Internet servers or clients on individuals' desktop.

It was also apparent that distributed memory is the only way efficiently to increase the number of processors managed by a parallel and distributed system. If scalability to larger and larger systems (as measured by the number of processors) was to continue, systems had to use distributed memory techniques. These two forces created a conflict: programming in the shared memory model was easier, and designing systems in the message passing model provided scalability. The distributed-shared memory (DSM) architecture began to appear in systems like the SGI Origin2000, and others. In such systems, memory is physically distributed; for example, the hardware architecture follows the message passing school of design, but the programming model follows the shared memory school of thought. In effect, software covers up the hardware. As far as a programmer is concerned, the architecture looks and behaves like a shared memory machine, but a message passing architecture lives underneath the software. Thus, the DSM machine is a hybrid that takes advantage of both design schools.

1.4.1 Shared Memory Organization

A shared memory model is one in which processors communicate by reading and writing locations in a shared memory that is equally accessible by all processors. Each processor may have registers, buffers, caches, and local memory banks as additional memory resources. A number of basic issues in the design of shared memory systems have to be taken into consideration. These include access control, synchronization, protection, and security. Access control determines which process accesses are possible to which resources. Access control models make the required check for every access request issued by the processors to the shared memory, against the contents of the access control table. The latter contains flags that determine the legality of each access attempt. If there are access attempts to resources, then until the desired access is completed, all disallowed access attempts and illegal processes are blocked. Requests from sharing processes may change the contents of the access control table during execution. The flags of the access control with the synchronization rules determine the system's functionality. Synchronization constraints limit the time of accesses from sharing processes to shared resources. Appropriate synchronization ensures that the information flows properly and ensures system functionality. Protection is a system feature that prevents processes from making arbitrary access to resources belonging to other processes. Sharing and protection are incompatible; sharing allows access, whereas protection restricts it.

(Continues...)



Excerpted from Advanced Computer Architecture and Parallel Processing by Hesham El-Rewini M. Abd-El-Barr Copyright © 2005 by John Wiley & Sons, Inc.. Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Read More Show Less

Table of Contents

1. Introduction to Advanced Computer Architecture and Parallel Processing.

1.1 Four Decades of Computing.

1.2 Flynn’s Taxonomy of Computer Architecture.

1.3 SIMD Architecture.

1.4 MIMD Architecture.

1.5 Interconnection Networks.

1.6 Chapter Summary.

Problems.

References.

2. Multiprocessors Interconnection Networks.

2.1 Interconnection Networks Taxonomy.

2.2 Bus-Based Dynamic Interconnection Networks.

2.3 Switch-Based Interconnection Networks.

2.4 Static Interconnection Networks.

2.5 Analysis and Performance Metrics.

2.6 Chapter Summary.

Problems.

References.

3. Performance Analysis of Multiprocessor Architecture.

3.1 Computational Models.

3.2 An Argument for Parallel Architectures.

3.3 Interconnection Networks Performance Issues.

3.4 Scalability of Parallel Architectures.

3.5 Benchmark Performance.

3.6 Chapter Summary.

Problems.

References.

4. Shared Memory Architecture.

4.1 Classification of Shared Memory Systems.

4.2 Bus-Based Symmetric Multiprocessors.

4.3 Basic Cache Coherency Methods.

4.4 Snooping Protocols.

4.5 Directory Based Protocols.

4.6 Shared Memory Programming.

4.7 Chapter Summary.

Problems.

References.

5. Message Passing Architecture.

5.1 Introduction to Message Passing.

5.2 Routing in Message Passing Networks.

5.3 Switching Mechanisms in Message Passing.

5.4 Message Passing Programming Models.

5.5 Processor Support for Message Passing.

5.6 Example Message Passing Architectures.

5.7 Message Passing Versus Shared Memory Architectures.

5.8 Chapter Summary.

Problems.

References.

6. Abstract Models.

6.1 The PRAM Model and Its Variations.

6.2 Simulating Multiple Accesses on an EREW PRAM.

6.3 Analysis of Parallel Algorithms.

6.4 Computing Sum and All Sums.

6.5 Matrix Multiplication.

6.6 Sorting.

6.7 Message Passing Model.

6.8 Leader Election Problem.

6.9 Leader Election in Synchronous Rings.

6.10 Chapter Summary.

Problems.

References.

7. Network Computing.

7.1 Computer Networks Basics.

7.2 Client/Server Systems.

7.3 Clusters.

7.4 Interconnection Networks.

7.5 Cluster Examples.

7.6 Grid Computing.

7.7 Chapter Summary.

Problems.

References.

8. Parallel Programming in the Parallel Virtual Machine.

8.1 PVM Environment and Application Structure.

8.2 Task Creation.

8.3 Task Groups.

8.4 Communication Among Tasks.

8.5 Task Synchronization.

8.6 Reduction Operations.

8.7 Work Assignment.

8.8 Chapter Summary.

Problems.

References.

9. Message Passing Interface (MPI).

9.1 Communicators.

9.2 Virtual Topologies.

9.3 Task Communication.

9.4 Synchronization.

9.5 Collective Operations.

9.6 Task Creation.

9.7 One-Sided Communication.

9.8 Chapter Summary.

Problems.

References.

10 Scheduling and Task Allocation.

10.1 The Scheduling Problem.

10.2 Scheduling DAGs without Considering Communication.

10.3 Communication Models.

10.4 Scheduling DAGs with Communication.

10.5 The NP-Completeness of the Scheduling Problem.

10.6 Heuristic Algorithms.

10.7 Task Allocation.

10.8 Scheduling in Heterogeneous Environments.

Problems.

References.

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)