- Shopping Bag ( 0 items )
Revised and updated with improvements conceived in parallel programming courses, The Art of Multiprocessor Programming is an authoritative guide to multicore programming. It introduces a higher level set of software development skills than that needed for efficient single-core programming. This book provides comprehensive coverage of the new principles, algorithms, and tools necessary for effective multiprocessor programming. Students and professionals alike will benefit from ...
Revised and updated with improvements conceived in parallel programming courses, The Art of Multiprocessor Programming is an authoritative guide to multicore programming. It introduces a higher level set of software development skills than that needed for efficient single-core programming. This book provides comprehensive coverage of the new principles, algorithms, and tools necessary for effective multiprocessor programming. Students and professionals alike will benefit from thorough coverage of key multiprocessor programming issues.
The computer industry is undergoing, if not another revolution, certainly a vigorous shaking-up. The major chip manufacturers have, for the time being at least, given up trying to make processors run faster. Moore's Law has not been repealed: each year, more and more transistors fit into the same space, but their clock speed cannot be increased without overheating. Instead, manufacturers are turning to "multicore" architectures, in which multiple processors (cores) communicate directly through shared hardware caches. Multiprocessor chips make computing more effective by exploiting parallelism: harnessing multiple processors to work on a single task.
The spread of multiprocessor architectures will have a pervasive effect on how we develop software. Until recently, advances in technology meant advances in clock speed, so software would effectively "speed up" by itself over time. Now, however, this free ride is over. Advances in technology will mean increased parallelism and not increased clock speed, and exploiting such parallelism is one of the outstanding challenges of modern Computer Science.
This book focuses on how to program multiprocessors that communicate via a shared memory. Such systems are often called shared-memory multiprocessors or, more recently, multicores. Programming challenges arise at all scales of multiprocessor systems—at a very small scale, processors within a single chip need to coordinate access to a shared memory location, and on a large scale, processors in a supercomputer need to coordinate the routing of data. Multiprocessor programming is challenging because modern computer systems are inherently asynchronous: activities can be halted or delayed without warning by interrupts, preemption, cache misses, failures, and other events. These delays are inherently unpredictable, and can vary enormously in scale: a cache miss might delay a processor for fewer than ten instructions, a page fault for a few million instructions, and operating system preemption for hundreds of millions of instructions.
We approach multiprocessor programming from two complementary directions: principles and practice. In the principles part of this book, we focus on computability: figuring out what can be computed in an asynchronous concurrent environment. We use an idealized model of computation in which multiple concurrent threads manipulate a set of shared objects. The sequence of the thread operations on the objects is called the concurrent program or concurrent algorithm. This model is essentially the model presented by the Java™, C#, or C++ thread packages.
Surprisingly, there are easy-to-specify shared objects that cannot be implemented by any concurrent algorithm. It is therefore important to understand what not to try, before proceeding to write multiprocessor programs. Many of the issues that will land multiprocessor programmers in trouble are consequences of fundamental limitations of the computational model, so we view the acquisition of a basic understanding of concurrent shared-memory computability as a necessary step. The chapters dealing with principles take the reader through a quick tour of asynchronous computability, attempting to expose various computability issues, and how they are addressed through the use of hardware and software mechanisms.
An important step in the understanding of computability is the specification and verification of what a given program actually does. This is perhaps best described as program correctness. The correctness of multiprocessor programs, by their very nature, is more complex than that of their sequential counterparts, and requires a different set of tools, even for the purpose of "informal reasoning" (which, of course, is what most programmers actually do). Sequential correctness is mostly concerned with safety properties. A safety property states that some "bad thing" never happens. For example, a traffic light never displays green in all directions, even if the power fails. Naturally, concurrent correctness is also concerned with safety, but the problem is much, much harder, because safety must be ensured despite the vast number of ways that the steps of concurrent threads can be interleaved. Equally important, concurrent correctness encompasses a variety of liveness properties that have no counterparts in the sequential world. A liveness property states that a particular good thing will happen. For example, a red traffic light will eventually turn green. A final goal of the part of the book dealing with principles is to introduce a variety of metrologies and approaches for reasoning about concurrent programs, which will later serve us when discussing the correctness of real-world objects and programs.
The second part of the book deals with the practice of multiprocessor programming, and focuses on performance. Analyzing the performance of multiprocessor algorithms is also different in flavor from analyzing the performance of sequential programs. Sequential programming is based on a collection of well-established and well-understood abstractions. When we write a sequential program, we usually do not need to be aware that underneath it all, pages are being swapped from disk to memory, and smaller units of memory are being moved in and out of a hierarchy of processor caches. This complex memory hierarchy is essentially invisible, hiding behind a simple programming abstraction.
In the multiprocessor context, this abstraction breaks down, at least from a performance perspective. To achieve adequate performance, the programmer must sometimes "outwit" the underlying memory system, writing programs that would seem bizarre to someone unfamiliar with multiprocessor architectures. Someday perhaps, concurrent architectures will provide the same degree of efficient abstraction now provided by sequential architectures, but in the meantime, programmers should beware.
The principles part of the book presents a progressive collection of shared objects and programming tools. Every object and tool is interesting in its own right, and we use each one to expose the reader to higher-level issues: spin-locks illustrate contention, linked lists illustrate the role of locking in data structure design, and so on. Each of these issues has important consequences for program performance. The hope is that the reader will understand the issue in a way that will later allow him or her to apply the lessons learned to specific multiprocessor systems. We culminate with a discussion of state-of-the-art technologies such as transactional memory.
We would like to include a few words about style. The book uses the Java programming language. There are, of course, other suitable languages which readers would have found equally appealing. We have a long list of reasons for our specific choice, but perhaps it is more suitable to discuss them over a cup of coffee! In the appendix we explain how the concepts expressed in Java are expressed in other popular languages or libraries. We also provide a primer on multiprocessor hardware. Throughout the book, we avoid presenting specific performance numbers for programs and algorithms, and stick to general trends. There is a good reason for this: multiprocessors vary greatly, and unfortunate though it may be, at this point in time, what works well on one machine may be significantly less impressive on another. Sticking to general trends is our way of guaranteeing that the validity of our assertions will be sustained over time.
We provide references at the end of each chapter. The reader will find a bibliographical survey of the material covered, with suggestions for further reading. Each chapter also includes a collection of exercises which readers can use to gauge their comprehension or entertain themselves on Sunday mornings.
1.1 Shared Objects and Synchronization
On the first day of your new job, your boss asks you to find all primes between 1 and 1010 (never mind why), using a parallel machine that supports ten concurrent threads. This machine is rented by the minute, so the longer your program takes, the more it costs. You want to make a good impression. What do you do?
As a first attempt, you might consider giving each thread an equal share of the input domain. Each thread might check 109 numbers, as shown in Fig. 1.1. This approach fails, for an elementary, but important reason. Equal ranges of inputs do not necessarily produce equal amounts of work. Primes do not occur uniformly: there are more primes between 1 and 109 than between 9 · 109 and 1010. To make matters worse, the computation time per prime is not the same in all ranges: it usually takes longer to test whether a large number is prime than a small number. In short, there is no reason to believe that the work will be divided equally among the threads, and it is not clear even which threads will have the most work.
A more promising way to split the work among the threads is to assign each thread one integer at a time (Fig. 1.2). When a thread is finished with testing an integer, it asks for another. To this end, we introduce a shared counter, an object that encapsulates an integer value, and that provides a getAndIncrement() method that increments its value, and returns the counter's prior value to the caller.
Fig. 1.3 shows a naïve implementation of Counter in Java. This counter implementation works well when used by a single thread, but it fails when shared by multiple threads. The problem is that the expression
is actually an abbreviation of the following, more complex code:
long temp = value; value = temp + 1; return temp;
In this code fragment, value is a field of the Counter object, and is shared among all the threads. Each thread, however, has its own local copy of temp, which is a local variable to each thread.
Now imagine that threads A and B both call the counter's getAndIncrement() method at about the same time. They might simultaneously read 1 from value, set their local temp variables to 1, value to 2, and both return 1. This behavior is not what we intended: concurrent calls to the counter's getAndIncrement() return the same value, but we expect them to return distinct values. In fact, it could get even worse. One thread might read 1 from value, but before it sets value to 2, another thread would go through the increment loop several times, reading 1 and setting to 2, reading 2 and setting to 3. When the first thread finally completes its operation and sets value to 2, it will actually be setting the counter back from 3 to 2.
The heart of the problem is that incrementing the counter's value requires two distinct operations on the shared variable: reading the value field into a temporary variable and writing it back to the Counter object.
Something similar happens when you try to pass someone approaching you head-on in a corridor. You may find yourself veering right, then left several times to avoid the other person doing exactly the same thing. Sometimes you manage to avoid bumping into them and sometimes you do not, and in fact, as we see in the later chapters, such collisions are provably unavoidable. On an intuitive level, what is going on is that each of you is performing two distinct steps: looking at ("reading") the other's current position, and moving ("writing") to one side or the other. The problem is, when you read the other's position, you have no way of knowing whether they have decided to stay or move. In the same way that you and the annoying stranger must decide who passes on the left and who on the right, threads accessing a shared Counter must decide who goes first and who goes second.
As we will see in Chapter 5, modern multiprocessor hardware provides special read-modify-write instructions that allow threads to read, modify, and write a value to memory in one atomic (i.e., indivisible) hardware step. For the Counter object, we can use such hardware to increment the counter atomically.
Excerpted from The Art of Multiprocessor Programming by Maurice Herlihy Nir Shavit Copyright © 2012 by Elsevier, Inc.. Excerpted by permission of MORGAN KAUFMANN. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
2. Mutual Exclusion
3. Concurrent Objects and Linearization
4. Foundations of Shared Memory
5. The Relative Power of Synchronization Methods
6. The Universality of Consensus
7. Spin Locks and Contention
8. Monitors and Blocking Synchronization
9. Linked Lists: the Role of Locking
10. Concurrent Queues and the ABA Problem
11. Concurrent Stacks and Elimination
12. Counting, Sorting and Distributed Coordination
13. Concurrent Hashing and Natural Parallelism
14. Skiplists and Balanced Search
15. Priority Queues
16. Futures, Scheduling and Work Distribution
18. Transactional Memory
Posted October 21, 2012
Are you a senior-level undergraduate and/or practitioner who programs multiprocessors? If you are, then this book is for you! Authors Maurice Herlihy and Nir Shavit, have done an outstanding job of writing a book that focuses on how to program multiprocessors that communicate via a shared memory. Herlihy and Shavit, begin by covering classical mutual exclusion algorithms that work by reading and writing shared memory. In addition, the authors examine various ways of specifying correctness and progress. They then describe the foundations of concurrent shared memory computing. Next, the authors show you a simple technique for proving statements of the form, where there is no wait-free implementation of X by Y. The authors continue by describing how to use consensus objects to build a universal construction that implements any concurrent object. In addition, they then try to make you understand how architecture affects performance, and how to exploit this knowledge to write efficient concurrent programs. The authors then show you how monitors are a structured way of combining synchronization and data. Next, the authors introduce several useful techniques that go beyond course-grained locking to allow multiple threads to access a single object at the same time. They continue by considering a kind of pool that provides first-in-first-out fairness. In addition, the authors show you how to implement concurrent stacks. They then show you how some important problems that seem inherently sequential can be made highly parallel by spreading out coordination tasks among multiple parties. Next, the authors look at concurrent hashing – a problem that seems to be naturally parallelizable or, using a more technical term: disjoint-access-parallel; meaning that concurrent method calls are likely to access disjoint locations, implying that there is little need for synchronization. They continue by looking at concurrent search structures with logarithmic depth. In addition, the authors describe how a priority queue typically provides an add() method to add an item to the set, and a removeMin() method to remove and return the item of minimal score. The authors then show you how to decompose certain kinds of problems into components that can be executed in parallel. Next, they discuss how a barrier is a way of forcing asynchronous threads to act almost as if they were synchronous. Finally, the authors review and analyze the strengths and weaknesses of the standard synchronization primitives, and describe some emerging alternatives that are likely to extend, and perhaps even to displace many of today’s standard primitives. This most excellent book focuses on computability: figuring out what can be computed in an asynchronous concurrent environment. Perhaps more importantly, this book deals with the practice of multiprocessor programming, and focuses on performance.Was this review helpful? Yes NoThank you for your feedback. Report this reviewThank you, this review has been flagged.