Parallel Programming in OpenMP / Edition 1 available in Paperback
Buy New
$70.95Buy Used
$52.02-
$52.02
$70.95Save 27% Current price is $52.02, Original price is $70.95. You Save 27%.-
SHIP THIS ITEM
Temporarily Out of Stock Online
Please check back later for updated availability.
-
Overview
- Timothy G. Mattson, Intel Corporation
"This book has an important role to play in the HPC community-both for introducing practicing professionals to OpenMP and for educating students and professionals about parallel programming. I'm happy to see that the authors have put together such a complete OpenMP presentation."
- Mary E. Zozel, Lawrence Livermore National Laboratory
The rapid and widespread acceptance of shared-memory multiprocessor architectures has created a pressing demand for an efficient way to program these systems. At the same time, developers of technical and scientific applications in industry and in government laboratories find they need to parallelize huge volumes of code in a portable fashion. OpenMP, developed jointly by several parallel computing vendors to address these issues, is an industry-wide standard for programming shared-memory and distributed shared-memory multiprocessors. It consists of a set of compiler directives and library routines that extend FORTRAN, C, and C++ codes to express shared-memory parallelism.
Parallel Programming in OpenMP is the first book to teach both the novice and expert parallel programmers how to program using this new standard. The authors, who helped design and implement OpenMP while at SGI, bring a depth and breadth to the book as compiler writers, application developers, and performance engineers.
Features:
- Designed so that expert parallel programmers can skip the opening chapters, which introduce parallel programming to novices, and jump right into the essentials of OpenMP.
- Presents all the basic OpenMP constructs in FORTRAN, C, and C++.
- Emphasizes practical concepts to address the concerns of real application developers.
- Includes high quality example programs that illustrate concepts of parallel programming as well as all the constructs of OpenMP.
- Serves as both an effective teaching text and a compact reference.
- Includes end-of-chapter programming exercises.
Rohit Chandra is currently a Chief Scientist at NARUS, Inc., a provider of internet business infrastructure solutions. He previously was a Principal Engineer in the Compiler Group of Silicon Graphics, where he helped design and implement OpenMP.
Leonardo Dagum currently works for Silicon Graphics in the Linux Server Platform Group where he is responsible for the I/O infrastructure in SGI's scalable Linux server systems. He helped define the OpenMP Fortran API. His research interests include parallel algorithms and performance modeling for parallel systems.
Dave Kohr is currently a member of the Technical Staff at NARUS, Inc. He previously was a Member of the Technical Staff in the Compiler Group at Silicon Graphics, where he helped define and implement the OpenMP.
Jeffrey McDonald currently owns SolidFX, a private software development company. In the capacity of Engineering Department Manager at Silicon Graphics, he proposed the OpenMP API effort and helped develop it into the industry standard it is today.
Dror Maydan is currently Director of Software at Tensilica, Inc., the provider of application-specific processor technology. He previously was an Engineering Department Manager in the Compiler Group of Silicon Graphics where he helped design and implement OpenMP.
Ramesh Menon is a Staff Engineer at NARUS, Inc. Prior to NARUS, Ramesh was a Staff Engineer at SGI representing SGI in the OpenMP forum. He was the founding Chairman of the OpenMP Architecture Review Board (ARB) and supervised the writing of the first OpenMP specifications.
Product Details
ISBN-13: | 2901558606714 |
---|---|
Publication date: | 10/02/2000 |
Edition description: | New Edition |
Pages: | 240 |
Product dimensions: | 6.00(w) x 1.25(h) x 9.00(d) |
About the Author
Ramesh Menon is a Staff Engineer at NARUS, Inc. Prior to NARUS, Ramesh was a Staff
Engineer at SGI representing SGI in the OpenMP forum. He was the founding Chairman of the OpenMP Architecture Review Board (ARB) and supervised the writing of the first OpenMP specifications.
Leonardo Dagum currently works for Silicon Graphics in the Linux Server Platform Group where he is responsible for the I/O infrastructure in SGI's scalable Linux server systems. He helped define the OpenMP Fortran API. His research interests include parallel algorithms and performance modeling for parallel systems.
Dave Kohr is currently a member of the Technical Staff at NARUS, Inc. He previously was a Member of the Technical Staff in the Compiler Group at Silicon Graphics, where he helped define and implement the OpenMP.
Dror Maydan is currently Director of Software at Tensilica, Inc., the provider of application-specific processor technology. He previously was an Engineering Department Manager in the Compiler Group of Silicon Graphics where he helped design and implement OpenMP.
Jeffrey McDonald currently owns SolidFX, a private software development company. In the capacity of Engineering Department Manager at Silicon Graphics, he proposed the OpenMP API effort and helped develop it into the industry standard it is today.
Table of Contents
Foreward | vii | |
Preface | xiii | |
Chapter 1 | Introduction | 1 |
1.1 | Performance with OpenMP | 2 |
1.2 | A First Glimpse of OpenMP | 6 |
1.3 | The OpenMP Parallel Computer | 8 |
1.4 | Why OpenMP? | 9 |
1.5 | History of OpenMP | 13 |
1.6 | Navigating the Rest of the Book | 14 |
Chapter 2 | Getting Started with OpenMP | 15 |
2.1 | Introduction | 15 |
2.2 | OpenMP from 10,000 Meters | 16 |
2.2.1 | OpenMP Compiler Directives or Pragmas | 17 |
2.2.2 | Parallel Control Structures | 20 |
2.2.3 | Communication and Data Environment | 20 |
2.2.4 | Synchronization | 22 |
2.3 | Parallelizing a Simple Loop | 23 |
2.3.1 | Runtime Execution Model of an OpenMP Program | 24 |
2.3.2 | Communication and Data Scoping | 25 |
2.3.3 | Synchronization in the Simple Loop Example | 27 |
2.3.4 | Final Words on the Simple Loop Example | 28 |
2.4 | A More Complicated Loop | 29 |
2.5 | Explicit Synchronization | 32 |
2.6 | The reduction Clause | 35 |
2.7 | Expressing Parallelism with Parallel Regions | 36 |
2.8 | Concluding Remarks | 39 |
2.9 | Exercises | 40 |
Chapter 3 | Exploiting Loop-Level Parallelism | 41 |
3.1 | Introduction | 41 |
3.2 | Form and Usage of the parallel do Directive | 42 |
3.2.1 | Clauses | 43 |
3.2.2 | Restrictions on Parallel Loops | 44 |
3.3 | Meaning of the parallel do Directive | 46 |
3.3.1 | Loop Nests and Parallelism | 46 |
3.4 | Controlling Data Sharing | 47 |
3.4.1 | General Properties of Data Scope Clauses | 49 |
3.4.2 | The shared Clause | 50 |
3.4.3 | The private Clause | 51 |
3.4.4 | Default Variable Scopes | 53 |
3.4.5 | Changing Default Scoping Rules | 56 |
3.4.6 | Parallelizing Reduction Operations | 59 |
3.4.7 | Private Variable Initialization and Finalization | 63 |
3.5 | Removing Data Dependences | 65 |
3.5.1 | Why Data Dependences Are a Problem | 66 |
3.5.2 | The First Step: Detection | 67 |
3.5.3 | The Second Step: Classification | 71 |
3.5.4 | The Third Step: Removal | 73 |
3.5.5 | Summary | 81 |
3.6 | Enhancing Performance | 82 |
3.6.1 | Ensuring Sufficient Work | 82 |
3.6.2 | Scheduling Loops to Balance the Load | 85 |
3.6.3 | Static and Dynamic Scheduling | 86 |
3.6.4 | Scheduling Options | 86 |
3.6.5 | Comparison of Runtime Scheduling Behavior | 88 |
3.7 | Concluding Remarks | 90 |
3.8 | Exercises | 90 |
Chapter 4 | Beyond Loop-Level Parallelism: Parallel Regions | 93 |
4.1 | Introduction | 93 |
4.2 | Form and Usage of the parallel Directive | 94 |
4.2.1 | Clauses on the parallel Directive | 95 |
4.2.2 | Restrictions on the parallel Directive | 96 |
4.3 | Meaning of the parallel Directive | 97 |
4.3.1 | Parallel Regions and SPMD-Style Parallelism | 100 |
4.4 | threadprivate Variables and the copyin Clause | 100 |
4.4.1 | The threadprivate Directive | 103 |
4.4.2 | The copyin Clause | 106 |
4.5 | Work-Sharing in Parallel Regions | 108 |
4.5.1 | A Parallel Task Queue | 108 |
4.5.2 | Dividing Work Based on Thread Number | 109 |
4.5.3 | Work-Sharing Constructs in OpenMP | 111 |
4.6 | Restrictions on Work-Sharing Constructs | 119 |
4.6.1 | Block Structure | 119 |
4.6.2 | Entry and Exit | 120 |
4.6.3 | Nesting of Work-Sharing Constructs | 122 |
4.7 | Orphaning of Work-Sharing Constructs | 123 |
4.7.1 | Data Scoping of Orphaned Constructs | 125 |
4.7.2 | Writing Code with Orphaned Work-Sharing Constructs | 126 |
4.8 | Nested Parallel Regions | 126 |
4.8.1 | Directive Nesting and Binding | 129 |
4.9 | Controlling Parallelism in an OpenMP Program | 130 |
4.9.1 | Dynamically Disabling the parallel Directives | 130 |
4.9.2 | Controlling the Number of Threads | 131 |
4.9.3 | Dynamic Threads | 133 |
4.9.4 | Runtime Library Calls and Environment Variables | 135 |
4.10 | Concluding Remarks | 137 |
4.11 | Exercises | 138 |
Chapter 5 | Synchronization | 141 |
5.1 | Introduction | 141 |
5.2 | Data Conflicts and the Need for Synchronization | 142 |
5.2.1 | Getting Rid of Data Races | 143 |
5.2.2 | Examples of Acceptable Data Races | 144 |
5.2.3 | Synchronization Mechanisms in OpenMP | 146 |
5.3 | Mutual Exclusion Synchronization | 147 |
5.3.1 | The Critical Section Directive | 147 |
5.3.2 | The atomic Directive | 152 |
5.3.3 | Runtime Library Lock Routines | 155 |
5.4 | Event Synchronization | 157 |
5.4.1 | Barriers | 157 |
5.4.2 | Ordered Sections | 159 |
5.4.3 | The master Directive | 161 |
5.5 | Custom Synchronization: Rolling Your Own | 162 |
5.5.1 | The flush Directive | 163 |
5.6 | Some Practical Considerations | 165 |
5.7 | Concluding Remarks | 168 |
5.8 | Exercises | 168 |
Chapter 6 | Performance | 171 |
6.1 | Introduction | 171 |
6.2 | Key Factors That Impact Performance | 173 |
6.2.1 | Coverage and Granularity | 173 |
6.2.2 | Load Balance | 175 |
6.2.3 | Locality | 179 |
6.2.4 | Synchronization | 192 |
6.3 | Performance-Tuning Methodology | 198 |
6.4 | Dynamic Threads | 201 |
6.5 | Bus-Based and NUMA Machines | 204 |
6.6 | Concluding Remarks | 207 |
6.7 | Exercises | 207 |
Appendix A | A Quick Reference to OpenMP | 211 |
References | 217 | |
Index | 221 |