Introduction to High Performance Computing for Scientists and Engineers
Written by high performance computing (HPC) experts, Introduction to High Performance Computing for Scientists and Engineers provides a solid introduction to current mainstream computer architecture, dominant parallel programming models, and useful optimization strategies for scientific HPC. From working in a scientific computing center, the authors gained a unique perspective on the requirements and attitudes of users as well as manufacturers of parallel computers.

The text first introduces the architecture of modern cache-based microprocessors and discusses their inherent performance limitations, before describing general optimization strategies for serial code on cache-based architectures. It next covers shared- and distributed-memory parallel computer architectures and the most relevant network topologies. After discussing parallel computing on a theoretical level, the authors show how to avoid or ameliorate typical performance problems connected with OpenMP. They then present cache-coherent nonuniform memory access (ccNUMA) optimization techniques, examine distributed-memory parallel programming with message passing interface (MPI), and explain how to write efficient MPI code. The final chapter focuses on hybrid programming with MPI and OpenMP.

Users of high performance computers often have no idea what factors limit time to solution and whether it makes sense to think about optimization at all. This book facilitates an intuitive understanding of performance limitations without relying on heavy computer science knowledge. It also prepares readers for studying more advanced literature.

Read about the authors’ recent honor: Informatics Europe Curriculum Best Practices Award for Parallelism and Concurrency

1101428663
Introduction to High Performance Computing for Scientists and Engineers
Written by high performance computing (HPC) experts, Introduction to High Performance Computing for Scientists and Engineers provides a solid introduction to current mainstream computer architecture, dominant parallel programming models, and useful optimization strategies for scientific HPC. From working in a scientific computing center, the authors gained a unique perspective on the requirements and attitudes of users as well as manufacturers of parallel computers.

The text first introduces the architecture of modern cache-based microprocessors and discusses their inherent performance limitations, before describing general optimization strategies for serial code on cache-based architectures. It next covers shared- and distributed-memory parallel computer architectures and the most relevant network topologies. After discussing parallel computing on a theoretical level, the authors show how to avoid or ameliorate typical performance problems connected with OpenMP. They then present cache-coherent nonuniform memory access (ccNUMA) optimization techniques, examine distributed-memory parallel programming with message passing interface (MPI), and explain how to write efficient MPI code. The final chapter focuses on hybrid programming with MPI and OpenMP.

Users of high performance computers often have no idea what factors limit time to solution and whether it makes sense to think about optimization at all. This book facilitates an intuitive understanding of performance limitations without relying on heavy computer science knowledge. It also prepares readers for studying more advanced literature.

Read about the authors’ recent honor: Informatics Europe Curriculum Best Practices Award for Parallelism and Concurrency

105.0 In Stock
Introduction to High Performance Computing for Scientists and Engineers

Introduction to High Performance Computing for Scientists and Engineers

by Georg Hager
Introduction to High Performance Computing for Scientists and Engineers

Introduction to High Performance Computing for Scientists and Engineers

by Georg Hager

Paperback(Older Edition)

$105.00 
  • SHIP THIS ITEM
    In stock. Ships in 3-7 days. Typically arrives in 3 weeks.
  • PICK UP IN STORE

    Your local store may have stock of this item.

Related collections and offers


Overview

Written by high performance computing (HPC) experts, Introduction to High Performance Computing for Scientists and Engineers provides a solid introduction to current mainstream computer architecture, dominant parallel programming models, and useful optimization strategies for scientific HPC. From working in a scientific computing center, the authors gained a unique perspective on the requirements and attitudes of users as well as manufacturers of parallel computers.

The text first introduces the architecture of modern cache-based microprocessors and discusses their inherent performance limitations, before describing general optimization strategies for serial code on cache-based architectures. It next covers shared- and distributed-memory parallel computer architectures and the most relevant network topologies. After discussing parallel computing on a theoretical level, the authors show how to avoid or ameliorate typical performance problems connected with OpenMP. They then present cache-coherent nonuniform memory access (ccNUMA) optimization techniques, examine distributed-memory parallel programming with message passing interface (MPI), and explain how to write efficient MPI code. The final chapter focuses on hybrid programming with MPI and OpenMP.

Users of high performance computers often have no idea what factors limit time to solution and whether it makes sense to think about optimization at all. This book facilitates an intuitive understanding of performance limitations without relying on heavy computer science knowledge. It also prepares readers for studying more advanced literature.

Read about the authors’ recent honor: Informatics Europe Curriculum Best Practices Award for Parallelism and Concurrency


Product Details

ISBN-13: 9781439811924
Publisher: Taylor & Francis
Publication date: 07/02/2010
Series: Chapman & Hall/CRC Computational Science , #7
Edition description: Older Edition
Pages: 356
Product dimensions: 6.00(w) x 9.20(h) x 0.80(d)

About the Author

Georg Hager is a senior research scientist in the high performance computing group of the Erlangen Regional Computing Center at the University of Erlangen-Nuremberg in Germany. Gerhard Wellein leads the high performance computing group of the Erlangen Regional Computing Center and is a professor in the Department for Computer Science at the University of Erlangen-Nuremberg in Germany.

Table of Contents

Foreword xiii

Preface xv

About the authors xxi

List of acronyms and abbreviations xxiii

1 Modern processors 1

1.1 Stored-program computer architecture 1

1.2 General-purpose cache-based microprocessor architecture 2

1.2.1 Performance metrics and benchmarks 3

1.2.2 Transistors galore: Moore's Law 7

1.2.3 Pipelining 9

1.2.4 Superscalarity 13

1.2.5 SIMD 14

1.3 Memory hierarchies 15

1.3.1 Cache 15

1.3.2 Cache mapping 18

1.3.3 Prefetch 20

1.4 Multicore processors 23

1.5 Multithreaded processors 26

1.6 Vector processors 28

1.6.1 Design principles 29

1.6.2 Maximum performance estimates 31

1.6.3 Programming for vector architectures 32

2 Basic optimization techniques for serial code 37

2.1 Scalar profiling 37

2.1.1 Function- and line-based runtime profiling 38

2.1.2 Hardware performance counters 41

2.1.3 Manual instrumentation 45

2.2 Common sense optimizations 45

2.2.1 Do less work! 45

2.2.2 Avoid expensive operations! 46

2.2.3 Shrink the working set! 47

2.3 Simple measures, large impact 47

2.3.1 Elimination of common subexpressions 47

2.3.2 Avoiding branches 48

2.3.3 Using SIMD instruction sets 49

2.4 The role of compilers 51

2.4.1 General optimization options 52

2.4.2 Inlining 52

2.4.3 Aliasing 53

2.4.4 Computational accuracy 54

2.4.5 Register optimizations 55

2.4.6 Using compiler logs 55

2.5 C++ optimizations 56

2.5.1 Temporaries 56

2.5.2 Dynamic memory management 59

2.5.3 Loop kernels and iterators 60

3 Data access optimization 63

3.1 Balance analysis and lightspeed estimates 63

3.1.1 Bandwidth-based performance modeling 63

3.1.2 The STREAM benchmarks 67

3.2 Storage order 69

3.3 Case study: The Jacobi algorithm 71

3.4 Case study: Dense matrix transpose 74

3.5 Algorithm classification and access optimizations 79

3.5.1 O(N)/O(N) 79

3.5.2 O(N2)/O(N2) 79

3.5.3 O(N3)/O(N2) 84

3.6 Case study: Sparse matrix-vector multiply 86

3.6.1 Sparse matrix storage schemes 86

3.6.2 Optimizing JDS sparse MVM 89

4 Parallel computers 95

4.1 Taxonomy of parallel computing paradigms 96

4.2 Shared-memory computers 97

4.2.1 Cache coherence 97

4.2.2 UMA 99

4.2.3 ccNUMA 100

4.3 Distributed-memory computers 102

4.4 Hierarchical (hybrid) systems 103

4.5 Networks 104

4.5.1 Basic performance characteristics of networks 104

4.5.2 Buses 109

4.5.3 Switched and fat-tree networks 110

4.5.4 Mesh networks 112

4.5.5 Hybrids 113

5 Basics of parallelization 115

5.1 Why parallelize? 115

5.2 Parallelism 116

5.2.1 Data parallelism 116

5.2.2 Functional parallelism 119

5.3 Parallel scalability 120

5.3.1 Factors that limit parallel execution 120

5.3.2 Scalability metrics 122

5.3.3 Simple scalability laws 123

5.3.4 Parallel efficiency 125

5.3.5 Serial performance versus strong scalability 126

5.3.6 Refined performance models 128

5.3.7 Choosing the right scaling baseline 130

5.3.8 Case study: Can slower processors compute faster? 131

5.3.9 Load imbalance 137

6 Shared-memory parallel programming with OpenMP 143

6.1 Short introduction to OpenMP 143

6.1.1 Parallel execution 144

6.1.2 Data scoping 146

6.1.3 OpenMP worksharing for loops 147

6.1.4 Synchronization 149

6.1.5 Reductions 150

6.1.6 Loop scheduling 151

6.1.7 Tasking 153

6.1.8 Miscellaneous 154

6.2 Case study: OpenMP-parallel Jacobi algorithm 156

6.3 Advanced OpenMP: Wavefront parallelization 158

7 Efficient OpenMP programming 165

7.1 Profiling OpenMP programs 165

7.2 Performance pitfalls 166

7.2.1 Ameliorating the impact of OpenMP worksharing constructs 168

7.2.2 Determining OpenMP overhead for short loops 175

7.2.3 Serialization 177

7.2.4 False sharing 179

7.3 Case study: Parallel sparse matrix-vector multiply 181

8 Locality optimizations on ccNUMA architectures 185

8.1 Locality of access on ccNUMA 185

8.1.1 Page placement by first touch 186

8.1.2 Access locality by other means 190

8.2 Case study: ccNUMA optimization of sparse MVM 190

8.3 Placement pitfalls 192

8.3.1 NUMA-unfriendly OpenMP scheduling 192

8.3.2 File system cache 194

8.4 ccNUMA issues with C++ 197

8.4.1 Arrays of objects 197

8.4.2 Standard Template Library 199

9 Distributed-memory parallel programming with MPI 203

9.1 Message passing 203

9.2 A short introduction to MPI 205

9.2.1 A simple example 205

9.2.2 Messages and point-to-point communication 207

9.2.3 Collective communication 213

9.2.4 Nonblocking point-to-point communication 216

9.2.5 Virtual topologies 220

9.3 Example: MPI parallelization of a Jacobi solver 224

9.3.1 MPI implementation 224

9.3.2 Performance properties 230

10 Efficient MPI programming 235

10.1 MPI performance tools 235

10.2 Communication parameters 239

10.3 Synchronization, serialization, contention 240

10.3.1 Implicit serialization and synchronization 240

10.3.2 Contention 243

10.4 Reducing communication overhead 244

10.4.1 Optimal domain decomposition 244

10.4.2 Aggregating messages 248

10.4.3 Nonblocking vs. asynchronous communication 250

10.4.4 Collective communication 253

10.5 Understanding intranode point-to-point communication 253

11 Hybrid parallelization with MPI and OpenMP 263

11.1 Basic MPI/OpenMP programming models 264

11.1.1 Vector mode implementation 264

11.1.2 Task mode implementation 265

11.1.3 Case study: Hybrid Jacobi solver 267

11.2 MPI taxonomy of thread interoperability 268

11.3 Hybrid decomposition and mapping 270

11.4 Potential benefits and drawbacks of hybrid programming 273

A Topology and affinity in multicore environments 277

A.l Topology 279

A.2 Thread and process placement 280

A.2.1 External affinity control 280

A.2.2 Affinity under program control 283

A.3 Page placement beyond first touch 284

B Solutions to the problems 287

Bibliography 309

Index 323

What People are Saying About This

From the Publisher

Georg Hager and Gerhard Wellein have developed a very approachable introduction to high performance computing for scientists and engineers. Their style and description is easy to read and follow. … This book presents a balanced treatment of the theory, technology, architecture, and software for modern high performance computers and the use of high performance computing systems. The focus on scientific and engineering problems makes this both educational and unique. I highly recommend this timely book for scientists and engineers. I believe this book will benefit many readers and provide a fine reference.
—From the Foreword by Jack Dongarra, University of Tennessee, Knoxville, USA

From the B&N Reads Blog

Customer Reviews