Table of Contents
Preface v
Chapter 1 Introduction 1
1.1 Definition of Parallel Computing 1
1.2 Evolution of Computers 4
1.3 An Enabling Technology 8
1.4 Cost Effectiveness 9
Chapter 2 Performance Metrics and Models 13
2.1 Parallel Activity Trace 13
2.2 Speedup 14
2.3 Parallel Efficiency 15
2.4 Load Imbalance 15
2.5 Granularity 16
2.6 Overhead 17
2.7 Scalability 18
2.8 Amdahl's Law 18
Chapter 3 Hardware Systems 19
3.1 Node Architectures 19
3.2 Network Interconnections 21
3.3 Instruction and Data Streams 28
3.4 Processor-Memory Connectivity 29
3.5 IO Subsystems 29
3.6 System Convergence 31
3.7 Design Considerations 31
Chapter 4 Software Systems 35
4.1 Node Software 35
4.2 Programming Models 37
4.3 Parallel Debuggers 43
4.4 Parallel Profilers 43
Chapter 5 Design of Algorithms 45
5.1 Algorithm Models 46
5.2 Examples of Collective Operations 54
5.3 Mapping Tasks to Processors 56
Chapter 6 Linear Algebra 65
6.1 Problem Decomposition 65
6.2 Matrix Operations 68
6.3 Solution of Linear Systems 81
Chapter 7 Differential Equations 89
7.1 Integration and Differentiation 89
7.2 Partial Differential Equations 92
Chapter 8 Fourier Transforms 105
8.1 Fourier Transforms 105
8.2 Discrete Fourier Transforms 106
8.3 Fast Fourier Transforms 107
8.4 Simple Parallelization 111
8.5 The Transpose Method 112
8.6 Complexity Analysis for FFT 113
Chapter 9 Optimization 115
9.1 Monte Carlo Methods 116
9.2 Parallelization 119
Chapter 10 Applications 123
10.1 Newton's Equation and Molecular Dynamics 124
10.2 Schrodinger's Equations and Quantum Mechanics 133
10.3 Partition Function, DFT and Material Science 134
10.4 Maxwell's Equations and Electrical Engineering 135
10.5 Diffusion Equation and Mechanical Engineering 135
10.6 Navier-Stokes Equation and CFD 136
10.7 Other Applications 136
Appendix A MPI 139
A.1 An MPI Primer 139
A.2 Examples of Using MPI 159
A.3 MPI Tools 161
A.4 Complete List of MPI Functions 167
Appendix B OpenMP 171
B.1 Introduction to OpenMP 171
B.2 Memory Model of OpenMP 172
B.3 OpenMP Directives 172
B.4 Synchronization 174
B.5 Runtime Library Routines 175
B.6 Examples of Using OpenMP 178
B.7 The Future 180
Appendix C Projects 181
Project C.1 Watts and Flops of Supercomputers 181
Project C.2 Review of Supercomputers 181
Project C.3 Top500 and BlueGene Supercomputers 181
Project C.4 Say Hello in Order 182
Project C.5 Broadcast on Torus 183
Project C.6 Competing with MPI on Broadcast, Scatter, etc 183
Project C.7 Simple Matrix Multiplication 183
Project C.8 Matrix Multiplication on 4D Torus 183
Project C.9 Matrix Multiplication and PAT 184
Project C.10 Matrix Inversion 184
Project C.11 Simple Analysis of an iBT Network 185
Project C.12 Compute Eigenvalues of Adjacency Matrices of Networks 185
Project C.13 Mapping Wave Equation to Torus 185
Project C.14 Load Balance in 3D Mesh 186
Project C.15 Wave Equation and PAT 186
Project C.16 Computing Coulomb's Forces 187
Project C.17 Timing Model for MD 187
Project C.18 Minimizing Lennard-Jones Potential 188
Project C.19 Install and Profile CP2K 188
Project C.20 Install and Profile CPMD 189
Project C.21 Install and Profile NAMD 190
Project C.22 FFT on Beowulf 190
Project C.23 FFT on BlueGene/Q 191
Project C.24 Word Analysis 191
Project C.25 Cost Estimate of a 0.1 Pflops System 191
Project C.26 Design of a Pflops System 191
Appendix D Program Examples 193
D.1 Matrix-Vector Multiplication 193
D.2 Long Range N-body Force 195
D.3 Integration 201
References 203
Index 205