The second half of the 1970s was marked with impressive advances in array/vector architectures and vectorization techniques and compilers. This progress continued with a particular focus on vector machines until the middle of the 1980s. The major ity of supercomputers during this period were register-to-register (Cray 1) or memory-to-memory (CDC Cyber 205) vector (pipelined) machines. However, the increasing demand for higher computational rates lead naturally to parallel comput ers and software. Through the replication of autonomous processors in a coordinated system, one can skip over performance barriers due technology limitations. In princi ple, parallelism offers unlimited performance potential. Nevertheless, it is very difficult to realize this performance potential in practice. So far, we have seen only the tip of the iceberg called "parallel machines and parallel programming". Parallel programming in particular is a rapidly evolving art and, at present, highly empirical. In this book we discuss several aspects of parallel programming and parallelizing compilers. Instead of trying to develop parallel programming methodologies and paradigms, we often focus on more advanced topics assuming that the reader has an adequate background in parallel processing. The book is organized in three main parts. In the first part (Chapters 1 and 2) we set the stage and focus on program transformations and parallelizing compilers. The second part of this book (Chapters 3 and 4) discusses scheduling for parallel machines from the practical point of view macro and microtasking and supporting environments). Finally, the last part (Le.
|Series:||The Springer International Series in Engineering and Computer Science , #59|
|Edition description:||Softcover reprint of the original 1st ed. 1988|
|Product dimensions:||5.98(w) x 9.02(h) x 0.02(d)|
Table of Contents1 Parallel Architectures and Compilers.- 1.1 Introduction.- 1.2 Book Overview.- 1.3 Vector and Parallel Machines.- 1.4 Parallelism in Programs.- 1.4.1 Sources and Types of Parallelism.- 1.4.2 Expressing Parallelism in Programs.- 1.4.3 Utilization of Parallelism.- 1.4.4 The Role of the Compiler.- 1.5 Basic Concepts and Definitions.- 2 Program restructuring for parallel execution.- 2.1 Data Dependences.- 2.2 Common Optimizations.- 2.2.1 Induction Variable Substitution.- 2.2.2 Index Recurrences.- 2.2.3 Loop Unrolling.- 2.2.4 Constant Propagation and Expression Evaluation.- 2.3 Transformations for Vector/Parallel Loops.- 2.3.1 Loop Vectorization and Loop Distribution.- 2.3.2 Loop Interchange.- 2.3.3 Node Splitting and Statement Reordering.- 2.3.4 Code Reordering for Minimizing Communication.- 2.3.5 Loop Blocking.- 2.4 Cycle Shrinking.- 2.4.1 Simple Loops.- 2.4.2 Complex Loops.- 2.4.3 Computing Dependence Distances.- 2.4.4 Cycle Shrinking vs. Partition.- 2.4.5 The Cost of Barriers in Cycle Shrinking.- 2.5 Loop Spreading.- 2.5.1 The Transformation for Chains of Loops.- 2.5.2 Generalizations.- 2.6 Loop Coalescing.- 2.6.1 Processor Assignment and Subscript Calculation.- 2.6.2 Hybrid Loops.- 2.6.3 Non-Perfectly Nested Loops, One Way Nesting.- 2.6.4 Multiway Nested Loops.- 2.7 Run-Time Dependence Testing.- 2.7.1 The RDC Transformation for Nonsubscripted Subscripts.- 2.8 Subscript Blocking.- 2.8.1 The Problem of Subscripted Subscripts and its Application.- 2.8.2 The Transformation.- 2.8.3 Recurrences with Subscripted Subscripts.- 2.8.4 Multiply Nested Loops.- 2.8.5 Expected Speedup.- 2.9 Future Directions.- 3 A Comprehensive Environment for Automatic Packaging and Scheduling of Parallelism.- 3.1 Introduction.- 3.1.1 Previous Approaches & Current Parallel Machines.- 3.1.2 Modes of Operation and Importance of Scheduling.- 3.1.3 Sophisticated vs Simple & Static vs Dynamic.- 3.1.4 Scheduling Goals: Overhead & Load Balancing.- 3.1.5 Granularity and Partitioning.- 3.1.6 Synchronization and Precedence Relations.- 3.2 A Comprehensive Approach to Scheduling.- 3.2.1 Partitioning.- 3.2.2 Pre-Scheduling.- 3.2.3 Run-Time Task Scheduling.- 3.2.4 Loop Scheduling.- 3.2.5 Beyond Loops: Scalar, Vector, and VLIW Processors.- 3.3 Auto-Scheduling Compilers.- 3.3.1 Possible Advantages & Disadvantages.- 3.3.2 A General Framework.- 3.3.3 Enforcing Task Execution Order.- 3.3.4 Optimizing the Drive Code.- 4 Static and Dynamic Loop Scheduling.- 4.1 Introduction.- 4.1.1 Parallel Loops.- 4.1.2 Self-Scheduling Through Implicit Coalescing.- 4.2 The Guided Self-Scheduling (GSS(k)) Algorithm.- 4.2.1 Implicit Loop Coalescing and Interchange.- 4.2.2 The Scheduling Algorithm.- 4.2.3 Further Reduction of Synchronization Operations.- 4.3 Simulation Results.- 4.3.1 The Simulator.- 4.3.2 Experiments.- 4.4 Static Loop Scheduling.- 4.4.1 Definitions and Basic Concepts.- 4.4.2 Optimal Processor Assignment to Parallel Loops.- 4.4.3 Experiments.- 5 Run-Time Overhead.- 5.1 Introduction.- 5.2 Bounds for Dynamic Loop Scheduling.- 5.3 Overhead of Parallel Tasks.- 5.4 Two Run-Time Overhead Models.- 5.4.1 Run-time Overhead is O(p).- 5.4.2 Run-Time Overhead is O(log p).- 5.4.3 Measurements.- 5.5 Deciding the Minimum Unit of Allocation.- 6 Static Program Partitioning.- 6.1 Introduction.- 6.1.1 Communication and Parallelism Trade-offs.- 6.1.2 More on Communication and Partitioning.- 6.2 Methods for Program Partitioning.- 6.2.1 A Model for Quantifying Communication.- 6.3 Optimal Task Composition for Chains.- 6.4 Details of Interprocessor Communication.- 7 Static Task Scheduling.- 7.1 Introduction.- 7.2 Optimal Allocations for High Level Spreading.- 7.3 Scheduling Independent Serial Tasks.- 7.4 High Level Spreading for Complete Task Graphs.- 7.4.1 Processor Allocation for p-Wide Task Graphs.- 7.4.2 List Scheduling Heuristics.- 7.4.3 The Weighted Priority Heuristic Algorithm.- 7.5 Bounds for Static Scheduling.- 8 Speedup Bounds for Parallel Programs.- 8.1 Introduction.- 8.2 General Bounds on Speedup.- 8.3 Speedup Measures for Task Graphs.- 8.4 Speedup Measures for Doacr Loops.- 8.5 Multiprocessors vs. Vector/Array Machines.- References.