ISBN-10:
0805327304
ISBN-13:
9780805327304
Pub. Date:
06/20/1995
Publisher:
Pearson
High Performance Compilers for Parallel Computing / Edition 1

High Performance Compilers for Parallel Computing / Edition 1

Paperback

View All Available Formats & Editions
Current price is , Original price is $131.4. You
Select a Purchase Option (New Edition)
  • purchase options
    $114.97 $131.40 Save 13% Current price is $114.97, Original price is $131.4. You Save 13%.
  • purchase options
    $98.18 $131.40 Save 25% Current price is $98.18, Original price is $131.4. You Save 25%.
    icon-error
    Note: Access code and/or supplemental material are not guaranteed to be included with textbook rental or used textbook.
  • purchase options

Product Details

ISBN-13: 9780805327304
Publisher: Pearson
Publication date: 06/20/1995
Edition description: New Edition
Pages: 500
Product dimensions: 7.00(w) x 9.10(h) x 1.00(d)

About the Author

As co-founder in 1979 of Kuck and Associates, Inc., Michael Wolfe helped develop KAP restructuring, parallelizing compiler software. In 1988, Wolfe joined the Oregon Graduate Institute of Science and Technology faculty, directing research on language and compiler issues for high performance computer systems. His current research includes development and implementation of program restructuring transformations to optimize programs for execution on parallel computers, refinement and application of recent results in analysis techniques to low level compiler optimizations, and analysis of data dependence decision algorithms.



0805327304AB04062001

Read an Excerpt

Techniques for constructing compilers for parallel computers have been in academic and commercial environments over the past 30 years. Some of these techniques are not quite mature and in common use, but there is still much active research in this area. Most of the reference material is scattered over many conference proceedings and journals. This book is intended to serve as coherent presentation of the important basic ideas used in modern compilers for parallel computer systems. It can be used as a reference or as a text for second or third course on compilers at the senior undergraduate or graduate level.

This book differs from previous collections in that its focus is not the automatic discovery of parallelism from sequential programs, though that is also included. Instead, its focus is techniques to generate optimized parallel code, given the source program and target architecture. The optimizer in high performance compilers is organized as a deep analysis phase followed by a code generation, or synthesis, phase. This book follows that organization.

The first chapter introduces the material and takes one example program through several stages of optimization for a variety of target machines. Each target architecture is presented at a high level, and is modeled after current commercial machines.

Chapter 2 discusses programming languages issues. Of particular interest are the parallel language extensions proposed for various languages, such as array assignments in Fortran 90, the forall statement in High Performance Fortran, and other parallel loop constructs.

Chapter 3 and 4 introduce basic analysis algorithms that are in common use in compilers. Chapter 3 focuses onalgorithms for graphs, which are used in compilers to represent control flow, interprocedural calls, and dependence. Chapter 4 discusses various aspects of linear algebra, which is becoming more important in compilers, including subjects such as solving linear and integer systems of equations and inequalities.

Chapter 5 through 8 cover aspects of the analysis phase of the optimizer. Chapter 5 presents the basic ideas behind data dependence relations, as used in compilers. In order to allow the most freedom in reordering and optimizing a program, compilers find dependence relations between program statements that cannot be violated without changing the meaning of the program.

Chapter 6 discussed various aspects of scalar analysis that are important for parallel computing, such as constant propagation and precise data dependence analysis for scalars. In particular, induction variable detection is important for array analysis.

Chapter 7 shows how to use the linear algebra techniques from Chapter 4 to find data dependence relations between array references. Because the linear algebra appears separately, this chapter is somewhat shorter than such a chapter might be in other books on the subject.

Chapter 8 discusses other problems related to dependence analysis, such as summarizing array accesses across procedural boundaries, solving data dependence analysis problems in the presence of pointers and I/O, and so forth.

Chapter 9 details the techniques used in the restructuring phase of creating the optimizer, focusing on loop structuring techniques. A catalogue of loop optimizations is presented, along with examples to show effects on performance.

Chapter 10 through 14 show how to tailor the code generation for various target architectures. Chapter 10 discusses a sequential target machine in which the compiler restructures the program to take advantage of a memory hierarchy (typically one or more levels of processor cache memory).

Chapter 11 presents methods to generate code and optimize for shared-memory parallel computers, which are now becoming common even at the workstation level. Automatic discovery of parallelism is also discussed.

Chapter 12 shows how to apply similar techniques for vector instruction sets, including automatic vectorization and supervector code generation.

Chapter 13 presents code generation methods for massively parallel message-passing computer systems, of both the SIMD and MIMD variety.

Chapter 14 shows techniques for massively parallel shared-memory systems. Different methods are used for the three varieties of machines in this category, depending on whether processor cache memories are kept consistent using global information or local information, or are absent altogether.

Each chapter includes a section titled "In the Pit," which includes hints and other anecdotal material that may be of some use when applying the information covered in the chapter. A "Further Reading" section contains citations to the original material in the reference list, and the exercises can be used as assignments or to test comprehension of the material.

Additional material that has proved useful for teaching is available via anonymous FTP form the machine bc.aw.com in the directory bc/wolfe/highperform. This material includes Postscript copies of the figures, programs implementing many of the algorithms, and the Tiny loop restructuring tool. A more complete bibliography, in Bibtex format, along with citations by chapter, can also be found. There is a README file containing useful information about the rest of the directory. Acknowledgments

This book grew out of a series of short courses that I offered over the past three years. Little material in this book is original or invented by the author; I owe a debt of gratitude to the many developers of the techniques used here. My introduction to this topic was working with the Parafrase group at the University of Illinois from 1976 to 1980. Many ideas now crucial in modern restructuring compilers were developed during the time. During my tenure at Kuck and Associates, Inc., I learned the important distinction between science and engineering , and the good engineering in a compiler is critically important. After joining the Oregon Graduate Institute, I have had more contacts with compiler researchers and developers around the globe; I have learned more during this period than ever before.

I especially thank the reviewers, who helped maintain consistency and made numerous important and helpful suggestions: Ron K. Cytron of Washington University, James Larus of the University of Wisconsin, Carl Offner of Digital Equipment Corp., J. (Ram) Ramanujam of Louisiana State University, David Stotts of the University of North Carolina, and Alan Sussman of the University of Maryland. Any errors contained herein are, of course, entirely my own fault. Several students at OGI reviewed selected chapters: Tito Autrey, Michael Gerlek, Priyadarshan Kolte, and Eric Stoltz. The editors and staff at the Benjamin/Cummings Publishing Co. were very encouraging, and to them I owe a great deal of gratitude.

Michael Wolfe

Table of Contents



1. High Performance Systems.

An Example Program: Matrix Multiplication.

Structure of a Compiler.



2. Programming Language Features.

Languages for High Performance.

Sequential and Parallel Loops.

Roundoff Error.



3. Basic Graph Concepts.

Sets, Tuples, Logic.

Graphs.

Control Dependence.



4. Review of Linear Algebra.

Real Vectors and Matrices.

Integer Matrices and Lattices.

Linear System of Equations.

System of Integer Equations.

Systems of Linear Inequalities.

Systems of Integer Linear Inequalities.

Extreme Values of Affine Functions.



5. Data Dependence.

Data Dependence in Loops.

Data Dependence in Conditionals.

Data Dependence in Parallel Loops.

Program Dependence Graph.



6. Scalar Analysis with Factored Use-Def Chains.

Constructing Factored Use-Def Chains.

FUD Chains for Arrays.

Finding All Reaching Definitions.

Implicit References in FUD Chains.

InductionVariables Using FUD Chains.

Constant Propagation with FUD Chains.

Data Dependence for Scalars.



7. Data Dependence Analysis for Arrays.

Building the Dependence System.

Dependence System Solvers.

General Solver.

Summary of Solvers.

Complications.

Run-time Dependence Testing.



8. Other Dependence Problems.

Array Region Analysis.

Pointer Analysis.

I/O Dependence.

Procedure Calls.

Interprocedural Analysis.



9. Loop Restructuring.

Simpile Transformations.

Loop Fusion.

Loop Fission.

Loop Reversal.

Loop Interchanging.

Loop Skewing.

Linear Loop Transformations.

Strip-Mining.

Loop Tiling.

Other Loop Transformations.

Interprocedural Transformations.



10. Optimizing for Locality.

Single Reference to Each Array.

Multiple References.

General Tiling.

Fission and Fusion for Locality.



11. Concurrency Analysis.

Code for Concurrent Loops.

Concurrency from Sequential Loops.

Concurrency from Parallel Loops.

Nested Loops.

Roundoff Error.

Exceptions and Debuggers.



12. Vector Analysis.

Vector Code.

Vector Code from Sequential Loops.

Vector Code from Forall Loops.

Nested Loops.

Roundoff Error, Exceptions, and Debuggers.

Multivector Computers.



13. Message-Passing Machines.

SIMD Machines.

MIMD Machines.

Data Layout.

Parallel Code for Array Assignment.

Remote Data Access.

Automatic Data Layout.

Multiple Array Assignments.

Other Topics.



14. Scalable Shared-Memory Machines.

Global Cache Coherence.

Local Cache Coherence.

Latency Tolerant Machines.



Glossary.


References.


Author Index.


Index. 0805327304T04062001

Preface

Techniques for constructing compilers for parallel computers have been in academic and commercial environments over the past 30 years. Some of these techniques are not quite mature and in common use, but there is still much active research in this area. Most of the reference material is scattered over many conference proceedings and journals. This book is intended to serve as coherent presentation of the important basic ideas used in modern compilers for parallel computer systems. It can be used as a reference or as a text for second or third course on compilers at the senior undergraduate or graduate level.

This book differs from previous collections in that its focus is not the automatic discovery of parallelism from sequential programs, though that is also included. Instead, its focus is techniques to generate optimized parallel code, given the source program and target architecture. The optimizer in high performance compilers is organized as a deep analysis phase followed by a code generation, or synthesis, phase. This book follows that organization.

The first chapter introduces the material and takes one example program through several stages of optimization for a variety of target machines. Each target architecture is presented at a high level, and is modeled after current commercial machines.

Chapter 2 discusses programming languages issues. Of particular interest are the parallel language extensions proposed for various languages, such as array assignments in Fortran 90, the forall statement in High Performance Fortran, and other parallel loop constructs.

Chapter 3 and 4 introduce basic analysis algorithms that are in common use in compilers. Chapter 3 focuses on algorithms for graphs, which are used in compilers to represent control flow, interprocedural calls, and dependence. Chapter 4 discusses various aspects of linear algebra, which is becoming more important in compilers, including subjects such as solving linear and integer systems of equations and inequalities.

Chapter 5 through 8 cover aspects of the analysis phase of the optimizer. Chapter 5 presents the basic ideas behind data dependence relations, as used in compilers. In order to allow the most freedom in reordering and optimizing a program, compilers find dependence relations between program statements that cannot be violated without changing the meaning of the program.

Chapter 6 discussed various aspects of scalar analysis that are important for parallel computing, such as constant propagation and precise data dependence analysis for scalars. In particular, induction variable detection is important for array analysis.

Chapter 7 shows how to use the linear algebra techniques from Chapter 4 to find data dependence relations between array references. Because the linear algebra appears separately, this chapter is somewhat shorter than such a chapter might be in other books on the subject.

Chapter 8 discusses other problems related to dependence analysis, such as summarizing array accesses across procedural boundaries, solving data dependence analysis problems in the presence of pointers and I/O, and so forth.

Chapter 9 details the techniques used in the restructuring phase of creating the optimizer, focusing on loop structuring techniques. A catalogue of loop optimizations is presented, along with examples to show effects on performance.

Chapter 10 through 14 show how to tailor the code generation for various target architectures. Chapter 10 discusses a sequential target machine in which the compiler restructures the program to take advantage of a memory hierarchy (typically one or more levels of processor cache memory).

Chapter 11 presents methods to generate code and optimize for shared-memory parallel computers, which are now becoming common even at the workstation level. Automatic discovery of parallelism is also discussed.

Chapter 12 shows how to apply similar techniques for vector instruction sets, including automatic vectorization and supervector code generation.

Chapter 13 presents code generation methods for massively parallel message-passing computer systems, of both the SIMD and MIMD variety.

Chapter 14 shows techniques for massively parallel shared-memory systems. Different methods are used for the three varieties of machines in this category, depending on whether processor cache memories are kept consistent using global information or local information, or are absent altogether.

Each chapter includes a section titled "In the Pit," which includes hints and other anecdotal material that may be of some use when applying the information covered in the chapter. A "Further Reading" section contains citations to the original material in the reference list, and the exercises can be used as assignments or to test comprehension of the material.

Additional material that has proved useful for teaching is available via anonymous FTP form the machine bc.aw.com in the directory bc/wolfe/highperform. This material includes Postscript copies of the figures, programs implementing many of the algorithms, and the Tiny loop restructuring tool. A more complete bibliography, in Bibtex format, along with citations by chapter, can also be found. There is a README file containing useful information about the rest of the directory.

Acknowledgments

This book grew out of a series of short courses that I offered over the past three years. Little material in this book is original or invented by the author; I owe a debt of gratitude to the many developers of the techniques used here. My introduction to this topic was working with the Parafrase group at the University of Illinois from 1976 to 1980. Many ideas now crucial in modern restructuring compilers were developed during the time. During my tenure at Kuck and Associates, Inc., I learned the important distinction between science and engineering , and the good engineering in a compiler is critically important. After joining the Oregon Graduate Institute, I have had more contacts with compiler researchers and developers around the globe; I have learned more during this period than ever before.

I especially thank the reviewers, who helped maintain consistency and made numerous important and helpful suggestions: Ron K. Cytron of Washington University, James Larus of the University of Wisconsin, Carl Offner of Digital Equipment Corp., J. (Ram) Ramanujam of Louisiana State University, David Stotts of the University of North Carolina, and Alan Sussman of the University of Maryland. Any errors contained herein are, of course, entirely my own fault. Several students at OGI reviewed selected chapters: Tito Autrey, Michael Gerlek, Priyadarshan Kolte, and Eric Stoltz. The editors and staff at the Benjamin/Cummings Publishing Co. were very encouraging, and to them I owe a great deal of gratitude.

Michael Wolfe



0805327304P04062001

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews