- Shopping Bag ( 0 items )
Ships from: Westminster, MD
Usually ships in 1-2 business days
Ships from: Avenel, NJ
Usually ships in 1-2 business days
Ships from: Secaucus, NJ
Usually ships in 1-2 business days
Ships from: Horcott Rd, Fairford, United Kingdom
Usually ships in 1-2 business days
For several years, it has been economically and technically feasible to build parallel systems that scale from tens to hundreds of processors. By necessity tools embody knowledge of the execution environment, identifying performance bottlenecks or logical program errors in terms of application code constructs and their interaction with the execution environment. Experience comes with time, as tool developers understand the common programming idioms, the interactions of application code and the underlying hardware and software, and the user interfaces best suited for relating these interactions in intuitive ways. Simply put developing good tools takes time, experience, and substantial effort.
This book contains papers and working group summaries from discussions on software tools for parallel computer systems that explore the current situation, outline research issues, and technology transition remedies. Developers of both debugging and performance analysis tools and application developers and vendors discuss the technical and sociological problems facing the field. The goal of this book is to maximize the return from shared development so that the reader can learn from others' needs and frustrations in building and using tools on parallel systems. It covers three major research themes: tools for task and data parallel languages, techniques for real-time adaptive system control, and optimization of heterogeneous metacomputing applications.
Daniel A. Reed University of Illinois Urbana, Illinois 61801
Jeffrey S. Brown, Ann H. Hayes, and Margaret L. Simmons Los Alamos National Laboratory Los Alamos, New Mexico 87545
"Here they are, boys; get your tools ready." ... As they ran, they pulled weapons from under their coats, hatchets, knuckle-dusters, hammers, and bars of iron. F. D. Sharpe (1938)
For several years, it has been economically and technically feasible to build parallel systems that scale from tens to hundreds of processors. Recognition of this feasibility has fueled the national High-Performance Computing and Communications (HPCC) program and the fierce competition among new and old high-performance computing companies for a share of the massively parallel market.
Though vendor competition has led to rapid architectural innovation and higher peak hardware performance, it has stretched academic, laboratory, and vendor software tool groups to the limit, forcing them to continually create tools for changing programming models and new hardware environments. By necessity tools embody knowledge of the execution environment, identifying performance bottlenecks or logical program errors in terms of application code constructs and their interaction with the execution environment.
Because the root causes of poor performance or unexpected program behavior may lie with run-time libraries, compilers, the operating system, or the hardware, tools must gather and correlate information from many sources. Not only does this correlation require interfaces for information access, the need for such information is most often gained only from experience. Experience comes with time, as tool developers understand the common programming idioms, the interactions of application code and the underlying hardware and software, and the user interfaces best suited for relating these interactions in intuitive ways. Simply put, developing good tools takes time, experience and substantial effort; small profit margins and short product life cycles, both leading to small installed product bases, have made it difficult for tool developers to create and support effective tools.
The goal of this workshop (and of its predecessors) was to bring together vendor tool developers, academic and government laboratory tool researchers, and application scientists to discuss the current situation and outline research issues and technology transition remedies. New architectures pose important unresolved research problems for tool developers. Moreover, transferring previous research results to products clearly requires newr mechanisms if tool developers are ever to catch the speeding train of architectural change.
In this light, the remainder of this introductory chapter is organized as follows. In Sec. 1.2, we describe the motivations for the latest workshop and the broadening of its scope to include both performance and debugging tools. Based on this context, in Sec. 1.3, we describe the research issues raised by the participants and the implications of these issues for tool research. Finally, because tools lie at the nexus between system software and applications, tool developers must exploit the features of system software to support a user community. This necessitates investment of time and effort in many activities not normally associated with research and, as described in Sec. 1.4, poses a host of difficult problems. Finally, Sec. 1.5 concludes with a summary of recommendations for research and technology transition.
1.2 Workshop Motivations and Experiences
This book contains the papers and working group summaries from discussions during the fifth in a series of workshops on software tools for parallel computer systems. For the first time in the workshop series, developers of both debugging and performance analysis tools met together with application developers and vendors to discuss the technical and sociological problems facing the field. The goal of this combined workshop was the integration of both performance analysis and debugging tools, with the intent of maximizing the return from shared development.
As with the previous workshops in the series, sessions consisted of technical presentations, panels, and working group discussions. Each session of technical presentations included three different perspectives: academic software tool developer, application developer, and computer vendor. Our goal was a dialogue involving all three communities so that all might learn the others' needs and frustrations in building and using tools on parallel systems.
One lesson drawn from the workshop was that tool users and developers speak different languages, often failing to understand the needs or problems faced by the other group. As we will discuss in Sec. 1.4, the attendees concluded, as they did at the 1993 Performance Workshop, that a project combining vendors, academics, and users must be undertaken to develop techniques for testing and evaluating tools and encouraging the support and commercialization of effective tools.
A second lesson is that there are new and exciting problems to be solved. The World Wide Web (WWW), distributed metacomputing, and new programming models all pose unsolved problems for debugging and performance analysis tools. A subset of these issues is summarized in Sec. 1.3 below, and papers by the workshop participants elaborate on these issues in subsequent chapters.
1.3 Research Issues
As with all workshops, the attendees discussed a wide variety of topics, both during the formal sessions and during the extended working groups. Despite the diversity, three major research themes emerged:
tools for task and data parallel languages,
techniques for real-time adaptive system control, and
optimization of heterogeneous metacomputing applications.
All three issues are united by the need for greater access to system internals and data and by the need for standard interfaces, both internally and for users.
Historically, performance and debugging tools have been developed independently of system software and compilers. This separation reflects both the integration of software from disparate sources (i.e., third party compilers and operating system kernels) and limited support provided by software systems for tool development. For example, operating systems typically provide debugger developers little more than a ptrace system call for controlling process execution and performance tool developers only a mechanism for accessing a coarse-resolution system clock. Similarly, compilers provide simple symbol table information in object files. For single processor systems, these features are sufficient to build breakpoint debuggers and profilers, though they are far from optimal. For parallel systems and workstation clusters, they are woefully inadequate.
At present, few compilers provide access to program transformation data, though debugging and tuning the performance of programs written in task and data parallel languages (e.g., like High-Performance Fortran (HPF)) requires both compile-time and run-time data. Relating the dynamic behavior of compiler-synthesized code to the user's source code requires knowledge of the program transformations applied by the compiler and the code generation model. For example, if an HPF compiler generates message passing code for a distributed memory system, understanding how messages relate to array locality is a key to improving performance. Moving beyond tools for explicit parallelism (e.g., via message passing) will require far tighter integration of tools with compilers.
Likewise, few operating systems or run-time libraries provide mechanisms for selecting resource management policies or for configuring those policies based on knowledge of application resource demands or dynamic performance data. However, many experiments have shown that tuning policies to application behavior is key to improving performance for irregular applications with complex behavioral dynamics. For example, Schwan et al. have shown that allowing users to steer application load distribution and to automatically adjust thread locking policies based on expected synchronization delay can substantially improve performance for shared memory codes.
The explosion of WWW use, together with rapidly expanding interests in distributed data mining and heterogeneous parallel computing, pose a different, though equally thorny, set of tool research problems. New types of debugging and performance tuning tools will be needed to create metacomputing applications that can exploit distributed computation resources (e.g., by distributing a computation across multiple, geographically dispersed parallel systems) and that can mine large data archives in response to complex queries. Measuring network latencies and band widths and adapting to changes in network loads will necessitate integration of "standard" tools for parallel systems with distributed network management mechanisms (e.g., like the Simple Network Management Protocol (SNMP)).
Support for new programming models, tuning of resource management policies, and management of distributed metacomputations requires interfaces and access to data not readily available via present methods. Below, we briefly describe the data requirements for each of these three domains, with pointers to extended discussions by the workshop participants.
1.3.1 Task and Data Parallel Languages
High-level, data-parallel languages such as HPF have attracted attention because they offer a simple and portable programming model for regular scientific computations. By allowing the programmer to construct a parallel application at a semantic higher level, without recourse to low-level message passing code, HPF is an effective specification language for regular, data parallel algorithms. Similarly, task parallel languages like Fortran M allow application developers to decompose less regular computations into a group of cooperating tasks.
After investing substantial intellectual effort in developing a task or data parallel program, its execution may yield only a small fraction of peak system performance. And, even if the data parallel code is portable across multiple parallel architectures, it is highly unlikely that it will achieve high performance on all architectures. Even on a single parallel architecture, observed application performance may vary substantially as a function of input parameters. Thus, achieving high performance on a parallel system requires a cycle of experimentation and refinement in which one first identifies the key program components responsible for the bulk of the program's execution time and then modifies the program in the hope of improving its performance.
For this cycle of debugging and performance tuning to be effective and unobtrusive to practicing scientists, performance data must not only be accurate, it and program dynamics must be directly related to the source program. Failure to provide accurate data (or to relate it to the corresponding source code) makes the task of performance improvement and debugging both laborious and error-prone.
Data and task parallel compilers greatly heighten the distance between source language constructs and executable code, making it impossible to map dynamic program behavior to specific source code fragments without knowledge of the program transformations applied by the compiler and the run-time task management strategy. This problem is analogous to developing debuggers for use with code generated by optimizing compilers, though much more difficult.
To understand the causes for performance variability in data parallel codes, one needs high-level performance analysis tools and techniques. Unfortunately, most current performance tools are targeted at the collection and presentation of program performance data when the parallelism and interprocessor communication are explicit and the program execution model closely mimics that in the source code (i.e., as is the case for message-passing codes).
For data parallel languages like HPF, such tools can only capture and present dynamic performance data in terms of primitive operations (e.g., communication library calls) in the compiler-generated code; clearly, this falls far short of the ideal. At a minimum, to support source-level performance analysis of programs in data parallel languages, compilers and performance tools must cooperate to integrate information about the program's dynamic behavior with compiler knowledge of the mapping from the low-level, explicitly parallel code to the high-level source.
More generally, a new compact between compiler writers and debugger and performance tool developers is needed. Under this compact, the compiler and the tool are co-equals, each providing functions of use to the other - performance data can guide compile-time program optimization, and compilers can provide information needed to estimate program performance scalabilty. In short, it is the basis for developing a set of tools that can break the modify/ compile/execute cycle inherent in current optimization and debugging models.
Three presentations at the workshop addressed techniques for supporting performance tuning of task and data parallel languages. Pase and Williams describe techniques for performance analysis of data parallel and message-passing codes on the CRAY T3D, and Nystrom et al. describe their experiences with performance tools on the CRAY T3D. Malony and Mohr describe tools for use with object-oriented languages that leverage the capabilities provided by the Sage compiler toolkit. Finally, Adve et al. describe techniques for supporting performance analysis of Fortran D and HPF programs by exploiting knowledge of compile-time program transformations.
1.3.2 Real-time Adaptive Control
It is increasingly clear that a large and important class of national challenge applications are irregular, with complex, data-dependent execution behavior, and dynamic, with time-varying resource demands. Because the interactions between application and system software change across applications and during a single application's execution, ideally, runtime libraries and resource management policies should automatically and unobtrusively adapt to rapidly changing application behavior. For example, recent studies of application input/output behavior have shown that tuning file system policies to exploit knowledge of application access patterns can increase performance by more than an order of magnitude.
Distressingly, the space of possible performance optimizations is large and non-convex, and the best match of application and resource management technique is seldom obvious a priori. Current performance instrumentation and analysis tools provide the data necessary to understand the causes for poor performance a posteriori, but alone they are insufficient to adapt to temporally varying application resource demands and system responses. As noted in Sec. 1.3.1, software developers currently must engage in a time-consuming cycle of program development, performance measurement, debugging and tuning to create non-portable code that adapts to parallel system idiosyncrasies.
One potential solution to the performance optimization conundrum is integration of dynamic performance instrumentation and on-the-fly performance data reduction with configurable, malleable resource management algorithms and a real-time adaptive control mechanism that automatically chooses and configures resource management algorithms based on application request patterns and observed system performance. In principle, an adaptive resource management infrastructure, driven by real-time performance data, would increase portability by allowing application and runtime libraries to adapt to disparate hardware and software platforms and would increase achieved performance by choosing and configuring those resource management algorithms best matched to temporally varying application behavior.
Excerpted from Debugging and Performance Tuning for Parallel Computing Systems Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
1. Performance and Debugging Tools: A Research and Development Checkpoint.
2. Tools: A Research Point of View.
Integrating Compilation and Performance Analysis for Data-Parallel Programs (Vikram S. Adve).
Integrating A Debugger and Performance Tool for Steering (Barton P. Miller).
Visualization, Debugging, and Performance in PVM (G.A. Geist).
Program Analysis and Tuning Tools for a Parallel Object Oriented Language: An Experiment with the TAU System (Dennis Gannon).
Race Detection -
Ten Years Later (C.E. McDowell).
Debugging and Performance Analysis Tools (Joan M. Francioni).
3. Tools: A Vendor Point of View.
A Scalable Debugger for Massively Parallel Message-Passing Programs (Rich Title).
A Building Block Approach to Parallel Tool Construction (Don Breazeal).
Visualizing Performance on Parallel Supercomputers (Marty Itzkowitz).
Multiple View of Parallel Application Execution (Ming C. Hao).
A Performance Tool for The CRAY T3D (Douglas M. Pase).
4. Tools: An Applications Point of View.
Issues of Running Codes on Very Large Parallel Processing Systems (Don Heller).
Opportunities and Tools for Highly Interactive Distributed and Parallel Computing (Karsten Schwan).
Methodologies for Developing Scientific Applications on the CRAY T3D (Nicholas A. Nystrom).
Tuning I/O Performance on the Paragon: Fun with Pablo and Norma (Carl Winstead).
Prospects of Solving Grand Challenge Problems (Rajan Gupta).
Portability and Performance Problems on Massively Parallel Supercomputers (David M. Beazley).
5. Updates and Working-Group Summaries.
Collaborative Efforts to Develop User-Oriented Parallel Tools (Cherri M. Pancake).
High-Performance Fortran Forum Status Report (Mary Zosel).
Summary of Working Group on Integrated Environments Vs. Toolkit (Diane T. Rover).
Working Group: Tools for Workstation Clusters (Robert Dilly).