Grid Computing: Making the Global Infrastructure a Reality / Edition 1

Hardcover (Print)
Buy New
Buy New from BN.com
$146.70
Used and New from Other Sellers
Used and New from Other Sellers
from $19.99
Usually ships in 1-2 business days
(Save 88%)
Other sellers (Hardcover)
  • All (8) from $19.99   
  • New (2) from $145.48   
  • Used (6) from $19.99   

Overview

Grid computing is applying the resources of many computers in a network to a single problem at the same time

Grid computing appears to be a promising trend for three reasons:
(1) Its ability to make more cost-effective use of a given amount of computer resources,
(2) As a way to solve problems that can't be approached without an enormous amount of computing power
(3) Because it suggests that the resources of many computers can be cooperatively and perhaps synergistically harnessed and managed as a collaboration toward a common objective.

A number of corporations, professional groups, university consortiums, and other groups have developed or are developing frameworks and software for managing grid computing projects. The European Community (EU) is sponsoring a project for a grid for high-energy physics, earth observation, and biology applications. In the United States, the National Technology Grid is prototyping a computational grid for infrastructure and an access grid for people. Sun Microsystems offers Grid Engine software. Described as a distributed resource management tool, Grid Engine allows engineers at companies like Sony and Synopsys to pool the computer cycles on up to 80 workstations at a time.
* "the Grid" is a very hot topic generating broad interest from research and industry (e.g. IBM, Platform, Avaki, Entropia, Sun, HP)
* Grid architecture enables very popular e-Science projects like the Genome project which demand global interaction and networking
* In recent surveys over 50% of Chief Information Officers are expected to use Grid technology this year
Grid Computing:
* Features contributions from the major players in the field
* Covers all aspects of grid technology from motivation to applications
* Provides an extensive state-of-the-art guide in grid computing

This is essential reading for researchers in Computing and Engineering, physicists, statisticians, engineers and mathematicians and IT policy makers.

Read More Show Less

Editorial Reviews

From the Publisher
"…a very good understanding of how the experts are approaching various problems around in the grid realm, and what the solutions are." (Computing Reviews.com, February 14, 2005)
Read More Show Less

Product Details

Read an Excerpt

Grid Computing

Making the Global Infrastructure a Reality

John Wiley & Sons

Copyright © 2003 John Wiley & Sons, Ltd
All right reserved.

ISBN: 0-470-85319-0


Chapter One

Condor and the Grid

Douglas Thain, Todd Tannenbaum, and Miron Livny University of Wisconsin-Madison, Madison, Wisconsin, United States

11.1 INTRODUCTION

Since the early days of mankind the primary motivation for the establishment of communities has been the idea that by being part of an organized group the capabilities of an individual are improved. The great progress in the area of intercomputer communication led to the development of means by which stand-alone processing subsystems can be integrated into multicomputer communities. - Miron Livny, Study of Load Balancing Algorithms for Decentralized Distributed Processing Systems, Ph.D. thesis, July 1983.

Ready access to large amounts of computing power has been a persistent goal of computer scientists for decades. Since the 1960s, visions of computing utilities as pervasive and as simple as the telephone have motivated system designers. It was recognized in the 1970s that such power could be achieved inexpensively with collections of small devices rather than expensive single supercomputers. Interest in schemes for managing distributed processors became so popular that there was even once a minor controversy over the meaning of the word 'distributed'.

As this early work made it clear that distributed computing was feasible, theoretical researchers began to notice that distributed computing would be difficult. When messages may be lost, corrupted, or delayed, precise algorithms must be used in order to build an understandable (if not controllable) system. Such lessons were not lost on the system designers of the early 1980s. Production systems such as Locus and Grapevine recognized the fundamental tension between consistency and availability in the face of failures.

In this environment, the Condor project was born. At the University of Wisconsin, Miron Livny combined his 1983 doctoral thesis on cooperative processing with the powerful Crystal Multicomputer designed by DeWitt, Finkel, and Solomon and the novel Remote UNIX software designed by Litzkow. The result was Condor, a new system for distributed computing. In contrast to the dominant centralized control model of the day, Condor was unique in its insistence that every participant in the system remain free to contribute as much or as little as it cared to.

Modern processing environments that consist of large collections of workstations interconnected by high capacity network raise the following challenging question: can we satisfy the needs of users who need extra capacity without lowering the quality of service experienced by the owners of under utilized workstations? ... The Condor scheduling system is our answer to this question. - Michael Litzkow, Miron Livny, and Matt Mutka, Condor: A Hunter of Idle Workstations, IEEE 8th Intl. Conf. on Dist. Comp. Sys., June 1988.

The Condor system soon became a staple of the production-computing environment at the University of Wisconsin, partially because of its concern for protecting individual interests. A production setting can be both a curse and a blessing: The Condor project learned hard lessons as it gained real users. It was soon discovered that inconvenienced machine owners would quickly withdraw from the community, so it was decreed that owners must maintain control of their machines at any cost. A fixed schema for representing users and machines was in constant change and so led to the development of a schema-free resource allocation language called ClassAds. It has been observed that most complex systems struggle through an adolescence of five to seven years. Condor was no exception.

The most critical support task is responding to those owners of machines who feel that Condor is in some way interfering with their own use of their machine. Such complaints must be answered both promptly and diplomatically. Workstation owners are not used to the concept of somebody else using their machine while they are away and are in general suspicious of any new software installed on their system. - Michael Litzkow and Miron Livny, Experience With The Condor Distributed Batch System, IEEE Workshop on Experimental Dist. Sys., October 1990.

The 1990s saw tremendous growth in the field of distributed computing. Scientific interests began to recognize that coupled commodity machines were significantly less expensive than supercomputers of equivalent power. A wide variety of powerful batch execution systems such as LoadLeveler (a descendant of Condor), LSF, Maui, NQE, and PBS spread throughout academia and business. Several high-profile distributed computing efforts such as SETI@Home and Napster raised the public consciousness about the power of distributed computing, generating not a little moral and legal controversy along the way. A vision called grid computing began to build the case for resource sharing across organizational boundaries.

Throughout this period, the Condor project immersed itself in the problems of production users. As new programming environments such as PVM, MPI, and Java became popular, the project added system support and contributed to standards development. As scientists grouped themselves into international computing efforts such as the Grid Physics Network and the Particle Physics Data Grid (PPDG), the Condor project took part from initial design to end-user support. As new protocols such as Grid Resource Access and Management (GRAM), Grid Security Infrastructure (GSI), and GridFTP developed, the project applied them to production systems and suggested changes based on the experience. Through the years, the Condor project adapted computing structures to fit changing human communities.

Many previous publications about Condor have described in fine detail the features of the system. In this chapter, we will lay out a broad history of the Condor project and its design philosophy. We will describe how this philosophy has led to an organic growth of computing communities and discuss the planning and the scheduling techniques needed in such an uncontrolled system. Our insistence on dividing responsibility has led to a unique model of cooperative computing called split execution. We will conclude by describing how real users have put Condor to work.

11.2 THE PHILOSOPHY OF FLEXIBILITY

As distributed systems scale to ever-larger sizes, they become more and more difficult to control or even to describe. International distributed systems are heterogeneous in every way: they are composed of many types and brands of hardware, they run various operating systems and applications, they are connected by unreliable networks, they change configuration constantly as old components become obsolete and new components are powered on. Most importantly, they have many owners, each with private policies and requirements that control their participation in the community.

Flexibility is the key to surviving in such a hostile environment. Five admonitions outline our philosophy of flexibility.

Let communities grow naturally: Humanity has a natural desire to work together on common problems. Given tools of sufficient power, people will organize the computing structures that they need. However, human relationships are complex. People invest their time and resources into many communities with varying degrees. Trust is rarely complete or symmetric. Communities and contracts are never formalized with the same level of precision as computer code. Relationships and requirements change over time. Thus, we aim to build structures that permit but do not require cooperation. Relationships, obligations, and schemata will develop according to user necessity.

Plan without being picky: Progress requires optimism. In a community of sufficient size, there will always be idle resources available to do work. But, there will also always be resources that are slow, misconfigured, disconnected, or broken. An overdependence on the correct operation of any remote device is a recipe for disaster. As we design software, we must spend more time contemplating the consequences of failure than the potential benefits of success. When failures come our way, we must be prepared to retry or reassign work as the situation permits.

Leave the owner in control: To attract the maximum number of participants in a community, the barriers to participation must be low. Users will not donate their property to the common good unless they maintain some control over how it is used. Therefore, we must be careful to provide tools for the owner of a resource to set use policies and even instantly retract it for private use.

Lend and borrow: The Condor project has developed a large body of expertise in distributed resource management. Countless other practitioners in the field are experts in related fields such as networking, databases, programming languages, and security. The Condor project aims to give the research community the benefits of our expertise while accepting and integrating knowledge and software from other sources.

Understand previous research: We must always be vigilant to understand and apply previous research in computer science. Our field has developed over many decades and is known by many overlapping names such as operating systems, distributed computing, metacomputing, peer-to-peer computing, and grid computing. Each of these emphasizes a particular aspect of the discipline, but is united by fundamental concepts. If we fail to understand and apply previous research, we will at best rediscover well-charted shores. At worst, we will wreck ourselves on well-charted rocks.

11.3 THE CONDOR PROJECT TODAY

At present, the Condor project consists of over 30 faculties, full time staff, graduate and undergraduate students working at the University of Wisconsin-Madison. Together the group has over a century of experience in distributed computing concepts and practices, systems programming and design, and software engineering.

Condor is a multifaceted project engaged in five primary activities.

Research in distributed computing: Our research focus areas and the tools we have produced, several of which will be explored below and are as follows:

1. Harnessing the power of opportunistic and dedicated resources. (Condor)

2. Job management services for grid applications. (Condor-G, DaPSched)

3. Fabric management services for grid resources. (Condor, Glide-In, NeST)

4. Resource discovery, monitoring, and management. (ClassAds, Hawkeye)

5. Problem-solving environments. (MW, DAGMan)

6. Distributed I/O technology. (Bypass, PFS, Kangaroo, NeST)

Participation in the scientific community: Condor participates in national and international grid research, development, and deployment efforts. The actual development and deployment activities of the Condor project are a critical ingredient toward its success. Condor is actively involved in efforts such as the Grid Physics Network (GriPhyN), the International Virtual Data Grid Laboratory (iVDGL), the Particle Physics Data Grid (PPDG), the NSF Middleware Initiative (NMI), the TeraGrid, and the NASA Information Power Grid (IPG). Further, Condor is a founding member in the National Computational Science Alliance (NCSA) and a close collaborator of the Globus project.

Engineering of complex software: Although a research project, Condor has a significant software production component. Our software is routinely used in mission-critical settings by industry, government, and academia. As a result, a portion of the project resembles a software company. Condor is built every day on multiple platforms, and an automated regression test suite containing over 200 tests stresses the current release candidate each night. The project's code base itself contains nearly a half-million lines, and significant pieces are closely tied to the underlying operating system. Two versions of the software, a stable version and a development version, are simultaneously developed in a multi-platform (Unix and Windows) environment. Within a given stable version, only bug fixes to the code base are permitted - new functionality must first mature and prove itself within the development series. Our release procedure makes use of multiple test beds. Early development releases run on test pools consisting of about a dozen machines; later in the development cycle, release candidates run on the production UW-Madison pool with over 1000 machines and dozens of real users. Final release candidates are installed at collaborator sites and carefully monitored. The goal is that each stable version release of Condor should be proven to operate in the field before being made available to the public.

Maintenance of production environments: The Condor project is also responsible for the Condor installation in the Computer Science Department at the University of Wisconsin-Madison, which consist of over 1000 CPUs. This installation is also a major compute resource for the Alliance Partners for Advanced Computational Servers (PACS). As such, it delivers compute cycles to scientists across the nation who have been granted computational resources by the National Science Foundation. In addition, the project provides consulting and support for other Condor installations at the University and around the world. Best effort support from the Condor software developers is available at no charge via ticket-tracked e-mail. Institutions using Condor can also opt for contracted support - for a fee, the Condor project will provide priority e-mail and telephone support with guaranteed turnaround times.

Education of students: Last but not the least, the Condor project trains students to become computer scientists. Part of this education is immersion in a production system. Students graduate with the rare experience of having nurtured software from the chalkboard all the way to the end user. In addition, students participate in the academic community by designing, performing, writing, and presenting original research. At the time of this writing, the project employs 20 graduate students including 7 Ph.D. candidates.

11.3.1 The Condor software: Condor and Condor-G

When most people hear the word 'Condor', they do not think of the research group and all of its surrounding activities. Instead, usually what comes to mind is strictly the software produced by the Condor project: the Condor High Throughput Computing System, often referred to simply as Condor.

11.3.1.1 Condor: a system for high-throughput computing

Condor is a specialized job and resource management system (RMS) for compute-intensive jobs.

Continues...


Excerpted from Grid Computing Copyright © 2003 by John Wiley & Sons, Ltd . Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Read More Show Less

Table of Contents

Overview of the Book: Grid Computing – Making the Global Infrastructure a Reality (F. Berman, et al.).

The Grid: Past, Present, Future (F. Berman, et al.).

The Grid: A New Infrastructure for 21st Century Science (I. Foster).

The Evolution of the Grid (D. De Roure, et al.).

Software Infrastructure for the I-WAY High-prformance Distributed Computing Experiment (Foster, et al.).

Implementing Production Grids (W. Johnston).

The Anatomy of the Grid (I. Foster, et al.).

Rationale for Choosing the Open Grid Services Architecture (M. Atkinson).

The Physiology of the Grid (I. Foster, et al.).

Grid Web Services and Application Factories (D. Gannon, et al.).

From Legion to Avaki: The Persistence of Vision (A. Grimshaw, et al.).

Condor and the Grid (D. Thain, et al.).

Architecture of a Commercial Enterprise Desktop Grid: The Entropia System (A. Chien).

Autonomic Computing and Grid (P. Pattnaik, et al.).

Databases and the Grid (P. Watson).

The Open Grid Services Architecture, and Data Grids (P. Kunszt & L. Guy).

Virtualization Services for Data Grids (R. Moore & C. Baru).

The Semantic Grid: A Future e-Science Infrastructure (D. De Roure, et al.).

Peer-to-Peer Grids (G. Fox, et al.).

Peer-to-Peer Grid Databases for Web Service Discovery (W. Hoschek).

Overview of Grid Computing Environments (G. Fox, et al.).

Grid Programming Models: Current Tools, Issues and Directions (C. Lee & D. Talia).

NaradaBrokering: An Event-based Infrastructure for Building Scalable Durable Peer-to-Peer Grids (G. Fox & S. Pallickara).

Classifying and Enabling Grid Applications (G. Allen, et al.).

NetSolve: Past, Present, and Future – A Look at a Grid Enabled Server (S. Agrawal, et al.).

Ninf-G: a GridRPC System on the Globus Toolkit (H. Nakada, et al.).

Commodity Grid Kits - Middleware for building Grid Computing Environments (G. von Laszewski, et al.).

The Grid Portal Development Kit (J. Novotny).

Building Grid Computing Portals: The NPACI Grid Portal Toolkit (M. Thomas & J. Boisseau).

Unicore and the Open Grid Services Architecture (D. Snelling).

Distributed Object-based Grid Computing Environments (T. Haupt & M. Pierce).

DISCOVER: a Computational Collaboratory for Interactive Grid Applications (V. Mann & M. Parashar).

Grid Resource Allocation and Control using Computational Economies (R. Wolski, et al.).

Parameter Sweeps on the Grid with APST (H. Casanova & F. Berman).

Storage Manager and File Transfer Web Services (W. Watson, et al.).

Application Overview for the Book: Grid Computing – Making the Global Infrastructure a Reality (F. Berman, et al.).

The Data Deluge: An e-Science Perspective (T. Hey & A. Trefethen).

Metacomputing (L. Smarr & C. Catlett).

Grids and the Virtual Observatory (R. Williams).

Data-intensive Grids for High-energy Physics (J. Bunn & H. Newman).

The New Biology and the Grid (K. Baldridge & P. Bourne).

eDiamond: a Grid-enabled Federated Database of Annotated Mammograms (M. Brady, et al.).

Combinatorial Chemistry and the Grid (J. Frey, et al.).

Education and the Enterprise with the Grid (G. Fox).

Index.

Views of the Grid.

Indirect Glossary.

List of Grid Projects.

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)