GPU Computing Gems Emerald Edition [NOOK Book]

Overview

"...the perfect companion to Programming Massively Parallel Processors by Hwu & Kirk." -Nicolas Pinto, Research Scientist at Harvard & MIT, NVIDIA Fellow 2009-2010

Graphics processing units (GPUs) can do much more than render graphics. Scientists and researchers increasingly look to GPUs to improve the efficiency and performance of computationally-intensive experiments across a range of disciplines.

...

See more details below
GPU Computing Gems Emerald Edition

Available on NOOK devices and apps  
  • NOOK Devices
  • Samsung Galaxy Tab 4 NOOK
  • NOOK HD/HD+ Tablet
  • NOOK
  • NOOK Color
  • NOOK Tablet
  • Tablet/Phone
  • NOOK for Windows 8 Tablet
  • NOOK for iOS
  • NOOK for Android
  • NOOK Kids for iPad
  • PC/Mac
  • NOOK for Windows 8
  • NOOK for PC
  • NOOK for Mac
  • NOOK for Web

Want a NOOK? Explore Now

NOOK Book (eBook)
$74.95
BN.com price

Overview

"...the perfect companion to Programming Massively Parallel Processors by Hwu & Kirk." -Nicolas Pinto, Research Scientist at Harvard & MIT, NVIDIA Fellow 2009-2010

Graphics processing units (GPUs) can do much more than render graphics. Scientists and researchers increasingly look to GPUs to improve the efficiency and performance of computationally-intensive experiments across a range of disciplines.

GPU Computing Gems: Emerald Edition brings their techniques to you, showcasing GPU-based solutions including:

  • Black hole simulations with CUDA
  • GPU-accelerated computation and interactive display of molecular orbitals
  • Temporal data mining for neuroscience
  • GPU -based parallelization for fast circuit optimization
  • Fast graph cuts for computer vision
  • Real-time stereo on GPGPU using progressive multi-resolution adaptive windows
  • GPU image demosaicing
  • Tomographic image reconstruction from unordered lines with CUDA
  • Medical image processing using GPU -accelerated ITK image filters
  • 41 more chapters of innovative GPU computing ideas, written to be accessible to researchers from any domain

GPU Computing Gems: Emerald Edition is the first volume in Morgan Kaufmann's Applications of GPU Computing Series, offering the latest insights and research in computer vision, electronic design automation, emerging data-intensive applications, life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, and video / image processing.



  • Covers the breadth of industry from scientific simulation and electronic design automation to audio / video processing, medical imaging, computer vision, and more
  • Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution
  • Offers insights and ideas as well as practical "hands-on" skills you can immediately put to use
Read More Show Less

Editorial Reviews

From the Publisher
Praise for GPU Computing Gems: Emerald Edition:

“GPU computing is becoming an outstanding field in high performance computing. Due to its easiness, the CUDA approach enables programmers to take advantage of GPU-acceleration very quickly… My research in complex science as well as applications in high frequency trading benefited significantly from GPU computing.”—Dr. Tobias Preis, ETH Zurich, Switzerland “This book is an important reference for everyone working on GPU/CUDA, and contains definitive work in a selection of fields. The patterns of CUDA parallelization it describes can often be adapted to applications in other fields.”—Dr. Ming Ouyang, Assistant Professor – Director Visualization and Intensive Graphics Lab, University of Louisville “Diving into the world of GPU computing has never been more important these days. GPU Computing Gems: Emerald Edition takes you through the looking glass into this fascinating world.”—Martin Eisemann, Computer Graphics Lab, TU Braunschweig “…an outstanding collection of vignettes of how to program GPUs for a breathtaking range of applications.”—Dr. Amitabh Varshney, Director, Institute for Advanced Computer Studies, University of Maryland "The book features a useful index that might help readers mine the gems in search of a solution to a specific algorithmic problem. The index is accompanied by online resources containing source code samples—and further information—for some of the chapters. A second volume with another 30 chapters of GPGPU application reports, somewhat more focused on generic algorithms and programming techniques, is currently in the pipeline and scheduled to appear as the "Jade Edition" sometime this month."—Computing in Science and Engineering "The book is an excellent selection of important papers describing various applications of GPUs. As such, I believe it would be a valuable addition to the bookshelf of any researcher in modeling and simulation…This is not a substitute for a more detailed text on massively parallel programming...Instead, it is a nice practical addition to that text."—Computing Reviews, August 2012

Read More Show Less

Product Details

  • ISBN-13: 9780123849892
  • Publisher: Elsevier Science
  • Publication date: 1/13/2011
  • Series: Applications of GPU Computing Series
  • Sold by: Barnes & Noble
  • Format: eBook
  • Pages: 886
  • File size: 21 MB
  • Note: This product may take a few minutes to download.

Meet the Author

Wen-mei Hwu: CTO of MulticoreWare, and is a professor at University of Illinois at Urbana-Champaign specializing in compiler design, computer architecture, computer microarchitecture, and parallel processing. He currently holds the Walter J. ("Jerry") Sanders III-Advanced Micro Devices Endowed Chair in Electrical and Computer Engineering in the Coordinated Science Laboratory. He is a PI for the petascale Blue Waters system, is co-director of the Intel and Microsoft funded Universal Parallel Computing Research Center (UPCRC), and PI for the world's first NVIDIA CUDA Center of Excellence. At the Illinois Coordinated Science Lab, Dr. Hwu leads the IMPACT Research Group and is director of the OpenIMPACT project - which has delivered new compiler and computer architecture technologies to the computer industry since 1987. He previously edited GPU Computing Gems, a similar work focusing on NVIDIA CUDA.

Read More Show Less

Read an Excerpt

GPU Computing Gems Emerald Edition


By Wen-mei W. Hwu

Morgan Kaufmann

Copyright © 2011 NVIDIA Corporation and Wen-mei W. Hwu
All right reserved.

ISBN: 978-0-12-384989-2


Chapter One

GPU-Accelerated Computation and Interactive Display of Molecular Orbitals John E. Stone, David J. Hardy, Jan Saam, Kirby L. Vandivort, Klaus Schulten

In this chapter, we present several graphics processing unit (GPU) algorithms for evaluating molecular orbitals on three-dimensional lattices, as is commonly used for molecular visualization. For each kernel, we describe necessary design trade-offs, applicability to various problem sizes, and performance on different generations of GPU hardware. We then demonstrate the appropriate and effective use of fast on-chip GPU memory subsystems for access to key data structures, show several GPU kernel optimization principles, and explore the application of advanced techniques such as dynamic kernel generation and just-in-time (JIT) kernel compilation techniques.

1.1 INTRODUCTION, PROBLEM STATEMENT, AND CONTEXT

The GPU kernels described here form the basis for the high-performance molecular orbital display algorithms in VMD, a popular molecular visualization and analysis tool. VMD (Visual Molecular Dynamics) is a software system designed for displaying, animating, and analyzing large biomolecular systems. More than 33,000 users have registered and downloaded the most recent VMD software, version 1.8.7. Due to its versatility and user-extensibility, VMD is also capable of displaying other large datasets, such as sequence data, results of quantum chemistry calculations, and volumetric data. While VMD is designed to run on a diverse range of hardware — laptops, desktops, clusters, and supercomputers — it is primarily used as a scientific workstation application for interactive 3-D visualization and analysis. For computations that run too long for interactive use, VMD can also be used in a batch mode to render movies for later use. A motivation for using GPU acceleration in VMD is to make slow batch-mode jobs fast enough for interactive use, thereby drastically improving the productivity of scientific investigations. With CUDA-enabled GPUs widely available in desktop PCs, such acceleration can have a broad impact on the VMD user community. To date, multiple aspects of VMD have been accelerated with the NVIDIA Compute Unified Device Architecture (CUDA), including electrostatic potential calculation, ion placement, molecular orbital calculation and display, and imaging of gas migration pathways in proteins.

Visualization of molecular orbitals (MOs) is a helpful step in analyzing the results of quantum chemistry calculations. The key challenge involved in the display of molecular orbitals is the rapid evaluation of these functions on a three-dimensional lattice; the resulting data can then be used for plotting isocontours or isosurfaces for visualization as shown in Fig. 1.1, and for other types of analyses. Most existing software packages that render MOs perform calculations on the CPU and have not been heavily optimized. Thus, they require runtimes of tens to hundreds of seconds depending on the complexity of the molecular system and spatial resolution of the MO discretization and subsequent surface plots.

With sufficient performance (two orders of magnitude faster than traditional CPU algorithms), a fast real-space lattice computation enables interactive display of even very large electronic structures and makes it possible to smoothly animate trajectories of orbital dynamics. Prior to the use of the GPU, this could be accomplished only through extensive batch-mode precalculation and preloading of timevarying lattice data into memory, making it impractical for everyday interactive visualization tasks. Efficient single-GPU algorithms are capable of evaluating molecular orbital lattices up to 186 times faster than a single CPU core (see Table 1.1), enabling MOs to be rapidly computed and animated on the fly for the first time. A multi-GPU version of our algorithm has been benchmarked at up to 419 times the performance of a single CPU core (see Table 1.2).

1.2 CORE METHOD

Since our target application is visualization focused, we are concerned with achieving interactive rendering performance while maintaining sufficient accuracy. The CUDA programming language enables GPU hardware features — inaccessible in existing programmable shading languages — to be exploited for higher performance, and it enables the use of multiple GPUs to accelerate computation further. Another advantage of using CUDA is that the results can be used for nonvisualization purposes.

Our approach combines several performance enhancement strategies. First, we use the host CPU to carefully organize input data and coefficients, eliminating redundancies and enforcing a sorted ordering that benefits subsequent GPU memory traversal patterns. The evaluation of molecular orbitals on a 3-D lattice is performed on one or more GPUs; the 3-D lattice is decomposed into 2-D planar slices, each of which is assigned to a GPU and computed. The workload is dynamically scheduled across the pool of GPUs to balance load on GPUs of varying capability. Depending on the specific attributes of the problem, one of three hand-coded GPU kernels is algorithmically selected to optimize performance. The three kernels are designed to use different combinations of GPU memory systems to yield peak memory bandwidth and arithmetic throughput depending on whether the input data can fit into constant memory, shared memory, or L1/L2 cache (in the case of recently released NVIDIA "Fermi" GPUs). One useful optimization involves the use of zero-copy memory access techniques based on the CUDA mapped host memory feature to eliminate latency associated with calls to cudaMemcpy(). Another optimization involves dynamically generating a problem-specific GPU kernel "on the fly" using justin-time (JIT) compilation techniques, thereby eliminating various sources of overhead that exist in the three general precoded kernels.

1.3 ALGORITHMS, IMPLEMENTATIONS, AND EVALUATIONS

A molecular orbital (MO) represents a statistical state in which an electron can be found in a molecule, where the MO's spatial distribution is correlated with the associated electron's probability density. Visualization of MOs is an important task for understanding the chemistry of molecular systems. MOs appeal to the chemist's intuition, and inspection of the MOs aids in explaining chemical reactivities. Some popular software tools with these capabilities include MacMolPlt, Molden, Molekel, and VMD.

The calculations required for visualizing MOs are computationally demanding, and existing quantum chemistry visualization programs are only fast enough to interactively compute MOs for only small molecules on a relatively coarse lattice. At the time of this writing, only VMD and MacMolPlt support multicore CPUs, and only VMD uses GPUs to accelerate MO computations. A great opportunity exists to improve upon the capabilities of existing tools in terms of interactivity, visual display quality, and scalability to larger and more complex molecular systems.

1.3.1 Mathematical Background

In this section we provide a short introduction to MOs, basis sets, and their underlying equations. Interested readers are directed to seek further details from computational chemistry texts and review articles. Quantum chemistry packages solve the electronic Schrödinger equation HΨ = EΨ or a given system. Molecular orbitals are the solutions produced by these packages. MOs are the eigenfunctions Ψv for expression of the molecular wavefunction Ψ, with H the Hamiltonian operator and E the system energy. The wavefunction determines molecular properties, for instance, the oneelectron density is ρ(r) = |Ψ(r)|. The visualization of the molecular orbitals resulting from quantum chemistry calculations requires evaluating the wavefunction on a 3-D lattice so that isovalue surfaces can be computed and displayed. With minor modifications, the algorithms and approaches we present for evaluating the wavefunction can be adapted to compute other molecular properties such as charge density, the molecular electrostatic potential, or multipole moments.

Each MO Ψv can be expressed as a linear combination over a set of K basis functions Φk,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.1)

where cvk are coefficients contained in the quantum chemistry calculation output files, and used as input for our algorithms. The basis functions used by the vast majority of quantum chemical calculations are atom-centered functions that approximate the solution of the Schrödinger equation for a single hydrogen atom with one electron, so-called atomic orbitals. For increased computational efficiency, Gaussian type orbitals (GTOs) are used to model the basis functions, rather than the exact solutions for the hydrogen atom:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.2)

The exponential factor ζ is defined by the basis set; i, j, and k are used to modulate the functional shape; and Nζijk is a normalization factor that follows from the basis set definition. The distance from a basis function's center (nucleus) to a point in space is represented by the vector R = {x, y, z} of length R = |R|.

The exponential term in Eq. 1.2 determines the radial decay of the function. Composite basis functions known as contracted GTOs (CGTOs) are composed of a linear combination of P individual GTO primitives in order to accurately describe the radial behavior of atomic orbitals.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.3)

The set of contraction coefficients {cp} and associated exponents {ζp} defining the CGTO are contained in the quantum chemistry simulation output.

CGTOs are classified into different shells based on the sum l = i + j + k of the exponents of the x, y, and z factors. The shells are designated by letters s, p, d, f, and g for l = 0, 1, 2, 3, 4, respectively, where we explicitly list here the most common shell types but note that higher-numbered shells are occasionally used. The set of indices for a shell is also referred to as the angular momenta of that shell. We establish an alternative indexing of the angular momenta based on the shell number l and a systematic indexing m over the possible number of sums l = i + j + k, where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] counts the number of combinations and m = 0, ..., Ml - 1 references the set {(i, j, k): i + j + k = l}.

The linear combination defining the MO Ψv must also sum contributions from each of the N atoms of the molecule and the Ln shells of each atom n. The entire expression, now described in terms of the data output from a QM package, for an MO wavefunction evaluated at a point r in space then becomes

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.4)

where we have replaced cvk by cvnlm, with the vectors Rn = r - rn connecting the position rn of the nucleus of atom n to the desired spatial coordinate r. We have dropped the subscript p from the set of contraction coefficients {c} and exponents {ζ} with the understanding that each CGTO requires an additional summation over the primitives, as expressed in Eq. 1.3.

The normalization factor Nζijk in Eq. 1.2 can be factored into a first part ηζl that depends on both the exponent ζ and shell type l = i + j + k and a second part ηijk (=ηlm in terms of our alternative indexing) that depends only on the angular momentum,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.5)

The separation of the normalization factor in Eq. 1.5 allows us to factor the summation over the primitives from the summation over the array of wavefunction coefficients. Combining Eqs. 1.2–1.4 and rearranging terms gives

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.6)

(Continues...)



Excerpted from GPU Computing Gems Emerald Edition by Wen-mei W. Hwu Copyright © 2011 by NVIDIA Corporation and Wen-mei W. Hwu . Excerpted by permission of Morgan Kaufmann. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Read More Show Less

Table of Contents

  1. Scientific Simulation
  2. Life Sciences
  3. Statistical Modeling
  4. Emerging Data-Intensive Applications
  5. Electronic Design Automation
  6. Ray Tracing and Rendering
  7. Computer Vision
  8. Video and Image Processing
  9. Signal and Audio Processing
  10. Medical Imaging
Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)