OpenCL Programming Guide

OpenCL Programming Guide

OpenCL Programming Guide

OpenCL Programming Guide

eBook

$41.49  $55.19 Save 25% Current price is $41.49, Original price is $55.19. You Save 25%.

Available on Compatible NOOK Devices and the free NOOK Apps.
WANT A NOOK?  Explore Now

Related collections and offers


Overview

Using the new OpenCL (Open Computing Language) standard, you can write applications that access all available programming resources: CPUs, GPUs, and other processors such as DSPs and the Cell/B.E. processor. Already implemented by Apple, AMD, Intel, IBM, NVIDIA, and other leaders, OpenCL has outstanding potential for PCs, servers, handheld/embedded devices, high performance computing, and even cloud systems. This is the first comprehensive, authoritative, and practical guide to OpenCL 1.1 specifically for working developers and software architects.

 

Written by five leading OpenCL authorities, OpenCL Programming Guide covers the entire specification. It reviews key use cases, shows how OpenCL can express a wide range of parallel algorithms, and offers complete reference material on both the API and OpenCL C programming language.

 

Through complete case studies and downloadable code examples, the authors show how to write complex parallel programs that decompose workloads across many different devices. They also present all the essentials of OpenCL software performance optimization, including probing and adapting to hardware. Coverage includes

 

  • Understanding OpenCL’s architecture, concepts, terminology, goals, and rationale
  • Programming with OpenCL C and the runtime API
  • Using buffers, sub-buffers, images, samplers, and events
  • Sharing and synchronizing data with OpenGL and Microsoft’s Direct3D
  • Simplifying development with the C++ Wrapper API
  • Using OpenCL Embedded Profiles to support devices ranging from cellphones to supercomputer nodes
  • Case studies dealing with physics simulation; image and signal processing, such as image histograms, edge detection filters, Fast Fourier Transforms, and optical flow; math libraries, such as matrix multiplication and high-performance sparse matrix multiplication; and more
  • Source code for this book is available at https://code.google.com/p/opencl-book-samples/

Product Details

ISBN-13: 9780132594554
Publisher: Pearson Education
Publication date: 07/07/2011
Series: OpenGL
Sold by: Barnes & Noble
Format: eBook
Pages: 648
File size: 13 MB
Note: This product may take a few minutes to download.

About the Author

Aaftab Munshi is the spec editor for the OpenGL ES 1.1, OpenGL ES 2.0, and OpenCL specifications and coauthor of the book OpenGL ES 2.0 Programming Guide (with Dan Ginsburg and Dave Shreiner, published by Addison-Wesley, 2008). He currently works at Apple.

 

Benedict R. Gaster is a software architect working on programming models for next-generation heterogeneous processors, in particular looking at high-level abstractions for parallel programming on the emerging class of processors that contain both CPUs and accelerators such as GPUs. Benedict has contributed extensively to the OpenCL’s design and has represented AMD at the Khronos Group open standard consortium. Benedict has a Ph.D. in computer science for his work on type systems for extensible records and variants. He has been working at AMD since 2008.

 

Timothy G. Mattson is an old-fashioned parallel programmer, having started in the mid-eighties with the Caltech Cosmic Cube and continuing to the present. Along the way, he has worked with most classes of parallel computers (vector supercomputers, SMP, VLIW, NUMA, MPP, clusters, and many-core processors). Tim has published extensively, including the books Patterns for Parallel Programming (with Beverly Sanders and Berna Massingill, published by Addison-Wesley, 2004) and An Introduction to Concurrency in Programming Languages (with Matthew J. Sottile and Craig E. Rasmussen, published by CRC Press, 2009). Tim has a Ph.D. in chemistry for his work on molecular scattering theory. He has been working at Intel since 1993.

 

James Fung has been developing computer vision on the GPU as it progressed from graphics to general-purpose computation. James has a Ph.D. in electrical and computer engineering from the University of Toronto and numerous IEEE and ACM publications in the areas of parallel GPU Computer Vision and Mediated Reality. He is currently a Developer Technology Engineer at NVIDIA, where he examines computer vision and image processing on graphics hardware.

 

Dan Ginsburg currently works at Children’s Hospital Boston as a Principal Software Architect in the Fetal-Neonatal Neuroimaging and Development Science Center, where he uses OpenCL for accelerating neuroimaging algorithms. Previously, he worked for Still River Systems developing GPU-accelerated image registration software for the Monarch 250 proton beam radiotherapy system. Dan was also Senior Member of Technical Staff at AMD, where he worked for over eight years in a variety of roles, including developing OpenGL drivers, creating desktop and hand-held 3D demos, and leading the development of handheld GPU developer tools. Dan holds a B.S. in computer science from Worcester Polytechnic Institute and an M.B.A. from Bentley University.

Table of Contents

Figures xv

Tables xxi

Listings xxv

Foreword xxix

Preface xxxiii

Acknowledgments xli

About the Authors xliii

 

Part I: The OpenCL 1.1 Language and API 1

 

Chapter 1: An Introduction to OpenCL 3

What Is OpenCL, or . . . Why You Need This Book 3

Our Many-Core Future: Heterogeneous Platforms 4

Software in a Many-Core World 7

Conceptual Foundations of OpenCL 11

OpenCL and Graphics 29

The Contents of OpenCL 30

The Embedded Profile 35

Learning OpenCL 36

 

Chapter 2: HelloWorld: An OpenCL Example 39

Building the Examples 40

HelloWorld Example 45

Checking for Errors in OpenCL 57

 

Chapter 3: Platforms, Contexts, and Devices 63

OpenCL Platforms 63

OpenCL Devices 68

OpenCL Contexts 83

 

Chapter 4: Programming with OpenCL C 97

Writing a Data-Parallel Kernel Using OpenCL C 97

Scalar Data Types 99

Vector Data Types 102

Other Data Types 108

Derived Types 109

Implicit Type Conversions 110

Explicit Casts 116

Explicit Conversions 117

Reinterpreting Data as Another Type 121

Vector Operators 123

Qualifiers 133

Keywords 141

Preprocessor Directives and Macros 141

Restrictions 146

 

Chapter 5: OpenCL C Built-In Functions 149

Work-Item Functions 150

Math Functions 153

Integer Functions 168

Common Functions 172

Geometric Functions 175

Relational Functions 175

Vector Data Load and Store Functions 181

Synchronization Functions 190

Async Copy and Prefetch Functions 191

Atomic Functions 195

Miscellaneous Vector Functions 199

Image Read and Write Functions 201

 

Chapter 6: Programs and Kernels 217

Program and Kernel Object Overview 217

Program Objects 218

Kernel Objects 237

 

Chapter 7: Buffers and Sub-Buffers 247

Memory Objects, Buffers, and Sub-Buffers Overview 247

Creating Buffers and Sub-Buffers 249

Querying Buffers and Sub-Buffers 257

Reading, Writing, and Copying Buffers and Sub-Buffers 259

Mapping Buffers and Sub-Buffers 276

 

Chapter 8: Images and Samplers 281

Image and Sampler Object Overview 281

Creating Image Objects 283

Creating Sampler Objects 292

OpenCL C Functions for Working with Images 295

Transferring Image Objects 299

 

Chapter 9: Events 309

Commands, Queues, and Events Overview 309

Events and Command-Queues 311

Event Objects 317

Generating Events on the Host 321

Events Impacting Execution on the Host 322

Using Events for Profiling 327

Events Inside Kernels 332

Events from Outside OpenCL 333

 

Chapter 10: Interoperability with OpenGL 335

OpenCL/OpenGL Sharing Overview 335

Querying for the OpenGL Sharing Extension 336

Initializing an OpenCL Context for OpenGL Interoperability 338

Creating OpenCL Buffers from OpenGL Buffers 339

Creating OpenCL Image Objects from OpenGL Textures 344

Querying Information about OpenGL Objects 347

Synchronization between OpenGL and OpenCL 348

 

Chapter 11: Interoperability with Direct3D 353

Direct3D/OpenCL Sharing Overview 353

Initializing an OpenCL Context for Direct3D Interoperability 354

Creating OpenCL Memory Objects from Direct3D Buffers and Textures 357

Acquiring and Releasing Direct3D Objects in OpenCL 361

Processing a Direct3D Texture in OpenCL 363

Processing D3D Vertex Data in OpenCL 366

 

Chapter 12: C++ Wrapper API 369

C++ Wrapper API Overview 369

C++ Wrapper API Exceptions 371

Vector Add Example Using the C++ Wrapper API 374

 

Chapter 13: OpenCL Embedded Profile 383

OpenCL Profile Overview 383

64-Bit Integers 385

Images 386

Built-In Atomic Functions 387

Mandated Minimum Single-Precision Floating-Point Capabilities 387

Determining the Profile Supported by a Device in an OpenCL C Program 390

 

Part II: OpenCL 1.1 Case Studies 391

 

Chapter 14: Image Histogram 393

Computing an Image Histogram 393

Parallelizing the Image Histogram 395

Additional Optimizations to the Parallel Image Histogram 400

Computing Histograms with Half-Float or Float Values for Each Channel 403

 

Chapter 15: Sobel Edge Detection Filter 407

What Is a Sobel Edge Detection Filter? 407

Implementing the Sobel Filter as an OpenCL Kernel 407

 

Chapter 16: Parallelizing Dijkstra’s Single-Source Shortest-Path Graph Algorithm 411

Graph Data Structures 412

Kernels 414

Leveraging Multiple Compute Devices 417

 

Chapter 17: Cloth Simulation in the Bullet Physics SDK 425

An Introduction to Cloth Simulation 425

Simulating the Soft Body 429

Executing the Simulation on the CPU 431

Changes Necessary for Basic GPU Execution 432

Two-Layered Batching 438

Optimizing for SIMD Computation and Local Memory 441

Adding OpenGL Interoperation 446

 

Chapter 18: Simulating the Ocean with Fast Fourier Transform 449

An Overview of the Ocean Application 450

Phillips Spectrum Generation 453

An OpenCL Discrete Fourier Transform 457

A Closer Look at the FFT Kernel 463

A Closer Look at the Transpose Kernel 467

 

Chapter 19: Optical Flow 469

Optical Flow Problem Overview 469

Sub-Pixel Accuracy with Hardware Linear Interpolation 480

Application of the Texture Cache 480

Using Local Memory 481

Early Exit and Hardware Scheduling 483

Efficient Visualization with OpenGL Interop 483

Performance 484

 

Chapter 20: Using OpenCL with PyOpenCL 487

Introducing PyOpenCL 487

Running the PyImageFilter2D Example 488

PyImageFilter2D Code 488

Context and Command-Queue Creation 492

Loading to an Image Object 493

Creating and Building a Program 494

Setting Kernel Arguments and Executing a Kernel 495

Reading the Results 496

 

Chapter 21: Matrix Multiplication with OpenCL 499

The Basic Matrix Multiplication Algorithm 499

A Direct Translation into OpenCL 501

Increasing the Amount of Work per Kernel 506

Optimizing Memory Movement: Local Memory 509

Performance Results and Optimizing the Original CPU Code 511

 

Chapter 22: Sparse Matrix-Vector Multiplication 515

Sparse Matrix-Vector Multiplication (SpMV) Algorithm 515

Description of This Implementation 518

Tiled and Packetized Sparse Matrix Representation 519

Header Structure 522

Tiled and Packetized Sparse Matrix Design Considerations 523

Optional Team Information 524

Tested Hardware Devices and Results 524

Additional Areas of Optimization 538

 

Appendix: Summary of OpenCL 1.1 541

The OpenCL Platform Layer 541

The OpenCL Runtime 543

Buffer Objects 544

Program Objects 546

Kernel and Event Objects 547

Supported Data Types 550

Vector Component Addressing 552

Preprocessor Directives and Macros 555

Specify Type Attributes 555

Math Constants 556

Work-Item Built-In Functions 557

Integer Built-In Functions 557

Common Built-In Functions 559

Math Built-In Functions 560

Geometric Built-In Functions 563

Relational Built-In Functions 564

Vector Data Load/Store Functions 567

Atomic Functions 568

Async Copies and Prefetch Functions 570

Synchronization, Explicit Memory Fence 570

Miscellaneous Vector Built-In Functions 571

Image Read and Write Built-In Functions 572

Image Objects 573

Image Formats 576

Access Qualifiers 576

Sampler Objects 576

Sampler Declaration Fields 577

OpenCL Device Architecture Diagram 577

OpenCL/OpenGL Sharing APIs 577

OpenCL/Direct3D 10 Sharing APIs 579

 

Index 581

From the B&N Reads Blog

Customer Reviews