Pub. Date:
OpenCL Programming Guide / Edition 1

OpenCL Programming Guide / Edition 1

Current price is , Original price is $59.99. You

Temporarily Out of Stock Online

Please check back later for updated availability.


Using the new OpenCL (Open Computing Language) standard, you can write applications that access all available programming resources: CPUs, GPUs, and other processors such as DSPs and the Cell/B.E. processor. Already implemented by Apple, AMD, Intel, IBM, NVIDIA, and other leaders, OpenCL has outstanding potential for PCs, servers, handheld/embedded devices, high performance computing, and even cloud systems. This is the first comprehensive, authoritative, and practical guide to OpenCL 1.1 specifically for working developers and software architects.

Written by five leading OpenCL authorities, OpenCL Programming Guide covers the entire specification. It reviews key use cases, shows how OpenCL can express a wide range of parallel algorithms, and offers complete reference material on both the API and OpenCL C programming language.

Through complete case studies and downloadable code examples, the authors show how to write complex parallel programs that decompose workloads across many different devices. They also present all the essentials of OpenCL software performance optimization, including probing and adapting to hardware. Coverage includes

  • Understanding OpenCL’s architecture, concepts, terminology, goals, and rationale
  • Programming with OpenCL C and the runtime API
  • Using buffers, sub-buffers, images, samplers, and events
  • Sharing and synchronizing data with OpenGL and Microsoft’s Direct3D
  • Simplifying development with the C++ Wrapper API
  • Using OpenCL Embedded Profiles to support devices ranging from cellphones to supercomputer nodes
  • Case studies dealing with physics simulation; image and signal processing, such as image histograms, edge detection filters, Fast Fourier Transforms, and optical flow; math libraries, such as matrix multiplication and high-performance sparse matrix multiplication; and more
  • Source code for this book is available at

Product Details

ISBN-13: 9780321749642
Publisher: Addison-Wesley
Publication date: 07/27/2011
Series: OpenGL Series
Edition description: New Edition
Pages: 648
Product dimensions: 7.00(w) x 9.00(h) x 1.40(d)

About the Author

Aaftab Munshi is the spec editor for the OpenGL ES 1.1, OpenGL ES 2.0, and OpenCL specifications and coauthor of the book OpenGL ES 2.0 Programming Guide (with Dan Ginsburg and Dave Shreiner, published by Addison-Wesley, 2008). He currently works at Apple.


Benedict R. Gaster is a software architect working on programming models for next-generation heterogeneous processors, in particular looking at high-level abstractions for parallel programming on the emerging class of processors that contain both CPUs and accelerators such as GPUs. Benedict has contributed extensively to the OpenCL’s design and has represented AMD at the Khronos Group open standard consortium. Benedict has a Ph.D. in computer science for his work on type systems for extensible records and variants. He has been working at AMD since 2008.


Timothy G. Mattson is an old-fashioned parallel programmer, having started in the mid-eighties with the Caltech Cosmic Cube and continuing to the present. Along the way, he has worked with most classes of parallel computers (vector supercomputers, SMP, VLIW, NUMA, MPP, clusters, and many-core processors). Tim has published extensively, including the books Patterns for Parallel Programming (with Beverly Sanders and Berna Massingill, published by Addison-Wesley, 2004) and An Introduction to Concurrency in Programming Languages (with Matthew J. Sottile and Craig E. Rasmussen, published by CRC Press, 2009). Tim has a Ph.D. in chemistry for his work on molecular scattering theory. He has been working at Intel since 1993.


James Fung has been developing computer vision on the GPU as it progressed from graphics to general-purpose computation. James has a Ph.D. in electrical and computer engineering from the University of Toronto and numerous IEEE and ACM publications in the areas of parallel GPU Computer Vision and Mediated Reality. He is currently a Developer Technology Engineer at NVIDIA, where he examines computer vision and image processing on graphics hardware.


Dan Ginsburg currently works at Children’s Hospital Boston as a Principal Software Architect in the Fetal-Neonatal Neuroimaging and Development Science Center, where he uses OpenCL for accelerating neuroimaging algorithms. Previously, he worked for Still River Systems developing GPU-accelerated image registration software for the Monarch 250 proton beam radiotherapy system. Dan was also Senior Member of Technical Staff at AMD, where he worked for over eight years in a variety of roles, including developing OpenGL drivers, creating desktop and hand-held 3D demos, and leading the development of handheld GPU developer tools. Dan holds a B.S. in computer science from Worcester Polytechnic Institute and an M.B.A. from Bentley University.

Table of Contents

Figures xv

Tables xxi

Listings xxv

Foreword xxix

Preface xxxiii

Acknowledgments xli

About the Authors xliii


Part I: The OpenCL 1.1 Language and API 1


Chapter 1: An Introduction to OpenCL 3

What Is OpenCL, or . . . Why You Need This Book 3

Our Many-Core Future: Heterogeneous Platforms 4

Software in a Many-Core World 7

Conceptual Foundations of OpenCL 11

OpenCL and Graphics 29

The Contents of OpenCL 30

The Embedded Profile 35

Learning OpenCL 36

Chapter 2: HelloWorld: An OpenCL Example 39

Building the Examples 40

HelloWorld Example 45

Checking for Errors in OpenCL 57


Chapter 3: Platforms, Contexts, and Devices 63

OpenCL Platforms 63

OpenCL Devices 68

OpenCL Contexts 83


Chapter 4: Programming with OpenCL C 97

Writing a Data-Parallel Kernel Using OpenCL C 97

Scalar Data Types 99

Vector Data Types 102

Other Data Types 108

Derived Types 109

Implicit Type Conversions 110

Explicit Casts 116

Explicit Conversions 117

Reinterpreting Data as Another Type 121

Vector Operators 123

Qualifiers 133

Keywords 141

Preprocessor Directives and Macros 141

Restrictions 146


Chapter 5: OpenCL C Built-In Functions 149

Work-Item Functions 150

Math Functions 153

Integer Functions 168

Common Functions 172

Geometric Functions 175

Relational Functions 175

Vector Data Load and Store Functions 181

Synchronization Functions 190

Async Copy and Prefetch Functions 191

Atomic Functions 195

Miscellaneous Vector Functions 199

Image Read and Write Functions 201


Chapter 6: Programs and Kernels 217

Program and Kernel Object Overview 217

Program Objects 218

Kernel Objects 237

Chapter 7: Buffers and Sub-Buffers 247

Memory Objects, Buffers, and Sub-Buffers Overview 247

Creating Buffers and Sub-Buffers 249

Querying Buffers and Sub-Buffers 257

Reading, Writing, and Copying Buffers and Sub-Buffers 259

Mapping Buffers and Sub-Buffers 276


Chapter 8: Images and Samplers 281

Image and Sampler Object Overview 281

Creating Image Objects 283

Creating Sampler Objects 292

OpenCL C Functions for Working with Images 295

Transferring Image Objects 299


Chapter 9: Events 309

Commands, Queues, and Events Overview 309

Events and Command-Queues 311

Event Objects 317

Generating Events on the Host 321

Events Impacting Execution on the Host 322

Using Events for Profiling 327

Events Inside Kernels 332

Events from Outside OpenCL 333


Chapter 10: Interoperability with OpenGL 335

OpenCL/OpenGL Sharing Overview 335

Querying for the OpenGL Sharing Extension 336

Initializing an OpenCL Context for OpenGL Interoperability 338

Creating OpenCL Buffers from OpenGL Buffers 339

Creating OpenCL Image Objects from OpenGL Textures 344

Querying Information about OpenGL Objects 347

Synchronization between OpenGL and OpenCL 348


Chapter 11: Interoperability with Direct3D 353

Direct3D/OpenCL Sharing Overview 353

Initializing an OpenCL Context for Direct3D Interoperability 354

Creating OpenCL Memory Objects from Direct3D Buffers and Textures 357

Acquiring and Releasing Direct3D Objects in OpenCL 361

Processing a Direct3D Texture in OpenCL 363

Processing D3D Vertex Data in OpenCL 366


Chapter 12: C++ Wrapper API 369

C++ Wrapper API Overview 369

C++ Wrapper API Exceptions 371

Vector Add Example Using the C++ Wrapper API 374

Chapter 13: OpenCL Embedded Profile 383

OpenCL Profile Overview 383

64-Bit Integers 385

Images 386

Built-In Atomic Functions 387

Mandated Minimum Single-Precision Floating-Point Capabilities 387

Determining the Profile Supported by a Device in an OpenCL C Program 390


Part II: OpenCL 1.1 Case Studies 391


Chapter 14: Image Histogram 393

Computing an Image Histogram 393

Parallelizing the Image Histogram 395

Additional Optimizations to the Parallel Image Histogram 400

Computing Histograms with Half-Float or Float Values for Each Channel 403


Chapter 15: Sobel Edge Detection Filter 407

What Is a Sobel Edge Detection Filter? 407

Implementing the Sobel Filter as an OpenCL Kernel 407


Chapter 16: Parallelizing Dijkstra’s Single-Source Shortest-Path Graph Algorithm 411

Graph Data Structures 412

Kernels 414

Leveraging Multiple Compute Devices 417


Chapter 17: Cloth Simulation in the Bullet Physics SDK 425

An Introduction to Cloth Simulation 425

Simulating the Soft Body 429

Executing the Simulation on the CPU 431

Changes Necessary for Basic GPU Execution 432

Two-Layered Batching 438

Optimizing for SIMD Computation and Local Memory 441

Adding OpenGL Interoperation 446


Chapter 18: Simulating the Ocean with Fast Fourier Transform 449

An Overview of the Ocean Application 450

Phillips Spectrum Generation 453

An OpenCL Discrete Fourier Transform 457

A Closer Look at the FFT Kernel 463

A Closer Look at the Transpose Kernel 467


Chapter 19: Optical Flow 469

Optical Flow Problem Overview 469

Sub-Pixel Accuracy with Hardware Linear Interpolation 480

Application of the Texture Cache 480

Using Local Memory 481

Early Exit and Hardware Scheduling 483

Efficient Visualization with OpenGL Interop 483

Performance 484


Chapter 20: Using OpenCL with PyOpenCL 487

Introducing PyOpenCL 487

Running the PyImageFilter2D Example 488

PyImageFilter2D Code 488

Context and Command-Queue Creation 492

Loading to an Image Object 493

Creating and Building a Program 494

Setting Kernel Arguments and Executing a Kernel 495

Reading the Results 496

Chapter 21: Matrix Multiplication with OpenCL 499

The Basic Matrix Multiplication Algorithm 499

A Direct Translation into OpenCL 501

Increasing the Amount of Work per Kernel 506

Optimizing Memory Movement: Local Memory 509

Performance Results and Optimizing the Original CPU Code 511


Chapter 22: Sparse Matrix-Vector Multiplication 515

Sparse Matrix-Vector Multiplication (SpMV) Algorithm 515

Description of This Implementation 518

Tiled and Packetized Sparse Matrix Representation 519

Header Structure 522

Tiled and Packetized Sparse Matrix Design Considerations 523

Optional Team Information 524

Tested Hardware Devices and Results 524

Additional Areas of Optimization 538


Appendix: Summary of OpenCL 1.1 541

The OpenCL Platform Layer 541

The OpenCL Runtime 543

Buffer Objects 544

Program Objects 546

Kernel and Event Objects 547

Supported Data Types 550

Vector Component Addressing 552

Preprocessor Directives and Macros 555

Specify Type Attributes 555

Math Constants 556

Work-Item Built-In Functions 557

Integer Built-In Functions 557

Common Built-In Functions 559

Math Built-In Functions 560

Geometric Built-In Functions 563

Relational Built-In Functions 564

Vector Data Load/Store Functions 567

Atomic Functions 568

Async Copies and Prefetch Functions 570

Synchronization, Explicit Memory Fence 570

Miscellaneous Vector Built-In Functions 571

Image Read and Write Built-In Functions 572

Image Objects 573

Image Formats 576

Access Qualifiers 576

Sampler Objects 576

Sampler Declaration Fields 577

OpenCL Device Architecture Diagram 577

OpenCL/OpenGL Sharing APIs 577

OpenCL/Direct3D 10 Sharing APIs 579


Index 581

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews

OpenCL Programming Guide 4.5 out of 5 based on 0 ratings. 2 reviews.
Liad_Weinberger More than 1 year ago
OpenCL Programming Guide is the 2nd book (to my awareness) being published, which deals with the new and exciting standard by the KHRONOS Group: OpenCL. The goal of this book is to provide the reader with an extensive walkthrough of the standard, providing explanations to complement the standard's specs. The authors of the book dim it "a pragmatic guide for people interested in writing code", and that it is. The book is at its first edition, and it shows. Throughout the book there are typos, and what can only be explained as 'copy & paste' originated mistakes. Some of the code samples contain generic errors such as memory leaks or incorrect remarks, and some of the figures simply do not convey the intended concept, or are erroneous. The majority of errata I personally reported dealt with these types of errors, which are arguably acceptable (for a first edition) as they are not regarding the focus of the book, however, the book also contains some errata that does touch the actual focus, like an incorrect explanation (e.g. reported issue #14 on pg.132, and reported issue #4 on pg.65), or incorrect usage of returned information (e.g. reported issue #8 on page 88). On the other hand, the book does provide good insight on a vast portion of the standard. Although it claims to cover the entire spec, the level of this coverage is inconsistent and in some aspects completely lacking (e.g. the explanation of clEnqueueTask() could have been accompanied with a concise example, but in turn ended up as a short sub-section). On the portions with most interest, i.e., OpenCL's support for data-parallel algorithms, the book does provide extended information, and adds to the OpenCL specs, by clarifying the concepts. The 2nd part of the book, which was added rather close to the final release of the book (from the eyes of a SafariBooksOnline RoughCuts reader), provides 9 case studies of OpenCL usage. Some of these are purely pedagogic (e.g. chapter 15), but some provide more real-world examples of how OpenCL can be used, and optimized (especially for a GPU). These add another dimension to the book, and contribute to its relevancy. On a closing note, I do think that the book is worth the while. It is currently the best option besides reading the specs, to learn the OpenCL APIs and OpenCL C programming language, and despite the shortcomings I've mentioned, it does manage to provide the gist of OpenCL, and add insight to the standard. ----------------------------- Proper disclosure: OpenCL and GPU programming is what I do for a living.
Anonymous More than 1 year ago
Chloe sat up, getting her things together so she could leave.