Table of Contents
Figures xv Tables xxi
 Listings xxv
 Foreword xxix
 Preface xxxiii
 Acknowledgments xli
 About the Authors xliii
  
 Part I: The OpenCL 1.1 Language and API 1
  
 Chapter 1: An Introduction to OpenCL 3
 What Is OpenCL, or . . . Why You Need This Book 3
 Our Many-Core Future: Heterogeneous Platforms 4
 Software in a Many-Core World 7
 Conceptual Foundations of OpenCL 11
 OpenCL and Graphics 29
 The Contents of OpenCL 30
 The Embedded Profile 35
 Learning OpenCL 36
  
 Chapter 2: HelloWorld: An OpenCL Example 39
 Building the Examples 40
 HelloWorld Example 45
 Checking for Errors in OpenCL 57
  
 Chapter 3: Platforms, Contexts, and Devices 63
 OpenCL Platforms 63
 OpenCL Devices 68
 OpenCL Contexts 83
  
 Chapter 4: Programming with OpenCL C 97
 Writing a Data-Parallel Kernel Using OpenCL C 97
 Scalar Data Types 99
 Vector Data Types 102
 Other Data Types 108
 Derived Types 109
 Implicit Type Conversions 110
 Explicit Casts 116
 Explicit Conversions 117
 Reinterpreting Data as Another Type 121
 Vector Operators 123
 Qualifiers 133
 Keywords 141
 Preprocessor Directives and Macros 141
 Restrictions 146
  
 Chapter 5: OpenCL C Built-In Functions 149
 Work-Item Functions 150
 Math Functions 153
 Integer Functions 168
 Common Functions 172
 Geometric Functions 175
 Relational Functions 175
 Vector Data Load and Store Functions 181
 Synchronization Functions 190
 Async Copy and Prefetch Functions 191
 Atomic Functions 195
 Miscellaneous Vector Functions 199
 Image Read and Write Functions 201
  
 Chapter 6: Programs and Kernels 217
 Program and Kernel Object Overview 217
 Program Objects 218
 Kernel Objects 237
  
 Chapter 7: Buffers and Sub-Buffers 247
 Memory Objects, Buffers, and Sub-Buffers Overview 247
 Creating Buffers and Sub-Buffers 249
 Querying Buffers and Sub-Buffers 257
 Reading, Writing, and Copying Buffers and Sub-Buffers 259
 Mapping Buffers and Sub-Buffers 276
  
 Chapter 8: Images and Samplers 281
 Image and Sampler Object Overview 281
 Creating Image Objects 283
 Creating Sampler Objects 292
 OpenCL C Functions for Working with Images 295
 Transferring Image Objects 299
  
 Chapter 9: Events 309
 Commands, Queues, and Events Overview 309
 Events and Command-Queues 311
 Event Objects 317
 Generating Events on the Host 321
 Events Impacting Execution on the Host 322
 Using Events for Profiling 327
 Events Inside Kernels 332
 Events from Outside OpenCL 333
  
 Chapter 10: Interoperability with OpenGL 335
 OpenCL/OpenGL Sharing Overview 335
 Querying for the OpenGL Sharing Extension 336
 Initializing an OpenCL Context for OpenGL Interoperability 338
 Creating OpenCL Buffers from OpenGL Buffers 339
 Creating OpenCL Image Objects from OpenGL Textures 344
 Querying Information about OpenGL Objects 347
 Synchronization between OpenGL and OpenCL 348
  
 Chapter 11: Interoperability with Direct3D 353
 Direct3D/OpenCL Sharing Overview 353
 Initializing an OpenCL Context for Direct3D Interoperability 354
 Creating OpenCL Memory Objects from Direct3D Buffers and Textures 357
 Acquiring and Releasing Direct3D Objects in OpenCL 361
 Processing a Direct3D Texture in OpenCL 363
 Processing D3D Vertex Data in OpenCL 366
  
 Chapter 12: C++ Wrapper API 369
 C++ Wrapper API Overview 369
 C++ Wrapper API Exceptions 371
 Vector Add Example Using the C++ Wrapper API 374
  
 Chapter 13: OpenCL Embedded Profile 383
 OpenCL Profile Overview 383
 64-Bit Integers 385
 Images 386
 Built-In Atomic Functions 387
 Mandated Minimum Single-Precision Floating-Point Capabilities 387
 Determining the Profile Supported by a Device in an OpenCL C Program 390
  
 Part II: OpenCL 1.1 Case Studies 391
  
 Chapter 14: Image Histogram 393
 Computing an Image Histogram 393
 Parallelizing the Image Histogram 395
 Additional Optimizations to the Parallel Image Histogram 400
 Computing Histograms with Half-Float or Float Values for Each Channel 403
  
 Chapter 15: Sobel Edge Detection Filter 407
 What Is a Sobel Edge Detection Filter? 407
 Implementing the Sobel Filter as an OpenCL Kernel 407
  
 Chapter 16: Parallelizing Dijkstra’s Single-Source Shortest-Path Graph Algorithm 411
 Graph Data Structures 412
 Kernels 414
 Leveraging Multiple Compute Devices 417
  
 Chapter 17: Cloth Simulation in the Bullet Physics SDK 425
 An Introduction to Cloth Simulation 425
 Simulating the Soft Body 429
 Executing the Simulation on the CPU 431
 Changes Necessary for Basic GPU Execution 432
 Two-Layered Batching 438
 Optimizing for SIMD Computation and Local Memory 441
 Adding OpenGL Interoperation 446
  
 Chapter 18: Simulating the Ocean with Fast Fourier Transform 449
 An Overview of the Ocean Application 450
 Phillips Spectrum Generation 453
 An OpenCL Discrete Fourier Transform 457
 A Closer Look at the FFT Kernel 463
 A Closer Look at the Transpose Kernel 467
  
 Chapter 19: Optical Flow 469
 Optical Flow Problem Overview 469
 Sub-Pixel Accuracy with Hardware Linear Interpolation 480
 Application of the Texture Cache 480
 Using Local Memory 481
 Early Exit and Hardware Scheduling 483
 Efficient Visualization with OpenGL Interop 483
 Performance 484
  
 Chapter 20: Using OpenCL with PyOpenCL 487
 Introducing PyOpenCL 487
 Running the PyImageFilter2D Example 488
 PyImageFilter2D Code 488
 Context and Command-Queue Creation 492
 Loading to an Image Object 493
 Creating and Building a Program 494
 Setting Kernel Arguments and Executing a Kernel 495
 Reading the Results 496
  
 Chapter 21: Matrix Multiplication with OpenCL 499
 The Basic Matrix Multiplication Algorithm 499
 A Direct Translation into OpenCL 501
 Increasing the Amount of Work per Kernel 506
 Optimizing Memory Movement: Local Memory 509
 Performance Results and Optimizing the Original CPU Code 511
  
 Chapter 22: Sparse Matrix-Vector Multiplication 515
 Sparse Matrix-Vector Multiplication (SpMV) Algorithm 515
 Description of This Implementation 518
 Tiled and Packetized Sparse Matrix Representation 519
 Header Structure 522
 Tiled and Packetized Sparse Matrix Design Considerations 523
 Optional Team Information 524
 Tested Hardware Devices and Results 524
 Additional Areas of Optimization 538
  
 Appendix: Summary of OpenCL 1.1 541
 The OpenCL Platform Layer 541
 The OpenCL Runtime 543
 Buffer Objects 544
 Program Objects 546
 Kernel and Event Objects 547
 Supported Data Types 550
 Vector Component Addressing 552
 Preprocessor Directives and Macros 555
 Specify Type Attributes 555
 Math Constants 556
 Work-Item Built-In Functions 557
 Integer Built-In Functions 557
 Common Built-In Functions 559
 Math Built-In Functions 560
 Geometric Built-In Functions 563
 Relational Built-In Functions 564
 Vector Data Load/Store Functions 567
 Atomic Functions 568
 Async Copies and Prefetch Functions 570
 Synchronization, Explicit Memory Fence 570
 Miscellaneous Vector Built-In Functions 571
 Image Read and Write Built-In Functions 572
 Image Objects 573
 Image Formats 576
 Access Qualifiers 576
 Sampler Objects 576
 Sampler Declaration Fields 577
 OpenCL Device Architecture Diagram 577
 OpenCL/OpenGL Sharing APIs 577
 OpenCL/Direct3D 10 Sharing APIs 579
  
 Index 581