Professional Parallel Programming with C#: Master Parallel Extensions with .NET 4


Expert guidance for those programming today?s dual-core processors PCs

As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous ...

See more details below
$38.53 price
(Save 29%)$54.99 List Price

Pick Up In Store

Reserve and pick up in 60 minutes at your local store

Other sellers (Paperback)
  • All (14) from $20.00   
  • New (8) from $21.98   
  • Used (6) from $20.00   
Professional Parallel Programming with C#: Master Parallel Extensions with .NET 4

Available on NOOK devices and apps  
  • NOOK Devices
  • NOOK HD/HD+ Tablet
  • NOOK
  • NOOK Color
  • NOOK Tablet
  • Tablet/Phone
  • NOOK for Windows 8 Tablet
  • NOOK for iOS
  • NOOK for Android
  • NOOK Kids for iPad
  • PC/Mac
  • NOOK for Windows 8
  • NOOK for PC
  • NOOK for Mac
  • NOOK Study
  • NOOK for Web

Want a NOOK? Explore Now

NOOK Book (eBook)
$31.49 price
(Save 42%)$54.99 List Price


Expert guidance for those programming today’s dual-core processors PCs

As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous programming model, and more. It also teaches other parallel programming techniques, such as SIMD and vectorization.

  • Teaches programmers professional-level, task-based, parallel programming with C#, .NET 4, and Visual Studio 2010
  • Covers concurrent collections, coordinated data structures, PLINQ, thread pools, asynchronous programming model, Visual Studio 2010 debugging, and parallel testing and tuning
  • Explores vectorization, SIMD instructions, and additional parallel libraries

Master the tools and technology you need to develop thread-safe concurrent applications for multi-core systems, with Professional Parallel Programming with C#.

Read More Show Less

Product Details

  • ISBN-13: 9780470495995
  • Publisher: Wiley
  • Publication date: 12/28/2010
  • Edition number: 1
  • Pages: 576
  • Sales rank: 865,605
  • Product dimensions: 7.40 (w) x 9.20 (h) x 1.10 (d)

Meet the Author

Gastón C. Hillar is an independent software consultant who has been researching parallel programming, multiprocessor, and multicore since 1997. He has years of experience designing and developing diverse types of complex parallelized solutions that take advantage of multiple processing cores with C# and .NET Framework.

Read More Show Less

Read an Excerpt

Professional Parallel Programming with C#

Master Parallel Extensions with .NET 4
By Gastón Hillar

John Wiley & Sons

Copyright © 2011 John Wiley & Sons, Ltd
All right reserved.

ISBN: 978-0-470-49599-5

Chapter One

Task-Based Programming


* Working with shared-memory multicore

* Understanding the differences between shared-memory multicore and distributed-memory systems

* Working with parallel programming and multicore programming in shared-memory architectures

* Understanding hardware threads and software threads

* Understanding Amdahl's Law

* Considering Gustafson's Law

* Working with lightweight concurrency models

* Creating successful task-based designs

* Understanding the differences between interleaved concurrency, concurrency, and parallelism

* Parallelizing tasks and minimizing critical sections

* Understanding rules for parallel programming for multicore architectures

* Preparing for NUMA architectures

This chapter introduces the new task-based programming that allows you to introduce parallelism in applications. Parallelism is essential to exploit modern shared-memory multicore architectures. The chapter describes the new lightweight concurrency models and important concepts related to concurrency and parallelisms. It includes the necessary background information in order to prepare your mind for the next 10 chapters.


In 2005, Herb Sutter published an article in Dr. Dobb's Journal titled "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software" ( concurrency-ddj.htm). He talked about the need to start developing software considering concurrency to fully exploit continuing exponential microprocessors throughput gains. Microprocessor manufacturers are adding processing cores instead of increasing their clock frequency. Software developers can no longer rely on the free-lunch performance gains these increases in clock frequency provided.

Most machines today have at least a dual-core microprocessor. However, quad-core and octal-core microprocessors, with four and eight cores, respectively, are quite popular on servers, advanced workstations, and even on high-end mobile computers. More cores in a single microprocessor are right around the corner. Modern microprocessors offer new multicore architectures. Thus, it is very important to prepare the software designs and the code to exploit these architectures. The different kinds of applications generated with Visual C# 2010 and .NET Framework 4 run on one or many central processing units (CPUs), the main microprocessors. Each of these microprocessors can have a different number of cores, capable of executing instructions.

You can think of a multicore microprocessor as many interconnected microprocessors in a single package. All the cores have access to the main memory, as illustrated in Figure 1-1. Thus, this architecture is known as shared-memory multicore. Sharing memory in this way can easily lead to a performance bottleneck.

Multicore microprocessors have many different complex micro-architectures, designed to offer more parallel-execution capabilities, improve overall throughput, and reduce potential bottlenecks. At the same time, multicore microprocessors try to shrink power consumption and generate less heat. Therefore, many modern microprocessors can increase or reduce the frequency for each core according to their workload, and they can even sleep cores when they are not in use. Windows 7 and Windows Server 2008 R2 support a new feature called Core Parking. When many cores aren't in use and this feature is active, these operating systems put the remaining cores to sleep. When these cores are necessary, the operating systems wake the sleeping cores.

Modern microprocessors work with dynamic frequencies for each of their cores. Because the cores don't work with a fixed frequency, it is difficult to predict the performance for a sequence of instructions. For example, Intel Turbo Boost Technology increases the frequency of the active cores. The process of increasing the frequency for a core is also known as overclocking.

If a single core is under a heavy workload, this technology will allow it to run at higher frequencies when the other cores are idle. If many cores are under heavy workloads, they will run at higher frequencies but not as high as the one achieved by the single core. The microprocessor cannot keep all the cores overclocked a lot of time, because it consumes more power and its temperature increases faster. The average clock frequency for all the cores under heavy workloads is going to be lower than the one achieved for the single core. Therefore, under certain situations, some code can run at higher frequencies than other code, which can make measuring real performance gains a challenge.

Differences Between Shared-Memory Multicore and Distributed-Memory Systems

Distributed-memory computer systems are composed of many microprocessors with their own private memory, as illustrated in Figure 1-2. Each microprocessor can be in a different computer, with different types of communication channels between them. Examples of communication channels are wired and wireless networks. If a job running in one of the microprocessors requires remote data, it has to communicate with the corresponding remote microprocessor through the communication channel. One of the most popular communications protocols used to program parallel applications to run on distributed-memory computer systems is Message Passing Interface (MPI). It is possible to use MPI to take advantage of shared-memory multicore with C# and .NET Framework. However, MPI's main focus is to help developing applications run on clusters. Thus, it adds a big overhead that isn't necessary in shared-memory multicore, where all the cores can access the memory without the need to send messages.

Figure 1-3 shows a distributed-memory computer system with three machines. Each machine has a quad-core microprocessor, and a shared-memory architecture for these cores. This way, the private memory for each microprocessor acts as a shared memory for its four cores.

A distributed-memory system forces you to think about the distribution of the data, because each message to retrieve remote data can introduce an important latency. Because you can add new machines (nodes) to increase the number of microprocessors for the system, distributed-memory systems can offer great scalability.

Parallel Programming and Multicore Programming

Traditional sequential code, where instructions run one after the other, doesn't take advantage of multiple cores because the serial instructions run on only one of the available cores. Sequential code written with Visual C# 2010 won't take advantage of multiple cores if it doesn't use the new features offered by .NET Framework 4 to split the work into many cores. There isn't an automatic parallelization of existing sequential code.

Parallel programming is a form of programming in which the code takes advantage of the parallel execution possibilities offered by the underlying hardware. Parallel programming runs many instructions at the same time. As previously explained, there are many different kinds of parallel architectures, and their detailed analysis would require a complete book dedicated to the topic.

Multicore programming is a form of programming in which the code takes advantage of the multiple execution cores to run many instructions in parallel. Multicore and multiprocessor computers offer more than one processing core in a single machine. Hence, the goal is to do more in less time by distributing the work to be done in the available cores.

Modern microprocessors can execute the same instruction on multiple data, something classified by Michael J. Flynn in his proposed Flynn's taxonomy in 1966 as Single Instruction, Multiple Data (SIMD). This way, you can take advantage of these vector processors to reduce the time needed to execute certain algorithms.

This book covers two areas of parallel programming in great detail: shared-memory multicore programming and the usage of vector-processing capabilities. The overall goal is to reduce the execution time of the algorithms. The additional processing power enables you to add new features to existing software, as well.


A multicore microprocessor has more than one physical core — real independent processing units that make it possible to run instructions at the same time, in parallel. In order to take advantage of multiple physical cores, it is necessary to run many processes or to run more than one thread in a single process, creating multithreaded code.

However, each physical core can offer more than one hardware thread, also known as a logical core or logical processor. Microprocessors with Intel Hyper-Threading Technology (HT or HTT) offer many architectural states per physical core. For example, many microprocessors with four physical cores with HT duplicate the architectural states per physical core and offer eight hardware threads. This technique is known as simultaneous multithreading (SMT) and it uses the additional architectural states to optimize and increase the parallel execution at the microprocessor's instruction level. SMT isn't restricted to just two hardware threads per physical core; for example, you could have four hardware threads per core. This doesn't mean that each hardware thread represents a physical core. SMT can offer performance improvements for multithreaded code under certain scenarios. Subsequent chapters provide several examples of these performance improvements.

Each running program in Windows is a process. Each process creates and runs one or more threads, known as software threads to differentiate them from the previously explained hardware threads. A process has at least one thread, the main thread. An operating system scheduler shares out the available processing resources fairly between all the processes and threads it has to run. Windows scheduler assigns processing time to each software thread. When Windows scheduler runs on a multicore microprocessor, it has to assign time from a hardware thread, supported by a physical core, to each software thread that needs to run instructions. As an analogy, you can think of each hardware thread as a swim lane and a software thread as a swimmer.

Each software thread shares the private unique memory space with its parent process. However it has its own stack, registers, and a private local storage.

Windows recognizes each hardware thread as a schedulable logical processor. Each logical processor can run code for a software thread. A process that runs code in multiple software threads can take advantage of hardware threads and physical cores to run instructions in parallel. Figure 1-4 shows software threads running on hardware threads and on physical cores. Windows scheduler can decide to reassign one software thread to another hardware thread to load-balance the work done by each hardware thread. Because there are usually many other software threads waiting for processing time, load balancing will make it possible for these other threads to run their instructions by organizing the available resources. Figure 1-5 shows Windows Task Manager displaying eight hardware threads (logical cores and their workloads).

Load balancing refers to the practice of distributing work from software threads among hardware threads so that the workload is fairly shared across all the hardware threads. However, achieving perfect load balance depends on the parallelism within the application, the workload, the number of software threads, the available hardware threads, and the load-balancing policy.

Windows Task Manager and Windows Resource Monitor show the CPU usage history graphics for hardware threads. For example, if you have a microprocessor with four physical cores and eight hardware threads, these tools will display eight independent graphics.

Windows runs hundreds of software threads by assigning chunks of processing time to each available hardware thread. You can use Windows Resource Monitor to view the number of software threads for a specific process in the Overview tab. The CPU panel displays the image name for each process and the number of associated software threads in the Threads column, as shown in Figure 1-6 where the vlc.exe process has 32 software threads.

Core Parking is a Windows kernel power manager and kernel scheduler technology designed to improve the energy efficiency of multicore systems. It constantly tracks the relative workloads of every hardware thread relative to all the others and can decide to put some of them into sleep mode.

Core Parking dynamically scales the number of hardware threads that are in use based on workload. When the workload for one of the hardware threads is lower than a certain threshold value, the Core Parking algorithm will try to reduce the number of hardware threads that are in use by parking some of the hardware threads in the system. In order to make this algorithm efficient, the kernel scheduler gives preference to unparked hardware threads when it schedules software threads. The kernel scheduler will try to let the parked hardware threads become idle, and this will allow them to transition into a lower-power idle state.

Core Parking tries to intelligently schedule work between threads that are running on multiple hardware threads in the same physical core on systems with microprocessors that include HT. This scheduling decision decreases power consumption.

Windows Server 2008 R2 supports the complete Core Parking technology. However, Windows 7 also uses the Core Parking algorithm and infrastructure to balance processor performance between hardware threads with microprocessors that include HT. Figure 1-7 shows Windows Resource Monitor displaying the activity of eight hardware threads, with four of them parked.

Regardless of the number of parked hardware threads, the number of hardware threads returned by .NET Framework 4 functions will be the total number, not just the unparked ones. Core Parking technology doesn't limit the number of hardware threads available to run software threads in a process.

Under certain workloads, a system with eight hardware threads can turn itself into a system with two hardware threads when it is under a light workload, and then increase and spin up reserve hardware threads as needed. In some cases, Core Parking can introduce an additional latency to schedule many software threads that try to run code in parallel. Therefore, it is very important to consider the resultant latency when measuring the parallel performance.


Excerpted from Professional Parallel Programming with C# by Gastón Hillar Copyright © 2011 by John Wiley & Sons, Ltd. Excerpted by permission of John Wiley & Sons. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Read More Show Less

Table of Contents

















Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Terms of Use.
  • - Barnes & reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously
Sort by: Showing all of 2 Customer Reviews
  • Posted September 13, 2011

    Not for the Faint of Heart

    This is a great book, but it is not for the faint of heart. It's a high level programming book geared towards teaching programmers how to best manage parallel programming techniques. I've dabbled a bit in background processes, but that is nothing compared what's discussed in this book. And just reading the examples are not enough. Putting these concepts into your own code is where the understanding is going to come in and the mythical light bulb is going to suddenly turn on for you. If you have already started working with Parallel Programming, this book will increase your skills and help you master the subject!

    Was this review helpful? Yes  No   Report this review
  • Posted April 6, 2011

    more from this reviewer

    Great coverage

    I wasn't sure what to think about this book when I got it, but as soon as I started reading it I knew that it was going to be a great reference.

    The author starts by explaining that parallel programming is not going to solve every performance problem. In fact, it won't solve most of them. The book attempts to clearly explain how to determine if/when parallel programming is going to be the right solution. The author provides a lot of data to explain what type of gains you can expect (or not). In fact, the author wanted to make sure this point was so clearly understood that it was almost annoying.

    The book starts by going over the TPL, PLINQ, Exception handling in parallel code and parallel friendly collections. Later on you get coverage of the Visual Studio parallel debugging tools and a look at how thread pooling works in .NET 4.

    Overall this book does a great job of explaining parallel theories and how the TPL works and and you can get up and running with just the first 4-5 chapters, but you get so much more advanced information later in the book. It's really worth keeping around.

    Was this review helpful? Yes  No   Report this review
Sort by: Showing all of 2 Customer Reviews

If you find inappropriate content, please report it to Barnes & Noble
Why is this product inappropriate?
Comments (optional)