Parallel Programming with Microsoft .Net : Parallel Aggregation - Design Notes

2/17/2011 9:25:40 AM
If you compare the sequential and parallel versions of the aggregation pattern, you see that the design of the parallel version includes an additional step in the algorithm that merges partial results. Figure 1, which illustrates the Parallel.ForEach and Parallel.For methods, shows this.
Figure 1. Aggregation using Parallel For and ForEach

Figure 1 shows that instead of placing the accumulation in a single, shared result, the parallel loop uses unshared local storage for partial results (these are named subtotals in Figure 1). Each task processes a single partition of the input values. The number of partitions depends on the degree of parallelism that’s needed to efficiently use the computer’s available cores. After a task finishes accumulating the values in its assigned partition, it merges its local result into the final, global result. The final result is shared by all tasks. Locks are required to ensure that updates are consistent.

The reason that this approach is fast is because there are very few locks. In normal cases, the number of elements to be processed is many times larger than the number of tasks and partitions. The cost of serializing locks can be amortized over many individual accumulation operations.

The algorithm used by PLINQ differs slightly from what’s shown in Figure 1. Aggregation in PLINQ does not require the developer to use locks. Instead, the final merge step is expressed as a binary operator that combines any two partial result values (that is, two of the subtotals) into another partial result. Repeating this process on subsequent pairs of partial results eventually converges on a final result. One of the advantages of the PLINQ approach is that it requires less synchronization, so it’s more scalable. The binary combinations do not require the system to lock the final result; they can be performed in parallel. (This is possible because the binary combination operation is associative, so the order of these operations doesn’t affect the result.)

This discussion shows that the parallel aggregation pattern is a good example of why changes to your algorithm are often needed when moving from a sequential to parallel approach.

To make this point clear, here’s an example of what parallel aggregation would look like if you simply added locks to the existing sequential algorithm. You just need to convert sequential for to Parallel.For and add one lock statement.


You can’t just add locks and expect to get good performance. You also need to think about the algorithm.

// Do not copy this code. This version will run much slower
// than the sequential version. It's included here to
// illustrate what not to do.
double[] sequence = ...
object lockObject = new object();
double sum = 0.0d;
// BUG - don't do this
Parallel.For(0, sequence.Length, i =>
// BUG - don't do this
lock (lockObject)
sum += sequence[i];
return sum;

If you forget to add the lock statement, this code fails to calculate the correct sum on a multicore computer. Adding the lock statement makes this code example correct with respect to serialization. If you run this code, it produces the expected sum. However, it fails completely as an optimization. This code is many times slower than the sequential version it attempted to optimize! The reason for the poor performance is the cost of synchronization.

In contrast, the examples of the parallel aggregation pattern that you have seen elsewhere in this article will run much faster on multicore computers than the sequential equivalent, and their performance also improves in approximate proportion to the number of cores.

It might at first seem counterintuitive that adding additional steps can make an algorithm go faster, but it’s true. If you introduce extra work and that work has the effect of preventing data dependencies between parallel tasks, you often benefit in terms of performance.

  •  Parallel Programming with Microsoft .Net : Parallel Aggregation - Variations
  •  Leveraging and Optimizing Search in SharePoint 2010 : Uninstalling FAST Search Server 2010 for SharePoint
  •  Leveraging and Optimizing Search in SharePoint 2010 : Customizing the FAST Search User Interface
  •  Deploying the Client for Microsoft Exchange Server 2010 : Planning Considerations and Best Practices
  •  Deploying the Client for Microsoft Exchange Server 2010 : Understanding Deployment Options
  •  Deploying the Client for Microsoft Exchange Server 2010 : Outlook 2007 Auto Account Setup
  •  Leveraging and Optimizing Search in SharePoint 2010 : Deploying FAST Search Service Applications
  •  Leveraging and Optimizing Search in SharePoint 2010 : Customizing the Search User Interface
  •  Leveraging and Optimizing Search in SharePoint 2010 : Keywords and Best Bets
  •  Leveraging and Optimizing Search in SharePoint 2010 : Federating Search
  •  Leveraging and Optimizing Search in SharePoint 2010 : Search Scopes
  •  Active Directory Domain Services 2008 : View Cached Credentials on a Read-Only Domain Controller
  •  Active Directory Domain Services 2008 : Remove a User, Group, or Computer from the Password Replication Policy
  •  Active Directory Domain Services 2008 : Add a User, Group, or Computer to the Password Replication Policy
  •  Exchange Server 2010 : Backing Up Specific Windows Services
  •  Create Bookmark Create Note or Tag Backing Up Windows Server 2008 and Exchange Server 2010
  •  What to Back Up on Exchange Servers 2010
  •  Leveraging and Optimizing Search in SharePoint 2010 : Define Content Sources
  •  Deploying a Native SharePoint 2010 Search Service Application
  •  Backing Up the Exchange Server 2010 Environment : Roles and Responsibilities & Developing a Backup Strategy
    Most View
    Turn Your Smartphone Into A Safe
    HTC Desire SV Review – A Shining Mid-Range Dual-SIM Smartphone
    Plantronics Marque 2 M165 - Bluetooth Headset
    Sony KDL-40HX853 - Superb DVD Upscaling
    Dell Inspiron 17R Special Edition - Battery Is A Disappointment
    G-360 And G-550 Power Supply Devices Review (Part 3)
    Photo Editors: From Professional RAW Tools To Simple Library Management (Part 3)
    Introduction to Windows 8 Administration : Windows 8 Architecture
    Six Ways To Pump Up The Volume
    iOS 6 Maps - What Went Wrong?
    Top 10
    ASUS Tytan – Tytanfall (Part 2)
    ASUS Tytan – Tytanfall (Part 1)
    Corsair Obsidian Series 350D Case
    Corsair Vengeance K70 Mechanical Gaming Keyboard
    CPU Buyer’s Guide - Hunting For Brains (Part 2)
    CPU Buyer’s Guide - Hunting For Brains (Part 1) : AMD A10-6800K, AMD FX-8350, AMD’s new FX-9590
    Crucial Ballistix Sport 16GB Kit - High-Performance Memory
    Epson WorkForce M1 O5 Printer - No Color Here
    EVGA Z77 Stinger - A New Mini-ITX Motherboard
    Gigabyte GeForce GTX Titan Graphics Card