Parallel Programming with Microsoft .Net : Pipelines - An Example

10/12/2010 9:28:10 AM
The online samples include an application named ImagePipeline. This application takes a directory of JPEG images and generates thumbnail versions, which are also post-processed with several image-enhancing filters. The resulting processed images are displayed as a slideshow. Each image in the slideshow is displayed in alphabetical file name order.
Note: You can’t use a parallel loop for this example because the application has a requirement that images must be processed in sequence. Parallel loops do not guarantee any particular processing order.

1. Sequential Image Processing

Each image is processed in four stages: the large color image is loaded from a file, a small thumbnail with a picture frame is generated from it, noise is added to the image to create a speckling effect, and then the processed image displays as the next picture in the slideshow. Figure 1 illustrates this sequence.

Figure 1. Sequential image processing

Here’s the code for the sequential version.

string sourceDir = ...
IEnumerable<string> fileNames = ...
int count = 0;
ImageInfo info = null;
foreach (var fileName in fileNames)
/// ...
info = LoadImage(fileName, sourceDir, count, ...);
// ...
DisplayImage(info, count + 1, displayFn, ...);
// ...
count += 1;
info = null;

The four steps are performed by the LoadImage, ScaleImage, FilterImage, and DisplayImage methods. This example is slightly abridged for clarity. The code that deals with cancellation, error handing (the disposal of handles to unmanaged objects), and the capture of performance measurements is omitted. You can refer to the online samples to see these details.

2. The Image Pipeline

The sequential loop can process only one image at a time; each image must complete all four stages before work can begin on the next image, and the stages themselves are sequentially linked. In fact, this example seems intractably sequential—the top-level loop has the restriction that images must be displayed in order, and within each step are substeps that require inputs from previous substeps. You can’t display an image until after the filter is applied to it. You can’t apply the filter until after the image is scaled to thumbnail size. You can’t do the scaling until after the original image loads.

However, the pipeline pattern can introduce parallelism into this example. Each image still passes through all four stages in sequence, but the stages can execute in parallel on different images. Figure 2 illustrates the image pipeline.

Figure 2. Image pipeline

The following code shows the parallel version.

IEnumerable<string> fileNames = ...
string sourceDir = ...
Action<ImageInfo> displayFn = ...
int limit = ...
var originalImages = new BlockingCollection<ImageInfo>(limit);
var thumbnailImages = new BlockingCollection<ImageInfo>(limit);
var filteredImages = new BlockingCollection<ImageInfo>(limit);
var f = new TaskFactory(TaskCreationOptions.LongRunning,
// ...

var loadTask = f.StartNew(() =>
LoadPipelinedImages(fileNames, sourceDir,
originalImages, ...));
var scaleTask = f.StartNew(() =>
thumbnailImages, ...));
var filterTask = f.StartNew(() =>
filteredImages, ...));
var displayTask = f.StartNew(() =>
displayFn, ...));
Task.WaitAll(loadTask, scaleTask, filterTask, displayTask);
// ... release handles to unmanaged resources ...

(Some details of error handling, cancellation, and the collection of performance data have been omitted here for clarity. Refer to the online sample for the complete implementation.)

There are three blocking collections that act as buffers between the stages of the pipeline. The four stages are the same as in the sequential version. A call to the task factory StartNew method launches each processing stage as a long-running task.

The code calls Task.WaitAll to defer cleanup until after all stages complete processing all images.

3. Performance Characteristics

To understand the performance characteristics of the sequential and pipelined versions, it’s useful to look at a scheduling diagram such as the one that Figure 3 illustrates.

Figure 3. Image pipeline with stages of equal speed

Figure 3 shows how the tasks in the image pipeline example execute over time. For example, the top row shows that stage 1 processes image 1 starting at time t0 and image 2 starting at time t1. Stage 2 begins processing image 1 at time t1. Assume for a moment that the pipeline is perfectly balanced; that is, each stage of the pipeline takes exactly the same amount of time to do its work. Call that duration T. Therefore, in the figure, t1 occurs after T units of time elapse, t2 after 2 x T units of time elapse, and so on.

If there are enough available cores to allow the pipeline’s tasks to run in parallel, the figure shows that the expected execution time for six images in a pipeline with four stages is approximately 9 x T. In contrast, the sequential version takes approximately 24 x T because each of the 24 steps must be processed one after another.

The average performance improves as more images are processed. The reason for this, as Figure 7-4 illustrates, is that some cores are idle as the pipeline fills during startup and drains during shutdown. With a large number of images, the startup and shutdown time becomes relatively insignificant. The average time per image would approach T.

Note: If there are enough available cores and if all stages of a pipeline take an equal amount of time, the execution time for the pipeline as a whole is the same as the time for just one stage.

However, there’s one catch: the assumption that all the pipeline steps take exactly the same amount of time isn’t always true. Figure 4 shows the scheduling diagram that occurs when the filter stage takes twice as long as the other stages.

Figure 4. Image pipeline with unequal stages

When one of the stages takes 2 x T units of time while the other stages take T units of time, you can see that it’s not possible to keep all of the cores completely busy. On average (with a large number of images), the time to process an images is 2 x T. In other words, when there are enough cores for each pipeline stage, the speed of a pipeline is approximately equal to the speed of its slowest stage.

Note: When the stages of a pipeline don’t take the same amount of time, the speed of a pipeline is approximately equal to the speed of its slowest stage.

If you run the ImagePipeline application in the Microsoft® Visual Studio® development system debugger, you can see this effect for yourself. The ImagePipeline sample has a user interface (UI) feature that reports the average length of time in milliseconds for each of the stages of the pipeline. It also reports the overall average length of time that’s needed to process each image. When you run the sample in sequential mode (by selecting the Sequential radio button), you’ll notice that the steady-state elapsed time per image equals the sum of all the stages. When you run in pipeline mode, the average elapsed time per image converges to approximately the same amount of time as slowest stage. The most efficient pipelines have stages of equal speed. You won’t always achieve this, but it’s a worthy goal.

Top 10
Windows 7 : The Zune PC Software (part 4) - Using Zune - Working with Videos, Organizing Pictures
Windows 7 : The Zune PC Software (part 3) - Using Zune - Rating Content, Working with Playlists
Windows 7 : The Zune PC Software (part 2) - Using Zune - The Zune User Experience, Enjoying Music
Windows 7 : The Zune PC Software (part 1) - Finding and Installing Zune, Configuring the Zune Software
Windows 7 : Microsoft Zune - A Digital Media Alternative - Why Zune?
Microsoft .NET : Design Principles and Patterns - Applying Requirements by Design (part 2) - Security
Microsoft .NET : Design Principles and Patterns - Applying Requirements by Design (part 1) - Testability
Silverlight Recipes : Controls - Applying Custom Templates to a DataGrid Cell
Silverlight Recipes : Controls - Displaying Row Details in a DataGrid
Experience Sennheiser HD700 Headphones
Most View
Buying Guide: CPU Cooling Equipment (Part 5) - Antec KUHLER H2O 620,Arctic Cooling Freezer i30,Cooler Master Hyper 612 PWM
iClone 5 Pro
Sharepoint 2010 : Creating Security Trimmed CRUD Operations on a SQL Database Using Visual Studio 2010
SQL Azure : Building Two OData Consumer Applications (part 2) - Windows Mobile 7 Application
Samsung Series 8 PS51D8000 3D Plasma TV
Summer games
Microsoft ASP.NET 3.5 : The HTTP Request Context - The global.asax File
The best music apps for your iOS Device (Part 1)
Kingston SSDNow V+ 200 - SSDs for The Budget Conscious
Windows 7 : Protecting Your Data from Loss and Theft - Encrypted File System (part 1) - Encrypting Offline Files, Using CIPHER
Tube amplifier for headphone review
Unifying: Greatest Challenge
The Best Apps and Gear of 2012 (Part 3)
Social Networking Tips & Tricks (May 2012)
MobileMe Gallery after iCloud
HP Anti-glare Compaq LCD
Dual-SIM Smartphone Galaxy Ace Duos – Battery Test
Sharepoint 2007: Personal Sites and Personal Details (Available Only in MOSS)
Advanced ASP.NET : Data-Access Components (part 2) - Using the Data-Access Component
Dell S2740L - A Beautifully Crafted 27-inch IPS Monitor