Data parallelism typically begins by unrolling a sequential
loop. For example, the code below defines a loop that runs
sequentially. In this example, the code iterates over the loop body 100
times. The duration of the for loop is the aggregate duration of those 100 operations. Assuming an average of 100 milliseconds per operation, the for loop will take 10 seconds to complete (100 × 100 milliseconds).
for (int count = 0; count < 100; ++count)
{
Operation();
}
Most importantly, the preceding code does not fully utilize the
available processor cores. Look at a view of the processor cores for a
computer with eight cores.
Note
To
display the Windows Task Manager, use the Ctrl+Shift+Esc key
combination. You can then double-click anywhere in one of the CPU usage
graphs to switch to the dedicated CPU usage view.
Notice that the average utilization of the cores is relatively low.
Except for the first processor core, the other cores are not very busy.
A lot of processor power is being wasted!
Here’s the complete code for you to try.
using System;+
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace ParallelLoop
{
class Program
{
static void Operation()
{
Thread.SpinWait(int.MaxValue);
}
static void Main(string[] args)
{
for (int count = 0; count < 100; ++count)
{
Operation();
}
}
}
}
Fortunately, changing from a sequential loop to a parallel loop
requires minimal changes in most circumstances. Here’s the parallel
version of the serial loop in the previous example.
Parallel.For(1, 100, (count) =>
{
Operation();
});
That was an easy change! Look at the processor usage graph for the
parallel version. Nirvana! The graph shows 100 percent processor
utilization with the parallel loop. However, there is a barely visible
straight line across the top of the graph that represents 100 percent
utilization.
You’ll delve into the syntactical nuances of Parallel.For
shortly. The important point now is that this code successfully
parallelizes the operation as a basket of tasks scheduled across the
available cores. There are enough tasks to keep the processor cores
busy, which is an important goal of parallel programming.
The key to high-performing data parallelism is independent
iterations, which prevent dependencies when converted to parallel
tasks. Only with independent iterations can you unroll the loops
successfully. You can accomplish some dependencies, such as reduction,
with minimal effect on performance. Not all dependencies are obvious. Here is a partial list of possible
dependencies:
-
Indexes
-
Shared memory
-
Shared files
-
Order restrictions
Can you identify the dependency in the next loop? The following code
is a parallel loop that squares the current index, which is then
written into a file. The program generates unexpected results. This is
because of a not-so-subtle dependency.
StreamWriter sw = new StreamWriter(filename);
Parallel.For (0, 100, (i)=>sw.WriteLine(i * i));
sw.Close();
The
problem is the shared file reference. The code could attempt to write
to the same file simultaneously from two or more parallel tasks—and
files are not implicitly thread safe. In addition, the order of the
writes is not assured, because the Parallel.For
method does not guarantee the order in which tasks are executed. The
largest reported value should be 10,000 (100×100). This is a partial
listing from the file.
6084
4900
705662415041
7225
6400
9216
73966561
None of the preceding values is less than 1,000! The problem occurs
because of data corruption from simultaneous writes to the file, which
demonstrates the potential problem that dependencies pose for data
parallelism. The shared file was an obvious dependency, but
unfortunately, most dependencies are more subtle and often go
undetected. This underscores the importance of performing rigorous unit
testing, stress testing, and concurrency testing when converting
routines from sequential to parallel. Converting syntax is simple, but
maintaining correctness is a greater challenge.
There are techniques and tricks to identifying
dependencies. One technique is to reverse the iteration of a sequential
loop. In other words, you change the iteration from ascending to
descending. If the reversal causes the application to generate
different results or crash, that change in results indicates the
likelihood of a dependency.