An example of dynamic
task parallelism is when you sort a list with an algorithm such as
QuickSort. This algorithm first divides an unsorted array of values,
and then it orders and recombines the pieces. Here’s a sequential
implementation.
Note:
Sorting is a typical application that can benefit from dynamic task parallelism.
static void SequentialQuickSort(int[] array, int from, int to)
{
if (to - from <= Threshold)
{
InsertionSort(array, from, to);
}
else
{
int pivot = from + (to - from) / 2;
pivot = Partition(array, from, to, pivot);
SequentialQuickSort(array, from, pivot);
SequentialQuickSort(array, pivot + 1, to);
}
}
This method sorts array in place, instead of returning a sorted array. The from and to
arguments identify the segment that will be sorted. The code includes
an optimization. It’s not efficient to use the recursive algorithm on
short segments, so the method calls the non-recursive InsertionSort method on segments that are not longer than Threshold, which is set in a global variable. This optimization applies equally to the sequential and parallel versions.
If the segment is longer than Threshold, the recursive algorithm is used. The element in the middle of the segment (at pivot) is chosen. The Partition method moves all the array elements not greater than the element at pivot to the segment that precedes pivot, and leaves the greater elements in the segment that follows pivot (pivotQuickSort on both segments. itself may be moved). Then the method recursively calls
The following code shows a parallel implementation.
static void ParallelQuickSort(int[] array, int from,
int to, int depthRemaining)
{
if (to - from <= Threshold)
{
InsertionSort(array, from, to);
}
else
{
int pivot = from + (to - from) / 2;
pivot = Partition(array, from, to, pivot);
if (depthRemaining > 0)
{
Parallel.Invoke(
() => ParallelQuickSort(array, from, pivot,
depthRemaining - 1),
() => ParallelQuickSort(array, pivot + 1, to,
depthRemaining - 1));
}
else
{
ParallelQuickSort(array, from, pivot, 0);
ParallelQuickSort(array, pivot + 1, to, 0);
}
}
}
This version uses Parallel.Invoke to execute the recursive calls in tasks that can run in parallel. Tasks are created dynamically with each recursive call; if the array is large, many tasks might be created.
The parallel version
includes an additional optimization. It’s generally not useful to
create many more tasks than there are processors to run them. So, the ParallelQuickSort method includes an additional argument to limit task creation. The depthRemaining
argument is decremented on each recursive call, and tasks are created
only when this argument exceeds zero. The following code shows how to
calculate an appropriate depth (the depthRemaining argument) from the number of processors.
public static void ParallelQuickSort(int[] array)
{
ParallelQuickSort(array, 0, array.Length,
(int) Math.Log(Environment.ProcessorCount, 2) + 4);
}
One relevant factor in
selecting the number of tasks is how similar the predicted run times of
the tasks will be. In the case of QuickSort, the duration of the tasks
may vary a great deal because the pivot points depend on the unsorted
data. They don’t necessarily result in segments of equal size (in fact,
the sizes could vary widely). To compensate for the uneven sizes of the
tasks, the formula in this example that calculates the depthRemaining
argument produces more tasks than cores. The formula limits the number
of tasks to approximately sixteen times the number of cores. This is
because the number of tasks can be no larger than 2 ^ depthRemaining. If you substitute depthRemaining = log2(NCores) + 4 and simplify the expression, you see that the number of tasks is 16 x NCores. (Recall that for any value a, 2 ^ (aa and that if a = log2(b), 2^a = b.) + 4) is the same as 16 times 2^
Note:
Limiting the number of subtasks by measuring the recursion depth is an extremely important technique.