You can modify the behavior of a PLINQ query with a variety of clauses and methods that are actually extension methods of ParallelQuery<TSource>.
Most of these are the same clauses and methods available to LINQ. You
can use these operators either independently or together to affect the
behavior of a PLINQ query. However, PLINQ also introduces some new
constructs, which are introduced in this section.
You create a PLINQ query to parallelize your code. In most circumstances, the next step is to iterate the results by using a foreach or for
method. At that time, the query is most likely performed by using
deferred execution. The results are processed in iterations of the foreach loop. There is only one problem: the foreach
loop is sequential. This is a classic “hurry-up-and-wait” scenario.
After executing a PLINQ query, you might want to extend parallelism to
handle the results in parallel as well.
The Parallel.ForEach
method is useful for parallelizing the same operation over a collection
of values. It would appear natural to adhere to the same model to
process the results of a PLINQ query. PLINQ returns a ParallelQuery<TSource> type, which represents multiple streams of data. However, Parallel.ForEach expects a single stream of data, which is then parsed into multiple streams. For this reason, the Parallel.ForEach method must recognize and convert multistream input to a single stream. There is a performance cost for this conversion.
The solution is the ParallelQuery<TSource>.ForAll method. The ForAll method directly accepts multiple streams, so it avoids the overhead of the Parallel.ForEach method. Here is a prototype of the ForAll method. The first parameter is the target of the extension method, which is a ParallelQuery type. The last parameter is an Action delegate. For the Action
delegate, you can use a delegate, a lambda expression, or even an
anonymous method. The next element of the collection is passed as a
parameter to the delegate.
public static void ForAll<TSource>(
this ParallelQuery<TSource> source,
Action<TSource> action
)
Here is a short tutorial that demonstrates the ForAll operator. In this example, you will perform a parallel query on a string array and then select and display strings longer than two characters in length.
Perform a parallel query of a string array
-
Create a console application for C# in Visual Studio 2010. In the Main method, define a string array.
string [] stringArray = { "A", "AB", "ABC", "ABCD" };
-
Perform a PLINQ query on the string array. Select strings with a length greater than two.
var results=from value in stringArray.AsParallel() where value.Length>2 select value;
-
Call the ForAll operator on the results. In the lambda expression, display the current item.
results.ForAll((item) => Console.WriteLine(item));
Here is the source code for the entire application.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ForAll
{
class Program
{
static void Main(string[] args)
{
string[] stringArray = { "A", "AB", "ABC", "ABCD" };
var results = from value in stringArray.AsParallel()
where value.Length > 2 select value;
results.ForAll((item) => Console.WriteLine(item));
Console.WriteLine("Press Enter to Continue");
Console.ReadLine();
}
}
}
The application will display ABC and ABCD as the result.
The AsParallel method to convert LINQ to PLINQ. It is a simple change to a LINQ query that alters the semantics completely.
A PLINQ query is not guaranteed to actually execute in parallel.
Overhead from executing the parallel query in parallel, such as
thread-related costs, synchronization, and the parallelization code,
can exceed the performance gain. Determining the relative performance
benefit of the PLINQ query is an inexact science based on several
factors. Here are some of the considerations that might affect the
performance of a PLINQ query:
One of the biggest factors is the duration of the parallel operations, such as the Select
clause. Dependencies and the synchronization that results from them
adversely affect the performance of any parallel solution. Furthermore,
shorter operations might not be worth parallelizing, because the
associated overhead might exceed the duration of the operation. For
small operations, you could change the chunking to improve the balance
of execution to overhead.
The number of processor cores might affect the performance of your
parallel application, including PLINQ. However, you should typically
ignore the number of processor cores, because that’s mostly beyond your
control. Maintaining hardware independence in your application is
important for both scalability and portability.
PLINQ does not consider all of the above factors when deciding to
execute a query in parallel. Based on the shape of the query and the
clauses used, PLINQ decides to execute a query either in parallel or
sequentially. You can override this default by using the WithExecutionMode clause with the ParallelExecutionMode enumeration as a parameter. The two options are ParallelExecutionMode.ForceParallelism and ParallelExecutionMode.Default. Use the ParallelExecutionMode.ForceParallelism enumeration to require parallel execution. The ParallelExecutionMode.Default
value defers to the PLINQ for the appropriate decision on the execution
mode. Here is an example that forces a parallel PLINQ query.
from item in data.AsParallel().WithExecutionMode(ParallelExecutionMode.ForceParallelism)
select item;
How the result of your query expression is handled can also affect performance. For example, the following PLINQ query returns a List<T> type. Converting the PLINQ to a list requires that the results be buffered to return an entire list.
intArray.AsParallel().Where((value)=>value>5).ToList();
As mentioned, for the above code, the results are buffered. In some
circumstances, PLINQ might buffer the results, but that is mostly
transparent to your code.
Using the .NET Framework 4 thread pool, PLINQ uses multiple threads
to execute the query in parallel. The results of these parallel
operations are then merged back onto the joining thread. The merge
option describes the buffering used when merging results from the
various threads.
Here are the merge options as defined in the ParallelMergeOptions enumeration:
-
NotBuffered. The results are not buffered. For operations such as the ForAll operation, NotBuffered is the default.
-
FullyBuffered. The results are fully buffered, which can delay receipt of the first result.
-
AutoBuffered. This option is similar to NotBuffered, except that the results are returned in chunks.
-
Default. The default is AutoBuffered.
You can override the default buffer preference with the WithMergeOptions operator.