ENTERPRISE

LINQ to Objects : How to Join with Data in Another Sequence (part 5) - One-to-One Join Performance Comparisons

1/14/2014 3:12:01 AM
2.6 One-to-One Join Performance Comparisons

Choosing the most appropriate join technique will yield large performance gains as collection size grows. To have some guidance on which technique makes sense, a simple experiment is necessary (results of such an analysis are summarized in Table 1). As mentioned earlier, a database management system (such as SQL Server, MySQL, and Oracle) should be used when joining large amounts of data, and although you can join in-memory using LINQ to Objects, you should exhaust other opportunities first if high performance on larger data sets is a requirement.

One-to-one join performance is primarily dependent on how many different elements will need to be looked up from the inner sequence and how often that is necessary. The Join operator builds a lookup list of all inner sequence keys ahead of time. If many of these keys are then used in the outer sequence, this step aids performance; if only a few of the inner elements will ever be eventually looked up (because there are only a few outer records), then this step actually harms performance. You will need to examine the likely number of records in both the inner and outer sequences. In the small set of test data used for this profiling example, it was evident that five records or more was the sweet-spot for building this lookup table in advance. Though if the number of records in the inner sequence grows, so will the sweet-spot—you must test given your own sequence sizes, and consider their likely growth over time.

The experimental approach used in this test was to simply run a large number of join iterations over some sample data and determine the total elapsed time. Each of the four different techniques were tested for all records in an outer sequence and a single record in the outer sequence (by adding a where clause for a single element).

To measure the elapsed time, a simple helper method was written that takes an action delegate and measures the elapsed time of calling that delegate a given number of times (iterations):

private long MeasureTime(Action action, int iterations)
{
System.Diagnostics.Stopwatch watch =
new System.Diagnostics.Stopwatch();

watch.Start();

for (int i = 0; i < iterations; i++)
action();

return watch.ElapsedMilliseconds;
}

This helper function is called by specifying an action inline, such as

long t = MeasureTime(delegate { q1.ToList(); }, 1000000)

In this example, q1 is our LINQ to Objects query under test. The method ToList is called on this query to ensure that the full sequence is yielded, meaning that every result element is processed avoiding deferred execution making all queries appear instant.

Each Join technique has a different performance profile for the two most basic scenarios:

  1. The outer sequence has many records, and you want all of them—An example of this scenario is when you are binding the full sequence of results to a grid or other repeater control. For example, the following code demonstrates a one-to-one join where outer collection has many records:

    var q1 = from outer in orders
    join inner in customers on
    outer.CustomerID equals inner.CustomerID
  2. The outer sequence has only a few records, either by count or because of a strict where clause (targeting a single record is the most common)—An example of this scenario is when you are looking up a single record for editing purposes. For example, the following code demonstrates a one-to-one join where a single record is isolated in the outer collection:

    var q1 = from outer in orders
    where outer.OrderNumber == "Order 3"
    join inner in customers on
    outer.CustomerID equals inner.CustomerID

The test data used for this experiment is small—up to 100 orders and 5 customers (for a full listing, see Table 2). This is an unrealistic size of data, but it does provide a stable platform for understanding relative performance characteristics. This experiment is looking at how the relationship between outer and inner sequence collection size impacts query performance. Table 3 shows the results of this experiment.

Table 3. One-to-One Join Techniques and Their Performance Comparisons Using 1M Iterations
Join Technique Iterations = 1MScenario 1 -Many Record Performance Outer n = 100 Inner n = 5Many Record Performance Outer n = 50 Inner n = 5Many Record Performance Outer n = 5 Inner n = 5Scenario 2 - Single Record Performance Outer n = 1 Inner n = 5
Using the join (or Join) operator Listing 221793ms (1 - best)11319ms2432ms1852ms (4 - worst)
Using a subquery in the select projection Listing 475897ms (3)40464ms4375ms1493ms (2)
Using the SingleOrDefault operator in the select projection Listing 556335ms (2)27412ms3758ms1320ms (1 - best)
Using a cross join with a where filter Listing 680829ms (4 - worst)40424ms5142ms1636ms (3)

Note

The actual numbers in seconds are highly subject to hardware and available memory; they are shown here for relative purposes only. This experiment is to differentiate performance of the different join techniques.


Plotting the results for 100, 50, and 1 demonstrates how linear the performance characteristic is against outer loop count. The join syntax always wins until the outer sequence size drops to below five. However, when the outer-sequence count is less than five, all syntax options are instant unless you are running one million iterations.

From this simple experiment, some conclusions can be drawn:

  • Use the Join syntax for all one-to-one scenarios unless the outer sequence has less than five elements

  • Use the SingleOrDefault syntax when the outer-sequence size is less than five elements and the number of iterations is in the hundreds of thousands.

    Figure 1. Different one-to-one Join techniques. Time taken for 1M iterations against an outer sequence of 100, 50, and 1 element count. Join syntax is faster until around an outer-sequence size of five elements.

  • Never use the cross join/where syntax for one-to-one joins. Enumerating every combination of element versus element only to filter out all but a few results impacts performance in a big way.

  • Use a relational database system for large volume join operations where possible.

Other  
  •  Moving into SAP Functional Development : Gaining Control of Change Control - How to Organize and Plan for Change in the Real World
  •  Moving into SAP Functional Development : Gaining Control of Change Control - Change Control and the SAP Solution Stack
  •  Moving into SAP Functional Development : Gaining Control of Change Control - Change Control Affects Everything
  •  Exchange Server 2010 : Outlook Integration (part 7) - Document Library Integration
  •  Exchange Server 2010 : Outlook Integration (part 6) - Alert Integration
  •  Exchange Server 2010 : Outlook Integration (part 5) - Task Integration
  •  Exchange Server 2010 : Outlook Integration (part 4) - Contact Integration
  •  Exchange Server 2010 : Outlook Integration (part 3) - Creating a Meeting Workspace
  •  Exchange Server 2010 : Outlook Integration (part 2) - Calendar Integration
  •  Exchange Server 2010 : Outlook Integration (part 1) - Integration Overview
  •  
    Top 10
    3 Tips for Maintaining Your Cell Phone Battery (part 2) - Discharge Smart, Use Smart
    3 Tips for Maintaining Your Cell Phone Battery (part 1) - Charge Smart
    OPEL MERIVA : Making a grand entrance
    FORD MONDEO 2.0 ECOBOOST : Modern Mondeo
    BMW 650i COUPE : Sexy retooling of BMW's 6-series
    BMW 120d; M135i - Finely tuned
    PHP Tutorials : Storing Images in MySQL with PHP (part 2) - Creating the HTML, Inserting the Image into MySQL
    PHP Tutorials : Storing Images in MySQL with PHP (part 1) - Why store binary files in MySQL using PHP?
    Java Tutorials : Nested For Loop (part 2) - Program to create a Two-Dimensional Array
    Java Tutorials : Nested For Loop (part 1)
    REVIEW
    - First look: Apple Watch

    - 3 Tips for Maintaining Your Cell Phone Battery (part 1)

    - 3 Tips for Maintaining Your Cell Phone Battery (part 2)
    VIDEO TUTORIAL
    - How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010 (Part 1)

    - How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010 (Part 2)

    - How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010 (Part 3)
    Popular Tags
    Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Biztalk Exchange Server Microsoft LynC Server Microsoft Dynamic Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Indesign Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe After Effects Adobe Photoshop Adobe Fireworks Adobe Flash Catalyst Corel Painter X CorelDRAW X5 CorelDraw 10 QuarkXPress 8 windows Phone 7 windows Phone 8 BlackBerry Android Ipad Iphone iOS