ENTERPRISE

LINQ to Objects : How to Join with Data in Another Sequence (part 7)

1/14/2014 3:19:24 AM
How join/into Works

The join/into keyword combination in query expression syntax maps to the GroupJoin standard query operator. The GroupJoin operator returns a sequence of inner elements with a matching key value for every outer element in a sequence. Unlike the pure join syntax, every outer-sequence element will have a represented group sequence, but those without matching inner-sequence elements will have an empty group. The GroupJoin extension method signature takes the following form:

public static IEnumerable<TResult>
GroupJoin<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer,
IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector,
Func<TInner, TKey> innerKeySelector,
Func<TOuter, IEnumerable<TInner>, TResult> resultSelector,
IEqualityComparer<TKey> comparer // optional
)

The pseudo-code for this operator is similar to the Join operator, except the groupings, rather than the individual elements in each group, are returned. The pseudo-code for GroupJoin is

  1. Create a grouping on the inner sequence for all elements that share the same inner-key selector function value (as determined by the innerKeySelector argument). Call this list lookup.

  2. Iterate the outer sequence. Call each current element item.

    1. Create a grouping from the key of the lookup sequence for all elements that share the same outer-key selector function result each item returns (as determined by the outerKeySelector argument, and the comparer equality comparer argument if specified).

    2. Yield the result by passing each item and the grouping to the resultSelector argument.

The returned sequence will be in the order of the outer-sequence elements, although it can be changed by any combination of order by clauses as required. The order of the elements in each group will be the same as the inner sequence, and if you want to change the order of the inner-sequence elements, consider using a subquery in the select projection with specific order by clauses instead of the GroupJoin syntax.

The key selection expression for the outer and inner sequence can be any expression that evaluates to a comparable type.

Using the extension method syntax for joins is almost never required (almost never—if you need a custom equality comparer, you have no choice). To demonstrate how a join/into query expression maps to the equivalent extension method syntax code, the following code is functionally equivalent to that shown in Listing 7:

var q1a = customers
.GroupJoin(
orders,
c => c.CustomerID,
o => o.CustomerID,
(c, o) => new
{
LastName = c.LastName,
Orders = o
}
);

One-to-Many Join Using a Subquery

A subquery in a query’s select projection can achieve a one-to-many join. If targeting a single record (or a small number, say less than five outer records), this technique performs adequately and is easy to read. Listing 8 demonstrates using a subquery for a one-to-many join, and this code produces the Console output shown in Output8.

Listing 8. One-to-Many Join using a subquery in the select projection—see Output 8
var customers = Customer.SampleData();
var orders = Order.SampleData();

var q2 = from c in customers
select new
{
LastName = c.LastName,
Orders = from o in orders
where o.CustomerID == c.CustomerID
select o
};

foreach (var customer in q2)
{
Console.WriteLine("Last name: {0}", customer.LastName);
foreach (var order in customer.Orders)
Console.WriteLine(" - Order number: {0}", order.OrderNumber);
}

Output 8.
Last name: Gottshall
- Order number: Order 1
- Order number: Order 2
Last name: Valdes
Last name: Gauwain
Last name: Deane
- Order number: Order 3
- Order number: Order 4
Last name: Zeeman
- Order number: Order 5

One-to-Many Join Using the ToLookup Operator

The ToLookup standard query operator builds an efficient lookup list based on a key selection function. The result is similar to a Hashtable, however the ToLookup operator returns a sequence for each key value rather than for a single element. Many of Microsoft’s standard query operators use ToLookup internally to perform their tasks, and it is a convenient helper method in all manner of programming problems. It was surprising in testing to find that by using this operator, a major performance gain was achieved over the other one-to-many techniques. An added advantage of creating your inner-sequence grouping first is that the result can be used multiple times across multiple queries. Listing 9 demonstrates how to achieve a one-to-many join using the ToLookup operator; this code produces the identical output to that shown in Output 9.

Listing 9. One-to-Many Join using the ToLookup Operator—see Output 9
var customers = Customer.SampleData();
var orders = Order.SampleData();

// build a lookup list for the inner sequence
var orders_lookup = orders.ToLookup(o => o.CustomerID);

var q3 = from c in customers
select new
{
LastName = c.LastName,
Orders = orders_lookup[c.CustomerID]
};

foreach (var customer in q3)
{
Console.WriteLine("Last name: {0}", customer.LastName);
foreach (var order in customer.Orders)
Console.WriteLine(" - Order number: {0}", order.OrderNumber);
}

One-to-Many Join Performance Comparisons

As seen in one-to-one joins, choosing the correct join technique is critical when working with large collection sizes, and the same holds true for one-to-many joins. The guidance offered here is in not a substitute for running tests on your own queries and data, however the guidance offered in Tables 4 and 5 show some critical join types to avoid. A database system is the best tool for undertaking joins over larger records sets; small data sizes work fine, but if performance is a critical requirement, exhaust all options of joining the data at the source database before using LINQ to Objects.

To derive the performance guidance for one-to-many joins, do a simple experiment of using the three different join techniques over a large number of iterations and measuring the total time. Each join technique has a different performance profile for the two most basic scenarios:

  1. The outer sequence has many records—An example of this scenario is when you are binding the full sequence of results to a grid or other repeater control and summarizing the results of a related set of records. For example, the following code demonstrates a one-to-many join where outer collection has many records:

    var q1 = from c in customers
    join o in orders on
    c.CustomerID equals o.CustomerID
    into cust_orders
  2. The outer sequence has only a few records, either by count or because of a strict where clause (targeting a single record is the most common)—An example of this scenario is when you are looking up a single record for editing purposes. For example, the following code demonstrates a one-to-many join where a single record is isolated in the outer collection:

    var q1 = from c in customers
    where c.CustomerID == "DEA1"
    join o in orders on
    c.CustomerID equals o.CustomerID
    into cust_orders
Table 5. One-to-Many Join Techniques and Their Performance Comparisons Using 1M Iterations
Join Technique Iterations = 1Million timesMany Record Performance Outer n = 100 Inner n = 5Many Record Performance Outer n = 50 Inner n = 5Many Record Performance Outer n = 5 Inner n = 5Single Record Performance Outer n = 1 Inner n = 5
Using the join/into combination (or GroupJoin) operator Listing 721562ms (3 - worst), although improves as inner sequence count grows larger12284ms2410ms1964ms (3 - worst)
Using a subquery in the select projection Listing 820179ms (2)9953ms1222ms712ms (2)
Using the ToLookup operator Listing 917928ms (1 - best)9514ms1218ms694ms (1 - best)


Conclusions can be drawn from these simple experiments:

  • Use the ToLookup syntax for all one-to-many scenarios, unless the inner query is complex and easier to achieve with a subquery.

  • The join/into syntax is slower than the other techniques and should only be used with small inner sequence counts (it gets superior as inner sequence size grows) or when aggregation operators are going to be used over the join result.

  • Use a relational database system for large volume join operations where possible.

Other  
  •  Moving into SAP Functional Development : Gaining Control of Change Control - How to Organize and Plan for Change in the Real World
  •  Moving into SAP Functional Development : Gaining Control of Change Control - Change Control and the SAP Solution Stack
  •  Moving into SAP Functional Development : Gaining Control of Change Control - Change Control Affects Everything
  •  Exchange Server 2010 : Outlook Integration (part 7) - Document Library Integration
  •  Exchange Server 2010 : Outlook Integration (part 6) - Alert Integration
  •  Exchange Server 2010 : Outlook Integration (part 5) - Task Integration
  •  Exchange Server 2010 : Outlook Integration (part 4) - Contact Integration
  •  Exchange Server 2010 : Outlook Integration (part 3) - Creating a Meeting Workspace
  •  Exchange Server 2010 : Outlook Integration (part 2) - Calendar Integration
  •  Exchange Server 2010 : Outlook Integration (part 1) - Integration Overview
  •  
    Video tutorials
    - How To Install Windows 8 On VMware Workstation 9

    - How To Install Windows 8

    - How To Install Windows Server 2012

    - How To Disable Windows 8 Metro UI

    - How To Change Account Picture In Windows 8

    - How To Unlock Administrator Account in Windows 8

    - How To Restart, Log Off And Shutdown Windows 8

    - How To Login To Skype Using A Microsoft Account

    - How To Enable Aero Glass Effect In Windows 8

    - How To Disable Windows Update in Windows 8

    - How To Disable Windows 8 Metro UI

    - How To Add Widgets To Windows 8 Lock Screen
    programming4us programming4us
    Top 10
    Free Mobile And Desktop Apps For Accessing Restricted Websites
    MASERATI QUATTROPORTE; DIESEL : Lure of Italian limos
    TOYOTA CAMRY 2; 2.5 : Camry now more comely
    KIA SORENTO 2.2CRDi : Fuel-sipping slugger
    How To Setup, Password Protect & Encrypt Wireless Internet Connection
    Emulate And Run iPad Apps On Windows, Mac OS X & Linux With iPadian
    Backup & Restore Game Progress From Any Game With SaveGameProgress
    Generate A Facebook Timeline Cover Using A Free App
    New App for Women ‘Remix’ Offers Fashion Advice & Style Tips
    SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th