2.2 Simulating Outer Joins—Getting Unmatched Elements
Outer joins
differ from inner joins in the way they handle missing and unmatched
values. An outer join will return a result for outer elements that have
no matching inner elements, whereas these elements would be skipped
when using an inner join (as demonstrated in Listing 2).
To carry out an outer-join style query in LINQ to Objects, the DefaultIfEmpty standard query operator is used in the from
clause of a join, and the null value generated by this operator handled
in the select projection. Although this sounds difficult, Listing 3 demonstrates an outer join using the sample data shown in Table 2.
This query is a little ambiguous at first glance, but in essence it
injects a null element (the default value for an instance of Order type) for any Customer
element that would have been removed by the inner join. This propagates
a null value for order that any subsequent code working with that
result property can process. In this example the ternary operator in
the Console.WriteLine statement is catching and handling the null values by replacing those results with the string "(no orders)".
Listing 3. Structuring the query for an outer join where all outer-sequence
records are returned, even though they have no orders—see Output 3
var customers = Customer.SampleData(); var orders = Order.SampleData();
var q = from c in customers join o in orders on c.CustomerID equals o.CustomerID into j from order in j.DefaultIfEmpty() select new { LastName = c.LastName, Order = order };
foreach (var element in q) Console.WriteLine( "Customer: {0} Order Number: {1}", element.LastName.PadRight(9, ' '), element.Order == null ? "(no orders)" : element.Order.OrderNumber);
|
Output 3.
Customer: Gottshall Order Number: Order 1 Customer: Gottshall Order Number: Order 2 Customer: Valdes Order Number: (no orders) Customer: Gauwain Order Number: (no orders) Customer: Deane Order Number: Order 3 Customer: Deane Order Number: Order 4 Customer: Zeeman Order Number: Order 5
|
The alternative to handling
the null values in all places that will process the query result, is to
handle the null instances in the actual select projection of the query
itself, avoiding duplication of null handling code. T
var q1 = from c in customers
join o in orders on
c.CustomerID equals o.CustomerID into j
from order in j.DefaultIfEmpty()
select new
{
LastName = c.LastName,
OrderNumber = order == null ?
"(no order)" : order.OrderNumber
};
2.3 Joining by Composite Keys (More Than One Value)
Similar in approach to how multiple values can be
used to specify the key value for grouping, composite keys are
supported for joins by using anonymous types . In the case of joins, two
anonymous types must be specified for the inner and outer keys as
defined by the arguments on either side of the equals keyword.
The following example
performs a one-to-one inner join between two in-memory collections. The
key values used for joining are phone number and extension, combined
using an anonymous type.
List<Contact> outer = Contact.SampleData();
List<CallLog> inner = CallLog.SampleData();
var q = from contact in outer
join call in inner on
new {
phone = contact.Phone,
contact.Extension
}
equals
new {
phone = call.Number,
call.Extension
}
select new { call, contact };
The anonymous types must match exactly—in property
names and property order. In the previous example, it was necessary to
specifically name the phone property because that property wasn’t the same name in both collections; Extension was the same, so the property name in the initializer could be omitted. (The compiler inferred the name to be Extension for this field.)
How Join Works
The query expression join syntax (like all the query expression keywords) map to the extension method appropriately named Join. This extension method takes two sequences (one as an extension method source outer, the other as an ordinary argument inner), two key selection functions (one for each source), and a result selection function. The Join extension method signature takes the following form:
public static IEnumerable<TResult>
Join<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer,
IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector,
Func<TInner, TKey> innerKeySelector,
Func<TOuter, TInner, TResult> resultSelector,
IEqualityComparer<TKey> comparer // optional
)
To understand how the Join operator works, the basic pseudo-code it carries out is as follows:
Create
a grouping on the inner sequence for all elements that share the same
inner key selector function value (as determined by the innerKeySelector argument). Call this list of groups lookup.
Iterate the outer sequence. Call each current element item.
Create a grouping on the lookup sequence for all elements that share the same outer-key selector function value as this item (as determined by the outerKeySelector argument). If a comparer argument is supplied, use this comparer to calculate the equality.
If there are any items in this grouping result
Loop through all elements in this group
Yield the result returned by the result selector on each element (using the resultSelector argument)
The returned sequence will be in the order of the
outer sequence elements, one record for each matching inner element,
followed by the next outer element’s records. Outer elements without
matching inner elements will be skipped.
The key selection expression for the outer and inner sequence can be any expression that evaluates to a comparable type.
From the final query select projection, you have
access to both sequences’ matching elements. The range variable can be
used for both the inner and outer records throughout the query and
referenced within the select projection expression to create a new type
or an anonymous type.