The basis of any Relational Database Management System (DBMS)
is the ability to draw together data from multiple sources (tables
generally) as the result of a query; this is called “Joining” data. In
relational databases, the joins are most often between primary key and
foreign key-related columns, but a join can be achieved on any data as
long as both data sources share a common column value. One fundamental
reason for splitting data into multiple sources is to avoid duplicating data—this process is called normalization by database developers.
Although the concept of joining in-memory
collections isn’t a common pattern today (it’s more likely that you
nest a collection inside another object to indicate a parent-child
relationship), LINQ to Objects introduces the concept of joining
multiple data sources through its powerful Join and GroupJoin
operators and their equivalent query expression syntax options. LINQ
also enables join-like functionality by using a subquery expression
within the Select clause.
There are three basic styles of Joins that are
commonly used when drawing together data from multiple sources based on
a shared key value:
Cross Join—
Every element in the first sequence (the outer) is projected with every
element in the second sequence (the inner), often called the Cartesian Product. An example use for this type of join is when you want to systematically access each pixel in a bitmap by x and y locations.
One-to-One Inner Join—
This type of join allows you to access data in both sequences based on
a common column value. The resulting sequence is flat (there is no
subgroup). An example use for this type of join is when you want to
access data from another sequence to extend the data you have on some
element.
One-to-Many Join—
This type of join allows you to use the first sequence record as a key
for a subgroup sequence where a given column value is used as the
referential key. An example of this type of join is having a list of
orders for a set of given customers.
There are numerous ways in LINQ to achieve each of these join types. Each technique and its relative merit are discussed next.
1. Cross Joins
Cross joins project every element in one sequence
against every element in a second sequence. Cross joins in LINQ to
Objects ironically aren’t implemented using either of the Join
operators, but rather a clever use of the SelectMany
operator. The query expression syntax will be familiar (not identical)
to anyone with knowledge of standard SQL joins. In LINQ query
expression syntax, you simply specify multiple from statements, as shown in the following code:
var outer = Enumerable.Range(1, 3);
var inner = Enumerable.Range(1, 3);
var q = from x in outer
from y in inner
select new { x, y };
foreach (var element in q)
Console.WriteLine("{0}, {1}", element.x, element.y);
The result from this query will be the Cartesian Product, which is a fancy term for all values in the outer range, x (with values 1, 2, and 3) against each value in the inner range, y (with values 1, 2, and 3) as shown in the following result:
1, 1
1, 2
1, 3
2, 1
2, 2
2, 3
3, 1
3, 2
3, 3
The following code, which is identical to the
previous query except that it is written in extension method syntax,
shows how the query expression syntax (using multiple from statements) is mapped to the SelectMany operator. The result that it yields is identical as well:
var q1 = outer
.SelectMany(
x => inner, // inner collection selector
(x, y) => new { x, y } // select projection
);
Cross joins allow you to expand the matrix of
values, every combination of values between two or more sequences.
There is no limit to the number of sequences that can participate in
the cross-product operation. Listing 1 demonstrates how a binary sequence can be created by using a cross-join combination of four sequences. Output 1
shows the result, a sequence of every binary value for a 4-bit number.
This might be useful when generating all combinations of a test case.
Listing 1. Any number of sequences can participate in a cross join. Every
element will be projected against every element of included
sequences—see Output 1
var binary = new int[] { 0, 1 };
var q = from b4 in binary from b3 in binary from b2 in binary from b1 in binary select String.Format( "{0}{1}{2}{3}", b4, b3, b2, b1);
foreach (var element in q) Console.WriteLine(element);
|
Output 1.
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
|
Another common cross-join example is accessing every
pixel in a bitmap, from 0 to the bitmap’s width (x) and 0 to the height
(y) index position. The following code demonstrates how to combine a
cross join and the Range operator to achieve access to every pixel in a bitmap:
var q = from x in Enumerable.Range(0, bm.Width)
from y in Enumerable.Range(0, bm.Height)
select bm.GetPixel(x, y);
A
cross join can also be used to achieve a one-to-one join by creating
every permutation between two collections and then filtering all but
those elements that share a key value; although as you will see, this
is generally the worst performing technique for this purpose in LINQ to
Objects.