4. Specifying Your Own Key Comparison Function
The
default behavior of grouping is to equate key equality using the normal
equals comparison for the type being tested. This may not always suit
your needs, and it is possible to override this behavior and specify
your own grouping function. To implement a custom comparer, you build a
class that implements the interface IEqualityComparer<T> and pass this class as an argument into the GroupBy extension method.
The IEqualityComparer<T> interface definition has the following definition:
public interface IEqualityComparer<T>
{
public bool Equals(T x, T y)
public int GetHashCode(T obj)
}
The Equals method is used to indicate that
one instance of an object has the same equality to another instance of
an object. Overriding this method allows specific logic to determine
object equality based on the data within those objects, rather than
instances being the same object instance. (That is, even though two
objects were constructed at different times and are two different
objects as far as the compiler is concerned, we want them to be deemed
equal based on their specific combination of internal values or
algorithm.)
The GetHashCode
method is intended for use in hashing algorithms and data structures
such as a hash table. The implementation of this method returns an
integer value based on at least one value contained in the instance of
an object. The resulting hash code can be used by other data structures
as a unique value (but not guaranteed unique). It is intended as a
quick way of segmenting instances in a collection or as a short-cut
check for value equality (although a further check needs to be carried
out for you to be certain of object value equality). The algorithm that
computes the hash code should return the same integer value when two
instances of an object have the same values (in the data fields being
checked), and it should be fast, given that this method will be called
many times during the grouping evaluation process.
The following example demonstrates one possible use
of a custom equality comparer function in implementing a simple Soundex
comparison routine (see http://en.wikipedia.org/wiki/Soundex for a full definition of the Soundex algorithm) that will group phonetically similar names. The code for the SoundexEqualityComparer is shown in Listing 5.
Soundex is an age-old algorithm that computes a reliable four character
string value based on the phonetics of a given word; an example would
be “Katie” is phonetically identical to “Katy.”
The approach for building the Soundex equality operator is to
Code
the Soundex algorithm to return a four-character string result
representing phonetic sounding given a string input. The form of the
Soundex code is a single character, followed by three numerical digits,
for example A123 or V456.
Implement the GetHashCode
for a given string. This will call the Soundex method and then convert
the Soundex code to an integer value. It builds the integer by using
the ASCII value of the character, multiplying it by 1000 and then
adding the three digit suffix of the Soundex code to this number, for
example A123 would become, (65 × 1000) + 123 = 65123.
Implement the Equals method by calling GetHashCode on both input arguments x and y and then comparing the return integer results. Return true if the hash codes match (GetHashCode can be used in this implementation of overloading the Equals operator because it is known that the Soundex algorithm implementation returns a unique value—this is not the case with other GetHashCode implementations).
Care must be taken for null values and empty strings
in deciding what behavior you want. I decided that I wanted null or
empty string entries to be in one group, but this null handling logic
should be considered for each specific implementation. (Maybe an empty
string should be in a different group than null entries; it really
depends on the specific situation.)
Listing 5. The custom SoundexEqualityComparer allows phonetically similar sounding strings to be easily grouped
public class SoundexEqualityComparer : IEqualityComparer<string> { public bool Equals(string x, string y) { return GetHashCode(x) == GetHashCode(y); }
public int GetHashCode(string obj) { // E.g. convert soundex code A123, // to an integer: 65123 int result = 0;
string s = soundex(obj); if (string.IsNullOrEmpty(s) == false) result = Convert.ToInt32(s[0]) * 1000 + Convert.ToInt32(s.Substring(1, 3));
return result; }
private string soundex(string s) { // Algorithm as listed on // http://en.wikipedia.org/wiki/Soundex. // builds a string code in the format: // [A-Z][0-6][0-6][0-6] // based on the phonetic sound of the input.
if (String.IsNullOrEmpty(s)) return null;
StringBuilder result = new StringBuilder(s.Length);
// As long as there is at least one character, // then we can proceed string source = s.ToUpper().Replace(" ", "");
// add the first character, then loop the // string mapping as we go result.Append(source[0]); char previous = '0';
for (int i = 1; i < source.Length; i++) { // map to the soundex numeral char mappedTo = '0'; char thisChar = source[i];
if ("BFPV".Contains(thisChar)) mappedTo = '1'; else if ("CGJKQSXZ".Contains(thisChar)) mappedTo = '2'; else if ("DT".Contains(thisChar)) mappedTo = '3'; else if ('L' == thisChar) mappedTo = '4'; else if ("MN".Contains(thisChar)) mappedTo = '5'; else if ('R' == thisChar) mappedTo = '6';
// ignore adjacent duplicates and // non-matched characters if (mappedTo != previous && mappedTo != '0') { result.Append(mappedTo); previous = mappedTo; } }
while (result.Length < 4) result.Append("0");
return result.ToString(0, 4); } }
|
Listing 6 demonstrates using a custom equality comparer (in this case SoundexEqualityComparer). The Console output is shown in Output 3.
Listing 6. Calling code of the new SoundexEqualityComparer used for grouping phonetically similar names—see Output 3
string[] names = new string[] { "Janet", "Janette", "Joanne", "Jo-anne", "Johanne", "Katy", "Katie", "Ralph", "Ralphe" };
var q = names.GroupBy(s => s, new SoundexEqualityComparer());
foreach (var group in q) { Console.WriteLine(group.Key); foreach (var name in group) Console.WriteLine(" - {0}", name); }
|
Output 3.
Janet - Janet - Janette Joanne - Joanne - Jo-anne - Johanne Katy - Katy - Katie Ralph - Ralph - Ralphe |