ENTERPRISE

LINQ to Objects : How to Group Elements (part 4) - Specifying Your Own Key Comparison Function

12/30/2013 8:46:01 PM

4. Specifying Your Own Key Comparison Function

The default behavior of grouping is to equate key equality using the normal equals comparison for the type being tested. This may not always suit your needs, and it is possible to override this behavior and specify your own grouping function. To implement a custom comparer, you build a class that implements the interface IEqualityComparer<T> and pass this class as an argument into the GroupBy extension method.

The IEqualityComparer<T> interface definition has the following definition:

public interface IEqualityComparer<T>
{
public bool Equals(T x, T y)
public int GetHashCode(T obj)
}

The Equals method is used to indicate that one instance of an object has the same equality to another instance of an object. Overriding this method allows specific logic to determine object equality based on the data within those objects, rather than instances being the same object instance. (That is, even though two objects were constructed at different times and are two different objects as far as the compiler is concerned, we want them to be deemed equal based on their specific combination of internal values or algorithm.)

The GetHashCode method is intended for use in hashing algorithms and data structures such as a hash table. The implementation of this method returns an integer value based on at least one value contained in the instance of an object. The resulting hash code can be used by other data structures as a unique value (but not guaranteed unique). It is intended as a quick way of segmenting instances in a collection or as a short-cut check for value equality (although a further check needs to be carried out for you to be certain of object value equality). The algorithm that computes the hash code should return the same integer value when two instances of an object have the same values (in the data fields being checked), and it should be fast, given that this method will be called many times during the grouping evaluation process.

The following example demonstrates one possible use of a custom equality comparer function in implementing a simple Soundex comparison routine (see http://en.wikipedia.org/wiki/Soundex for a full definition of the Soundex algorithm) that will group phonetically similar names. The code for the SoundexEqualityComparer is shown in Listing 5. Soundex is an age-old algorithm that computes a reliable four character string value based on the phonetics of a given word; an example would be “Katie” is phonetically identical to “Katy.”

The approach for building the Soundex equality operator is to

  1. Code the Soundex algorithm to return a four-character string result representing phonetic sounding given a string input. The form of the Soundex code is a single character, followed by three numerical digits, for example A123 or V456.

  2. Implement the GetHashCode for a given string. This will call the Soundex method and then convert the Soundex code to an integer value. It builds the integer by using the ASCII value of the character, multiplying it by 1000 and then adding the three digit suffix of the Soundex code to this number, for example A123 would become, (65 × 1000) + 123 = 65123.

  3. Implement the Equals method by calling GetHashCode on both input arguments x and y and then comparing the return integer results. Return true if the hash codes match (GetHashCode can be used in this implementation of overloading the Equals operator because it is known that the Soundex algorithm implementation returns a unique value—this is not the case with other GetHashCode implementations).

Care must be taken for null values and empty strings in deciding what behavior you want. I decided that I wanted null or empty string entries to be in one group, but this null handling logic should be considered for each specific implementation. (Maybe an empty string should be in a different group than null entries; it really depends on the specific situation.)

Listing 5. The custom SoundexEqualityComparer allows phonetically similar sounding strings to be easily grouped
public class SoundexEqualityComparer
: IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return GetHashCode(x) == GetHashCode(y);
}

public int GetHashCode(string obj)
{
// E.g. convert soundex code A123,
// to an integer: 65123
int result = 0;

string s = soundex(obj);
if (string.IsNullOrEmpty(s) == false)
result = Convert.ToInt32(s[0]) * 1000 +
Convert.ToInt32(s.Substring(1, 3));

return result;
}

private string soundex(string s)
{
// Algorithm as listed on
// http://en.wikipedia.org/wiki/Soundex.
// builds a string code in the format:
// [A-Z][0-6][0-6][0-6]
// based on the phonetic sound of the input.

if (String.IsNullOrEmpty(s))
return null;

StringBuilder result =
new StringBuilder(s.Length);

// As long as there is at least one character,
// then we can proceed
string source = s.ToUpper().Replace(" ", "");

// add the first character, then loop the
// string mapping as we go
result.Append(source[0]);
char previous = '0';

for (int i = 1; i < source.Length; i++)
{
// map to the soundex numeral
char mappedTo = '0';
char thisChar = source[i];

if ("BFPV".Contains(thisChar))
mappedTo = '1';
else if ("CGJKQSXZ".Contains(thisChar))
mappedTo = '2';
else if ("DT".Contains(thisChar))
mappedTo = '3';
else if ('L' == thisChar)
mappedTo = '4';
else if ("MN".Contains(thisChar))
mappedTo = '5';
else if ('R' == thisChar)
mappedTo = '6';

// ignore adjacent duplicates and
// non-matched characters
if (mappedTo != previous && mappedTo != '0')
{
result.Append(mappedTo);
previous = mappedTo;
}
}

while (result.Length < 4) result.Append("0");

return result.ToString(0, 4);
}
}


 

Listing 6 demonstrates using a custom equality comparer (in this case SoundexEqualityComparer). The Console output is shown in Output 3.

Listing 6. Calling code of the new SoundexEqualityComparer used for grouping phonetically similar names—see Output 3
string[] names = new string[] { "Janet", "Janette", "Joanne",
"Jo-anne", "Johanne", "Katy", "Katie", "Ralph", "Ralphe" };

var q = names.GroupBy(s => s,
new SoundexEqualityComparer());

foreach (var group in q)
{
Console.WriteLine(group.Key);
foreach (var name in group)
Console.WriteLine(" - {0}", name);
}

Output 3.
Janet
- Janet
- Janette
Joanne
- Joanne
- Jo-anne
- Johanne
Katy
- Katy
- Katie
Ralph
- Ralph
- Ralphe
Other  
  •  Moving into SAP Functional Development : Gaining Control of Change Control - Change Management Best Practices and Approaches (part 4)
  •  Moving into SAP Functional Development : Gaining Control of Change Control - Change Management Best Practices and Approaches (part 3)
  •  Moving into SAP Functional Development : Gaining Control of Change Control - Change Management Best Practices and Approaches (part 2)
  •  Moving into SAP Functional Development : Gaining Control of Change Control - Change Management Best Practices and Approaches (part 1)
  •  Moving into SAP Functional Development : Gaining Control of Change Control - An Overview of Change Management
  •  Performing mySAP.com Component Installations : Addressing General mySAP Post-Installation Tasks
  •  Programming .NET Components : Working with Threads (part 5) - Thread States, Thread Priority and Scheduling
  •  Programming .NET Components : Working with Threads (part 4) - Aborting a Thread
  •  Programming .NET Components : Working with Threads (part 3) - Blocking Threads
  •  Programming .NET Components : Working with Threads (part 2) - Creating Threads
  •  
    Most View
    Microsoft SharePoint 2010 Web Applications : Presentation Layer Overview - Ribbon (part 1)
    The Cyber-athletic Revolution – E-sports’ Era (Part 1)
    Windows Server 2003 : Implementing Software Restriction Policies (part 4) - Implementing Software Restriction Policies - Creating a Path Rule, Designating File Types
    Sql Server 2012 : Hierarchical Data and the Relational Database - Populating the Hierarchy (part 1)
    Two Is Better Than One - WD My Cloud Mirror
    Programming ASP.NET 3.5 : Data Source-Based Data Binding (part 3) - List Controls
    Windows 8 : Configuring networking (part 5) - Managing network settings - Understanding the dual TCP/IP stack in Windows 8, Configuring name resolution
    Nikon Coolpix A – An Appealing Camera For Sharp Images (Part 2)
    Canon PowerShot SX240 HS - A Powerful Perfection
    LG Intuition Review - Skirts The Line Between Smartphone And Tablet (Part 2)
    Popular Tags
    Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Biztalk Exchange Server Microsoft LynC Server Microsoft Dynamic Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Indesign Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe After Effects Adobe Photoshop Adobe Fireworks Adobe Flash Catalyst Corel Painter X CorelDRAW X5 CorelDraw 10 QuarkXPress 8 windows Phone 7 windows Phone 8 BlackBerry Android Ipad Iphone iOS
    Top 10
    Review : Acer Aspire R13
    Review : Microsoft Lumia 535
    Review : Olympus OM-D E-M5 Mark II
    TomTom Runner + MultiSport Cardio
    Timex Ironman Run Trainer 2.0
    Suunto Ambit3 Peak Sapphire HR
    Polar M400
    Garmin Forerunner 920XT
    Sharepoint 2013 : Content Model and Managed Metadata - Publishing, Un-publishing, and Republishing
    Sharepoint 2013 : Content Model and Managed Metadata - Content Type Hubs