ENTERPRISE

Parallel Programming with Microsoft Visual Studio 2010 : Using the MapReduce Pattern (part 2)

11/21/2013 7:43:25 PM

A Word Count Example

There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy.

– William Shakespeare

Shakespeare is timeless. He also tends to use many of the same words in his various works. This makes Shakespeare ideal for a word count example. In addition, this section will provide a more complete demonstration of using the MapReduce class.

This example uses four common Shakespearean sonnets. Fortunately, you can find these sonnets in many places online. The goal is to count the instances of every word across the four sonnets. Small words, such as a, be, we, and so on, would clutter the results. For that reason, exclude small words from the list. Fortunately, there is a function for this purpose. An overload of the MapReduce.Map method has a Filter parameter, which is a function delegate. The Filter method accepts a key-value pair. If the method returns true, the entry is added to the intermediate collection. If it returns false, the item is omitted.

The source collection is comprised of the name and location of four sonnets, used to initialize an instance of a MapReduce class.

Tuple<string, string>[] sonnets = new Tuple<string, string>[] {
new Tuple<string, string>("Sonnet 1.txt",@"C:\shakespeare"),
new Tuple<string, string>("Sonnet 2.txt",@"C:\shakespeare"),
new Tuple<string, string>("Sonnet 3.txt",@"C:\shakespeare"),
new Tuple<string, string>("Sonnet 4.txt",@"C:\shakespeare") };
MapReduce<string, string> wordCount = new MapReduce<string, string>(sonnets);

The MapReduce.Map method will map the file names to a word count.

  1. Read the text from the sonnets.

  2. Define word delimiters.

  3. Create a Dictionary object. For each word, check whether the word is in the dictionary. If not, add the word to the dictionary and set the count to 1. Otherwise, when the word already exists in the dictionary, increment the count of the existing word in the dictionary. When the process completes, return the values portion of the dictionary object as the intermediate collection. The intermediate collection will have the individual count per word for each file.

Here is the code for the word count example.

IEnumerable<Tuple<string, int>> wordCollection;
wordCount.Map<string, int>((input) =>
{
StreamReader sw = new StreamReader(input.Item2 + @"\" + input.Item1);
string data = sw.ReadToEnd();
string[] words = data.Split(new[] {' ','.',',',';',':','=','+', '-', '*', ')',
'(',
'!', '#', '$', '\n', '\r'});
Dictionary<string, Tuple<string, int>> rawCount =
new Dictionary<string Tuple<string, int>>();
foreach (var word in words)
{
Tuple<string, int> value;
if (rawCount.TryGetValue(word, out value))
{
int increment = rawCount[word].Item2 + 1;
rawCount[word] = new Tuple<string, int>(word, increment);
}
else
{
rawCount.Add(word, new Tuple<string, int>(word, 1));
}
}
return rawCount.Values;
},

After the mapping function, you have the Filter function. For brevity, words less than three characters in length are excluded from the final intermediate collection.

(key, value) =>
{
if (key.Length < 3)
{
return false;
}
else
{
return true;
}
},
out wordCollection);

The MapReduce.Reduce method is simple. The reduction method reduces the key groupings to totals that represent the aggregate total count of each word in the four files.

IEnumerable<Tuple<string, int>> reduction = wordCount.Reduce(
wordCollection,
(key, values) =>
{
return values.Sum();
}
);

Lastly, you can the show the results.

foreach (var item in reduction)
{
Console.WriteLine("{0} {1}", item.Item1, item.Item2);
}

Here is the partial output from the Word Count example.

A Word Count Example
Other  
  •  Parallel Programming with Microsoft Visual Studio 2010 : Data Parallelism - Reduction
  •  NET Debugging : Visual Studio (part 3) - Visual Studio 2010
  •  NET Debugging : Visual Studio (part 2) - .NET Framework Source-Level Debugging
  •  NET Debugging : Visual Studio (part 1) - SOS Integration
  •  System Center Configuration Manager 2007 : Creating Packages (part 3) - About Packages, Programs, Collections, Distribution Points, and Advertisements
  •  System Center Configuration Manager 2007 : Creating Packages (part 2) - Comparing GPO-based Software Distribution to ConfigMgr Software Distribution
  •  System Center Configuration Manager 2007 : Creating Packages (part 1)
  •  Microsoft Dynamic AX 2009 : Configuration and Security - Security Framework (part 3) - Security Coding
  •  Microsoft Dynamic AX 2009 : Configuration and Security - Security Framework (part 2) - Applying Security
  •  Microsoft Dynamic AX 2009 : Configuration and Security - Security Framework (part 1)
  •  
    Video
    Video tutorials
    - How To Install Windows 8

    - How To Install Windows Server 2012

    - How To Install Windows Server 2012 On VirtualBox

    - How To Disable Windows 8 Metro UI

    - How To Install Windows Store Apps From Windows 8 Classic Desktop

    - How To Disable Windows Update in Windows 8

    - How To Disable Windows 8 Metro UI

    - How To Add Widgets To Windows 8 Lock Screen

    - How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010
    programming4us programming4us
    Top 10
    Free Mobile And Desktop Apps For Accessing Restricted Websites
    MASERATI QUATTROPORTE; DIESEL : Lure of Italian limos
    TOYOTA CAMRY 2; 2.5 : Camry now more comely
    KIA SORENTO 2.2CRDi : Fuel-sipping slugger
    How To Setup, Password Protect & Encrypt Wireless Internet Connection
    Emulate And Run iPad Apps On Windows, Mac OS X & Linux With iPadian
    Backup & Restore Game Progress From Any Game With SaveGameProgress
    Generate A Facebook Timeline Cover Using A Free App
    New App for Women ‘Remix’ Offers Fashion Advice & Style Tips
    SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
    Popular Tags
    Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Exchange Server Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe Photoshop CorelDRAW X5 CorelDraw 10 windows Phone 7 windows Phone 8 Iphone