ENTERPRISE

Parallel Programming with Microsoft Visual Studio 2010 : Using the MapReduce Pattern (part 1)

11/21/2013 7:42:48 PM

MapReduce is a well-known pattern introduced in 2004 in a paper titled “MapReduce: Simplified Data Processing on Large Clusters” by Jeffrey Dean and Sanjay Ghemawat. The link for the document is http://labs.google.com/papers/mapreduce-osdi04.pdf. The MapReduce pattern is designed to handle the reduction of vast amounts of data separated across multiple computers. However, the pattern is applicable even on a much smaller scale, such as a modern multicore computer. The MapReduce pattern is a complex application of data parallelism, dependencies, and reduction.

There are three collections in the MapReduce pattern. The first collection is the input for the MapReduce pattern. It is a collection of key and value pairs. You perform some transformation on the input collection to create a second, intermediate collection, which is a non-unique collection of key and value pairs. The third collection is a reduction of the non-unique keys from the intermediate collection.

If the word apple appeared in three of the files, there would be identical entries for apple in the intermediate list. The word “apple” would be the key, and the value for each key would be the number of times that word (the key) appears in each file. For the reduction, you want to reduce non-unique keys to totals. The following diagram illustrates this example.

Using the MapReduce Pattern

Note

The MapReduce class resides in the ParallelBook namespace. When you create a MapReduce object, you initialize it with a source collection. The MapReduceMapReduce.Map, is responsible for transforming the source collection to an intermediate collection. The first parameter is the mapping function, which performs the transformation. The last parameter is an out parameter, which is the intermediate collection. The second method is the MapReduce.Reduce method, which accepts and reduces the intermediate collection—its first parameter. The next parameter is the reduction operation. The Map and Reduce methods are exposed separately in the interface to allow multiple reductions of an intermediate collection. The last parameter is the group operation, which groups the keys of the intermediate collection. This is important because the intermediate collection is reduced along group boundaries. The default is to reduce by matching keys. class has only two methods. The first,

Here is the MapReduce.Map prototype.

public void Map<KEY2, VALUE2>(Func<Tuple<KEY, VALUE>,
IEnumerable<Tuple<KEY2, VALUE2>>> mapFunc,
out IEnumerable<Tuple<KEY2, VALUE2>> TupleCollection
)

And here’s the MapReduce.Reduce prototype.

public IEnumerable<Tuple<KEY2, VALUE2>> Reduce<KEY2, VALUE2>(
IEnumerable<Tuple<KEY2, VALUE2>> intermediate,
Func<KEY2, VALUE2[], VALUE2> reduceFunc,
Func<IEnumerable<Tuple<KEY2, VALUE2>>, Dictionary<KEY2, VALUE2[]>>
groupFunc= null
)

This next exercise involves using the MapReduce class. You will create a collection of key and value pairs. The keys are string values, and the values are integers. The intermediate collection simply squares the values of the input collection. The reduction will then reduce keys by summation.

Create a MapReduce class to square the values of a source collection, and then reduce the collection by summing the keys

  1. Create a console project in Visual Studio 2010 for C#. Add a using statement for the System.Threading.Tasks namespace to the source code. Add a reference to the project for MapReduce.dll.

  2. In the Main method, define and initialize an array of binary tuples for string and integer pairs.

    Tuple<string, int>[] tuples = new Tuple<string, int>[] {
    new Tuple<string, int>("a", 3),
    new Tuple<string, int>("b", 2),
    new Tuple<string, int>("b", 5)
    };
  3. Create an instance of the MapReduce class. In the constructor, initialize the object with the tuples array.

    MapReduce<string, int> letters = new MapReduce<string, int>(tuples);
  4. Now you will transform the source collection. First, define a collection of tuples to hold the intermediate results. Your mapping operation simply squares the value of each tuple and places the results in an out variable.

    IEnumerable<Tuple<string, int>> newmap;
    letters.Map<string, int>((input) =>
    {
    return new Tuple<string, int>[] { new Tuple<string, int>(input.Item1,
    input.Item2 * input.Item2) };
    }, out newmap);
  5. Reduce the collection with the MapReduce.Reduce method. Provide the intermediate collection as the input. In the reduction method, sum the totals of each group.

    IEnumerable<Tuple<string, int>> reduction =
    letters.Reduce<string, int>(newmap, (key, values) =>
    {
    int total = 0;
    foreach (var item in values)
    {
    total += item;
    }
    return total;
    });
  6. Display the results, which are returned from the MapReduce.Reduce method. The answer should be a=9 and b=29.

    foreach (var item in reduction)
    {
    Console.WriteLine("{0} = {1}", item.Item1, item.Item2);
    }

Here is the entire program.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Letters
{
class Program
{
static void Main(string[] args)
{
Tuple<string, int>[] tuples = new Tuple<string, int>[] {
new Tuple<string, int>("a", 3),
new Tuple<string, int>("b", 2),
new Tuple<string, int>("b", 5) };

MapReduce<string, int> letters = new MapReduce<string, int>(tuples);
IEnumerable<Tuple<string, int>> newmap;

letters.Map<string, int>((input) =>
{
return new Tuple<string, int>[] { new Tuple<string,
int>(input.Item1, input.Item2 * input.Item2) };
}, out newmap);

IEnumerable<Tuple<string, int>> reduction = letters.Reduce<string,
int>(newmap, (key, values) =>
{
int total = 0;
foreach (var item in values)
{
total += item;
}
return total;
});

foreach (var item in reduction)
{
Console.WriteLine("{0} = {1}", item.Item1, item.Item2);
}

Console.WriteLine("Press enter to <end>.");
Console.ReadLine();
}
}
}
Other  
  •  Parallel Programming with Microsoft Visual Studio 2010 : Data Parallelism - Reduction
  •  NET Debugging : Visual Studio (part 3) - Visual Studio 2010
  •  NET Debugging : Visual Studio (part 2) - .NET Framework Source-Level Debugging
  •  NET Debugging : Visual Studio (part 1) - SOS Integration
  •  System Center Configuration Manager 2007 : Creating Packages (part 3) - About Packages, Programs, Collections, Distribution Points, and Advertisements
  •  System Center Configuration Manager 2007 : Creating Packages (part 2) - Comparing GPO-based Software Distribution to ConfigMgr Software Distribution
  •  System Center Configuration Manager 2007 : Creating Packages (part 1)
  •  Microsoft Dynamic AX 2009 : Configuration and Security - Security Framework (part 3) - Security Coding
  •  Microsoft Dynamic AX 2009 : Configuration and Security - Security Framework (part 2) - Applying Security
  •  Microsoft Dynamic AX 2009 : Configuration and Security - Security Framework (part 1)
  •  
    Video
    Video tutorials
    - How To Install Windows 8

    - How To Install Windows Server 2012

    - How To Install Windows Server 2012 On VirtualBox

    - How To Disable Windows 8 Metro UI

    - How To Install Windows Store Apps From Windows 8 Classic Desktop

    - How To Disable Windows Update in Windows 8

    - How To Disable Windows 8 Metro UI

    - How To Add Widgets To Windows 8 Lock Screen

    - How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010
    programming4us programming4us
    Top 10
    Free Mobile And Desktop Apps For Accessing Restricted Websites
    MASERATI QUATTROPORTE; DIESEL : Lure of Italian limos
    TOYOTA CAMRY 2; 2.5 : Camry now more comely
    KIA SORENTO 2.2CRDi : Fuel-sipping slugger
    How To Setup, Password Protect & Encrypt Wireless Internet Connection
    Emulate And Run iPad Apps On Windows, Mac OS X & Linux With iPadian
    Backup & Restore Game Progress From Any Game With SaveGameProgress
    Generate A Facebook Timeline Cover Using A Free App
    New App for Women ‘Remix’ Offers Fashion Advice & Style Tips
    SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
    Popular Tags
    Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Exchange Server Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe Photoshop CorelDRAW X5 CorelDraw 10 windows Phone 7 windows Phone 8 Iphone