Map Reduce
This article is about Map Reduce invented by Google to process big data in distributed computing manner.
MapReduce is a processing technique and a program model for distributed computing. The MapReduce algorithm contains two important tasks, namely Map and Reduce. The map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs) and the reduce method, which performs a summary operation.
The MapReduce is used to find the word count in an essay. The MapReduce is used for processing big data in a distributed computing manner.
The MapReduce contains two functions Map and Reduce. The Map is a function which processes records sequentially and independently and the reduce function processes the values of records together.
From the above diagram, the input is “Welcome to Hadoop Class Hadoop is good Hadoop is bad”. The input will split into multiple map tasks. So the whole data set can be processed parallelly and quickly.
After splitting into multiple map tasks the map function converts the form of the dataset into key, value pair.
Here the records(words) will be taken as keys and the occurrence of records will be taken as values. As we know that Map function works sequentially and independently each key will independently have a value.
After the completion of the map function, the reduce function will process and merges all intermediate values associated per-key basis.
The internal working of reduce function is the reduce function assigns each key to one reduce task and parallelly processes and merges all intermediate values by partitioning keys.
The partitioning method that we used here will be the hash partitioning.
Hash Partitioning: the key is assigned to reduce # = hash(key) % # reduce tasks. By using this hash partitioning the key will be assigned to reduce task.
The hash function that we used here will be either MD-5 or Random 1 to 1 hash function.
At last, the reduce function will summarize the occurrence of values and the final output will be printed.
Github Link: https://github.com/tharun143/Map-Reduce