- March 26, 2017 at 4:32 pm #3085
Mapper Job : The Mapper class performs the Map task of the MR job. This is the first stage of the MR program. It receives the input splits in the form of key/value pairs (k1,v1). Therefore, it transforms the input and produces the output in the form of key/value pairs List[k2, v2]. It works on the keys and breaks the records further into tokens based on the delimiter. This mapper’s output (also called as intermediate result and stored in the temp directory of LFS by default) is being transformed (k3,List[v3]) by shuffle and sorting mechanism and produced as Reducer input.
==> To define the mapper class, we need to import org.apache.hadoop.mapper package in Hadoop 1.x and import org.apache.hadoop.mapreduce.Mapper package in Hadoop 2.x.
Reducer Job : The Reducer class performs the Reduce task of the MR job. The reducer object takes input in the form of key/value pairs as generated after mapping, shuffle and sorting operations. Then the reducer works on the sorted values of the key/value pairs to produce the final result to serve the job client. Once the reducer task is being completed successfully, the MR job is declared as completed.
==> To define the reducer class, we need to import org.apache.hadoop.Reducer package in Hadoop 1.x and import org.apache.hadoop.mapreduce.Reducer package in Hadoop 2.x.