How many types of sorting?
There are three types of sorting in MR job.
A. Partial Sorting
B. Total Sorting
C. Secondary Sorting
Please explain secondary sorting?
Answer ==> This is the sorting on the mapper’s output (list[k2,v2]) to optimize the reducer phase. The sorting is based on the composite key where values are sorted on ascending or descending order associated with key. Therefore, the sorted list of key/value (list[k3,v3]) is passed to the reducer as input.
Composite Key = (key + value)
This is the another solution of the problem of finding the max or min values against each keys by avoiding the iteration on every key to find the desired value in reduce phase.
For example, we want to find the maximum temperature for the each of the years from the input files containing various recorded temperatures for several years. In the reduce phase, only pick the first value from every keys as the value (Temperature) is already sorted on descending order.
Hence, complexity of the reducer job is now n*1 or n instead of n*m, where n is the number of keys, m is the number of values.