1.What do the master class and the output class do?

Ans: Master is defined to update the Master or the job tracker and the output class is defined to write data onto the output location.

2.What is the input type/format in MapReduce by default?

Ans: By default the type input type in MapReduce is ‘text’.

3.Is it mandatory to set input and output type/format in MapReduce?

Ans: No, it is not mandatory to set the input and output type/format in MapReduce. By default, the cluster takes the input and the output type as ‘text’.

4.What is Streaming?

Ans: Streaming is a feature with Hadoop framework that allows us to do programming using MapReduce in any programming language which can accept standard input and can produce standard output. It could be Perl, Python, Ruby and not necessarily be Java. However, customization in MapReduce can only be done using Java and not any other programming language.

5.What is a Combiner?

Ans: A ‘Combiner’ is a mini reducer that performs the local reduce task. It receives the input from the mapper on a particular node and sends the output to the reducer. Combiners help in enhancing the efficiency of MapReduce by reducing the quantum of data that is required to be sent to the reducers.

6.What is the difference between an HDFS Block and Input Split?

Ans: HDFS Block is the physical division of the data and Input Split is the logical division of the data

7.What does job conf class do?

Ans: MapReduce needs to logically separate different jobs running on the same cluster. ‘Job conf class’ helps to do job level settings such as declaring a job in real environment. It is recommended that Job name should be descriptive and represent the type of job that is being executed.

8.What does conf.setMapper Class do?

Ans: Conf.setMapperclass sets the mapper class and all the stuff related to map job such as reading a data and generating a key-value pair out of the mapper.

9.What do sorting and shuffling do?

Ans: Sorting and shuffling are responsible for creating a unique key and a list of values.Making similar keys at one location is known as Sorting. And the process by which the intermediate output of the mapper is sorted and sent across to the reducers is known as Shuffling.

10.What does a split do?

Ans: Before transferring the data from hard disk location to map method, there is a phase or method called the ‘Split Method’. Split method pulls a block of data from HDFS to the framework. The Split class does not write anything, but reads data from the block and pass it to the mapper.Be default, Split is taken care by the framework. Split method is equal to the block size and is used to divide block into bunch of splits.