1.what is the difference between mapper and map task?
answer:Mapper is one of the class used in mapreduce program.Mapper job is to read the data based on the input location and helps in the generation of key value and to set a intermeediate key.
Map task is the entity which performs operation of the map by setting the value in the integer format.
2.Difference between old API nad new API?
Answer:. OLD API-
a. In OLD API it uses Mapper & Reducer as Interface (still exist in New API as well)
b. old API can still be found in org.apache.hadoop.mapred. package
c. Controlling mappers by writing a MapRunnable, but no
equivalent exists for reducers.
a. New API uses Mapper and Reducer as Class
So we can add a method (with a default implementation) to an
abstract class without breaking old implementations of the class.
b. new API is in the org.apache.hadoop.mapreduce package.
c. new API allows both mappers and reducers to control the execution
flow by overriding the run() method.
3.what is shuffling and sorting?
Answers:Shuffling is the process of transfering data from the mappers to the reducers, so it is obvious that it is necessary for the reducers otherwise, they wouldn’t be able to have any input
Sorting saves time for the reducer, helping it easily distinguish when a new reduce task should start. It simply starts a new reduce task, when the next key in the sorted input data is different than the previous.
4.Anatomy of the mapreduce?
a.job has to be submitted to hadoop cluster.
b.job is intialised in hadoop.
c.initialised job is now split into tasks.
d.job tracker assigns task to task tracker.
e.tasks are executed in distributed environment,tracking the progress and status of job is tracked.
f.the execution process continues till all tasks are completed.