This topic contains 5 replies, has 6 voices, and was last updated by  suchitra mohanty 2 years ago.

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • #1871 Reply

    somu s
    Member

    1) What is Hadoop Map Reduce ?
    2. Explain what combiners is and when you should use a combiner in a MapReduce Job?
    3.What happens when a datanode fails ?
    4. Explain what is Speculative Execution?
    5. Explain what are the basic parameters of a Mapper?

    6. Explain what is the function of MapReducer partitioner?
    7. Explain what is difference between an Input Split and HDFS Block?
    8. Explain what happens in textinformat ?
    9. Mention what are the main configuration parameters that user need to specify to run Mapreduce Job ?
    10.Explain what does the conf.setMapper Class do ?

    #1897 Reply

    Paresh sahare
    Participant

    1. What is Hadoop Map Reduce ?
    ans :- In hadoop having two module
    1) HDFS
    2) Map Reduce :-Hadoop MapReduce for easily writing applications which process huge amounts of data in-parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner.

    2. Explain what combiners is and when you should use a combiner in a MapReduce Job?
    3.What happens when a datanode fails ?
    Ans :-Jobtracker and namenode find the failure On the failed node all tasks are restart
    Namenode replicates the all users data to another data node .

    4. Explain what is Speculative Execution?
    Ans :-
    Speculative Execution is a good case for turning off Speculative Execution for reduce tasks, since any duplicate reduce tasks have to fetch the same map outputs as the original task,and this can significantly increase network traffic on the cluster . Speculative execution is resolve the problem that one task run Continually long time for some reason, which leads to a long job execution time. When speculative execution is enabled, the Job Tracker in Hadoop 1 or the Application Master in Hadoop 2 will launch another instance of the same task, whoever finishes first will win and the other running will be terminated.

    5. Explain what are the basic parameters of a Mapper?
    Ans :- Mapper<LongWritable, Text,IntWritable, Text>

    6. Explain what is the function of MapReducer partitioner?
    Ans :-
    In the above example, we perform the partitioning of the intermediate data, such that the
    requests from the same geographic region will be sent to the same reducer instance.

    7. Explain what is difference between an Input Split and HDFS Block?
    Ans :- 1) Block is the physical representation of data.
    1) It is part of input processed by a single map.Each split is processed by a single map
    Split is the logical representation of data present in Block.

    For example if data is 128 MB and block size if 64 MB(default)
    Case 1 – Input split size [64 MB] = Block size [64 MB], # of Map task 2
    Case 2 – Input split size [32 MB] = Block size [64 MB], # of Map task 4
    Case 2 – Input split size [128 MB] = Block size [64 MB], # of Map task 1

    8. Explain what happens in textinputformat ?
    Ans :-
    textinputformat are key being simply the offset within the file it is common for each line in a
    file to be key value pair separate by a delimiter .

    9. Mention what are the main configuration parameters that user need to specify to run Mapreduce Job ?
    Ans :-
    The job’s input location(s) in the distributed file system.
    The job’s output location in the distributed file system.
    The input format.
    The output format.
    The class containing the map function.
    The class containing the reduce function but it is optional.
    The JAR file containing the mapper and reducer classes and driver classes.

    10.Explain what does the conf.setMapper Class do ?

    Ans :-
    Conf.setMapperclass inform the framework that the mapper class
    This class simply passes the input key value pairs directly to its
    output which in our case will be the shuffle .

    #1917 Reply

    faizan0607
    Participant

    Answers –
    1. Hadoop Map Reduce is one of the component of Hadoop Framework. Map Reduce is also called the heart of Hadoop which is responsible for processing of the job.

    2. When a MapReduce Job is run on a large dataset, Hadoop Mapper generates large chunks of intermediate data that is passed on to Hadoop Reducer for further processing, which leads to massive network congestion.The MapReduce framework offers a function known as ‘Combiner’ that can play a crucial role in reducing network congestion.

    3. When a data node fails the chunks are transfered to other data node which does not have the same copy of that chunk.

    4. If a particular drive is taking a long time to complete a task, Hadoop will create a duplicate task on another disk. Disks that finish the task first are retained and disks that do not finish first are killed. The disk will continue to be used for other tasks, as Hadoop does not have the ability to ‘fail’ a disk, it just keeps it from being used for a particular task.

    These ‘speculative executions’ are noted within the hdfs logs, so if a particularly large number of issues are seen with a particular disk (storage volume), that volume should be investigated from the storage perspective for possible issues.
    Reference – https://kb.netapp.com/support/index?page=content&id=3013312&impressions=false

    5. Basic parameters of Mapper are –
    <LongWritable,text,text,IntWritable>

    6. MapReducer Partioner –
    It partitions the key-value pairs of intermediate Map-outputs. It partitions the data using a user-defined condition, which works like a hash function. The total number of partitions is same as the number of Reducer tasks for the job.

    7. Input Split –
    Input Split is how Record Reader presents data to mapper. Mapper gets one input split at a time to put it other way. How it’s split depends upon Input Format. Default Input Format is FileInput Format which uses line Feed for Input Split i.e. each line is a separate input split.

    HDFS Blocks-
    Hadoop stores data in form of blocks. A block is replicated across the cluster as per the replication factor which is 3 by default. Default block size is 64MB.
    A file is divided into blocks when it’s moved into the cluster. A file can be divided into multiple blocks but one block can contain only one file.

    8. The Text Input Format works as An Input Format for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text.

    9. the main configuration parameters that user need to specify to run Map Reduce Job –
    The job’s input location(s) in the distributed file system.
    The job’s output location in the distributed file system.
    The input format.
    The output format.
    The class containing the map function.
    The class containing the reduce function but it is optional.
    The JAR file containing the mapper and reducer classes and driver classes.

    10. conf.setmapperclass passes the input key value pairs directly the output.

    #1923 Reply

    1>MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity hardware in a reliable manner.It is a processing technique and a program model for distributed computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce

    2>Combiners are used to increase the efficiency of a MapReduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help you reduce the amount of data that needs to be transferred across to the reducers

    3>When a datanode fails Jobtracker and namenode detect the failure On the failed node all tasks are re-scheduled Namenode replicates the assign data to another datanode

    4>When the tasks running slow Hadoop doesn’t try to diagnose and fix slow-running tasks; instead, it tries to detect when a task is running slower than expected and launches another, equivalent, task as a backup. This is termed speculative execution of tasks.

    5>The four basic parameters of a mapper are LongWritable, text, text and IntWritable. The first two represent input parameters and the second two represent intermediate output parameters.

    6>A MapReduce partitioner makes sure that all the value of a single key goes to the same reducer, thus allows evenly distribution of the map output over the reducers. It redirects the mapper output to the reducer by determining which reducer is responsible for a particular key.

    7>HDFS Block is the physical division of the data and Input Split is the logical division of the data.

    8>In textinputformat, each line in the text file is a record. Key is the byte offset of the line and value is the content of the line. For instance, Key: longWritable, value: text.

    9>

    10>Conf.setMapper class sets the mapper class and all the stuff related to map job such as reading a data and generating a key-value pair out of the mapper.

    #1934 Reply

    chinni
    Participant

    1a)mapreduce is the heart of hadoop.the mapreduce is divided into two parts 1.map
    2.reduce
    1.map:map take the data and it will converts into small set of data,each set of data is a key / values pairs(tuples).
    2.reduce:reduce task ,which takes the o/p from a map as i/p and combines those tuples into a smaller set of tuples.

    2a)combiner:combiners is also called the semi-reducer,the combiner is optinal class that operates by accepting i/p from the mapper and after passing o/p key value pairs to the reducer class.
    purpose:In mapreduce usually the o/p from map task is large and transfer between map and reduce task will be high.the data transfer across the network is expensive and limit the value of data transfer between map and reduce task.combiner summarise the o/p records with the same key and o/p of combiner will sent over between to reduce task as i/p.

    3a)This is a well known and recognized single point of failure in Hadoop.when datanode failed at that time namenode allote the new datanode and it will do the failure of datanode work.

    4a)when the tasks running slow hadoop doesn’t try to diagnose and fix slow running tasks,instead ,it tries to detect when a task is running slower than expected and launches another,equivalent,task as a backup.this is termed as speculative execution of tasks.

    5a)The basic parameters of a Mapper are
    Long-writable and Text
    Text and Int-writable

    6a)A partitioner works like a condition in processing an input dataset. The partition phase takes place after the Map phase and before the Reduce phase.The number of partitioners is equal to the number of reducers. That means a partitioner will divide the data according to the number of reducers.the partitions the data using a user-defined condition, which works like a hash function.

    7a)Block:In HDFS architecture there is a concept of blocks. A typical block size used by HDFS is 64 MB. When we place a large file into HDFS it chopped up into 64 MB chunks(based on default configuration of blocks), Suppose you have a file of 1GB and you want to place that file in HDFS,then there will be 1GB/64MB = 16 split/blocks

    split:Data splitting happens based on file offsets.The goal of splitting of file and store it into different blocks is parallel processing. If you have not defined any input split size in Map/Reduce program then default HDFS block split will be considered as input split.

    8a)In textinputformat, each line in the text file is a record.
    Value is the content of the line while Key is the byte offset of the line.
    For instance, Key:longWritable, Value: text

    #1976 Reply

    suchitra mohanty
    Participant

    1)Map reduce is heart of hadoop.It is a framework or a programming model that is used for processing and generating the large data sets with a parallel,distributed algorithm on a cluster.
    The MapReduce algorithm contains two important tasks a> Map and b> Reduce.
    a) Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).
    b)reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples.

    2)Combiner also known as a semi-reducer.The Combiner class is used in between the Map class and the Reduce class to reduce the volume of data transfer between Map and Reduce. The output of the map task is large and the data transferred to the reduce task is high.

    3)When Datanode failed,Namenode Detects the problem and allocate a new Data node which does n’t have the same copy and which is in active state.

    5)The four basic parameters of a mapper are LongWritable, text, text and IntWritable.

    6)A MapReduce partitioner makes all the value of a single key goes to the same reducer,thus allows evenly distribution of the map output over the reducers. It redirects the mapperoutput to the reducer by determining which reducer is responsible for a particular key.

    8)In TextInputFormat, each line in the text file is a record. Key is the byte offset of the line and value is the content of the line.

Viewing 6 posts - 1 through 6 (of 6 total)
Reply To: MAPREDUCE Q & A
Your information:




cf22

Your Name (required)

Your Email (required)

Subject

Phone No

Your Message

Cart

  • No products in the cart.