Forum

This topic contains 6 replies, has 4 voices, and was last updated by  sivatejakumarreddy 2 years, 5 months ago.

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #1244 Reply

    svolmin
    Participant

    1.What is Record reader in Map-reduce?
    Ans: Record Reader reads each record & coverts to key, pair values which serves as a input to the mapper.

    #1248 Reply

    sivatejakumarreddy
    Participant

    1.What are “slaves” and “masters” in Hadoop?
    Ans:We Have 5 deamons In Hadoop
    HDFS Daeamons Are 1.Name Node
    2.Secondary Name Node
    3.Data Node
    Map Reduce Deamons Are 1.Job Tracker
    2.Task Tracker
    The Masters Deamons In Hadoop Are Name Node,Secondary Name Node and Job Tracker
    The Slaves Deamons In Hadoop Are Data Node and Task Tracker.

    2.How many datanodes can run on a single Hadoop cluster?
    Ans: In Hadoop Single Node Cluster ‘n’ Number Of Data Nodes Are Running.

    3.What is job tracker in Hadoop?
    Ans:The Job Tracker Is A Master Node In Hadoop And It Assigns Job To The Respective Task Tracker.

    4.What is “jps”?
    Ans:Jps Means Java Processor.

    5.What is the purpose of RecordReader in Hadoop?
    Ans: It Can Read One Line At A Time And It Breaks The Key Value Pair And It Just Writes The Output.

    #1251 Reply

    dmaitra1
    Participant

    1.What is Record reader in Map-reduce?

    Record reader translates the raw i/p into key value pair.

    2.What are “slaves” and “masters” in Hadoop?
    Ans: Data Node and Task Tracker daemon

    2.How many datanodes can run on a single Hadoop cluster?
    Ans: We can create multiple datanodes but single name noder per cluster

    3.What is job tracker in Hadoop?
    Ans:The Job Tracker Is A Master Node In Hadoop And It Assigns Job To The Respective Task Tracker.

    4.What is “jps”?
    Ans:Jps Means Java Processor.

    5.What is the purpose of RecordReader in Hadoop?
    Ans: Record reader translates the raw i/p into key value pair.

    #1253 Reply

    sivatejakumarreddy
    Participant

    1.What is a Commodity HardWare?

    Ans: It Means Cheap Hardware or Low Cost HardWare.

    2.What is a Meta Data?

    Ans: Meta Data Means Data About Data.

    3.What is Deamon?

    Ans: Deamon Is A BackGround Process

    4.What is The Typical Block Size of an Hdfs Block?

    Ans: The Typical Block Size Of An HDFS Block Is 64 MB.

    5.What is NOSQL?

    Ans:A NOSQL DataBase Provides a Mechanism For Storage And Retrieval Of Data That Is Different From Other Tabular Structure Data Base.
    It Is A Popular Name For A Structured Storage Software That Is Designed With The Intention Of Delivering Increased Optimization For High Performance Operates On Large Data Sheets. And It Is Easy To Use .

    #1254 Reply

    sivatejakumarreddy
    Participant

    1. What is HeartBeat In Hadoop?

    Ans: As of now, we know that once if the input file is loaded on to the Hadoop Cluster, the file is sliced into blocks, and these blocks are distributed among the cluster.

    Now Job Tracker and Task Tracker comes into picture. To process the data, Job Tracker assigns certain tasks to the Task Tracker. Let us think that, while the processing is going on one DataNode in the cluster is down. Now, NameNode should know that the certain DataNode is down , otherwise it cannot continue processing by using replicas. To make NameNode aware of the status(active / inactive) of DataNodes, each DataNode sends a “Heart Beat Signal” for every 10 minutes(Default). This mechanism is called as HEART BEAT MECHANISM.

    Based on this Heart Beat Signal Job Tracker assigns tasks to the Tasks Trackers which are active. If any task tracker is not able to send the signal in the span of 10 mins, Job Tracker treats it as inactive, and checks for the ideal one to assign the task. If there are no ideal Task Trackers, Job Tracker should wait until any Task Tracker becomes ideal.

    2.What Are The Side Data Distribution Techniques?

    Ans:–Side data refers to extra static small data required by map reduce to perform job. Main challenge is the availability of side data on the node where the map would be executed. Hadoop provides two side data distribution techniques.

    Using Job Configuration

    An arbitrary Key value pair can be set in job configuration. very useful technique in case of small file. Suggested size of file to keep in configuration object is in KBs.Because conf object would be read by job tracker, task tracker and new child jvm. this would increase overhead at every front. A part from this side data would require serialization if it has non-primitive encoding.

    Distributed Cache

    Rather than serializing side data in the job configuration, it is preferable to distribute datasets using Hadoop’s distributed cache mechanism. This provides a service for copying files and archives to the task nodes in time for the tasks to use them when they run. To save network bandwidth, files are normally copied to any particular node once per job.

    3.What Is Shuffling In Map Reduce?

    Ans: The process of moving map outputs to the reducers is know as shuffling. A different subset of the intermediate key space is assigned to each reduce node; these subsets (know as “partitions”) are the inputs to the reduce tasks. Each map task may emit (key , value) pairs to any partition; all values for the same key are always reduced together regardless of which mapper id its origin. Therefore, the map nodes must all agree on where to send the different pieces of the intermediate data.

    4.List Hadoop’s three configuration files?

    Ans: core-site.xml
    hdfs-site.xml
    mapred-site.xml
    hadoop-env.sh

    5.What Is a “Map” In Hadoop?

    Ans: Map Is Responsible To Read Data from Input Location And Based On The Input Type It will generate A Key/Value Pair That Is An Intermediate Output In Local Machine.

    6.What Is a “Reducer” In Hadoop?

    Ans: It Is Responsible To process The Intermediate Output Received From Mapper And Generates The Final Output.

    7.What Are The Parameters Of Mappers And Reducers?

    Ans: We Have 4 Basic Parameters Of Mapper
    1.Long Writable
    2.Text
    3.Int Writable
    4.Text

    We Have 4 Parameters Of Reducer

    1.Text
    2.Int Writable
    3.Text
    4.Int Writable

    8.How can we change the split size if our commodity hardware has less storage space?

    Ans: If our commodity hardware has less storage space, we can change the split size by writing the ‘custom splitter‘. There is a feature of customization in Hadoop which can be called from the main method.

    9.Can we rename the output file?

    Ans: Yes we can rename the output file by implementing multiple format output class.

    10.What is Streaming?

    Ans: Streaming is a feature with Hadoop framework that allows us to do programming using MapReduce in any programming language which can accept standard input and can produce standard output. It could be Perl, Python, Ruby and not necessarily be Java. However, customization in MapReduce can only be done using Java and not any other programming language.

    11.What is a Combiner?

    Ans: A ‘Combiner’ is a mini reducer that performs the local reduce task. It receives the input from the mapper on a particular node and sends the output to the reducer. Combiners help in enhancing the efficiency of MapReduce by reducing the quantum of data that is required to be sent to the reducers.

    12.What is the difference between an HDFS Block and Input Split?

    Ans: HDFS Block is the physical division of the data and Input Split is the logical division of the data.

    13.What happens in a textinputformat?

    Ans: In textinputformat, each line in the text file is a record. Key is the byte offset of the line and value is the content of the line. For instance, Key: longWritable, value: text.

    14.What do you know about keyvaluetextinputformat?

    Ans:In keyvaluetextinputformat, each line in the text file is a ‘record‘. The first separator character divides each line. Everything before the separator is the key and everything after the separator is the value. For instance, Key: text, value: text.

    15.What is MapReduce?

    Ans: It is a framework or a programming model that is used for processing large data sets over clusters of computers using distributed programming

    #1255 Reply

    mamthakulal
    Participant

    1. Can we deploye job tracker other than name node?
    Ans: Yes
    2. What are the four modules that make up the Apache Hadoop framework?
    Ans: 1.Name Node
    2. Data Node
    3. Job Tracker
    4. Task Tracker
    3. Where are Hadoop’s configuration files located?
    Ans: Conf folder
    4. List Hadoop’s three configuration files.
    Ans: core-site.xml, hdfs-site.xml, mapred-site.xml
    5. What are “slaves” and “masters” in Hadoop?
    Ans: Masters – Name Node and Job Tracker.
    Slaves – Data Node and Task Taker
    6. How many datanodes can run on a single Hadoop cluster?
    Ans: One
    7. What is job tracker in Hadoop?
    Ans: Job tacker assigns the task to task tracker
    8. How many job tracker processes can run on a single Hadoop cluster?
    Ans: one
    9. What sorts of actions does the job tracker process perform?
    Ans: Once Job Tacker receives the input program file from client, it interacts with Name node to find out which Data Nodes are having the data blocks. Once the Job tacker receives the metadata information of Data node, it assigns the task to a task tracker residing in the respective Data nodes. If task tracker is processing slow or down, it will assigns the same task to difeerent task tracker and makes sure that the task is performed.

    #1257 Reply

    sivatejakumarreddy
    Participant

    1.What do the master class and the output class do?

    Ans: Master is defined to update the Master or the job tracker and the output class is defined to write data onto the output location.

    2.What is the input type/format in MapReduce by default?

    Ans: By default the type input type in MapReduce is ‘text’.

    3.Is it mandatory to set input and output type/format in MapReduce?

    Ans: No, it is not mandatory to set the input and output type/format in MapReduce. By default, the cluster takes the input and the output type as ‘text’.

    4.What is Streaming?

    Ans: Streaming is a feature with Hadoop framework that allows us to do programming using MapReduce in any programming language which can accept standard input and can produce standard output. It could be Perl, Python, Ruby and not necessarily be Java. However, customization in MapReduce can only be done using Java and not any other programming language.

    5.What is a Combiner?

    Ans: A ‘Combiner’ is a mini reducer that performs the local reduce task. It receives the input from the mapper on a particular node and sends the output to the reducer. Combiners help in enhancing the efficiency of MapReduce by reducing the quantum of data that is required to be sent to the reducers.

    6.What is the difference between an HDFS Block and Input Split?

    Ans: HDFS Block is the physical division of the data and Input Split is the logical division of the data

    7.What does job conf class do?

    Ans: MapReduce needs to logically separate different jobs running on the same cluster. ‘Job conf class’ helps to do job level settings such as declaring a job in real environment. It is recommended that Job name should be descriptive and represent the type of job that is being executed.

    8.What does conf.setMapper Class do?

    Ans: Conf.setMapperclass sets the mapper class and all the stuff related to map job such as reading a data and generating a key-value pair out of the mapper.

    9.What do sorting and shuffling do?

    Ans: Sorting and shuffling are responsible for creating a unique key and a list of values.Making similar keys at one location is known as Sorting. And the process by which the intermediate output of the mapper is sorted and sent across to the reducers is known as Shuffling.

    10.What does a split do?

    Ans: Before transferring the data from hard disk location to map method, there is a phase or method called the ‘Split Method’. Split method pulls a block of data from HDFS to the framework. The Split class does not write anything, but reads data from the block and pass it to the mapper.Be default, Split is taken care by the framework. Split method is equal to the block size and is used to divide block into bunch of splits.

Viewing 7 posts - 1 through 7 (of 7 total)
Reply To: Basics & Programming & compiling MapReduce Programs
Your information:




cf22

Your Name (required)

Your Email (required)

Subject

Phone No

Your Message

Cart

  • No products in the cart.