Forum

This topic contains 0 replies, has 1 voice, and was last updated by  faizan0607 2 years, 1 month ago.

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #1928 Reply

    faizan0607
    Participant

    1. how hadoop transaction is different form orcle transaction and also how read and write anotomy of hdfs is different from oracle databases ?

    2. how to transfer data from one datanode to another datanode ?

    3. how nodes communicate each other ?

    4.what is logical split ?

    5.if datanode goes down how the blocks of that data node shift to another active datanode ?

    6.what are the major componets of algorithm used by namenode to allocate location of block on different datanode ?

    7.how to scale cluster configuration for particular data ?

    8.how to design hadoop cluster to store 200TB of data (resource,configuration&no of nodes.) ?

    9. write a algorithm in which namenode decide to get the location of datanode to store block of file ?

    10. what is job of outputcollector ?

    11. in which we use split size greater than block size ?

    12write down the difference mapr and maptask ?

    13) write job of combiner ?

    14) write a mapreduce program , to introduce combiner class ?

    15) write a alogorithm for partioner class – name – testscore – mode (weekend and weekdays) ?

    1. which student scored hightest weekend and weekdays ?

    2. which student fresher or experience from weekdays and weekend scored highest ?
    ————————————————————————————————————————–

    Answers –

    1. Difference between Hadoop Transaction & Oracle Transaction-
    Hadoop Oracle
    a. Key-Value Pair Record
    b. MapReduce (Functional Style) SQL (Declarative)
    c. De-normalized Normalized
    d. All varieties of Data Structured Data
    e. OLAP/Batch/Analytical Queries OLTP/Real time/ Point Queries

    Difference between HDFS read & write anatomy and oracle database –
    In HDFS firstly the client sends request to the NameNode for read/write operation, then the NameNode checks in the metadata for availability of slots in DataNode for write operation while for read operation the NameNode gives the information of DataNode to the client by checking in the metadata.

    While in Oracle database there is a client/server architecture where the client is a database application that initiates a request for an operation to be performed on the database server. It requests, processes, and presents data managed by the server. And The server receives and processes the SQL and PL/SQL statements that originate from client applications.
    —————————————————————————————————————————-

    2. Data is transfered from one DataNode to another DataNode in case of cluster rebalancing, whenever a free space on a
    DataNode falls below a certain threshold. Also when a DataNode fails then the blocks in it is transfered to another DataNode where the same copy of block is not already present.
    —————————————————————————————————————————–

    3. The DataNode communicate with NameNode by sending heartbeat signals to the NameNode every 3 sec. Every 10th heartbeat the
    DataNode sends a block report to the NameNode by which the NameNode updates its metadata. While the Namenode give the location
    of the DataNode to the client to write the data, then the client writes the data in one DataNode directly. From here the DataNode
    sends the data to another DataNode and so on, this is how the DataNodes communicate to other DataNode.
    ——————————————————————————————————————————

    4. In Logical Split the data is divided logically, it is actually logical representation of data stored in the blocks.
    It represents the data that is to be processed by an individual mapper, in this method the input is in the form of
    byte-oriented form and the RecordReader processes it to give record-oriented form.
    ——————————————————————————————————————————

    5. If the DataNode goes down then it does not sends the heartbeat signal to the NameNode hence the NameNode understands that the
    DataNode is not working. So the NameNode checks the information (like the blocks, size, location etc) of the DataNode from the
    metadata and orders the blocks to go into another DataNode where the same copy of blocks is not present.
    ——————————————————————————————————————————

    6. The major componets of algorithm used by namenode to allocate location of block on different datanode are Nearest Location,
    Data Redundancy and Minimum Traffic.
    ——————————————————————————————————————————

    9. Algorithm in which namenode decide to get the location of datanode to store block of file –
    a. Firstly the NameNode checks for three factors i.e Nearest DataNode, Minimum Traffic & Data Redundancy.
    b. As per the Replication Factor the NameNode give the information to the Client to store data in which data nodes, such that same
    block is not stored twice in the same DataNode.
    c. Accordingly it stores the information in the metadata and also in two files namely the fsimage and EditLog.
    ——————————————————————————————————————————-

    10. The job of output collecter is to manage the mapper output or the intermediate value.
    ——————————————————————————————————————————-

    11. Split size greater than block size is recommended case which is defined in HDFS.
    ——————————————————————————————————————————-

    12. Mapper is one of the class used in the Map Reduce program.
    The main task of mapper class is to read data from input location, and based on the input type, it will generate a key value
    pair, that is an intermediate output in local machine.

    Map Task – It is one of the entity of Job Tracker which helps to process the data. It works on a condition i.e it executes one
    block at a time.
    ——————————————————————————————————————————–

    13. Combiner is also known as mini reducer which runs in every DataNode. Combiners are used to increase the efficiency of a
    MapReduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help
    you reduce the amount of data that needs to be transferred across to the reducers.
    ——————————————————————————————————————————–

    14. A mapReduce program using combiner –
    package Combiner;
    import java.io.IOException;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.DoubleWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.Mapper.Context;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.GenericOptionsParser;

    public class AverageSalary
    {
    public static class Map extends Mapper<LongWritable, Text, Text, DoubleWritable>
    {
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
    {
    String[] empDetails= value.toString().split(“,”);
    Text unit_key = new Text(empDetails[1]);
    DoubleWritable salary_value = new DoubleWritable(Double.parseDouble(empDetails[2]));
    context.write(unit_key,salary_value);

    }
    }
    public static class Combiner extends Reducer<Text,Text, Text,Text>
    {
    public void reduce(final Text key, final Iterable<Text> values, final Context context)
    {
    String val;
    double sum=0;
    int len=0;
    while (values.iterator().hasNext())
    {
    sum+=values.iterator().next().get();
    len++;
    }
    val=String.valueOf(sum)+”:”+String.valueOf(len);
    try {
    context.write(key,new Text(val));
    } catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (InterruptedException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }
    }
    }
    public static class Reduce extends Reducer<Text,Text, Text,Text>
    {
    public void reduce (final Text key, final Text values, final Context context)
    {
    //String[] sumDetails=values.toString().split(“:”);
    //double average;
    //average=Double.parseDouble(sumDetails[0]);
    try {
    context.write(key,values);
    } catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (InterruptedException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }
    }
    }
    public static void main(String args[])
    {
    Configuration conf = new Configuration();
    try
    {
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
    System.err.println(“Usage: Main <in> <out>”);
    System.exit(-1); }
    Job job = new Job(conf, “Average salary”);
    //job.setInputFormatClass(KeyValueTextInputFormat.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    job.setJarByClass(AverageSalary.class);
    job.setMapperClass(Map.class);
    job.setCombinerClass(Combiner.class);
    job.setReducerClass(Reduce.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    System.exit(job.waitForCompletion(true) ? 0 : -1);
    } catch (ClassNotFoundException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (InterruptedException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }
    }

    Reference – http://stackoverflow.com/questions/20212884/mapreduce-combiner
    ———————————————————————————————————————————-

Viewing 1 post (of 1 total)
Reply To: Assignment of HDFS and MapReduce
Your information:




cf22

Your Name (required)

Your Email (required)

Subject

Phone No

Your Message

Cart

  • No products in the cart.