This topic contains 4 replies, has 4 voices, and was last updated by  Rishi 3 weeks ago.

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #3310 Reply

    niteshchitti
    Participant

    1. Algorithm for NN to allocate block on different data node
    -> It depends on 3 factors
    i) Nearest DN
    ii)Bandwidth/less traffic.
    iii)data redundancy

    3. How to modify the HB and BR time interval of data node.
    ->dfs.hearbeat.interval — default of 3 secs
    dfs.blockreport.intervalMsec — default of 21600000 millisec

    can change the value of it.

    http://stackoverflow.com/questions/33857562/hadoop-heart-beat-and-block-report-time-interval

    4. what is fsimage and what type of metadata it stores.
    -> it is a file which contains the complete file system at a point of time.Every file system modification is assigned to a unqiue increasing transaction ID.

    6. write the life cycle of SNN in production.
    -> The NN has its metadata stored in 2 file, fsimage and editlogs. The metadata backup is taken by SNN from NN,If there a failure while edit logs merges with fsimage during cluster start up,the back which is taken from SNN , saved in checkpoint, is moved to NN metadata and the cluster is boot up.

    7. What metadata that NN hold on cache memory?
    -> its contains 2 files, fsimage and edit logs which contains the complete file system at a point of time and it contains each of file system chnaged that was made after the most recent fsimage respectively.

    8. if any DN stop working , how the block of dead DN will move to the active DN.
    -> there is always a communicatio happening between DN and NN. This communication is sent a signal from DN to NN. This signal is called as Heartbeat, it is by default sends a signal for every 3 secs stating that the DN is active and also sends the block report for every 10th HB. If the NN doesn’t recieve any signal, then the NN allocates a new live DN and updates NN’s metadata with the new DN location.

    9. write the condition : in which case block size changed.
    -> Depends on the scenario. But mainly depends on file size.

    #3311 Reply

    Nitin

    1. Algo for NN to allocate block on different data node.?
    Reply:WIP

    2. Write the function called for splitting the user data into blocks?
    Reply:InputFormat.getSplits(JobConfig job, unt numOfSplits) to convert data into logical splits. [Into physical blocks, couldn’t find ???]

    3. How to modify the heartbeat and block reports time interval of data node.
    Reply: update dfs.heartbeat.interval in hdfs-site.xml

    4. What is fs image and what type of metadata it store.
    Reply: fsimage is an image file on disk in Name Node. It contains meta data like, file replication level,modification/access time, permissions, block size.
    on Name Node start up, it loads meta data from fsimage and then applies edits from edits file.

    5. If 2TB is given, what is the max expected meta data will generate?
    Reply: 2*1024*1024 = 2097152 MB. Assuming block size=64 MB.
    Number of blocks = 2097152/64 = 32768
    One block needs 150 bytes of meta data. 32768 blocks will require 32768*150=4915200 bytes [4800 KB OR 4.68 MB]

    6. Write the life cycle of SNN in production?
    Reply:SNN gets started by bin/start-dfs.sh on the node mentioned in conf/masters file.
    After a particular interval of time(fs.checkpoint.period), SNN reads two files(fsimage, edits) from NN and copy the same structure in its own machine.
    If edits file gets filled to (fs.checkpoint.size), SNN reads two files from NN even if above period has not passed.
    It keep the edits file with in the limit.

    7. What meta data NN hold in cache memory?
    Reply:It contains file to block mapping[list of blocks and their location]

    8. If any DN stop working, how the blockes of dead DN will move to the actual data node?
    Answers: If some DN gets down, NN will not receive Heart beat, block records and NN will come to know that DN is down.
    NN will check the load of other DN and to maintain replication factor, all blocks in Failed DN will be replicated to other Data Nodes.
    NN will use the meta data to get all blocks of files from other replicated Data Nodes.

    9. Write the condition: in which cases Block size changes.
    Answers: If file size is too big its always better to increase block size from 64 MB to 128 MB.
    If file size is not in multiple of block size, last block size will be lesser than block size, for ex, 70 MB file size will be divided into blocks;
    1 is 64 MB second will be 6 MB block.

    Module 2:
    1. What are the external daemons are running on Standalone mode?
    Reply:
    No Daemons runs in Stand alone mode. Stand alone mode is just to debug and run some MAP reduce program on small files.

    2. Set standalone cluster?
    Reply:
    yes we can.

    3. How to control job running in DN?
    Reply:
    WE can use Job tracker and Task tracker WEB UI.
    Other option is Hadoop job -list or hadoop job -status jobID.

    4. How to define trash path?
    Reply: Trash path is hard coded and non configurable as of now. i.e /user/$USER/.Trash.
    Tracing gets enabled on configuring fs.trash.interval in core-site.xml.

    #3836 Reply

    anjali_jayaraj
    Participant

    1 a . Algorithm for NN to allocate block on different data nodes.
    The algorithm defines 3 parameters which are taken in to consideration before allocating blocks on different data nodes.They are
    1. Data node location : Block will be added to the nearest data node.
    2. Lesser network traffic
    3. Data redundancy.
    1 b. Write the function called for splitting the meta data in to blocks.
    Split() function.
    2. How to modify the heart beat and block reports time interval of data node?
    In hdfs-site.xml ,there is a parameter called ‘dfs.heartbeat.interval’ parameter which will determine data node heartbeat interval in second.
    The parameter ‘dfs.namenode.stale.datanode.interval’ is also used for modifying heartbeat interval
    The parameter ‘dfs.blockreport.intervalMsec’ is used for modifying block report time interval in Millisecond .
    3. What is FSimage and what type of metadata it store?
    It’s the snapshot of file system, when the name node is started. It contains meta data like list of files,list of blocks for each files,
    list of data nodes for each block, file attributes like Replication Factor,Access time etc.
    4. If 2TB data is given,what is the maximum expected metadata that will be generated?
    Converting 2TB to MB = 2*1024*1024 = 2097152 MB.
    Default Block Size = 64 MB.
    Number of blocks = 2097152/64 = 32768
    One block needs 150 bytes of meta data. 32768 blocks will require 32768*150=4915200 bytes [4800 KB OR 4.68 MB]
    5. Write the life cycle of SNN in production.
    SNN also have FSimage and edit log. SNN retrieves the metadata details of NameNode from NN’s FS image and edit log.
    It merges these two and is kept as a separate check point.
    The edit log is merged with FSimage during the namenode restart. If the edit log is too large so that merging may take time
    and it can even have the challenge of losing huge amount of data.In that case the details are fetched from SNN and restored in the
    FS image and edit log of NN by using get and post methods.
    6. What meta data that NN hold in cache memory?
    It contains the block mapping details [the blocks and the location or data node in which each block is located]
    7. If any DN stops working,how the blocks of dead DN will move on to active data node?
    If the communication between DN and NN is lost (NN is not receiving any signal from DN ,ie..,heartbeat),it will identify DN as dead and then
    will start to migrate the blocks from some other active DN which will have the replica of the blocks ,which was present on the dead DN.Then
    the blocks will be migrated to the DN identified by the NN ,so as to maintain the Replication Factor. NameNode will update its meta data with
    new DN location for those blocks ,after the migration has been done.
    8. Write the conditions in which cases block sizes changed.
    Usually block size is 64MB. If the file size is big, then we can change the default block size to 128 MB or any multiples of 64MB.

    #4165 Reply

    RAHUL M
    Participant

    Algorithm for NN to allocate block on different data node
    -> It depends on 3 factors
    i) Nearest DN
    ii)Bandwidth/less traffic.
    iii)data redundancy

    Hi
    The above specified ans is not an algorithm..those are the factors for NN to allocate blocks on different nodes.
    Could you please let me know the algorithm.

    #4174 Reply

    Rishi

    1) The namenode code typically places one replica on local rack, the second replica on remote random rack and the third replica on a random node of that remote rack.

    2)HB can be modified by defining the parameter dfs.heartbeat.interval in hdfs-site.XML

    3)Fsimage file is a complete persistent checkpoint of the filesystem metadata.It contains a searlized form of all directories and file inode.It doesn’t record the datanodes on which blocks are stored.

    4) default block size 64mb ; (2*1024*1024/64)*150bytes (metadata/block=150 bytes)

    6) NN holds the block mapping in cache ,this it constructs by asking the dn for their blocklists when they join the cluster and periodically afterwards to ensure up-to-date mapping.

Viewing 5 posts - 1 through 5 (of 5 total)
Reply To: Hadoop Module 1
Your information:




cf22

Your Name (required)

Your Email (required)

Subject

Phone No

Your Message

Cart

  • No products in the cart.