Q1. what will be the replication factor to complete MapReduce job if we have disk size per datanode is 20 and no.of nodes you have is 40?
well, by using the formula:
Disk Size= Data size * RF/total no of nodes.
replication factor will be 10.
Q2.Algo for NN to allocate block on different DN?
client raises request to NN to create a file so, hdfs client contacts NN. NN will insert the file name into the filesystem and check for RF all copies should be present there for NN to proceed after this NN will allocate a data block for it .NN respond to the client request with identity of the DN and destination of Data block then client will take the block of data lFS to specified DN.
Q3.how to modify heartbeat and block report time interval of DN?
We can modify the values of the below mentioned properties:
Q4. what is Fsimage &what type of metadata it stores?
fs image file is the snapshot of the filesystem when namenode started. it contains the complete state of the file system at a point in time. Every fS modification is assigned a unique monotonically increasing transaction id.
Q5. write the lifecycle of SNN
SNN periodically reads the Fsimage and edit logs and apply modification . When a NN starts up, it reads HDFS state from an image file, fsimage, and then applies edits from the edits log file. It then writes new HDFS state to the fsimage and starts normal operation with an empty edits file. Since NN merges fsimage and edits files only during start up, the edits log file could get very large over time on a busy cluster. Another side effect of a larger edits file is that next restart of NN takes longer.
The SNN merges the fsimage and the edits log files periodically and keeps edits log size within a limit. It is usually run on a different machine than the primary NN since its memory requirements are on the same order as the primary NN.
The SNN stores the latest checkpoint in a directory which is structured the same way as the primary NameNode’s directory. So that the check pointed image is always ready to be read by the primary NameNode if necessary.
Q6. If 2Tb is given what is the max expected metadata will generate?
The data which is generated in very small quantity it will be in kb.
Q7. what metadata that NN hold on cache memory?
NN holds metadata of file, size, owner, group, permissions, block size etc
Q8 If any dn stop working,how the blocks of dead dn will move to the active data node?
Every DN will send a Heartbeat to NN every 3sec If in case the NN didn’t receive HB from the DN it will be declared as dead. Whatever data it holds are now shifted to another blocks to maintain data backup and replication factor. RF will happen in different DN, Every 10HB will send additional details with it NN called as Block report , It will be sent to NN to update its metadata.