HDFS Interview Questions and Answers
Hdfs interview questions, Are you looking for the list of top Rated hdfs interview questions? Or the one who is casually looking for the Best Platform which is offering Best interview questions on HDFS? Or the one who is carrying experience seeking for the List of best hdfs interview questions for experienced then stays with us for the most asked interview questions on hdfs which are asked in the most common interviews.
Are you the one who is dreaming to become the certified Pro Hadoop Developer? Then ask India’s Leading Big Data Training institute how to become a pro developer. Get the Advanced Big Data Certification course under the guidance of World-class Trainers of Big Data Training institute.
Algorithm for NN to allocate a block on different Data Node?
Ans: The Name node uses the nearest neighbor algorithm to allocate the blocks on different Nodes. Name nodes easily send their data to the nearest data node.
Write the functions called for splitting the user data into blocks?
Ans: The Split () method is used to break the data into multiple chunks for allocating the data node in the cluster .suppose the Abc.txt file is split into six multiple chunks called A1, B2, C3, D4, E5, F6 Sequentially a chunk C3 will send the request to HDFS client to get the location of data node from the name node and in reverse the name node will send the location of first or near data node for storing the data.
Hadoop HDFS commands with examples and Usage
How to modify the heartbeat and block report time interval of Data node?
Ans: We can easily modify the heartbeat of the data node in the cluster by managing the configuration file of the Hadoop.
Filename: hdf-site.xml = this file is required for setting environment of the Hadoop, it also manages the name node, secondary name node and data node Inside the file we have a parameter called “dfs.heartbeat.interval” just modify the value according to the requirement. We can easily change the block report from the “hdfs-site.xml” file there we have a parameter called “dfs.blockreport.intervalMsec” just modify according to the requirement.
What is fsImage and which type of metadata it stores?
Ans: FsImage is one of the directories of the Name Node which is used to store the configuration of data, its replication factors, block report and it stores the metadata of location of the data where it stores
If 2TB data given, what is the max expected Metadata will generate?
Ans: 2TB=2*1024 GB *1024 MB/128Mb=16384 Blocks
Write the Lifecycle of SNN in production?
Ans: Secondary name node is used to overcome the situation of single point of failure suppose if the name node is failed due to some network issue the entire cluster is not working, so in order to provide the backup for name node, it basically takes the checkpoints of the Hadoop file system of the name node . The name node contains two directory Edit logs and fsImage. When editing log mergers all its files and copied it to the fsImage and restart the cluster, after restart name node will run these files to update its metadata so secondary name node takes the checkpoints of the Hadoop file system not playing the role of Name node in the cluster
If any DN stops working, how the blocks of dead DN will move to the active DN?
Ans: Its moves manually, Data node is only responsible for this. When the data node goes down then first it will search the nearest space where dead data node could adjust. If it could not find any place then it will switch to the blank data node.
Algorithm for Name Node to allocates block on different Data Nodes?
Ans: The Name node basically uses the nearest neighbor algorithm to allocate the blocks on a different node. Name node easily send the data to the nearer server data node,
Example,
As we see in the above diagram that a cluster having name node with Data node resides in different location and a client from another location (called Andhra Pradesh) sends some files for storing on the Data nodes then the Name Node has the responsibility to find the nearest Data node from the client to perform their operation in less time, so here we have three locations New Delhi, Mumbai, and Chennai. From the nearest algorithm, Chennai is nearer to the client apart from others. So, the name node basically sends the location of the data node to the client for storing their data.
Write the function called splitting the user data into blocks?
Ans : The Split() method is used to break the data into multiple chunks for allocating the data node in the cluster .suppose the Abc.txt file is splitted into six multiple chunks called A1,B2,C3,D4,E5, F6, now Sequentially a chunk C3 will send the request to HDFS client to get the location of data node from the name node and in reverse the name node will send the location of first or near data node for storing the data.
How to modify the heartbeat and block report time interval of the data node?
Ans: Heartbeat is used to check the status (Active & Inactive) of the data node in the cluster in order to perform different operations. The data node regularly sends the heartbeat report every 3 sec (default) to the name node. If the data node is Inactive the operation assign by the name node will not be occurred. We can easily modify the heartbeat of the data node in the cluster by managing the configuration file of the Hadoop.
Filename: hdf-site.xml = this file is required for setting environment of the Hadoop, it also manages the name node, secondary name node and data node Inside the file we have a parameter called “dfs.heartbeat.interval” just modify the value according to the requirement.
Block Report
Block report is used to send the status of the different block of the data nodes, every 10th heartbeat is referred as block report, it basically contains the information of the number of replication factor and block name, etc. We can easily change the block report from the “hdfs-site.xml” file there we have a parameter called “dfs.blockreport.intervalMsec” just modify according to the requirement.
What is fsImage and which type of metadata it stores?
Ans: fsImage is one of the directories of the Name Node which is used to store the configuration of data, its replication factors, block report and it stores the metadata of location of the data where it stores
If 2Tb data is given what is the max expected metadata is generated?
Ans: we have 2TB data i.e. 2*1024gb=2048GB*1024/128=16384.
Write the Life cycle of SNN in production?
Ans: Secondary name node is used to overcome the situation of single point of failure suppose if the name node is failed due to some network issue the entire cluster is not working, so in order to provide the backup for name node, it basically takes the checkpoints of the Hadoop file system of the name node . The name node contains two directory Edit logs and fsImage . When editing log mergers all its files and copied it to the fsImage and restart the cluster, after restart name node will run these files to update its metadata so secondary name node takes the checkpoints of the Hadoop file system not playing the role of Name node in the cluster
If any DN stops working, how the blocks of dead DN will move to the active DN?
Ans: In a cluster, if any DN is not working, the blocks of Data node will simply merger with any of the active Data nodes in the cluster to ensure or maintain the replication factor. It never happens that the replication factor has gone down because in a cluster during splitting the amount of data into the number of blocks called chunks, we decided the number of replication factor to the data