Question: What will be Replication Factor to complete MapReduce job.if we have disk size per datanode is 20 and no. of nodes you have 40?
Answer: We have 80 TB of data to store and to run MapReduce on this amount of data. Configuration of datanodes.
Simple formula: disk size= total amount of data*R.F/Total no. of nodes
disk size =20, No. of nodes40, Total amount of data=80TB
Formula rewritten as R.F = disk size * Total no. of nodes/total amount of data
= 20 * 40/80
R.F. = 10