Question: What will be Replication Factor to complete MapReduce job.if we have disk size per datanode is 20 and no. of nodes you have 40?
Answer: We have 80 TB of data to store and to run MapReduce on this amount of data. Configuration of datanodes. Simple formula: disk size= total amount of data*R.F/Total no. of nodes disk size =20, No. of nodes40, Total amount of data=80TB Formula rewritten as R.F = disk size * Total no. of nodes/total amount of data = 20 * 40/80 =10 R.F. = 10