Let’s say you have 80 TB of data to store and to run MapReduce on this amount of data. Configuration of datanodes · 8 GB RAM · 100 MB/s read-write speed Total no. of nodes = 20 Let’s assume the replication factor is 4 and block size is 64 mb. By simple calculation you will need: = Total amount of Data * Replication Factor / Total no of nodes = 80 * 4 / 20 = 16 (disk size per datanode) Now let’s say you need to run MapReduce program on this 80 TB of data. Reading 80 TB data at a speed of 100 MB/s using only 1 node will take: = Total data / Read-write speed = 80 * 1024 * 1024 / 100 = 838860.8 seconds = 13981.01 hours With 20 data node you will be able to finish this job in = 13981.01/ 20 = 699.05 hours
Q. What will be Replication Factor to complete MapReduce job.if we have disk size per datanode is 20 and no. of nodes you have 40.