Eshwar


Let’s say you have 80 TB of data to store and to run MapReduce on this amount of data.
Configuration of datanodes
· 8 GB RAM
Total no. of nodes = 20
Let’s assume the replication factor is 4 and block size is 64 mb.
By simple calculation you will need:
= Total amount of Data * Replication Factor / Total no of nodes
= 80 * 4 / 20
= 16 (disk size per datanode)
Now let’s say you need to run MapReduce program on this 80 TB of data.
Reading 80 TB data at a speed of 100 MB/s using only 1 node will take:
= Total data / Read-write speed
= 80 * 1024 * 1024 / 100
= 838860.8 seconds
= 13981.01 hours
With 20 data node you will be able to finish this job in
= 13981.01/ 20
= 699.05 hours

Q. What will be Replication Factor to complete MapReduce job.if we have disk
size per datanode is 20 and no. of nodes you have 40.

Ans : 10 (80TB * R.f)/40 = 20TB

