Let’s say you have 80 TB of data to store and to run MapReduce on this amount of data. Configuration of datanodes
· 8 GB RAM
Total no. of nodes = 20
Let’s assume the replication factor is 4 and block size is 64 mb.
By simple calculation you will need: = Total amount of Data * Replication Factor / Total no of nodes = 80 * 4 / 20 = 16 (disk size per datanode)

Now let’s say you need to run MapReduce program on this 80 TB of data. Reading 80 TB data at a speed of 100 MB/s using only 1 node will take: = Total data / Read-write speed = 80 * 1024 * 1024 / 100 = 838860.8 seconds = 13981.01 hours With 20 data node you will be able to finish this job in = 13981.01/ 20 = 699.05 hours

——————-Task for you——————- Q. What will be Replication Factor to complete MapReduce job.if we have disk size per datanode is 20 and no. of nodes you have 40.

How the Replication factor will be 10 can you give explanation for it.

naveenkumar_mce
20=(80*X)/40
X=(20*40)/80=10(replication factor)

so 10 is the replication factor

