This topic contains 1 reply, has 3 voices, and was last updated by  narmada 4 years, 6 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
• Author
Posts
• #1192

Let’s say you have 80 TB of data to store and to run MapReduce on this amount of data. Configuration of datanodes
· 8 GB RAM
Total no. of nodes = 20
Let’s assume the replication factor is 4 and block size is 64 mb.
By simple calculation you will need: = Total amount of Data * Replication Factor / Total no of nodes = 80 * 4 / 20 = 16 (disk size per datanode)

Now let’s say you need to run MapReduce program on this 80 TB of data. Reading 80 TB data at a speed of 100 MB/s using only 1 node will take: = Total data / Read-write speed = 80 * 1024 * 1024 / 100 = 838860.8 seconds = 13981.01 hours With 20 data node you will be able to finish this job in = 13981.01/ 20 = 699.05 hours

——————-Task for you——————- Q. What will be Replication Factor to complete MapReduce job.if we have disk size per datanode is 20 and no. of nodes you have 40.

#1208

Participant

How the Replication factor will be 10 can you give explanation for it.

#1398

naveenkumar_mce
Participant

= Total amount of Data * Replication Factor / Total no of nodes = 80 * 4 / 20 = 16 (disk size per datanode)
20=(80*X)/40
X=(20*40)/80=10(replication factor)

so 10 is the replication factor

Viewing 3 posts - 1 through 3 (of 3 total)

The forum ‘Hadoop Developer Batch(1 Nov 2014)’ is closed to new topics and replies.