Forum

This topic contains 0 replies, has 1 voice, and was last updated by  Eshwar 2 years, 3 months ago.

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #1278 Reply

    Eshwar
    Participant

    Let’s say you have 80 TB of data to store and to run MapReduce on this amount of data.
    Configuration of datanodes
    · 8 GB RAM
    · 100 MB/s read-write speed
    Total no. of nodes = 20
    Let’s assume the replication factor is 4 and block size is 64 mb.
    By simple calculation you will need:
    = Total amount of Data * Replication Factor / Total no of nodes
    = 80 * 4 / 20
    = 16 (disk size per datanode)
    Now let’s say you need to run MapReduce program on this 80 TB of data.
    Reading 80 TB data at a speed of 100 MB/s using only 1 node will take:
    = Total data / Read-write speed
    = 80 * 1024 * 1024 / 100
    = 838860.8 seconds
    = 13981.01 hours
    With 20 data node you will be able to finish this job in
    = 13981.01/ 20
    = 699.05 hours

    Q. What will be Replication Factor to complete MapReduce job.if we have disk
    size per datanode is 20 and no. of nodes you have 40.

    Ans : 10 (80TB * R.f)/40 = 20TB

Viewing 1 post (of 1 total)
Reply To: Hadoop Assignment Week 1 (Batch : 16/03/2015)
Your information:




cf22

Your Name (required)

Your Email (required)

Subject

Phone No

Your Message

Cart

  • No products in the cart.