Forum

This topic contains 0 replies, has 1 voice, and was last updated by  khazim 3 months, 2 weeks ago.

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #3037 Reply

    khazim
    Participant

    Blocks are physical division and input splits are logical division. One input split can be map to multiple physical blocks.
    When Hadoop submits jobs, it splits the input data logically and process by each Mapper task.
    The number of Mappers are equal to the number of splits.
    One important thing to remember is that InputSplit doesn’t contain actual data but a reference (storage locations) to the data.

    A split basically has 2 things :

    a length in bytes and a set of storage locations, which are just hostname strings.

    Block size and split size is customizable. Default block size is 64Mb and default split size is equal to block size.

    1 data set = 1….n files = 1….n blocks for each file

    1 mapper = 1 input split = 1….n blocks

    InputFormat.getSplits() is responsible for generating the input splits which are going to be used each split as input for each mapper.

    By default this class is going to create one input split for each HDFS block.

Viewing 1 post (of 1 total)
Reply To: 2) write the function called for splitting the user data into blocks?
Your information:




cf22

Your Name (required)

Your Email (required)

Subject

Phone No

Your Message

Cart

  • No products in the cart.