This topic contains 1 reply, has 2 voices, and was last updated by  chinni 2 years, 10 months ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
  • #1961

    usha yogeshgowda

    1.How to acheive mapside join in PIG?
    2.How to acheive mapside join in HIVE?
    3.Write two tables for manage table and external table and perform the mapside join and reducejoin and check the differences



    1a)The inputs for to each map must be partitioned and sorted in a specific way. Each input dataset must be divided into the
    same number of partitions, and it must be sorted by the same key (the join key) in each source. All the records for a particular key must reside in the same partition and which is mandatory. A map-side join can be used to join the outputs of several jobs that had the same number of reducers, the same keys, and output files that are not splittable which means the ouput files should not be bigger than the HDFS block size.Using the org.apache.hadoop.mapred.join.CompositeInputFormat class we can achieve this. The join type (Inner or Outer) is configurable usingthe join expression. for ex,func ::= tbl(<class>,”<path>”);
    We can set it to the CompositeInputFormat using,inner(tbl(org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.class,
    “hdfs://localhost:8000/usr/data”),tbl(org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.class,“hdfs://localhost:8000/usr/activity”));We can achieve following kind of joins using Map-Side techniques,
    1) Inner Join
    2) Outer Join
    3) Override – MultiFilter for a given key, prefered values from the right most source

    3a)map-reduce join:Map-reduce join completed the job in less time compared to the join.
    Map-reduce join has completed its job without the help of any reducer whereas join executed this job with the help of one reducer at least.

Viewing 2 posts - 1 through 2 (of 2 total)

The forum ‘BIG DATA & HADOOP INTERNSHIP BATCH – 2015’ is closed to new topics and replies.