1.How to acheive mapside join in PIG?
2.How to acheive mapside join in HIVE?
3.Write two tables for manage table and external table and perform the mapside join and reducejoin and check the differences
1a)The inputs for to each map must be partitioned and sorted in a specific way. Each input dataset must be divided into the
same number of partitions, and it must be sorted by the same key (the join key) in each source. All the records for a particular key must reside in the same partition and which is mandatory. A map-side join can be used to join the outputs of several jobs that had the same number of reducers, the same keys, and output files that are not splittable which means the ouput files should not be bigger than the HDFS block size.Using the org.apache.hadoop.mapred.join.CompositeInputFormat class we can achieve this. The join type (Inner or Outer) is configurable usingthe join expression. for ex,func ::= tbl(<class>,”<path>”);
We can set it to the CompositeInputFormat using,inner(tbl(org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.class,
“hdfs://localhost:8000/usr/data”),tbl(org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.class,“hdfs://localhost:8000/usr/activity”));We can achieve following kind of joins using Map-Side techniques,
1) Inner Join
2) Outer Join
3) Override – MultiFilter for a given key, prefered values from the right most source
3a)map-reduce join:Map-reduce join completed the job in less time compared to the join.
Map-reduce join has completed its job without the help of any reducer whereas join executed this job with the help of one reducer at least.