Piggy bank and it’s applications
The Piggy Bank is a place for Pig users to share their functions. The functions are contributed “as-is”.
To build a jar file that contains all available user defined functions (UDFs),steps to be followed:
Create a directory for the Pig source code:
cd into that directory:
Checkout the Pig source code:
svn checkout http://svn.apache.org/repos/asf/pig/trunk/ .
Build the project:
cd into the piggybank dir:
Build the piggybank:
We will now see a piggybank.jar file in that directory.
How physical translator works at the time of compilation of pig query
After the logical plan is generated,script executation moves to the physical plan, Apache script will use to execute the pig script. During the creation of physical plan, co-group logical operator is converted into three physical operators namely : Local Rearrange, Global Rearrange and Package. Load and store functions usually get resolved in the physical plan. Physical plan is more or less similar to mapreduce join.
How to run PIG command from Hue?
First,we have to open web UI ,there we have to click on Hue button.Then login with cloudera credentials.There we find one menu “Query Editor” . From there we can choose Pig Editor.Ther we can write , save, and run the commands from the editor.
Limitation Of Pig
1> Pig cannot deal with poor design of xmlor JSON and flexible schemas.
2> It ha s a problem dealing with unstructured data like images,videos,audios,etc.
3> Pig is built on top of Map Readuce, which is batch-oriented.
How to achieve Performance Tuning in Pig?
i> Use Optimization:
Pig supports various optimization rules which are turned on by default.
If types are not specified in the load statement, Pig assumes the type of double for numeric computations.
iii> Reduce Your Operator Pipeline:
we can split our projects into several steps for the clarity of our script.
iv> Filter Early and Often:
For early result,in most cases it is beneficial to apply filters to reduce the amount of data flowing through the pipeline.
Piggy Bank and its application
-> User Defined Pig Function
-> Place for Pig User to share their functions.
-> Can edit the function and contribute that function to the piggy bank.
-> Piggy Bank is a Pig’s repository of user-contributed functions.They are distributed as a part of Pig distribution.
-> We need to register the piggy bank jar to use it. We can define that jar at contrib/piggybank/java/piggybank.jar