#3184

Debarati chatterjee

How to achieve perfomance tuning in PIG?
Ans:
a.Use Optimization
Pig supports various optimization rules which are turned on by default. Become familiar with these rules.
b.Use Types
If types are not specified in the load statement, Pig assumes the type of =double= for numeric computations. A lot of the time, your data would be much smaller, maybe, integer or long. Specifying the real type will help with speed of arithmetic computation. It has an additional advantage of early error detection.
c.Project Early and Often
Pig does not (yet) determine when a field is no longer needed and drop the field from the row. For example, say you have a query like:
d.Filter Early and Often
As with early projection, in most cases it is beneficial to apply filters as early as possible to reduce the amount of data flowing through the pipeline.
e.Reduce Your Operator Pipeline
For clarity of your script, you might choose to split your projects into several steps for instance:
f.Make Your UDFs Algebraic
Queries that can take advantage of the combiner generally ran much faster (sometimes several times faster) than the versions that don’t. The latest code significantly improves combiner usage; however, you need to make sure you do your part. If you have a UDF that works on grouped data and is, by nature, algebraic (meaning their computation can be decomposed into multiple steps) make sure you implement it as such. For details on how to write algebraic UDFs, see Algebraic Interface.

Prwatech