April 4, 2017 at 9:51 am #3163
Limitation of pig
1.Low latency queries are not supported in pig
2. Pig does not support random read or write queries
3. Pig works well for only batch processingApril 5, 2017 at 11:35 am #3173
Run PIG command from hue
ans: we open HUE web UI and login with cloudera credentials there we can find one menu “query editor” from where we can choose pig editor then we can write
,run and save the commands from the editorApril 5, 2017 at 11:35 am #3174
What are the limitations of pig?
Ans.1. When something goes wrong, it just gives execution error in udf, it does not show the type of error like syntax error or type error, logical error.
Atleast a developer should get the different types of error when developer has a syntax error.
2.The commands are not executed unless either you dump or store an intermediate or final result. This increases the iteration between debug and resolving the issue.
3. low latncy query are not supportable in pig. this is not suitable for oltp and olap.
4. If we want to do random writes to update small portion of data, we can not use pig.April 5, 2017 at 11:40 am #3175
When we run PIG in local mode,will it convert the query in MR or not
Yes, it will convert into MR as it is written in java only so it will generate byte codeApril 5, 2017 at 11:41 am #3176
Limitation of Pig:
1)Code efficiency is relatively less with MR.
2)Pig is built on top of MapReduce, which is batch oriented.
3)In batch processing its work well.
4)Low latency queries are not supported in pig.April 5, 2017 at 11:43 am #3178
3) How Physical translator works at the time of compilation of pig query
After the logical plan is generated, the script execution moves to the physical plan where there is a description about the physical operators, Apache Pig will use, to execute the Pig script. A physical plan is more or less like a series of Map Reduce jobs but then the plan does not have any reference on how it will be executed in Map Reduce. During the creation of physical plan, co group logical operator is converted into 3 physical operators namely –Local Rearrange, Global Rearrange and Package. Load and store functions usually get resolved in the physical planApril 5, 2017 at 11:44 am #3179
How to achieve performance tuning in PIG ?
Performance tuning in PIG can be achieved in PIG with the following:
1.We can use Optimization
2. We can use Types. If types are not specified in the load statement, Pig assumes the type of =double= for numeric computations.
3. Pig determine when a field is no longer needed and drop the field from the row.
4.It is fruitful to use filters to reduce the amount of data through pipeline
5.It is better to reduce the operator pipelinbeApril 5, 2017 at 11:44 am #3180
When we run Pig in local mode, will it convert the query in MR or not ?
No, Because in local mode it will take data from LFS. For MapReduce it is mandatory to use hadoop and file should be stored in HDFS.April 5, 2017 at 11:48 am #3181
How the physical translator works at the time of compilation of pig query?
Pig undergoes some steps when a pig latin script is converted into MapReduce jobs. After performing the basic parsing and semantic checking, it produces a logical plan. The logical plan describes the logical operators that have to be executed by pig during execution. After this, pig produces a physical plan. The physical plan describes the physical operators that are needed to execute the script.April 5, 2017 at 11:49 am #3182
* Optimized Logical Plan?
* Physical Plan?
Logical and Physical plans are created during the execution of a pig script. Pig scripts are based on interpreter checking. Logical plan is produced after semantic checking and basic parsing and no data processing takes place during the creation of a logical plan. For each line in the Pig script, syntax check is performed for operators and a logical plan is created. Whenever an error is encountered within the script, an exception is thrown and the program execution ends, else for each statement in the script has its own logical plan.A logical plan contains collection of operators in the script but does not contain the edges between the operators.
After the logical plan is generated, the script execution moves to the physical plan where there is a description about the physical operators, Apache Pig will use, to execute the Pig script. A physical plan is more or less like a series of MapReduce jobs but then the plan does not have any reference on how it will be executed in MapReduce. During the creation of physical plan, cogroup logical operator is converted into 3 physical operators namely –Local Rearrange, Global Rearrange and Package. Load and store functions usually get resolved in the physical plan.April 5, 2017 at 11:50 am #3184
How to achieve perfomance tuning in PIG?
Pig supports various optimization rules which are turned on by default. Become familiar with these rules.
If types are not specified in the load statement, Pig assumes the type of =double= for numeric computations. A lot of the time, your data would be much smaller, maybe, integer or long. Specifying the real type will help with speed of arithmetic computation. It has an additional advantage of early error detection.
c.Project Early and Often
Pig does not (yet) determine when a field is no longer needed and drop the field from the row. For example, say you have a query like:
d.Filter Early and Often
As with early projection, in most cases it is beneficial to apply filters as early as possible to reduce the amount of data flowing through the pipeline.
e.Reduce Your Operator Pipeline
For clarity of your script, you might choose to split your projects into several steps for instance:
f.Make Your UDFs Algebraic
Queries that can take advantage of the combiner generally ran much faster (sometimes several times faster) than the versions that don’t. The latest code significantly improves combiner usage; however, you need to make sure you do your part. If you have a UDF that works on grouped data and is, by nature, algebraic (meaning their computation can be decomposed into multiple steps) make sure you implement it as such. For details on how to write algebraic UDFs, see Algebraic Interface.April 5, 2017 at 11:50 am #3185
How to implement MapSide Join or Reduce SideJoin in PIG?
Map-side join: n a map-side (fragment-replicate) join, you hold one dataset in memory (and join on the other dataset, record-by-record. In this type of join the large relation is followed by one or more small relations. The small relations must be small enough to fit into main memory; if they don’t, the process fails and an error is generated.April 5, 2017 at 11:51 am #3186
PiggyBank and application
Piggy Bank is a place for Pig users to share the Java UDFs they have written for use.The function ia written as “as-is”.If anyone find bug in function,take time fix it and contribute the fix to Piggy Bank.April 5, 2017 at 11:52 am #3187
Run Pig Command From Hue.
Ans. First open HUE web UI and login with cloudera credentials there we can find one menu “Query Editor” From there we can choose pig editor then we can write the command and then run the commands from the editor. we can also save the command and result.April 5, 2017 at 11:59 am #3188
When we run Pig in local mode, will it convert the query in MR or not?
-> Yes, it will convert into MR as it is written in java only so it will generate byte code.