This topic contains 0 replies, has 1 voice, and was last updated by  RahulPainuly 9 months, 1 week ago.

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #3025 Reply

    RahulPainuly
    Participant

    1) Run PIG command from hue.
    Ans:
    We open HUE web UI and login with cloudera credentials there we can find one menu “Query Editor” From there we can choose pig editor then we can write, save and run the commands from the editor.

    2) When we run PIG in local mode,will it convert the query in MR or not
    Ans:
    Yes, it will convert into MR as it is written in java only so it will generate byte code.

    3) How Physical translator works at the time of compilation of pig query
    Ans:
    After the logical plan is generated, the script execution moves to the physical plan where there is a description about the physical operators, Apache Pig will use, to execute the Pig script. A physical plan is more or less like a series of MapReduce jobs but then the plan does not have any reference on how it will be executed in MapReduce. During the creation of physical plan, cogroup logical operator is converted into 3 physical operators namely –Local Rearrange, Global Rearrange and Package. Load and store functions usually get resolved in the physical plan.
    Ref:google

    4) Limitation of Pig
    Ans:
    1.Code efficiency is relatively less with MR
    2.Pig is built on top of MapReduce, which is batch oriented

    5) Compilation Stage
    * Optimized Logical Plan?
    * Physical Plan?
    Ans:
    Logical and Physical plans are created during the execution of a pig script. Pig scripts are based on interpreter checking. Logical plan is produced after semantic checking and basic parsing and no data processing takes place during the creation of a logical plan. For each line in the Pig script, syntax check is performed for operators and a logical plan is created. Whenever an error is encountered within the script, an exception is thrown and the program execution ends, else for each statement in the script has its own logical plan.A logical plan contains collection of operators in the script but does not contain the edges between the operators.
    After the logical plan is generated, the script execution moves to the physical plan where there is a description about the physical operators, Apache Pig will use, to execute the Pig script. A physical plan is more or less like a series of MapReduce jobs but then the plan does not have any reference on how it will be executed in MapReduce. During the creation of physical plan, cogroup logical operator is converted into 3 physical operators namely –Local Rearrange, Global Rearrange and Package. Load and store functions usually get resolved in the physical plan.

    6) How to achieve perfomance tuning in PIG?
    Ans:
    a.Use Optimization
    Pig supports various optimization rules which are turned on by default. Become familiar with these rules.
    b.Use Types
    If types are not specified in the load statement, Pig assumes the type of =double= for numeric computations. A lot of the time, your data would be much smaller, maybe, integer or long. Specifying the real type will help with speed of arithmetic computation. It has an additional advantage of early error detection.
    c.Project Early and Often
    Pig does not (yet) determine when a field is no longer needed and drop the field from the row. For example, say you have a query like:
    d.Filter Early and Often
    As with early projection, in most cases it is beneficial to apply filters as early as possible to reduce the amount of data flowing through the pipeline.
    e.Reduce Your Operator Pipeline
    For clarity of your script, you might choose to split your projects into several steps for instance:
    f.Make Your UDFs Algebraic
    Queries that can take advantage of the combiner generally ran much faster (sometimes several times faster) than the versions that don’t. The latest code significantly improves combiner usage; however, you need to make sure you do your part. If you have a UDF that works on grouped data and is, by nature, algebraic (meaning their computation can be decomposed into multiple steps) make sure you implement it as such. For details on how to write algebraic UDFs, see Algebraic Interface.

    REF: https://pig.apache.org/docs/r0.9.1/perf.html#performance-enhancers

    7) How to implement MapSide Join or Reduce SideJoin in PIG?
    Ans
    Map-side join: n a map-side (fragment-replicate) join, you hold one dataset in memory (and join on the other dataset, record-by-record. In this type of join the large relation is followed by one or more small relations. The small relations must be small enough to fit into main memory; if they don’t, the process fails and an error is generated.

    Reduce join: In a reduce-side join, you group on the join key using hadoop’s standard merge sort. You should design your keys so that the dataset with the fewest records per key comes first — you need to hold the first group in memory and stream the second one past it. In Pig, for a standard join you accomplish this by putting the largest dataset last

    8) PIGGY BANK & Application?
    Ans:
    Piggy Bank is a place for Pig users to share the Java UDFs they have written for use with Pig. The functions are contributed “as-is.” If you find a bug in a function, take the time to fix it and contribute the fix to Piggy Bank. If you don’t find the UDF you need, take the time to write and contribute the function to Piggy Bank.
    Note: Piggy Bank currently supports Java UDFs. Support for Python and JavaScript UDFs will be added at a later date.

Viewing 1 post (of 1 total)
Reply To: PIG
Your information:




cf22

Your Name (required)

Your Email (required)

Subject

Phone No

Your Message

Cart

  • No products in the cart.