Tagged: 

This topic contains 3 replies, has 4 voices, and was last updated by  sampathk123 2 years ago.

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #1872 Reply

    somu s
    Member

    1.Why Pig ?
    2.Advantages of Using Pig ?
    3.Pig Features ?
    4.Difference Between Pig and SQL ?
    5.What are the scalar datatypes in pig?
    6.What are the complex datatypes in pig?
    map:
    8.What is the purpose of ‘dump’ keyword in pig?
    9.what are relational operations in pig latin?
    10.How to use ‘foreach’ operation in pig scripts?
    11.How to write ‘foreach’ statement for map datatype in pig scripts?
    12.How to write ‘foreach’ statement for tuple datatype in pig scripts?
    13.How to write ‘foreach’ statement for bag datatype in pig scripts?
    14.why should we use ‘filters’ in pig scripts?

    #1939 Reply

    1>
    1.Ease of programming
    2.Optimization opportunities.
    3.Extensibility

    2> i) Pig can be treated as a higher level language
    a) Increases Programming Productivity
    b) Decreases duplication of Effort
    c) Opens the M/R Programming system to more uses

    ii) Pig Insulates against hadoop complexity
    a) Hadoop version Upgrades
    b) Job configuration Tunning

    3>
    1.Data Flow Language User Specifies a Sequence of Steps where each step specifies only a single high-level data transformation.
    2. User Defined Functions
    3.Debugging Environment
    4.Nested data Model

    4>https://www.quora.com/What-is-main-differences-between-hive-vs-pig-vs-sql
    PIG SQL
    ——————————————————***————————————————————————-
    1.Data Flow Language 1.Structured Query Language for OLTP
    2.It can support Pegabytes,Tera bytes 2.It can’t support Pegabytes,Terabytes
    3.It is a Scripting Language 3.It is procedure,triggers,funcions

    6>Pig has three complex types: maps, tuples and bags.
    Map: A map is a chararray to data element mapping which is expressed in key-value pairs.
    Tuple: Tuples are fixed length, ordered collection of Pig data elements.
    Bag: Bags are unordered collection of tuples.

    7>Dump diaplay the output on the screen
    Dump ‘processed’.

    8>Relational operators are the main tools Pig Latin provides to operate on your data. They allow you to transform it by sorting, grouping, joining, projecting, and filtering.

    9>Foreach takes a set of expressions and applies them to every record in the data pipeline

    10>
    for map we can use hash(‘#’)

    11>for tuple we can use dot(‘.’)

    12>when you project fields in a bag, you are creating a new bag with only those fields:

    13>
    Filters are similar to where clause in SQL.filter which contain predicate.If that predicate evaluates to true for a given record, that record will be passed down the pipeline. Otherwise, it will not.predicate contain different operators like ==,>=,<=,!=.so,== and != can be applied to maps and tuples.

    #1972 Reply

    chinni
    Participant

    1a)Pig is a Apache open source project which is run on hadoop,provides engine for data flow in parallel on hadoop.It includes language called pig latin,which is for expressing these data flow.It includes different operations like joins,sort,filter ..etc and also ability to write UserDefine Functions(UDF) for proceesing and reaing and writing.pig uses both HDFS and MapReduce i,e storing and processing.

    2a)decrease in development time.this is the biggest advantage especially considering vanilla map-reduce jobs complexicity,time-spent and maintenance of the program.

    3a)The Hadoop Plugin comes with features that should make it much easier for you to quickly run and debug Pig scripts.
    The main one is the ability to quickly run Pig scripts on your Hadoop cluster through a gateway machine. Having a gateway machine is a common setup for Hadoop clusters, especially secure clusters .These tasks will build your project, rsync your Pig script and all of its transitive runtime dependencies to the gateway and execute your script for you.

    4a)Pig latin is procedural version of SQl.pig has certainly similarities,more difference from sql.sql is a query language for user asking question in query form.sql makes answer for given but dont tell how to answer the given question.suppose ,if user want to do multiple operations on tables,we have write maultiple queries and also use temporary table for storing,sql is support for subqueries but intermediate we have to use temporary tables,SQL users find subqueries confusing and difficult to form properly.using sub-queries creates an inside-out design where the first step in the data pipeline is the innermost query .pig is designed with a long series of data operations in mind, so there is no need to write the data pipeline in an inverted set of subqueries or to worry about storing data in temporary tables.

    5a)scalar datatype
    int -4bytes,
    float -4bytes,
    double -8bytes,
    long -8bytes,
    chararray,
    bytearray

    6a)map:map in pig is chararray to data element mapping where element have pig data type including complex data type.
    example of map [‘city’#’hyd’,’pin’#500086]
    the above example city and pin are data elements(key) mapping to values
    tuple:tuple have fixed length and it have collection datatypes.tuple containing multiple fields and also tuples are ordered.
    example, (hyd,500086) which containing two fields.
    bag:A bag containing collection of tuples which are unordered,Bag constants are constructed using braces, with tuples in the bag separated by com-mas. For example, {(‘hyd’, 500086), (‘chennai’, 510071), (‘bombay’, 500185)}

    8a)dump diaplay the output on the screen
    dump ‘processed’

    9a)they are
    a)for each
    b)order by
    c)filters
    d)group
    e)distinct
    f)join
    g)limit

    10a)foreach takes a set of expressions and applies them to every record in the data pipeline
    A = load ‘input’ as (user:chararray, id:long, address:chararray, phone:chararray,preferences:map[]);
    B = foreach A generate user, id;
    positional references are preceded by a $ (dollar sign) and start from 0:
    c= load d generate $2-$1

    11a)for map we can use hash(‘#’)
    bball = load ‘baseball’ as (name:chararray, team:chararray,position:bag{t:(p:chararray)}, bat:map[]);
    avg = foreach bball generate bat#’batting_average’;

    12a)for tuple we can use dot(‘.’)
    A = load ‘input’ as (t:tuple(x:int, y:int));
    B = foreach A generate t.x, t.$1;

    13a)when you project fields in a bag, you are creating a new bag with only those fields:
    A = load ‘input’ as (b:bag{t:(x:int, y:int)});
    B = foreach A generate b.x;
    we can also project multiple field in bag
    A = load ‘input’ as (b:bag{t:(x:int, y:int)});
    B = foreach A generate b.(x, y);

    14a)Filters are similar to where clause in SQL.filter which contain predicate.If that predicate evaluates to true for a given record, that record will be passed down the pipeline. Otherwise, it will not.predicate contain different operators like ==,>=,<=,!=.so,== and != can be applied to maps and tuples.
    A= load ‘inputs’ as(name,address)
    B=filter A by symbol matches ‘CM.*’;

    15)why should we use ‘group’ keyword in pig scripts?
    a)The group statement collects together records with the same key.In SQL the group by clause creates a group that must feed directly into one or more aggregate functions. In Pig Latin there is no direct connection between group and aggregate functions.
    input2 = load ‘daily’ as (exchanges, stocks);
    grpds = group input2 by stocks;

    #2004 Reply

    sampathk123
    Participant

    1) Run pig in stand alone mode:

    pig can be run in 2 modes:
    1) Local Mode – to run pig in local mode, we need access to a single machine.
    pig -x local
    2) Mapreduce Mode – To run pig in map reduce mode, you need access to a hadoop cluster and HDFS installation.
    pig
    (or)
    pix -x mapreduce

    2) When we run PIG in local mode, will it convert the query in MR or not?
    – No, As the pig scripts run in the local system. By default pig stores data in local file system. For MapReduce, its mandatory to start hadoop and files should be stored in HDFS.

    3) How the physical translator works at the time of compilation of pig query?
    Pig undergoes some steps when a pig latin script is converted into MapReduce jobs. After performing the basic parsing and semantic checking, it produces a logical plan. The logical plan describes the logical operators that have to be executed by pig during execution. After this, pig produces a physical plan. The physical plan describes the physical operators that are needed to execute the script.

Viewing 4 posts - 1 through 4 (of 4 total)
Reply To: PIG Questions
Your information:




cf22

Your Name (required)

Your Email (required)

Subject

Phone No

Your Message

Cart

  • No products in the cart.