Share Ideas, Start Something Good.

This topic contains 2 replies, has 1 voice, and was last updated by  mamthakulal 3 years, 3 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
• Author
Posts
• #1286

mamthakulal
Participant

Question: Find the Maximum temperature

Step1:

temp = load ‘/data’ using PigStorage (‘\t’) as (year:int, t:int);
dump temp;
result:
(2000,30)
(2000,28)
(2000,29)
(2001,32)
(2001,30)
(2005,29)
(2005,30)
(2007,28)
(2007,28)
(2007,31)
(2010,29)
(2011,32)
(2012,31)
(2012,33)
(2014,29)

describe temp;
result:
temp: {year: int,t: int}

Step 2:

grpbyt = group temp by year;
dump grpbyt;

result:
(2000,{(2000,30),(2000,28),(2000,29)})
(2001,{(2001,32),(2001,30)})
(2005,{(2005,29),(2005,30)})
(2007,{(2007,28),(2007,28),(2007,31)})
(2010,{(2010,29)})
(2011,{(2011,32)})
(2012,{(2012,31),(2012,33)})
(2014,{(2014,29)})

describe grpbyt;
result:
grpbyt: {group: int,temp: {year: int,t: int}}

illustrate grpbyt;
result:
———————————————
| temp | year: bytearray | t: bytearray |
———————————————
| | 2005 | 29 |
| | 2005 | 30 |
| | 2010 | 29 |
———————————————
———————————
| temp | year: int | t: int |
———————————
| | 2005 | 29 |
| | 2005 | 30 |
| | 2010 | 29 |
———————————
———————————————————–
| grpbyt | group: int | temp: bag({year: int,t: int}) |
———————————————————–
| | 2005 | {(2005, 29), (2005, 30)} |
| | 2010 | {(2010, 29)} |
———————————————————–

Step 3:
maxt = foreach grpbyt generate group, MAX(temp.t);
dump maxt;

result:
(2000,30)
(2001,32)
(2005,30)
(2007,31)
(2010,29)
(2011,32)
(2012,33)
(2014,29)

Step 4:
stroe maxt into ‘/tempresults’;

#1287

mamthakulal
Participant

Q: Word Count

Step 1:

w = load ‘/words’ as (wd:chararray);
dump w;
result:
(Hi how are you? I am Fine)
(Where are you? I am at Prwatech class)
(Are you learning Hadoop? Yes I am.)

Step 2:
w_split = foreach w generate FLATTEN(TOKENIZE(wd)) as wd;
dump w_split;
result:
(Hi)
(how)
(are)
(you?)
(I)
(am)
(Fine)
(Where)
(are)
(you?)
(I)
(am)
(at)
(Prwatech)
(class)
(Are)
(you)
(learning)
(Yes)
(I)
(am.)

describe w_split;
w_split: {word: chararray}

illustrate w_split;
———————————————-
| w | wd: bytearray |
———————————————-
| | Are you learning Hadoop? Yes I am. |
———————————————-
———————————————-
| w | wd: chararray |
———————————————-
| | Are you learning Hadoop? Yes I am. |
———————————————-
———————————
| w_split | word: chararray |
———————————
| | Are |
| | you |
| | learning |
| | Yes |
| | I |
| | am. |
———————————

Step 3:

wrdgrp = group w_split by word;
dump wrdgrp;
result:

(I,{(I),(I),(I)})
(Hi,{(Hi)})
(am,{(am),(am)})
(at,{(at)})
(Are,{(Are)})
(Yes,{(Yes)})
(am.,{(am.)})
(are,{(are),(are)})
(how,{(how)})
(you,{(you)})
(Fine,{(Fine)})
(you?,{(you?),(you?)})
(Where,{(Where)})
(class,{(class)})
(Prwatech,{(Prwatech)})
(learning,{(learning)})

describe wrdgrp;
wrdgrp: {group: chararray,w_split: {word: chararray}}

illustrate wrdgrp;
————————————————-
| w | wd: chararray |
————————————————-
| | Hi how are you? I am Fine |
| | Where are you? I am at Prwatech class |
| | Are you learning Hadoop? Yes I am. |
————————————————-
———————————
| w_split | word: chararray |
———————————
| | Hi |
| | how |
| | are |
| | you? |
| | I |
| | am |
| | Fine |
| | Where |
| | are |
| | you? |
| | I |
| | am |
| | at |
| | Prwatech |
| | class |
| | Are |
| | you |
| | learning |
| | Yes |
| | I |
| | am. |
———————————
——————————————————————-
| wrdgrp | group: chararray | w_split: bag({word: chararray}) |
——————————————————————-
| | Are | {(Are)} |
| | Fine | {(Fine)} |
| | Hi | {(Hi)} |
| | I | {(I), (I), (I)} |
| | Prwatech | {(Prwatech)} |
| | Where | {(Where)} |
| | Yes | {(Yes)} |
| | am | {(am), (am)} |
| | am. | {(am.)} |
| | are | {(are), (are)} |
| | at | {(at)} |
| | class | {(class)} |
| | how | {(how)} |
| | learning | {(learning)} |
| | you | {(you)} |
| | you? | {(you?), (you?)} |
——————————————————————-

Step 4:

wrdcount = foreach wrdgrp generate group, COUNT(w_split);

dump wrdcount;
result:

(I,3)
(Hi,1)
(am,2)
(at,1)
(Are,1)
(Yes,1)
(am.,1)
(are,2)
(how,1)
(you,1)
(Fine,1)
(you?,2)
(Where,1)
(class,1)
(Prwatech,1)
(learning,1)

Step 5:
store wrdcount into ‘/wordcount_pig’;

#1289

mamthakulal
Participant

Q: Word Size

Step 1:

w = load ‘/words’ as (wd:chararray);
dump w;
(Hi how are you? I am Fine)
(Where are you? I am at Prwatech class)
(Are you learning Hadoop? Yes I am.)

Step 2:

wrdgrp = group w by SIZE(wD);
dump wrdgrp;

result:
(25,{(Hi how are you? I am Fine)})
(34,{(Are you learning Hadoop? Yes I am.)})
(37,{(Where are you? I am at Prwatech class)})

Step 3:
wrdsizec = foreach wrdgrp generate group, COUNT(w);
dump wrdsizec;
result:
(25,1)
(34,1)
(37,1)

Step 4:
store wrdsizec into ‘/Wordsizecount_pig’

Viewing 3 posts - 1 through 3 (of 3 total)

The forum ‘General Discussion’ is closed to new topics and replies.