apuaparna
A. Find out the top 5 categories with maximum number of videos uploaded.
B. Find out the top 10 rated videos.
C. Find out the most viewed videos.

Sanchita Sen
A. Find out the top 5 categories with maximum number of videos uploaded.

The required query is:

a = load ‘/user/cloudera/Proj’ as (a1:chararray, a2:chararray, a3:int, a4:chararray ,a5:int ,a6:int ,a7:float ,a8:int ,a9:int ,a10:int);
word1 = FOREACH a GENERATE a4;
g = GROUP word1 BY \$0;
count = FOREACH g GENERATE group, COUNT(word1) as cnt;
co = ORDER count BY cnt DESC;
f = LIMIT co 5; //This query will print top 5 values
DUMP F;
OUTPUT

(Entertainment,908)
(Music,862)
(Comedy,414)
(People & Blogs,398)
(News & Politics,333)

apuaparna
B. Find out the top 10 rated videos.
Ans.
First we have to copy the input file from lfs to hdfs. input file is youtubedata.txt

then we have to load the file to pig
grunt> a = load ‘/user/cloudera/proj’ as (a1:chararray, a2:chararray, a3:int, a4:chararray ,a5:int ,a6:int ,a7:float ,a8:int ,a9:int ,a10:int);
grunt> f = foreach a generate a1,a8;
grunt> g = order f by \$1 DESC; // this will store the result in descending order
grunt> h = LIMIT c 10; // this query will store top 10 values
grunt> dump h;
OUTPUT:
(kHmvkRoEowc,122129)
(EwTZ2xpQwpA,83514)
(rZBA0SKmQy8,75004)
(4DC4Rb9quKk,73257)
(LU8DDYz68kM,58850)
(Qit3ALTelOo,56767)
(irp8CNj9qBI,43774)
(3QL97xldoXc,37247)
(LTxO_pgMqys,35352)
(Md6rURKhZmA,34802)

MD SAJID AKHTAR
C. Find out the most viewed videos.

SOLUTION::

First we have to copy the input file from lfs to hdfs.Here the input file is youtubedata.txt.

Then,we have to load the file in pig.
grunt>a = load ‘/user/cloudera/youtube_proj’ as (a1:chararray ,a2:chararray ,a3:int ,a4:chararray ,a5:int ,a6:int ,a7:float ,a8:int ,a9:int ,a10:int );
grunt>b = foreach a GENERATE a1,a6; //This query will fetch the only a1 and a6 column from the loaded file.
grunt> c = order b by \$1 DESC; // This query will store the result in Descending order.
grunt> res = LIMIT c 1;// This query will store the only top value.
grunt> dump res;

OUTPUT
(kHmvkRoEowc,122129) //MostViewed-Video id=kHmvkRoEowc,value=122129)

