Forum

This topic contains 3 replies, has 3 voices, and was last updated by  MD SAJID AKHTAR 6 months, 1 week ago.

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #3216 Reply

    apuaparna
    Participant

    A. Find out the top 5 categories with maximum number of videos uploaded.
    B. Find out the top 10 rated videos.
    C. Find out the most viewed videos.

    #3217 Reply

    Sanchita Sen
    Participant

    A. Find out the top 5 categories with maximum number of videos uploaded.</strong

    The required query is:

    hadoop fs -copyFromLocal ‘/user/cloudera/desktop/Proj’ /user/cloudera/
    a = load ‘/user/cloudera/Proj’ as (a1:chararray, a2:chararray, a3:int, a4:chararray ,a5:int ,a6:int ,a7:float ,a8:int ,a9:int ,a10:int);
    word1 = FOREACH a GENERATE a4;
    g = GROUP word1 BY $0;
    count = FOREACH g GENERATE group, COUNT(word1) as cnt;
    co = ORDER count BY cnt DESC;
    f = LIMIT co 5; //This query will print top 5 values
    DUMP F;
    OUTPUT

    (Entertainment,908)
    (Music,862)
    (Comedy,414)
    (People & Blogs,398)
    (News & Politics,333)

    #3218 Reply

    apuaparna
    Participant

    B. Find out the top 10 rated videos.
    Ans.
    First we have to copy the input file from lfs to hdfs. input file is youtubedata.txt

    loudera@localhost ~]$ hadoop fs -copyFromLocal ‘/home/cloudera/Desktop/youtubedata.txt’ /user/cloudera/proj
    then we have to load the file to pig
    grunt> a = load ‘/user/cloudera/proj’ as (a1:chararray, a2:chararray, a3:int, a4:chararray ,a5:int ,a6:int ,a7:float ,a8:int ,a9:int ,a10:int);
    grunt> f = foreach a generate a1,a8;
    grunt> g = order f by $1 DESC; // this will store the result in descending order
    grunt> h = LIMIT c 10; // this query will store top 10 values
    grunt> dump h;
    OUTPUT:
    (kHmvkRoEowc,122129)
    (EwTZ2xpQwpA,83514)
    (rZBA0SKmQy8,75004)
    (4DC4Rb9quKk,73257)
    (LU8DDYz68kM,58850)
    (Qit3ALTelOo,56767)
    (irp8CNj9qBI,43774)
    (3QL97xldoXc,37247)
    (LTxO_pgMqys,35352)
    (Md6rURKhZmA,34802)

    #3219 Reply

    MD SAJID AKHTAR
    Participant

    C. Find out the most viewed videos.

    SOLUTION::

    First we have to copy the input file from lfs to hdfs.Here the input file is youtubedata.txt.
    [cloudera@localhost ~]$ hadoop fs -copyFromLocal ‘/home/cloudera/Desktop/youtube/youtubedata.txt’ /user/cloudera/youtube_proj

    Then,we have to load the file in pig.
    grunt>a = load ‘/user/cloudera/youtube_proj’ as (a1:chararray ,a2:chararray ,a3:int ,a4:chararray ,a5:int ,a6:int ,a7:float ,a8:int ,a9:int ,a10:int );
    grunt>b = foreach a GENERATE a1,a6; //This query will fetch the only a1 and a6 column from the loaded file.
    grunt> c = order b by $1 DESC; // This query will store the result in Descending order.
    grunt> res = LIMIT c 1;// This query will store the only top value.
    grunt> dump res;

    OUTPUT
    (kHmvkRoEowc,122129) //MostViewed-Video id=kHmvkRoEowc,value=122129)

Viewing 4 posts - 1 through 4 (of 4 total)
Reply To: Youtube Project
Your information:




cf22

Your Name (required)

Your Email (required)

Subject

Phone No

Your Message

Cart

  • No products in the cart.