How to Run a Pig Script in HDFS Mode

How to Run a Pig Script in HDFS Mode

Pig scripts are used to execute a set of Apache Pig commands collectively. This helps in reducing the time and effort invested in writing and executing each command manually while doing the Pig programming.
Apache Pig script is a step by step guide to help you create your first Apache Pig script.

An Apache Pig script works in two modes:
Local Mode: In ‘local mode’, you can execute the pig script in local file system. In this case you don’t need to store the data in Hadoop HDFS file system, instead you can work with the data stored in local file system itself.
HDFS Mode: In ‘HDFS mode’, the data needs to be stored in HDFS file system and you can process the data with the help of pig script.
Pig Script in HDFS Mode:
Step1: Writing a script
Open an editor (e.g. gedit) in your Cloudera Demo VM environment:
Command: gedit pigsample.pig

Step 2: Create a Input File with some data. Here I created file name data.txt with some content

Step 3:
Load : – Here I’m using load command for load the data or file.
A = LOAD ‘/data.txt’ using Pig Storage (‘,’) as (fname: chararray, lname:chararray, city:chararray, profession:chararray);
Pig Storage (‘,’):- Helps to store an input file. My input file is data.txt .my data.txt I used delimiter as.
FOREACH …Generate: is used for a select a particular columns.
B = FOREACH A generate fname, mobile no, profession;
DUMP B:
Is used for a display for show the output.

Step 4: copying file from local file system to hadoop(hdfs).

Step 5: Check whether the file is copied or not, the file is copied.

Step 5: Run the pig Script using this command .

Category: PIG