PIG Script Using UDF Functions

PIG Script Using UDF Functions

There are multiple predefined functions in Pig. But Pig also provides support for user defined functions (UDFs) as a way to specify custom processing. This blog will help you understand how to use Java to create UDF and implement it in Pig Script
Step 1: Writing Java UDF
Create a new project in Eclipse (example : PrwatechUDF).
Now create a package under this project (example: com.pudfs.pgms).
Next step is to create the UDF class itself UPPER.java.
Download Pig library pig-0.8.0-cdh3u0-core.jar (or the required version of pig jar) from internet.
Right click on JRE System Library, select Build Path-> Configure Build Path
and then select‘Add External Jars’ and select the path of your downloaded ‘pig-0.8.0-cdh3u0-core.jar’ and then click on ‘OK’.
Your version of pig jar file will be now added to the library.

Create the jar file of this UDF. Right click on project ‘PrwatechUDF’, select Export->JAR File and click on Next. Click on Browse button to give the path where you want to save the jar file and click on Finish button. The jar file name is PUDF.jar. Now transfer the jar file that is PUDF.jar in Cloudera using Filezilla or Winscp.

Step 2: Using UDF in Pig Script
Now login to cloudera vm and create a sample data file using :
gedit data.txt (The data contains fname,lname,phoneno,city,profession)

The problem statement is to change all the names and profession in data to UPPER case. To process this data using Pig, this file should be present in Apache Hadoop HDFS since we are working in HDFS mode of Pig. Use the following command:
Hadoop dfs –copyFromLocal data.txt/ /

Step 3: Write and execute the pig script
Write the below pig script into pigsample.pig file using :
gedit pigsample.pig

Now execute the pigsample.pig using: pig pigsample.pig

The pig script has executed successfully, and you can see that the fname and profession in data has changed its case to UPPER case.

Category: PIG