Hadoop flume tutorial

Hadoop flume tutorial, Welcome to the world of Hadoop flume Tutorials. In these Tutorials, one can explore how to fetch Flume Data from Twitter. Learn More advanced Tutorials on flume configuration in Hadoop from India’s Leading Hadoop Training institute which Provides Advanced Hadoop Course for those tech enthusiasts who wanted to explore the technology from scratch to advanced level like a Pro. We Prwatech, the Pioneers of Hadoop Training Offering advanced Certification course and Hadoop flume setup to those who are keen to explore the technology under the World-class Training Environment.

Fetching Flume Data from Twitter

Ubuntu v12 (or above)
Apache flume 1.3.1 bin.tar
Flume source 1.0. SNAPSHOT

Twitter data analysis using flume

Make a new directory in /usr/lib for flume $cd /usr/lib/ $mkdir myflume moving the apache-flume 1.3.1 bin.tar to /usr/lib/myflume $sudo mv /home/cloudera/Desktop/apache flume 1.3.1 bin.tar /usr/lib/myflume Untar the file. $sudo tar -zxvf apache flume 1.3.1 bin.tar Now we will have two files in /usr/lib/myflume apache flume 1.3.1 bin.tar.gz apache flume 1.3.1 bin This apache "flume 1.3.1 bin" will have many directories one among them will be lib . move the "flume-source -1.0.SNAPSHOT.jar to this lib. $ sudo mv /home/cloudera/Desktop/flume source 1.0. SNAPSHOT /usr/lib/myflume/apache-flume 1.3.1 bin /lib/ Go to the conf directory $ cd ../conf/ Create a copy of flume-env.sh.template as flume-env.sh in the same /conf/ dir. as : $ cp /usr/lib/myflume/apache flume 1.3.1 bin /conf/flume-env.sh.template/usr/lib/myflume/apache flume 1.3.1 bin /conf/flume-env.sh Hence it will contain : flume-env.sh.template 2 flume-env.sh configuring the flume-env.sh as $ sudo gedit flume-env.sh JAVA_HOME=/usr/lib/jvm/java-6-sun FLUME_CLASSPATH="/usr/lib/myflume/apache flume 1.3.1 bin/lib/flume-source -1.0.SNAPSHOT.jar" CREATING API CREDENTIALS: app twitter --> twitter application management : sign in : username: password : CREATE NEW API : Application Details: * Name: * Description: * Website : * Callback URL: not require *Finally click on “yes I agree” KEY AND ACCESS TOKENS : #NOTE: Get the following information and fill it in the flume.txt file.

ConsemerKey
ConsumerSecret:
AccessTokens:
AccessTokenSecret:

Now move the flume.txt file to the Cloudera create a new file in the /conf/ directory. $ cd /usr/lib/myflume/apache flume 1.3.1 bin/conf/ $sudo gedit flume.conf

copy the content of the flume.txt in this file and save it Go to the bin directory and fire the final command:

$cd /usr/lib/myflume/apache flume 1.3.1 bin/bin/
$./flume-ng agent -n TwitterAgent -c conf -f /usr/lib/myflume/apache /conf/flume.conf

Now we use our virtual machine web browser to see our records collected by user from Twitter. Goto NameNode Status > user > flume > tweets As you can see all the data collected from twitter is in json format which is needed to be converted in csv so that user can understand the data collected. For this we can use online json to csv converter to convert the following data.