Flume With Twitter

Flume With Twitter

Flume – What is it ?

  • A data collection service for Hadoop
  • For distributed systems
  • Open source
  • Scaleable
  • Reliable
  • Manageable
  • Fault tolerant

Flume – How does it work ?

  • Flumes uses agents which have
  • A source
  • Listen for events
  • Write events to channel
  • A channel
  • Queue event data as transactions
  • A sink
  • Write event data to target i.e. HDFS
  • Remove event from queue

Download Flume in clouder

Type this Command :

Wget http://apache.mirrors.hoobly.com/flume/1.4.0/apache-flume-1.4.0-bin.tar.gz

Check whether Flume tar Present or Not

Command : ls

Create flume-ng directory for save the flume tar file

Using this command:

$ sudo mkdir /usr/lib/flume-ng

Copy flume tar file to flume-ng directory with help of this command

$sudo cp -r apache-flume-1.4.0-bin.tar.gz /usr/lib/flume-ng/

Check Whether Flume is copied or Not

$ls /usr/lib/flume-ng/

Change the directory.
Extract the Flume File
Using this Command

$cd /usr/lib/flume-ng/

/usr/lib/flume-ng$ sudo tar -xzvf /usr/lib/flume-ng/apache-flume-1.4.0-bin.tar.gz

Check Whether files are Extracted or Not

Command: /usr/lib/flume-ng$ ls

Move flume-sources-1.0-SNAPSHOT.jar to cloudera directory using FileZilla

Use ifconfig..

Move the file from cloudera directory to lib sirectory of apache -flume:


sudo mv /home/cloudera/flume-sources-1.0-SNAPSHOT.jar /usr/lib/flume-ng/apache-flume-1.4.0-bin/bin/

Check Whether the flume SNAPSHOT.jar Moved to lib folder of flume-ng


$ ls /usr/lib/flume-ng/apache-flume-1.4.0-bin/bin/

Create flume-env.sh in conf directory of apache flume:

To Open flume-env.sh:

To See the Edit flume-env.sh according to below snapshot:

To Set the Java Path & the Snapshot.jar Path
FLUME_CLASSPATH=”/usr/lib/flumetwit/apache flume 1.3.1 bin/lib/flume-source -1.0.SNAPSHOT.jar”

To open the conf file in flume by using gedit:

Log in your own Twitter account by this url:

Open your Twitter Apps:

To Create New application and enter all details in application:

Here application settings are used to set the application:

Here we can create our Access Tokens:

Here Our Access Token can be used to make the API Requests of our own accounts:

Here Our Access Token can be used to make the API Requests of our own accounts:

Here we can access the required tokens:

Edit flume.conf:

Change Directory to the bin folder of apache flume:

Start Fetching the data from twitter:

Open the Browser and click on NameNode status and then click on Browse the filesystem:

Click on emptax.txt:

Click on tweets:

Click on Flume data file:

Finally we got the data has be downloaded from the Twitter:

Category: FLUME