Flume With Twitter
Flume – What is it ?
- A data collection service for Hadoop
- For distributed systems
- Open source
- Scaleable
- Reliable
- Manageable
- Fault tolerant
Flume – How does it work ?
- Flumes uses agents which have
- A source
- Listen for events
- Write events to channel
- A channel
- Queue event data as transactions
- A sink
- Write event data to target i.e. HDFS
- Remove event from queue
Download Flume in clouder
Type this Command :
Wget http://apache.mirrors.hoobly.com/flume/1.4.0/apache-flume-1.4.0-bin.tar.gz
Check whether Flume tar Present or Not
Command : ls
Create flume-ng directory for save the flume tar file
Using this command:
$ sudo mkdir /usr/lib/flume-ng
Copy flume tar file to flume-ng directory with help of this command
$sudo cp -r apache-flume-1.4.0-bin.tar.gz /usr/lib/flume-ng/
Check Whether Flume is copied or Not
command
$ls /usr/lib/flume-ng/
Change the directory.
Extract the Flume File
Using this Command
$cd /usr/lib/flume-ng/
/usr/lib/flume-ng$ sudo tar -xzvf /usr/lib/flume-ng/apache-flume-1.4.0-bin.tar.gz
Check Whether files are Extracted or Not
Command: /usr/lib/flume-ng$ ls
Move flume-sources-1.0-SNAPSHOT.jar to cloudera directory using FileZilla
Use ifconfig..
Move the file from cloudera directory to lib sirectory of apache -flume:
Commands:
sudo mv /home/cloudera/flume-sources-1.0-SNAPSHOT.jar /usr/lib/flume-ng/apache-flume-1.4.0-bin/bin/
Check Whether the flume SNAPSHOT.jar Moved to lib folder of flume-ng
Commands
$ ls /usr/lib/flume-ng/apache-flume-1.4.0-bin/bin/
Create flume-env.sh in conf directory of apache flume:
To Open flume-env.sh:
To See the Edit flume-env.sh according to below snapshot:
To Set the Java Path & the Snapshot.jar Path
JAVA_HOME=/usr/lib/jvm/java-6-sun
FLUME_CLASSPATH=”/usr/lib/flumetwit/apache flume 1.3.1 bin/lib/flume-source -1.0.SNAPSHOT.jar”
To open the conf file in flume by using gedit:
Log in your own Twitter account by this url:
https://dev.twitter.com/user/login?destion=home
Open your Twitter Apps:
To Create New application and enter all details in application:
Here application settings are used to set the application:
Here we can create our Access Tokens:
Here Our Access Token can be used to make the API Requests of our own accounts:
Here Our Access Token can be used to make the API Requests of our own accounts:
Here we can access the required tokens:
Edit flume.conf:
Change Directory to the bin folder of apache flume:
Start Fetching the data from twitter:
Open the Browser and click on NameNode status and then click on Browse the filesystem:
Click on emptax.txt:
Click on tweets:
Click on Flume data file:
Finally we got the data has be downloaded from the Twitter: