Blog

Hadoop_Basic Hbase Commands

  • OPEN TERMINAL AND GO TO HBASE SHELL :

cloudera@cloudera-vm:~$ hbase shell

  • CHECK WHAT TABLES EXISTS IN THE SYSTEM :

hbase(main):001:0> list
TABLE

  • CREATE TABLE :

hbase(main):002:0> create ‘batch’, ‘details’

  • ENTER DATA INTO THE TABLE :

hbase(main):003:0> put ‘batch’, ‘row1’, ‘details:name’, ‘Rhiddhiman’

hbase(main):004:0> put ‘batch’, ‘row2’, ‘details:name’, ‘Rohit’

hbase(main):005:0> put ‘batch’, ‘row3’, ‘details:name’, ‘Dipankar’

hbase(main):006:0> put ‘batch’, ‘row4’, ‘details:name’, ‘Kalyan’

  • CHECK DATA ENTERED IN THE TABLE :

hbase(main):007:0> scan ‘batch’

  • CHANGE VALUE OF A PARTICULAR COLUMN IN A ROW :

hbase(main):008:0> put ‘batch’, ‘row2’, ‘details:name’, ‘Jayanta’

  • CHECK DATA AFTER MODIFICATION

hbase(main):009:0> scan ‘batch’

  • CHANGE VALUE OF A PARTICULAR COLUMN IN A ROW :

hbase(main):010:0> put ‘batch’, ‘row3’, ‘details:name’, ‘Dhrubajyoti’

  • CHECK DATA AFTER MODIFICATION :

hbase(main):011:0> scan ‘batch’

  • CHECK VALUE THAT HAS BEEN CHANGED :

hbase(main):012:0> get ‘batch’, ‘row2′, {COLUMN=>’details:name’, VERSIONS=>2}

hbase(main):013:0> put ‘batch’, ‘row3’, ‘details:name’, ‘Banajit’

hbase(main):014:0> get ‘batch’, ‘row3′, {COLUMN=>’details:name’, VERSIONS=>3}

  • ENTER REMAINING DATA :

hbase(main):015:0> put ‘batch’, ‘row1’, ‘details:address’, ‘Marathahalli’

hbase(main):016:0> put ‘batch’, ‘row1’, ‘details:age’, ’27’

hbase(main):017:0> put ‘batch’, ‘row1’, ‘details:course’, ‘Hadoop’

hbase(main):018:0> put ‘batch’, ‘row2’, ‘details:address’, ‘BTM’

hbase(main):019:0> put ‘batch’, ‘row3’, ‘details:address’, ‘Whitefield’

hbase(main):020:0> put ‘batch’, ‘row4’, ‘details:address’, ‘Electronics City’

  • CHECK DATA :

hbase(main):021:0> scan ‘batch’

  • DESCRIPTION OF TABLE :

hbase(main):031:0> describe ‘batch’

Hadoop-Multinode Cluster

#Prerequisites :

  1. Vmware
  2. Ubuntu image 12.04
  3. Hadoop 1.x

 

ON MASTER NODE
● Command: sudo gedit masters (to create masters)

On masters node, master file contains the ip address of master only

● Command: sudo gedit masters

On master node, slaves file contains slaves ip address
● Command: sudo gedit slaves

ON SLAVE NODE :

On Slave nodes, master file remain blank (For all the salve nodes)
● Command: sudo gedit masters

On slave nodes, slave file contains slaves ip address
● Command: sudo gedit slaves

  • Now edit the hosts in both master and slave

Hadoop-Basic HDFS Commands

Open a terminal window to the current working directory.

==> /home/training

#1. Print the Hadoop version : hadoop version

#2. List the contents of the root directory in HDFS : hadoop fs -ls /

#3. Report the amount of space used and available on currently mounted filesystem : hadoop fs -df hdfs:/

#4. Count the number of directories,files and bytes under the paths that match the specified file   pattern: hadoop fs -count hdfs:/

#5. Run a DFS filesystem checking utility : hadoop fsck – /

#6. Run a cluster balancing utility : hadoop balancer

#7. Create a new directory named “hadoop” below the /user/training directory in HDFS. Since you’re currently logged in with the “training” user ID,/user/training is your home directory in HDFS : hadoop fs -mkdir /user/training/hadoop

#8. Add a sample text file from the local directory
named “data” to the new directory you created in HDFS during the previous step : hadoop fs -put data/sample.txt/user/training/hadoop

#9. List the contents of this new directory in HDFS : hadoop fs -ls /user/training/hadoop

#10. Add the entire local directory called “retail” to the /user/training directory in HDFS : hadoop fs -put data/retail /user/training/hadoop

#11. Since /user/training is your home directory in HDFS, any command that does not have an absolute path is interpreted as relative to that directory. The next command will therefore list your home directory, and should show the items you’ve just added there : hadoop fs -ls

#12. See how much space this directory occupies in HDFS : hadoop fs -du -s -h /prwatech

#13. Delete a file ‘customers’ from the “retail” directory : hadoop fs -rm hadoop/retail/customers

#14. Ensure this file is no longer in HDFS : hadoop fs -ls hadoop/retail/customers

#15. Delete all files from the “retail” directory using a wildcard : hadoop fs -rm hadoop/retail/*

#16. To empty the trash : hadoop fs -expunge

#17. Finally, remove the entire retail directory and all
of its contents in HDFS : hadoop fs -rm -r hadoop/retail

#18. List the hadoop directory again : hadoop fs -ls hadoop

#19. Add the purchases.txt file from the local directory named “/home/training/” to the hadoop directory you created in HDFS : hadoop fs -copyFromLocal /home/training/purchases.txt hadoop/

#20. To view the contents of your text file purchases.txt which is present in your hadoop directory : hadoop fs -cat hadoop/purchases.txt

#21. Add the purchases.txt file from “hadoop” directory which is present in HDFS directory to the directory “data” which is present in your local directory : hadoop fs -copyToLocal hadoop/purchases.txt /home/training/data

#22. cp is used to copy files between directories present in HDFS : hadoop fs -cp /user/training/*.txt /user/training/hadoop

#23. ‘-get’ command can be used alternaively to ‘-copyToLocal’ command : hadoop fs -get hadoop/sample.txt /home/training/

#24. Display last kilobyte of the file “purchases.txt” to stdout : hadoop fs -tail hadoop/purchases.txt

#25. Default file permissions are 666 in HDFS
Use ‘-chmod’ command to change permissions of a file : hadoop fs -ls hadoop/purchases.txt
sudo -u hdfs hadoop fs -chmod 600 hadoop/purchases.txt

#26. Default names of owner and group are training,training  Use ‘-chown’ to change owner name and group name simultaneously :
hadoop fs -ls hadoop/purchases.txt
sudo -u hdfs hadoop fs -chown root:root hadoop/purchases.txt

#27. Default name of group is training Use ‘-chgrp’ command to change group name : hadoop fs -ls hadoop/purchases.txt sudo -u hdfs hadoop fs -chgrp training hadoop/purchases.txt

#28. Move a directory from one location to other :
hadoop fs -mv hadoop apache_hadoop

#29. Default replication factor to a file is 3. Use ‘-setrep’ command to change replication factor of a  file : hadoop fs -setrep -w 2 apache_hadoop/sample.txt

#30. Copy a directory from one node in the cluster to another Use ‘-distcp’ command to copy,-overwrite option to overwrite in an existing files -update command to synchronize both directories : hadoop fs -distcp hdfs://namenodeA/apache_hadoop hdfs://namenodeB/hadoop

#31. Command to make the name node leave safe  mode : hadoop fs -expunge
sudo -u hdfs hdfs dfsadmin -safemode leave

#32. List all the hadoop file system shell commands : hadoop fs

#33. Last but not least, always ask for help! : hadoop fs -help

Hadoop-GCP

Perquisites Of GCP :

  1. GCP Account
  • First Create a GCP Account

It’s Free and Google will give you 300$ Credits which is 21 thousand approx.

  • Now Open Google Cloud Platform

  • Open Console

  • Now you can see GCP console from where you can manage your Cluster Go to Dataproc and Click on Cluster

  • Now Create Cluster

  • Name Your Cluster and Select Region

  • Select Cluster Mode

  • Configure Master and Slave Nodes

  • Master Node > Machine Type 4 CPUs

> Primary Disk Size 32GB

  • Worker Nodes > Machine Type 1

CPU > Primary Disk Size 10GB

 

  • Click on Advance if you want to add any bucket or select an iso

  • Click on Create

  • It will take a few seconds.Then you can see the status of cluster as active and Running

 

  • Click on Cluster > VM Instances

  • Click on SSH

  • Using Command : cat /etc/os-release

You can see all the details of the Operating System

  • Use command : sudo jps

To check for all the running nodes in cluster

Hadoop-Basic Linux Commands

File Commands :
ls : directory listing
ls-al : formatted listing with hidden files
cd dir : change directory to dir
cd : change to home
pwd : show current directory
mkdirdir : create a directory dir
rmfile : delete file
rm-r dir : delete directory dir
rm-f file : force remove file
rm-rfdir : force remove directory dir *
cp file1 file2 : copy file1 to file2
cp -r dir1 dir2 : copy dir1 to dir2; create dir2 if it is      not present.
mv file1 file2  : rename or move file1 to file2
if file2 is an existing directory, moves file1 into
directory file2
ln-s file link  : create symbolic link linkto file
touch file : create or update file
cat > file : places standard input into file
more file : output the contents of file
head file : output the first 10 lines of file
tail file : output the last 10 lines of file
tail -f file : output the contents of file as it
grows, starting with the last 10 lines

 

Process Management Commands:
ps : display your currently active processes
top : display all running processes
kill pid : kill process id pid
killallproc : kill all processes named proc *
bg : lists stopped or background jobs; resume a
stopped job in the background
fg : brings the most recent job to foreground
fgn : brings job n to the foreground

 

SSH Commands :
sshuser@host : connect to host as user
ssh-p port user@host : connect to host on port
port as user
ssh-copy-id user@host : add your key to host for
user to enable a keyed or passwordlesslogin

 

Searching Commands :
greppattern files : search for pattern in files
grep-r pattern dir : search recursively for
pattern in dir
command | greppattern : search for pattern in the
output of command
locate file : find all instances of file

 

System Info Commands :
date : show the current date and time
cal : show this month’s calendar
uptime : show current uptime
w : display who is online
whoami : who you are logged in as
finger user : display information about user
uname-a : show kernel information
cat /proc/cpuinfo : cpuinformation
cat /proc/meminfo : memory information
man command : show the manual for command
df : show disk usage
du : show directory space usage
free : show memory and swap usage
whereisapp : show possible locations of app
which app : show which app will be run by default

 

Compression Commands :
tar cffile.tar files : create a tar named
file.tar containing files
tar xffile.tar : extract the files from file.tar
tar czffile.tar.gzfiles : create a tar with
Gzipcompression
tar xzffile.tar.gz : extract a tar using Gzip
tar cjffile.tar.bz2 : create a tar with Bzip2
compression
tar xjffile.tar.bz2 : extract a tar using Bzip2
gzipfile : compresses file and renames it to file.gz
gzip-d file.gz : decompresses file.gz back to file

 

Network Commands :
ping host : ping host and output results
whoisdomain : get whoisinformation for domain
dig domain : get DNS information for domain
dig -x host : reverse lookup host
wgetfile : download file
wget-c file : continue a stopped download

 

Installation Commands :
Install from source:
./configure
make
make install
dpkg-ipkg.deb : install a package (Debian)
rpm -Uvhpkg.rpm : install a package (RPM)

 

Shortcuts Commands :
Ctrl+C : halts the current command
Ctrl+Z : stops the current command, resume with
fgin the foreground or bgin the background
Ctrl+D : log out of current session, similar to exit
Ctrl+W : erases one word in the current line
Ctrl+U : erases the whole line
Ctrl+R : type to bring up a recent command
!!  : repeats the last command
exit : log out of current session

 

Hadoop-AWS Configuration

  • Prerequisites :
  1. AWS account
  2. Putty and Puttygen (Lastest Version)
  • Go to the given url :

  • Click on Create a Free Account :

 

  • If you don’t have account then fill or details else login with your existing account

  • After Sign in Go to click on EC2

  • Click on lunch Instance

  • At left side click on free tier only button

  • Select Operating System

  • Select General purpose t2.micro

  • At Right bottom click on next : Configure instance details

  • Configure the instance

  • After Configure done click on bottom right on Next: Add Storage

  • Make Size as 30 GiB

  • Click on Next: Add Tags which is present on Right bottom

  • Add tags according to your requirement

  • Click to Next : Configure Security Group

  • Add Rule and Change Source

  • Click on Launch

  • Change Security and Download the .pem file

  • Click On Launch Instances

  • If you are using window machine go to browser and download putty.exe and puttygen.exe file

 

  • Open Puttygen and click on load and open your .pem file in puttygen
  • Click on generate private key and save you private key in your local machine

  • Go to Putty and in Left Side Click on SSH and click on auth and browse for the .pem key generated by instance

  • Go to Session and Paste your private dns in host option.

  • Saved Session and Double click on that You will get Console

Hadoop-Singlenode Using Hadoop 2.x

  • Prerequisites:
  1. Hadoop 2.7.0
  2. Java-8-oracle
  3. Ubuntu 12.0 or above

 

  • Download Hadoop 2.7.0.tar.gz

  • After downloading untar the hadoop.

  • Get into hadoop directory

  • Go to hadoop and configure the fill

  • Edit core-site.xml

  • Edit hdfs-site.xml

  • Edit mapred-site.xml

  • Edit yarn-site.xml

  • Edit hadoop-env.sh

  • Set JAVA and HADOOP HOME in .bashrc file

  • Now format the namenode

  • Start all daemons

  • Check your deamons

 

Hadoop-Singlenode Using Hadoop 1.x

  • Prerequisites :
  1. Hadoop 1.x
  2. Java V6
  3. Ubuntu 12.0 or above

Go to the below link and download the image of ubuntu 12.04

–> http://prwatech.in/online-softwares/

  •  Open VMware Player and click open virtual machine and select path where you have extracted image of ubuntu.After that select the .vmx file and click ok.

  • Now you can see the below screen in VMware Player

  • Double click on Ubuntu present in VMware Player. You will get a screen of the below image.

  • Update the repository:

  • Once the update is complete:

Command: sudo apt-get install openjdk-6-jdk

  • After java as been installed .To check weather java is installed on your system or Not give the below command:

Command: java –version

  • Install openssh-server:

Command: sudo apt-get install openssh-server

  • Download and extract hadoop:

–>http://prwatech.in/online-softwares/

Command: tar –xvf hadoop-1.2.0.tar.gz

  • Get into hadoop-1.2.0 directory

  • Edit core-site.xml:

Command: sudo gedit core-site.xml

Write under configuration:

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:8020</value>

</property>

  • Edit mapred-site.xml:

Command: sudo gedit mapred-site.xml

Write under configuration:

<property>

<name>mapred.job.tracker</name>

<value>localhost:8021</value>

</property>

  • Edit hdfs-site.xml:

Command: sudo gedit hdfs-site.xml

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

  • Add java_home in hadoop-env.sh file:

Command: sudo gedit hadoop-1.2.0/conf/hadoop-env.sh

  • Type:export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386

Uncomment the below shownexport and add the below the path to your JAVA_HOME:

  • Get your IP Address:

Command: ifconfig

  • Create a ssh key:

Command: ssh-keygen -t rsa

  • Moving the key to authorized key:

Command:cat $HOME /.ssh /id_rsa.pub>>$HOME/ .ssh/ authorized_key

  • Get into your bin directory.

  • Format the name node :

Command: sh hadoop namenode -format

  • Start the nodes .

Command: sh start-all.sh

  • To check Hadoop started correctly :

Command: jps

 

Quick Support

image image