Hadoop 2.0 Installation on Ubuntu-part-2
Source the .bashrc file to set the hadoop environment variables without having to invoke a new shell:
After that Type :$ source .bashrc
Setup the Hadoop Cluster
This section describes the detail steps needed for setting up the Hadoop Cluster and configuring the core Hadoop configuration files.
Configure JAVA_HOME
Configure JAVA_HOME in ‘hadoop-env.sh’. This file specifies environment variables that affect the JDK used by Apache Hadoop 2.2.0 daemons started by the Hadoop start-up scripts:
$cd $HADOOP_CONF_DIR
$pwd
Now you should be in hadoop-2.2.0/etc/hadoop/ directory.
$gedit hadoop-env.sh
Update the JAVA_HOME to:
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
JAVA HOME SETUP
Create NameNode and DataNode directory
Create DataNode and NameNode directories to store HDFS data.
$mkdir -p $HADOOP_HOME/hadoop2_data/hdfs/namenode
$mkdir -p $HADOOP_HOME/hadoop2_data/hdfs/datanode
Configure the Default File system
The ’core-site.xml’ file contains the configuration settings for Apache Hadoop Core such as I/O settings that are common to HDFS, YARN and MapReduce. Configure default files-system (Parameter: fs.default.name) used by clients in core-site.xml
$gedit core-site.xml
———————–Set Hadoop environment Variables – Begin————————-
name fs.default.name name
value hdfs://localhost:9000 value
————————-Set Hadoop environment Variables – End—————-
CONFIGURE THE DEFAULT FILE SYSTEM
Where hostname and port are the machine and port on which Name Node daemon runs and listens. It also informs the Name Node as to which IP and port it should bind. The commonly used port is 9000 and you can also specify IP address rather than hostname.
Configure the HDFS
This file contains the cconfiguration settings for HDFS daemons; the Name Node and the data nodes.
Configure hdfs-site.xml and specify default block replication, and NameNode and DataNode directories for HDFS. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
$gedit hdfs-site.xml
—————–Set Hadoop environment Variables – Begin————————-
dfs.replication
1 dfs.permissions
false dfs.namenode.name.dir
/home/user/hadoop-2.2.0/hadoop2_data/hdfs/namenode dfs.datanode.data.dir
/home/user/hadoop-2.2.0/hadoop2_data/hdfs/datanode
——————–Set Hadoop environment Variables – End—————————
Configure YARN framework
This file contains the configuration settings for YARN; the NodeManager.
$gedit yarn-site.xml
——————————-Set Hadoop environment Variables – Begin—————–
yarn.nodemanager.aux-services
mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
——————————-Set Hadoop environment Variables – End——————
$gedit yarn-site.xml
Configure MapReduce framework
This file contains the configuration settings for MapReduce. Configure mapred-site.xml and specify framework details.
$cp mapred-site.xml.template mapred-site.xml
$gedit mapred-site.xml
—————-Set Hadoop environment Variables – Begin————————-
mapreduce.framework.name
yarn
——————————-Set Hadoop environment Variables – End——————-
Configure MapReduce framework
This file contains the configuration settings for MapReduce. Configure mapred-site.xml and specify framework details.
$cp mapred-site.xml.template mapred-site.xml
$gedit mapred-site.xml
—————-Set Hadoop environment Variables – Begin————————-
name mapreduce.framework.name name
value yarn value
——————————-Set Hadoop environment Variables – End——————-
Edit /etc/hosts file
Give ifconfig in the terminal and note down the ip address. Then put this ip address in /etc/hosts file as mentioned in below snapshots, save the file and then close it.
$cd
$ifconfig
$sudo gedit /etc/hosts
The ip address in this file, localhost and ubuntu are separated by tab.
Creating ssh
$ssh-keygen -t rsa -P “”
Moving the key to authorized key:
$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Moving the key to authorized key:
$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
———————–Restart the system————————
Start the DFS services
The first step in starting up your Hadoop installation is formatting the Hadoop file-system, which is implemented on top of the local file-systems of your cluster. This is required on the first time Hadoop installation. Do not format a running Hadoop file-system, this will cause all your data to be erased.
To format the file-system, run the command:
$cd
$hadoop namenode –format
You are now all set to start the HDFS services i.e. Name Node, Resource Manager, Node Manager and Data Nodes on your Apache Hadoop Cluster.
$cd hadoop-2.2.0/sbin/
$./hadoop-daemon.sh start namenode
$./hadoop-daemon.sh start datanode
START THE SERVICES
START THE SERVICES
Start the YARN daemons i.e. Resource Manager and Node Manager. Cross check the service start-up using JPS (Java Process Monitoring Tool).
$./yarn-daemon.sh start resourcemanager
$./yarn-daemon.sh start nodemanager
START THE YARN DAEMONS
Start the History server:
$./mr-jobhistory-daemon.sh start historyserver