Hadoop 2.0 Installation on Ubuntu-part-2
Hadoop 2.0 Installation on Ubuntu-part-2
Source the .bashrc file to set the hadoop environment variables without having to invoke a new shell:
After that Type :$ source .bashrc
Setup the Hadoop Cluster
This section describes the detail steps needed for setting up the Hadoop Cluster and configuring the core Hadoop configuration files.
Configure JAVA_HOME
Configure JAVA_HOME in βhadoop-env.shβ. This file specifies environment variables that affect the JDK used by Apache Hadoop 2.2.0 daemons started by the Hadoop start-up scripts:
$cd $HADOOP_CONF_DIR
$pwd
Now you should be in hadoop-2.2.0/etc/hadoop/ directory.
$gedit hadoop-env.sh
Update the JAVA_HOME to:
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
JAVA HOME SETUP
Create NameNode and DataNode directory
Create DataNode and NameNode directories to store HDFS data.
$mkdir -p $HADOOP_HOME/hadoop2_data/hdfs/namenode
$mkdir -p $HADOOP_HOME/hadoop2_data/hdfs/datanode
Configure the Default File system
The βcore-site.xmlβ file contains the configuration settings for Apache Hadoop Core such as I/O settings that are common to HDFS, YARN and MapReduce. Configure default files-system (Parameter: fs.default.name) used by clients inΒ core-site.xml
$gedit core-site.xml
ββββββββSet Hadoop environment Variables β Beginββββββββ-
name fs.default.name name
value hdfs://localhost:9000 value
ββββββββ-Set Hadoop environment Variables β Endβββββ-
CONFIGURE THE DEFAULT FILE SYSTEM

Where hostname and port are the machine and port on which Name Node daemon runs and listens. It also informs the Name Node as to which IP and port it should bind. The commonly used port is 9000 and you can also specify IP address rather than hostname.
Configure the HDFS
This file contains the cconfiguration settings for HDFS daemons; the Name Node and the data nodes.
Configure hdfs-site.xml and specify default block replication, and NameNode and DataNode directories for HDFS. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
$gedit hdfs-site.xml
ββββββSet Hadoop environment Variables β Beginββββββββ-
dfs.replication
1Β dfs.permissions
falseΒ dfs.namenode.name.dir
/home/user/hadoop-2.2.0/hadoop2_data/hdfs/namenodeΒ dfs.datanode.data.dir
/home/user/hadoop-2.2.0/hadoop2_data/hdfs/datanode
βββββββSet Hadoop environment Variables β Endβββββββββ

Configure YARN framework
This file contains the configuration settings for YARN; the NodeManager.
$gedit yarn-site.xml
ββββββββββ-Set Hadoop environment Variables β Beginββββββ
yarn.nodemanager.aux-services
mapreduce_shuffleΒ yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
ββββββββββ-Set Hadoop environment Variables β Endββββββ
$gedit yarn-site.xml

Configure MapReduce framework
This file contains the configuration settings for MapReduce. Configure mapred-site.xml and specify framework details.
$cp mapred-site.xml.template mapred-site.xml
$gedit mapred-site.xml
βββββ-Set Hadoop environment Variables β Beginββββββββ-
mapreduce.framework.name
yarn
ββββββββββ-Set Hadoop environment Variables β Endββββββ-
Configure MapReduce framework
This file contains the configuration settings for MapReduce. Configure mapred-site.xml and specify framework details.
$cp mapred-site.xml.template mapred-site.xml
$gedit mapred-site.xml
βββββ-Set Hadoop environment Variables β Beginββββββββ-
name mapreduce.framework.name name
value yarn value
ββββββββββ-Set Hadoop environment Variables β Endββββββ-


Edit /etc/hosts file
Give ifconfig in the terminal and note down the ip address. Then put this ip address in /etc/hosts file as mentioned in below snapshots, save the file and then close it.
$cd
$ifconfig

$sudo gedit /etc/hosts
The ip address in this file, localhost and ubuntu are separated by tab.

Creating ssh
$ssh-keygen -t rsa -P ββ

Moving the key to authorized key:
$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Moving the key to authorized key:
$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

ββββββββRestart the systemββββββββ
Start the DFS services
The first step in starting up your Hadoop installation is formatting the Hadoop file-system, which is implemented on top of the local file-systems of your cluster. This is required on the first time Hadoop installation. Do not format a running Hadoop file-system, this will cause all your data to be erased.
To format the file-system, run the command:
$cd
$hadoop namenode βformat

You are now all set to start the HDFS servicesΒ i.e. Name Node, Resource Manager, Node Manager and Data Nodes on your Apache Hadoop Cluster.
$cd hadoop-2.2.0/sbin/
$./hadoop-daemon.sh start namenode
$./hadoop-daemon.sh start datanode
START THE SERVICES
START THE SERVICES
Start the YARN daemons i.e. Resource Manager and Node Manager. Cross check the service start-up using JPS (Java Process Monitoring Tool).
$./yarn-daemon.sh start resourcemanager
$./yarn-daemon.sh start nodemanager
START THE YARN DAEMONS
Start the History server:
$./mr-jobhistory-daemon.sh start historyserver
