Hadoop 2.0 Installation on Ubuntu-part-2

Hadoop 2.0 Installation on Ubuntu-part-2

Source the .bashrc file to set the hadoop environment variables without having to invoke a new shell:
After that Type :$ source .bashrc

Setup the Hadoop Cluster

This section describes the detail steps needed for setting up the Hadoop Cluster and configuring the core Hadoop configuration files.
Configure JAVA_HOME
Configure JAVA_HOME in ‘hadoop-env.sh’. This file specifies environment variables that affect the JDK used by Apache Hadoop 2.2.0 daemons started by the Hadoop start-up scripts:
$cd $HADOOP_CONF_DIR
$pwd
Now you should be in hadoop-2.2.0/etc/hadoop/ directory.
$gedit hadoop-env.sh
Update the JAVA_HOME to:
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386

JAVA HOME SETUP

Create NameNode and DataNode directory
Create DataNode and NameNode directories to store HDFS data.
$mkdir -p $HADOOP_HOME/hadoop2_data/hdfs/namenode
$mkdir -p $HADOOP_HOME/hadoop2_data/hdfs/datanode

Configure the Default File system
The ’core-site.xml’ file contains the configuration settings for Apache Hadoop Core such as I/O settings that are common to HDFS, YARN and MapReduce. Configure default files-system (Parameter: fs.default.name) used by clients in core-site.xml
$gedit core-site.xml
———————–Set Hadoop environment Variables – Begin————————-
name fs.default.name name
value hdfs://localhost:9000 value
————————-Set Hadoop environment Variables – End—————-

CONFIGURE THE DEFAULT FILE SYSTEM

Where hostname and port are the machine and port on which Name Node daemon runs and listens. It also informs the Name Node as to which IP and port it should bind. The commonly used port is 9000 and you can also specify IP address rather than hostname.

Configure the HDFS

This file contains the cconfiguration settings for HDFS daemons; the Name Node and the data nodes.
Configure hdfs-site.xml and specify default block replication, and NameNode and DataNode directories for HDFS. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.

$gedit hdfs-site.xml
—————–Set Hadoop environment Variables – Begin————————-
dfs.replication
1 dfs.permissions
false dfs.namenode.name.dir
/home/user/hadoop-2.2.0/hadoop2_data/hdfs/namenode dfs.datanode.data.dir
/home/user/hadoop-2.2.0/hadoop2_data/hdfs/datanode
——————–Set Hadoop environment Variables – End—————————

Configure YARN framework

This file contains the configuration settings for YARN; the NodeManager.
$gedit yarn-site.xml
——————————-Set Hadoop environment Variables – Begin—————–
yarn.nodemanager.aux-services
mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
——————————-Set Hadoop environment Variables – End——————

$gedit yarn-site.xml

Configure MapReduce framework

This file contains the configuration settings for MapReduce. Configure mapred-site.xml and specify framework details.
$cp mapred-site.xml.template mapred-site.xml

$gedit mapred-site.xml
—————-Set Hadoop environment Variables – Begin————————-
mapreduce.framework.name
yarn
——————————-Set Hadoop environment Variables – End——————-

Configure MapReduce framework

This file contains the configuration settings for MapReduce. Configure mapred-site.xml and specify framework details.
$cp mapred-site.xml.template mapred-site.xml

$gedit mapred-site.xml
—————-Set Hadoop environment Variables – Begin————————-
name mapreduce.framework.name name
value yarn value
——————————-Set Hadoop environment Variables – End——————-

Edit /etc/hosts file

Give ifconfig in the terminal and note down the ip address. Then put this ip address in /etc/hosts file as mentioned in below snapshots, save the file and then close it.
$cd
$ifconfig

$sudo gedit /etc/hosts

The ip address in this file, localhost and ubuntu are separated by tab.

Creating ssh
$ssh-keygen -t rsa -P “”

Moving the key to authorized key:

$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Moving the key to authorized key:

$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

———————–Restart the system————————

Start the DFS services

The first step in starting up your Hadoop installation is formatting the Hadoop file-system, which is implemented on top of the local file-systems of your cluster. This is required on the first time Hadoop installation. Do not format a running Hadoop file-system, this will cause all your data to be erased.
To format the file-system, run the command:
$cd
$hadoop namenode –format

You are now all set to start the HDFS services i.e. Name Node, Resource Manager, Node Manager and Data Nodes on your Apache Hadoop Cluster.
$cd hadoop-2.2.0/sbin/
$./hadoop-daemon.sh start namenode
$./hadoop-daemon.sh start datanode
START THE SERVICES

START THE SERVICES
Start the YARN daemons i.e. Resource Manager and Node Manager. Cross check the service start-up using JPS (Java Process Monitoring Tool).
$./yarn-daemon.sh start resourcemanager
$./yarn-daemon.sh start nodemanager
START THE YARN DAEMONS

Start the History server:
$./mr-jobhistory-daemon.sh start historyserver

Category: INSTALLATION