Hadoop 2.0 Installation on Ubuntu-part-1

Hadoop 2.0 Installation on Ubuntu-part-1


Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu
Creating an Ubuntu VM Player instance
The first step is to download an Ubuntu image and create an Ubuntu VMPlayer instance.
Download the VMware image
Access the following link and download the 12.0.4 Ubuntu image:

Online Support

Open the image file
Extract the Ubuntu VM image and Open it in VMware Player.
Click open virtual machine and select path where you have extracted the image


Select the ‘.vmx’ file and click ‘ok’.

Click on Play virtual machine.
You will get the home screen with the following image.

Ubuntu home screen

The user details for the Virtual instance is:
Username: user
Password: password

Open the terminal to access the file system.
Open a terminal

Update the OS packages and their dependencies

The first task is to run ‘apt-get update’ to download the package lists from the repositories and “update” them to get information on the newest versions of packages and their dependencies.
$sudo apt-get update

Install the Java and openssh server for Hadoop 2.2.0
Use apt-get to install the JDK 7.

$sudo apt-get install openjdk-6-jdk
$sudo apt-get install openssh-server


Download the Apache Hadoop 2.2.0 binaries
Download the Hadoop package
Download the binaries to your home directory. Use the default user ‘user’ for the installation.
In Live production instances a dedicated Hadoop user account for running Hadoop is used. Though, it’s not mandatory to use a dedicated Hadoop user account but is recommended because this helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (separating for security, permissions, backups, etc.).
$wget http://www.motorlogy.com/apache/hadoop/common/stable/hadoop-2.2.0.tar.gz


Unzip the files and review the package content and configuration files.
$tar -xvf hadoop-2.2.0.tar.gz

Review the Hadoop configurations files.
After creating and configuring your virtual servers, the Ubuntu instance is now ready to start installation and configuration of Apache Hadoop 2.2.0 Single Node Cluster. This section describes the steps in details to install Apache Hadoop 2.2.0 and configure a Single-Node Apache Hadoop cluster
Configure the Apache Hadoop 2.2.0 Single Node Server
This section explains the steps to configure the Single Node Apache Hadoop 2.2.0 Server on Ubuntu.
Update the Configuration files
Update “.bashrc” file for user ‘ubuntu’.
Move to ‘user’ $HOME directory and edit ‘.bashrc’ file.

Update the ‘.bashrc’ file to add important Apache Hadoop environment variables for user.
a) Change directory to home. $ cd
b) Edit the file

$ sudo gedit .bashrc
Add below lines in the .bashrc file.
——————————-Set Hadoop environment Variables – Begin————————-
# Set Hadoop-related environment variables
export HADOOP_HOME=$HOME/hadoop-2.2.0
export HADOOP_CONF_DIR=$HOME/hadoop-2.2.0/etc/hadoop
export HADOOP_MAPRED_HOME=$HOME/hadoop-2.2.0
export HADOOP_COMMON_HOME=$HOME/hadoop-2.2.0
export HADOOP_HDFS_HOME=$HOME/hadoop-2.2.0
export YARN_HOME=$HOME/hadoop-2.2.0
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HOME/hadoop-2.2.0/bin
——————————Set Hadoop environment Variables – End —————–