Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu
Creating an Ubuntu VM Player instance
The first step is to download an Ubuntu image and create an Ubuntu VMPlayer instance.
Download the VMware image
Access the following link and download the 12.0.4 Ubuntu image:
Open the image file
Extract the Ubuntu VM image and Open it in VMware Player.
Click open virtual machine and select path where you have extracted the image
OPEN THE VMIMAGE
Select the ‘.vmx’ file and click ‘ok’.
Click on Play virtual machine.
You will get the home screen with the following image.
Ubuntu home screen
The user details for the Virtual instance is:
Open the terminal to access the file system.
Open a terminal
Update the OS packages and their dependencies
The first task is to run ‘apt-get update’ to download the package lists from the repositories and “update” them to get information on the newest versions of packages and their dependencies.
$sudo apt-get update
Install the Java and openssh server for Hadoop 2.2.0
Use apt-get to install the JDK 7.
$sudo apt-get install openjdk-6-jdk
$sudo apt-get install openssh-server
Download the Apache Hadoop 2.2.0 binaries
Download the Hadoop package
Download the binaries to your home directory. Use the default user ‘user’ for the installation.
In Live production instances a dedicated Hadoop user account for running Hadoop is used. Though, it’s not mandatory to use a dedicated Hadoop user account but is recommended because this helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (separating for security, permissions, backups, etc.).
DOWNLOAD HADOOP 2.2.0
Unzip the files and review the package content and configuration files.
$tar -xvf hadoop-2.2.0.tar.gz
Review the Hadoop configurations files.
After creating and configuring your virtual servers, the Ubuntu instance is now ready to start installation and configuration of Apache Hadoop 2.2.0 Single Node Cluster. This section describes the steps in details to install Apache Hadoop 2.2.0 and configure a Single-Node Apache Hadoop cluster
Configure the Apache Hadoop 2.2.0 Single Node Server
This section explains the steps to configure the Single Node Apache Hadoop 2.2.0 Server on Ubuntu.
Update the Configuration files
Update “.bashrc” file for user ‘ubuntu’.
Move to ‘user’ $HOME directory and edit ‘.bashrc’ file.
Update the ‘.bashrc’ file to add important Apache Hadoop environment variables for user.
a) Change directory to home. $ cd
b) Edit the file
$ sudo gedit .bashrc
Add below lines in the .bashrc file.
——————————-Set Hadoop environment Variables – Begin————————-
# Set Hadoop-related environment variables
# Set JAVA_HOME
# Add Hadoop bin/ directory to PATH
——————————Set Hadoop environment Variables – End —————–