What is Hadoop 3.0

What is Hadoop 3.0?

What is Hadoop 3.0, Welcome to the world of the best Hadoop 3.0 tutorial. In This tutorial, one can easily know the information about what is Hadoop 3.0, what is new in Hadoop 3.0 and what are the new features of Hadoop 3.0 which are available and are used by most of the Hadoop developers. Are you also dreaming to become to certified Pro Developer, then stop just dreaming get your Hadoop certification course from India’s Leading Hadoop Training institute.

So follow the below mentioned Hadoop 3.0 tutorial from Prwatech and learn Hadoop Course like a pro from today itself under 15+ Years of Hands-on Experienced Professionals.

what is hadoop3.0

What is Hadoop 3.0?

Apache Hadoop community has released a new release of Hadoop that is called Hadoop3.0. Through this version, the feedback can be provided of downstream applications and end-users and the platform to check it. This feature can be incorporated into the alpha and beta processes. Thousands of new fixes and improvements have been incorporated in this new release in comparison to previous minor release 2.7.0. That was released a year before. This blog will provide you with information about the new release of Hadoop and its features.

A number of significant enhancements are being incorporated in the new Hadoop version. The features are listed below and they have been proven much more advantageous to the Hadoop user. Apache site has full information about these new changes and enhancements that are being done in the new version. You can refer the site to get the look of those changes and here an overview is provided for all those changes that are being offered to the Hadoop users.

New Features of Hadoop 3.0

Supports Java 8.

Supports HDFS Erasure Code.

Supports more than 2 NameNodes:- This architecture is able to tolerate the failure of any one node in the system.

Support for Microsoft Azure Data Lake:- Hadoop now supports integration with Microsoft Azure Data Lake.

Scalability:- YARN Timeline Service v.2 chooses Apache HBase as the primary backing storage.

Shaded client jars:- New Hadoop-client-apiand Hadoop-client-runtime artifacts that shade Hadoop’s dependencies into a single jar. This avoids leaking Hadoop’s dependencies onto the application’s classpath.

MapReduce task:- level native optimization:- MapReduce has added support for a native implementation of the map output collector. For shuffle-intensive jobs, this can lead to a performance improvement of 30% or more.

Support for Erasure Coding in HDFS:- Considering the rapid growth trends in data and datacentre hardware, support for erasure coding in Hadoop 3.0 is an important feature in years to come. Erasure Coding is a 50 years old technique that lets any random piece of data be recovered based on other pieces of data i.e. metadata stored around it. Erasure Coding is more like an advanced RAID technique that recovers data automatically when a hard disk fails.

Support for Opportunistic Containers and Distributed Scheduling:- A notion of ExecutionType has been introduced, whereby Applications can now request for containers with an execution type of Opportunistic. Containers of this type can be dispatched for execution at an NM even if there are no resources available at the moment of scheduling.

Opportunistic containers are by default allocated by the central RM, but support has also been added to allow opportunistic containers to be allocated by a distributed scheduler which is implemented as an AMRMProtocol interceptor.

Default port number changes

NameNode :– 9870
ResourceManager :– 8088
MapReduce JobHistory server:– 19888

In this Hadoop 3.0 tutorial, we have covered concepts of what is Hadoop 3.0, what is new in the Hadoop 3.0 version and what are the new features of the Hadoop 3.0 version. Take your Hadoop interest to the next level. Become a certified expert in Hadoop technology by getting enrolled from Prwatech E-learning India’s leading advanced Hadoop training institute in Bangalore. Register Now for more updates on Hadoop 3.0 technology. Our expert trainers will help you towards mastering real-world skills in relation to these Hadoop technologies.

Hadoop-Multinode Cluster setup

Hadoop multinode cluster setup on ubuntu, Vmware and windows

Hadoop multinode cluster setup, In this tutorial one can easily know the information about the Hadoop multi-node cluster configuration which is available and is used by most of the Hadoop developers. Are you dreaming to become to certified Pro Hadoop Developer, then stop just dreaming, get your Hadoop certification course from India’s Leading Hadoop Training institute in Bangalore.

In this tutorial, we will learn how to install a Hadoop multinode cluster setup on Ubuntu, VMware. We will learn various steps for the Hadoop multi-node cluster configuration on Ubuntu to setup Hadoop multi-node cluster. We will start with platform requirements for Hadoop Multi-Node Cluster Setup on Ubuntu, prerequisites to install Hadoop multi-node cluster, various software required for installing Hadoop, how to start Hadoop multi-node cluster set up on master mode and slave mode. Do you want to set up the Hadoop multi-node cluster, So follow the below mentioned Hadoop multi-node cluster tutorial from Prwatech and learn Hadoop course like a pro from today itself under 15+ Years of Hands-on Experienced Professionals.

Hadoop multi-node cluster setup

Prerequisites

Vmware
Ubuntu image 12.04
Hadoop 1.x

hadoop multi node cluster setup

hadoop multi node cluster tutorial

hadoop multi node cluster

hadoop multi node cluster set up

hadoop multi node cluster tutorial

Hadoop multi-node cluster configuration

hadoop multi node cluster tutorial

hadoop multi node cluster configuration

ON MASTER NODE

● Command: sudo gedit masters (to create masters)

hadoop multi node cluster configuration

On masters node, master file contains the ip address of master only

● Command: sudo gedit masters

hadoop multi node cluster configuration

On the master node, the slaves file contains slaves IP address

● Command: sudo edit slaves

hadoop multi node cluster configuration

ON SLAVE NODE :

hadoop multi node cluster configuration

On Slave nodes, master file remains blank (For all the salve nodes)

● Command: sudo edit masters

On slave nodes, the slave file contains slaves IP address

● Command: sudo edit slaves

Now edit the hosts in both master and slave

hadoop multi node cluster configuration

Become a certified expert in Hadoop technology by getting enrolled from Prwatech E-learning India’s leading advanced Hadoop training institute in Bangalore.

Hadoop Basic HDFS Commands

Hadoop HDFS Commands, welcome to the world of Hadoop HDFS Basic commands. Are you the one who is looking forward to knowing the apache Hadoop HDFS commands List which comes under Hadoop technology? Or the one who is very keen to explore the list of all the HDFS commands in Hadoop with examples that are available? Then you’ve landed on the Right path which provides the standard and Basic Hadoop HDFS Commands.

If you are the one who is keen to learn the technology then learn the advanced certification course from the best Hadoop training institute who can help guide you about the course from the 0 Level to Advanced level. So don’t just dream to become the certified Pro Developer Achieve it by choosing the best World classes Hadoop Training institute which consists of World-class Trainers.

We, Prwatech listed some of the Top Hadoop HDFS Commands which Every Hadoop Developer should know about. So follow the Below Mentioned Hadoop Basic HDFS Commands and Learn the Advanced Hadoop course from the best Hadoop Trainer like a Pro.

Open a terminal window to the current working directory.

==> /home/training

Print the Hadoop version

⇒ Hadoop version

List the contents of the root directory in HDFS

⇒ Hadoop fs -ls /

Count the number of directories, files, and bytes under the paths

⇒ Hadoop fs -count hdfs:/

Run a DFS filesystem checking utility

⇒ Hadoop fsck – /

Run a cluster balancing utility

⇒ Hadoop balancer

Create a new directory named “Hadoop” below the /user/training directory in HDFS.

⇒ Hadoop fs -mkdir /user/training/Hadoop

Add a sample text file from the local directory named “data” to the new directory you created in HDFS during the previous step

⇒ Hadoop fs -put data/sample.txt/user/training/Hadoop

List the contents of this new directory in HDFS

⇒ Hadoop fs -ls /user/training/Hadoop

Add the entire local directory called “retail” to the /user/training directory in HDFS

⇒ Hadoop fs -put data/retail /user/training/Hadoop

Since /user/training is your home directory in HDFS, any command that does not have an absolute path is interpreted as relative to that directory. The next command will, therefore, list your home directory, and should show the items you’ve just added there

⇒ Hadoop fs -ls

Delete a file ‘customers’ from the “retail” directory

⇒ Hadoop fs -rm Hadoop/retail/customers

Ensure this file is no longer in HDFS

⇒ Hadoop fs -ls Hadoop/retail/customers

Delete all files from the “retail” directory using a wildcard

⇒ Hadoop fs -rm hadoop/retail/*

To empty the trash

⇒ Hadoop fs -expunge

Finally, remove the entire retail directory and all
of its contents in HDFS

⇒ Hadoop fs -rm -r Hadoop/retail

List the Hadoop directory again

⇒ Hadoop fs -ls Hadoop

Add the purchases.txt file from the local directory named “/home/training/” to the Hadoop directory you created in HDFS

⇒ Hadoop fs -copyFromLocal /home/training/purchases.txt Hadoop/

To view the contents of your text file purchases.txt which is present in your Hadoop directory

⇒ Hadoop fs -cat hadoop/purchases.txt

Add the purchases.txt file from “Hadoop” directory which is present in HDFS directory to the directory “data” which is present in your local directory

⇒ Hadoop fs -copyToLocal hadoop/purchases.txt /home/training/data

cp is used to copy files between directories present in HDFS

⇒ Hadoop fs -cp /user/training/*.txt /user/training/Hadoop

‘-get’ command can be used alternatively to ‘-copyToLocal’ command

⇒ Hadoop fs -get hadoop/sample.txt /home/training/

Display last kilobyte of the file “purchases.txt” to stdout

⇒ Hadoop fs -tail hadoop/purchases.txt

Default file permissions are 666 in HDFS

Use ‘-chmod’ command to change permissions of a file
⇒ Hadoop fs -ls hadoop/purchases.txt

sudo -u hdfs Hadoop fs -chmod 600 hadoop/purchases.txt

Default names of owner and group are training, training Use ‘-chown’ to change owner name and group name simultaneously

⇒ hadoop fs -ls hadoop/purchases.txt

sudo -u hdfs Hadoop fs -chown root: root hadoop/purchases.txt

The default name of group is training Use ‘-chgrp’ command to change the group name

⇒ Hadoop fs -ls hadoop/purchases.txt

sudo -u hdfs Hadoop fs -chgrp training hadoop/purchases.txt

Move a directory from one location to other

⇒ Hadoop fs -mv Hadoop apache_hadoop

The default replication factor to a file is 3. Use ‘-setrep’ command to change the replication factor of a file

⇒ Hadoop fs -setrep -w 2 apache_hadoop/sample.txt

Command to make the name node leave safe mode

⇒ Hadoop fs -expunge sudo -u hdfs hdfs dfsadmin -safemode leave

List all the Hadoop file system shell commands

⇒ Hadoop fs

⇒ Hadoop fs -help

I hope you like Prwatech Hadoop Basic HDFS Commands, Get the Advanced certification from World-class Trainers of Best Hadoop Training Institute.

OPEN TERMINAL AND GO TO HBASE SHELL :

cloudera@cloudera-vm:~$ hbase shell

CHECK WHAT TABLES EXISTS IN THE SYSTEM :

hbase(main):001:0> list
TABLE

CREATE TABLE :

hbase(main):002:0> create ‘batch’, ‘details’

ENTER DATA INTO THE TABLE :

hbase(main):003:0> put ‘batch’, ‘row1’, ‘details:name’, ‘Rhiddhiman’

hbase(main):004:0> put ‘batch’, ‘row2’, ‘details:name’, ‘Rohit’

hbase(main):005:0> put ‘batch’, ‘row3’, ‘details:name’, ‘Dipankar’

hbase(main):006:0> put ‘batch’, ‘row4’, ‘details:name’, ‘Kalyan’

CHECK DATA ENTERED IN THE TABLE :

hbase(main):007:0> scan ‘batch’

CHANGE VALUE OF A PARTICULAR COLUMN IN A ROW :

hbase(main):008:0> put ‘batch’, ‘row2’, ‘details:name’, ‘Jayanta’

CHECK DATA AFTER MODIFICATION

hbase(main):009:0> scan ‘batch’

CHANGE VALUE OF A PARTICULAR COLUMN IN A ROW :

hbase(main):010:0> put ‘batch’, ‘row3’, ‘details:name’, ‘Dhrubajyoti’

CHECK DATA AFTER MODIFICATION :

hbase(main):011:0> scan ‘batch’

CHECK VALUE THAT HAS BEEN CHANGED :

hbase(main):012:0> get ‘batch’, ‘row2′, {COLUMN=>’details:name’, VERSIONS=>2}

hbase(main):013:0> put ‘batch’, ‘row3’, ‘details:name’, ‘Banajit’

hbase(main):014:0> get ‘batch’, ‘row3′, {COLUMN=>’details:name’, VERSIONS=>3}

ENTER REMAINING DATA :

hbase(main):015:0> put ‘batch’, ‘row1’, ‘details:address’, ‘Marathahalli’

hbase(main):016:0> put ‘batch’, ‘row1’, ‘details:age’, ’27’

hbase(main):017:0> put ‘batch’, ‘row1’, ‘details:course’, ‘Hadoop’

hbase(main):018:0> put ‘batch’, ‘row2’, ‘details:address’, ‘BTM’

hbase(main):019:0> put ‘batch’, ‘row3’, ‘details:address’, ‘Whitefield’

hbase(main):020:0> put ‘batch’, ‘row4’, ‘details:address’, ‘Electronics City’

CHECK DATA :

hbase(main):021:0> scan ‘batch’

DESCRIPTION OF TABLE :

hbase(main):031:0> describe ‘batch’

Hadoop Cluster on Google Cloud Platform (GCP)

Hadoop Cluster on Google Cloud Platform (GCP), Welcome to the world of advanced Tutorials on Hadoop. Are you looking forward to Creating a Hadoop Cluster on Google Cloud Platform? Or looking for some help on how to Setup Hadoop on GCP (Google Cloud platform)? Then you’ve landed on the Right Path which providing advanced tutorial Based concepts on the Hadoop cluster. In this tutorial, one can easily explore how to Setup Hadoop on GCP (Google Cloud platform) with step by step explanation.

There are many possible ways to Create Hadoop cluster on GCP Platform, just follow the below-mentioned step by step process of How to Setup Hadoop on GCP (Google Cloud platform) Tutorials which was originally designed by India’s Leading Big Data Training institute Professionals who also offering advanced Hadoop Course and Certification Programs.

Prerequisites

1.GCP Account

2. First Create a GCP Account

3. It’s Free and Google will give you 300$ Credits which is 21 thousand approx.

♦ Now Open Google Cloud Platform

♦ Open Console

♦ To create instance go to Bigtable from the top bar

♦ Enter your instance name and choose the instance type (follow below-given configuration)

♦ Select region and zone

♦ Click on ‘Create’ to create an instance

♦ An instance will be created as per your configuration

♦ Now tap on the console button to open up the terminal

♦ The terminal will be displayed

♦ Now check for your system configuration

Command : cat /etc/os-release

Cluster Instance

♦ Go to Dataproc and Click on Cluster

♦ Now Create Cluster

♦ Name Your Cluster and Select Region

♦ Configure Master and Slave Nodes

1.Master Node > Machine Type 4 CPUs

2.Primary Disk Size 32GB

3.Worker Nodes > Machine Type 1

4.CPU > Primary Disk Size 10GB

♦ Click on Advance if you want to add any bucket or select an iso

♦ Click on Create

♦ It will take a few seconds. Then you can see the status of the cluster as active and Running

♦ Click on Cluster > VM Instances

♦ Click on SSH

♦ It will start the terminal

♦ Using Command : cat /etc/os-release

You can see all the details of the Operating System

♦ Use the command: Sudo jps

To check for all the running nodes in the cluster

How to install HBase on Google Cloud Platform (GCP)

Hadoop Basic Linux Commands

Welcome to the world of best Linux commands used in Hadoop, In This tutorial, one can easily learn a List of all Top Rated Hadoop basic Linux commands which are available and are used by most of the Hadoop developers. Are you also dreaming to become to certified Pro Developer, then stop just dreaming get your Hadoop certification course from India’s Leading Big Data Training institute.

So follow the below mentioned basic Linux commands for Hadoop from Prwatech and learn Hadoop Course like a pro from today itself under 15+ Years of Hands-on Experienced Professionals.

Basic Linux commands used in Hadoop

ls ⇒ directory listing

ls -al ⇒ formatted listing with hidden files

cd dir ⇒ change directory to dir

cd ⇒ change to home

pwd ⇒ shows current directory

mkdir dir ⇒ create a directory dir

rm file ⇒ delete the file

rm -r dir ⇒ delete directory dir

rm -f file ⇒ force remove the file

rm -rf dir ⇒ force remove directory dir *

cp file1 file2 ⇒ Copy file1 to file2

cp -r dir1 dir2 ⇒ copy dir1 to dir2; create dir2 if it is not present.

mv file1 file2 ⇒ rename or move file1 to file2 if file2 is an existing directory, moves file1 into

directory file2

ln -s file link ⇒ create a symbolic link to file

touch file ⇒ create or update file

cat > file ⇒ places standard input into the file

more file ⇒ output the contents of the file

head file ⇒ output the first 10 lines of the file

tail file ⇒ output the last 10 lines of the file

tail -f file ⇒ output the contents of the file as it

grows, starting with the last 10 lines

Permission Commands:

chmod ⇒ modify file access rights

su ⇒ temporarily become the superuser

sudo ⇒ temporarily become the superuser

chowm ⇒ change file ownership

chgrp ⇒ change a file’s group ownership

System Info Commands :

date ⇒ shows the current date and time

cal ⇒ show this month’s calendar

uptime ⇒ show current uptime

w ⇒ display who is online

whoami ⇒ who you are logged in as

finger user ⇒ display information about user

uname -a ⇒ show kernel information

cat /proc/cpuinfo ⇒ cpu information

cat /proc/meminfo ⇒ memory information

man command ⇒ show the manual for command

df ⇒ show disk usage

du ⇒ show directory space usage

free ⇒ show memory and swap usage

whereis app ⇒ show possible locations of app

which app ⇒ show which app will be run by default

Process Management Commands:

ps ⇒ display your currently active processes

top ⇒ display all running processes

kill pid ⇒ kill process id pid

killall proc ⇒ kill all processes named proc *

bg ⇒ lists stopped or background jobs; resume a

stopped job in the background

fg ⇒ brings the most recent job to foreground

fg n ⇒ brings job n to the foreground

SSH Commands

sshuser@host ⇒ connect to host as the user

ssh -p port user@host ⇒ connect to host on port
port as user

ssh-copy-id user@host ⇒ add your key to host for
user to enable a keyed or passwordlesslogin

Searching Commands

grep pattern files ⇒ search for pattern in files

grep -r pattern dir ⇒ search recursively for
pattern in dir

command | grep pattern ⇒ search for pattern in the output of the command

locate file ⇒ find all instances of file

Network Commands

ping host ⇒ ping host and output results

whois domain ⇒ get whoisinformation for domain

dig domain ⇒ get DNS information for domain

dig -x host ⇒ reverse lookup host

wget file ⇒ download file

wget -c file ⇒ continue a stopped download

Installation Commands

Install from source:

./configure
make
make install

dpkg-i pkg.deb ⇒ install a package (Debian)

Shortcuts Commands

♦ Ctrl+C ⇒  halts the current command
♦ Ctrl+Z ⇒  stops the current command, resume with
fgin the foreground or bgin the background
♦ Ctrl+D ⇒  log out of current session, similar to exit
♦ Ctrl+W ⇒  erases one word in the current line
♦ Ctrl+U ⇒  erases the whole line
♦ Ctrl+R ⇒  type to bring up a recent command
♦ !! ⇒  repeats the last command
♦ exit ⇒  log out of current session

Thanks for Reading us, if you are also the one who is keen to learn the technology like a pro from scratch to advanced level, the Ask your World-class Trainers of India’s Leading Hadoop Training institute now and get Benefits of Big Data Certification course from Prwatech.

Hadoop cluster on AWS setup

How to set up an Apache Hadoop Cluster on AWS

Apache Hadoop Installation and Cluster setup on AWS

Hadoop cluster on AWS setup, In this tutorial one can easily know the information about Apache Hadoop Installation and Cluster setup on AWS which are available and are used by most of the Hadoop developers. Are you dreaming to become to certified Pro Hadoop Developer, then stop just dreaming, get your Hadoop certification course from India’s Leading Hadoop Training institute.

In this tutorial, we will learn how to configure an Apache Hadoop Cluster on AWS. We will learn various steps for the Hadoop AWS configuration on AWS to set up the Apache Hadoop cluster. We will start with platform requirements for Apache Hadoop cluster on AWS setup, prerequisites to install apache Hadoop and cluster on AWS, various software required for installing Hadoop. Do you want to set up the Apache Hadoop cluster on AWS, So follow the below mentioned How to configure an Apache Hadoop Cluster on AWS tutorial from Prwatech and learn Hadoop course like a pro from today itself under 15+ Years of Hands-on Experienced Professionals.

Apache Hadoop AWS configuration

♦ Prerequisites :

AWS account
Putty and Puttygen (Lastest Version)

♦ Go to the given URL :

♦ Click on Create a Free Account :

♦ If you don’t have an account then fill or details else login with your existing account

♦ After Sign in Go to click on EC2

♦ Click on lunch Instance

♦ At left side click on free tier the only button :

♦ Select the Operating System

♦ Select General purpose t2.micro

♦ At Right bottom click on next: Configure instance details

♦ Configure the instance

♦ After Configure has done click on the bottom right on Next: Add Storage

♦ Make Size as 30 GiB

♦ Click on Next: Add Tags which is present on Right bottom

♦ Add tags according to your requirement

♦ Click to Next: Configure Security Group

♦ Add Rule and Change Source

♦ Click on Launch

♦ Change Security and Download the .pem file

♦ Click On Launch Instances:

♦ If you are using window machine go to a browser and download putty.exe and puttygen.exe file

&nbsp

♦ Open Puttygen and click on load and open your .pem file in puttygen

&nbsp

♦ Click on generate a private key and save your private key in your local machine

&nbsp
♦ Go to Putty and in Left Side Click on SSH and click on auth and browse for the .pem key generated by an instance.

♦ Go to Session and Paste your private DNS in host option.

♦ Saved Session and Double click on that You will get Console

Become a certified expert in Hadoop technology by getting enrolled from Prwatech E-learning India’s leading advanced Hadoop training institute in Bangalore. Register Now for more updates on Hadoop AWS configuration upgrades. Our expert trainers will help you towards mastering real-world skills in relation to these Hadoop technologies.

Hadoop Singlenode Using Hadoop 2.x

Install Hadoop Single node Cluster Using Hadoop 2.x

How to Install Hadoop Single node Cluster Using Hadoop 2.x, in this Tutorial one can learn how to install Hadoop with Single Node Using Hadoop 2.X. We Prwatech the Pioneers of Hadoop Training Sharing information about Hadoop to Those Tech Enthusiasts who wanted to explore the technology and who wanted to Become the Certified Big Data Developer. Are you the one who is looking for the best platform which provides information about what is the installation process of Hadoop-Single Node clustering Using Hadoop 2.x? Or the one who is looking forward to taking the advanced Certification Course from India’s Leading Big Data Training institute? Then you’ve landed on the Right Path.

Get Clear Understand of the Installation process of Hadoop Single node Cluster Using Hadoop 2.x with India’s Leading Hadoop Training institute. The Below mentioned Tutorial will help to Understand the detailed information about Install Hadoop-Single node Using Hadoop 2.x, so Just Follow All the Tutorials of India’s Leading Best Big Data Training Institute and Be a Pro Hadoop Developer.

Prerequisites:

1.Hadoop 2.7.0
2.Java-8-oracle
3.Ubuntu 12.0 or above

Process of Setting up a Single Node Cluster Using Hadoop 2.x

Update the repository

Once the update is complete go for java installation

Command :

sudo add-apt-repository ppa:webupd8team/java

sudo apt-get update

sudo apt-get install oracle-java8-installer

♦ After java has been installed. To check whether java is installed on your system or Not give the below command:

Command: java –version

Install openssh-server

Command: sudo apt-get install openssh-server

Download Hadoop 2.7.0.tar.gz

Command: wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.0/hadoop-2.7.0.tar.gz

After downloading untar the Hadoop

Command: tar -xvf hadoop-2.7.0.tar.gz

Get into Hadoop directory

Go to Hadoop and configure the fill

Edit core-site.xml

Edit hdfs-site.xml

Edit mapred-site.xml

Edit yarn-site.xml

Edit Hadoop-env.sh

Set JAVA and HADOOP HOME in the .bashrc file

Create an ssh key:

Command: ssh-keygen -t rsa

Moving the key to authorized key

Command:cat $HOME /.ssh /id_rsa.pub>>$HOME/ .ssh/ authorized_keys

Copy the key to other host

Command: ssh-copy-id -i $HOME/.ssh/id_rsa.pub user@hostname

Now format the name node

Start all daemons

Check your daemons

Command: Jps

Hadoop-Singlenode Using Hadoop 1.x

How to Install Hadoop Single node Cluster Using Hadoop 1.x

Install Hadoop-Single node Using Hadoop 1.x, in this Tutorial one can learn how to install Hadoop with Single Node Using Hadoop 1.X. Are you the one who is looking for the best platform which provides information about what is the installation process of Hadoop-Single Node clustering Using Hadoop 1.x? Or the one who is looking forward to taking the advanced Certification Course from India’s Leading Big Data Training institute? Then you’ve landed on the Right Path.

The Below mentioned Tutorial will help to Understand the detailed information about Install Hadoop-Single node Using Hadoop 1.x, so Just Follow All the Tutorials of India’s Leading Best Big Data Training institute and Be a Pro Hadoop Developer.

Prerequisites for Install Using Single Node Cluster Using Hadoop 1.x

Hadoop 1.x
Java V6
Ubuntu 12.0 or above

Download ubuntu 12.04

Download the Latest Softwares from Prwatech, Go to the below link and download the image of ubuntu 12.04

Site: http://prwatech.in/online-softwares/

VMware Player

Open VMware Player and click the open virtual machine and select the path where you have extracted the image of ubuntu. After that select the .vmx file and click ok.

Now you can see the below screen in VMware Player

Double click on Ubuntu present in VMware Player. You will get a screen of the below image.

Step by Step Process of Install Hadoop Single node Cluster

Update the repository

Once the update is complete Use the Below Mentioned Command

Command: sudo apt-get install openjdk-6-jdk

After java as been installed. To check whether java is installed on your system or Not give the below command:

Command: java –version

Install openssh-server:

Command: sudo apt-get install openssh-server

Download and extract hadoop

Link: http://prwatech.in/online-softwares/

Command: tar –xvf hadoop-1.2.0.tar.gz

Get into hadoop-1.2.0 directory

Edit core-site.xml

Command: sudo gedit core-site.xml

Write under configuration:

<name>fs.default.name</name>

<value>hdfs://localhost:8020</value>

</property>

Edit mapred-site.xml

Command: sudo gedit mapred-site.xml

Write under configuration:

<name>mapred.job.tracker</name>

<value>localhost:8021</value>

</property>

Edit hdfs-site.xml

Command: sudo gedit hdfs-site.xml

<name>dfs.replication</name>

</property>

<name>dfs.permissions</name>

<value>false</value>

</property>

Add java_home in Hadoop-env.sh file

Command: gedit Hadoop-1.2.0/conf/Hadoop-env.sh

This image has an empty alt attribute; its file name is Screenshot-556.png

Type:export JAVA_HOME=/usr/lib/JVM/java-6-OpenJDK-i386

Uncomment the below-shown export and add the below the path to your JAVA_HOME:

Create an ssh key:

Command: ssh-keygen -t rsa

Moving the key to authorized key:

Command: cat $HOME /.ssh /id_rsa.pub>>$HOME/ .ssh/ authorized_keys

Copy the key to other hosts:

Command: ssh-copy-id -i $HOME/.ssh/id_rsa.pub user@hostname

Get into your bin directory.

Format the name node :

Command: sh hadoop namenode -format

Start the nodes.

Command: sh start-all.sh

To check Hadoop started correctly :

Command: jps

What is Hadoop 3.0?

What is Hadoop 3.0?

New Features of Hadoop 3.0

Default port number changes

Hadoop multinode cluster setup on ubuntu, Vmware and windows

Hadoop multi-node cluster setup

Prerequisites

Hadoop multi-node cluster configuration

ON MASTER NODE

ON SLAVE NODE :

Hadoop Basic HDFS Commands

Open a terminal window to the current working directory.

Print the Hadoop version

List the contents of the root directory in HDFS

Count the number of directories, files, and bytes under the paths

Run a DFS filesystem checking utility

Run a cluster balancing utility

Create a new directory named “Hadoop” below the /user/training directory in HDFS.

Add a sample text file from the local directory named “data” to the new directory you created in HDFS during the previous step

List the contents of this new directory in HDFS

Add the entire local directory called “retail” to the /user/training directory in HDFS

Since /user/training is your home directory in HDFS, any command that does not have an absolute path is interpreted as relative to that directory. The next command will, therefore, list your home directory, and should show the items you’ve just added there

Delete a file ‘customers’ from the “retail” directory

Ensure this file is no longer in HDFS

Delete all files from the “retail” directory using a wildcard

To empty the trash

Finally, remove the entire retail directory and all of its contents in HDFS

List the Hadoop directory again

Add the purchases.txt file from the local directory named “/home/training/” to the Hadoop directory you created in HDFS

To view the contents of your text file purchases.txt which is present in your Hadoop directory

Add the purchases.txt file from “Hadoop” directory which is present in HDFS directory to the directory “data” which is present in your local directory

cp is used to copy files between directories present in HDFS

‘-get’ command can be used alternatively to ‘-copyToLocal’ command

Display last kilobyte of the file “purchases.txt” to stdout

Default file permissions are 666 in HDFS

Default names of owner and group are training, training Use ‘-chown’ to change owner name and group name simultaneously

The default name of group is training Use ‘-chgrp’ command to change the group name

Move a directory from one location to other

The default replication factor to a file is 3. Use ‘-setrep’ command to change the replication factor of a file

Command to make the name node leave safe mode

List all the Hadoop file system shell commands

OPEN TERMINAL AND GO TO HBASE SHELL :

CHECK WHAT TABLES EXISTS IN THE SYSTEM :

CREATE TABLE :

ENTER DATA INTO THE TABLE :

CHECK DATA ENTERED IN THE TABLE :

CHANGE VALUE OF A PARTICULAR COLUMN IN A ROW :

CHECK DATA AFTER MODIFICATION

CHANGE VALUE OF A PARTICULAR COLUMN IN A ROW :

CHECK DATA AFTER MODIFICATION :

CHECK VALUE THAT HAS BEEN CHANGED :

ENTER REMAINING DATA :

CHECK DATA :

DESCRIPTION OF TABLE :

Hadoop Cluster on Google Cloud Platform (GCP)

Prerequisites

♦ Now Open Google Cloud Platform

♦ Open Console

♦ To create instance go to Bigtable from the top bar

♦ Enter your instance name and choose the instance type (follow below-given configuration)

♦ Select region and zone

♦ Click on ‘Create’ to create an instance

♦ An instance will be created as per your configuration

♦ Now tap on the console button to open up the terminal

♦ The terminal will be displayed

♦ Now check for your system configuration

Cluster Instance

♦ Go to Dataproc and Click on Cluster

♦ Now Create Cluster

♦ Name Your Cluster and Select Region

♦ Configure Master and Slave Nodes

♦ Click on Advance if you want to add any bucket or select an iso

♦ Click on Create

♦ It will take a few seconds. Then you can see the status of the cluster as active and Running

♦ Click on Cluster > VM Instances

♦ Click on SSH

♦ It will start the terminal

♦ Using Command : cat /etc/os-release

♦ Use the command: Sudo jps

Hadoop Basic Linux Commands

Finally, remove the entire retail directory and all
of its contents in HDFS