Big Data And Hadoop
Learn from Highly Qualified & Experienced Industry Practitioners
Personalized Guidance from our 24*7 Support Team
Reschedule your Batch/Class at Your Convenience
13th MayDuration:40 Hrs.Class Days:Weekend - SatClass Time (IST):10:00 AM ISTDurationClass DaysClass Time (IST)Enrollment
15th MayDuration:40 Hrs.Class Days:weekday - MonClass Time (IST):8:00 AM IST
27th MayDuration:40 Hrs.Class Days:weekend - SatClass Time (IST):1:00 PM IST
29th MayDuration:40 Hrs.Class Days:weekday - MonClass Time (IST):11:00 AM IST
What is Big Data?
Big Data is collection of huge or massive amount of data.We live in data age.And it’s not easy to measure the total volume of data or to manage & process this enormous data. The flood of thisBig Data are coming from different resources.
Such as : New York stock exchange, Facebook, Twitter, AirCraft, Wallmart etc.
Today’s world information is getting doubled after every two years (1.8 times).
And still 80% of data is in unstructured format,which is very difficult to store,process or retrieve. so, we can say all this unstructured data is Big Data.
Why Hadoop is called Future of Information Economy
Hadoop is a Big Data mechanism, which helps to store and process & analysis unstructured data by using any commodity hardware.Hadoop is an open source software framework written in java,which support distributed application.It was introduced by Dough Cutting & Michael J. Cafarellain in mid of 2006.Yahoo is the first commercial user of Hadoop(2008).
Hadoop works on two different generation Hadoop 1.0 & Hadoop 2.0 which, is based on YARN (yet another resource negotatior) architecture.Hadoop named after Dough cutting’s son’s elephant.
Big Data Growth & Future Market
Commercial growth of BIG DATA and HADOOP
World’s Information is getting doubled after every two years.Today’s market agenda to convert Volume to Value .In current time, every company is investing 30% of its investment to maintain Big Data.According to this, the future prediction by 2020 Data Center is going to be 10X times multiple, Storage Device 100X times multiple,which required to stored this enormousBig Data & to manage this it required massive Man power.The opportunity on Big Data & Hadoop will be 1000X times multiple of today’s requirement by 2020.
IBM is one of the giant user of Big Data.IBM 10% (Million$ 1036)revenue come from Big Data.
Other top five company revenue from Big Data: HP Million$ 664, Teradeta Million$ 435, Dell Million$ 425 ,Oracle Million$ 415, SAP Million$ 368.
Job Titles for Hadoop Professionals
Job opportunities for talented software engineers in fields of Hadoop and Big Data are enormous and profitable. Zest to become proficient and well versed in Hadoop environment is all that is required for a fresher. Having technical experience and proficiency in fields described below can help you move up the ladder to great heights in the IT industry.
A Hadoop Architect is an individual or team of experts who manage penta bytes of data and provide documentation for Hadoop based environments around the globe. An even more crucial role of a Hadoop Architect is to govern administers, managers and manage the best of their efforts as an administrator. Hadoop Architect also needs to govern Hadoop on large cluster. Every HAdoop Architect must have an impeccable experience in Java, MApreduce, Hive, Hbase and Pig.
Hadoop developer is one who has a strong hold on programming languages such as Core Java,SQL jQuery and other scripting languages. Hadoop Developer has to be proficient in writing well optimized codes to manage huge amounts of data. Working knowledge of Hadoop related technologies such as Hive, Hbase, Flume facilitates him in building an exponentially successful career in IT industry.
Hadoop Scientist or Data Scientist is a more technical term replacing Business Analyst. They are professionals who generate, evaluate, spread and integrate the humongous knowledge gathered and stored in Hadoop environments. Hadoop Scientists need to have an in-depth knowledge and experience in business and data. Proficiency in programming languages such as R, and tools such as SAS and SPSS is always a plus point.
With colossal sized database systems to be administered, Hadoop Administrator needs to have a profound understanding of designing principals of HAdooop. An extensive knowledge of hardware systems and a strong hold on interpersonal skills is crucial. Having experience in core technologies such as HAdoop MapReduce,Hive,Linux,Java, Database administration helps him always be a forerunner in his field.
Data Engineers/ Hadoop Enginners are those can create the data-processing jobs and build the distributed MapReduce algorithms for data analysts to utilize. Data Engineers with experience in Java, and C++ will have an edge over others.
Big Data Hadoop Analysts need to be well versed in tools such as Impala, Hive, Pig and also a sound understanding of application of business intelligence on a massive scale. Hadoop Analysts need to come up with cost efficient breakthroughs that are faster in jumping between silos and migrating data.
Introduction of Big Data And Hadoop
HADOOP CLUSTER CONFIGURATION AND DATA LOADING
Learning Objectives -After this module,you will learn the Hadoop Cluster Architecture and Setup,Important Configuration files in a Hadoop Cluster,Data Loading Techniques.
Hadoop Cluster Architecture, Hadoop Cluster Configuration files
,Hadoop Cluster Modes, Multi-Node Hadoop Cluster, A Typical Production Hadoop Cluster, Map Reduce Job execution, Common Hadoop Shell commands.
Data Loading Techniques:
Hadoop Copy Commands, Hadoop Project:Data Loading
Hadoop Multinode Cluster & Architecture
Learning Objectives – This module will help you understand Multiple Hadoop Server roles such as NameNode and DataNode, and MapReduce data processing; you will also understand
the Hadoop 1.0 Cluster setup and configuration, steps in setting up Hadoop Clients using Hadoop 1.0, and important Hadoop configuration files and parameters..
Hadoop Installation and Initial Configuration, Deploying Hadoop in fully-distributed mode, deploying a multi-node Hadoop cluster, Installing Hadoop Clients, Hadoop server roles and their usage, Rack Awareness, Anatomy of Write and Read, Replication Pipeline, Data Processing
Backup, Monitoring ,Recovery and Maintenance
Learning Objectives – In this module, you will understand all the regular Cluster Administration tasks such as adding and Removing Data Nodes, Name Node recovery, configuring Backup and Recovery in Hadoop, Diagnosing the Node Failures in the Cluster, Hadoop Upgrade etc..
Setting up Hadoop Backup, white list and blacklist data nodes in a cluster, setup quota’s, upgrade Hadoop cluster, copy data across clusters using distcp, Diagnostics and Recovery,
Cluster Maintenance, Configure Rack awareness
PIG (Analytics using PIG) & PIG Latin
In this module, will learn about analytics with PIG. About Pig Latin scripting, complex data type, different cases to work with PIG. Execution environment, operation & transformation.
Topics : About Pig, PIG Installation, Pig latin scripting, complex Data Type, File Format, where to use PIG when there is MR , operation & transformation, compilation, Load, Filter, Join, foreach, Hadoop scripting, Pig UDF, PIG project .
Hive & HQL with Analytics
In this Module we will discuss a data-ware house package which analysis structure data. About Hive installation and loading data. Storing Data in different Table.
Topics : About Hive, Hive Installation, Manage table, External table, Complex data Type, execution engine, Partition & Bucketing , Hive UDF, Hive querry (sorting , aggregating, Joins, Subquerry), Map reduce side joins, Hive project.
Advance Hive, NoSQL Databases and HBase
In this module, you will understand Advance Hive concepts such as UDF. You will also acquire in-depth knowledge of what is HBase, how you can load data into HBase and query data from HBase using client.
Topics : Hive: Data manipulation with Hive, User Defined Functions, Appending Data into existing Hive Table, Custom Map/Reduce in Hive, Hadoop Project: Hive Scripting, HBase: Introduction to HBase, Client API’s and their features, Available Client, HBase Architecture, MapReduce Integration.
Sqoop (Real world datasets and analysis)
LearningObjectives- This Module will cover to Import & Export Data from RDBMS(MySql,
Oracle) to HDFS & Vice Versa
What is Sqoop?
Importing and exporting data using Sqoop
Provisioning Hive Metastore
What are the features of Sqoop?
What are the performance benchmarks in our cluster for Sqoop
Multiple Case with Hands on from HBaseusingclient
Flume(Twitter Datasets & Analysis)
What is Flume?
Importing Data using Flume
Twitter Data Analysis using Hive
Hadoop 2.0, MRv2 and YARN
In this module, you will understand the newly added features in Hadoop 2.0, namely, YARN, MRv2, NameNode High Availability, HDFS Federation, support for Windows etc.
Topics: Schedulers:Fair and Capacity, Hadoop 2.0 New Features: NameNode High Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN, Upgrade your existing MRv1 code to MRv2, Programming in YARN framework.
What is Oozie
• Kinds of Oozie Jobs
• Developing & Running an Oozie Workflow(Mapreduce, Hive, Pig, Sqoop) –
• Configuring Oozie Workflows
• Kinds of Nodes
Advance Topic :
Hadoop Project Environment
In this module, you will understand how multiple Hadoop ecosystem components work together in a Hadoop implementation to solve Big Data problems. We will discuss multiple data sets and specifications of the project. This module will also cover Apache Oozie Workflow Scheduler for Hadoop Jobs.
- PowerPoint Presentation covering all classes
- Recorded Videos Sessions On Bigdata and Hadoop with LMS Access.(lifetime support)
- Quiz , Assignment & POC.
- On Demand Online Support .
- Discussion Forum.
i.Sample Question papers of Cloudera Certification. .
ii. Technical Notes & Study Material..