Big Data And Hadoop
18th OctoberDuration:40 Hrs.Class Days:weekendsClass Time (IST):10:30 AM ISTDurationClass DaysClass Time (IST)Enrollment
27th OctoberDuration:40 Hrs.Class Days:weekdaysClass Time (IST):11:30 AM IST
1st NovemberDuration:40 Hrs.Class Days:weekendsClass Time (IST):2:00 PM IST
3rd NovemberDuration:40 Hrs.Class Days:weekdaysClass Time (IST):1:00 PM IST
What is Big Data?
Big Data is collection of huge or massive amount of data.We live in data age.And it’s not easy to measure the total volume of data or to manage & process this enormous data. The flood of thisBig Data are coming from different resources.
Such as : New York stock exchange, Facebook, Twitter, AirCraft, Wallmart etc.
Today’s world information is getting doubled after every two years (1.8 times).
And still 80% of data is in unstructured format,which is very difficult to store,process or retrieve. so, we can say all this unstructured data is Big Data.
Why Hadoop is called Future of Information Economy
Hadoop is a Big Data mechanism, which helps to store and process & analysis unstructured data by using any commodity hardware.Hadoop is an open source software framework written in java,which support distributed application.It was introduced by Dough Cutting & Michael J. Cafarellain in mid of 2006.Yahoo is the first commercial user of Hadoop(2008).
Hadoop works on two different generation Hadoop 1.0 & Hadoop 2.0 which, is based on YARN (yet another resource negotatior) architecture.Hadoop named after Dough cutting’s son’s elephant.
Big Data Growth & Future Market
Commercial growth of BIG DATA and HADOOP
World’s Information is getting doubled after every two years.Today’s market agenda to convert Volume to Value .In current time, every company is investing 30% of its investment to maintain Big Data.According to this, the future prediction by 2020 Data Center is going to be 10X times multiple, Storage Device 100X times multiple,which required to stored this enormousBig Data & to manage this it required massive Man power.The opportunity on Big Data & Hadoop will be 1000X times multiple of today’s requirement by 2020.
IBM is one of the giant user of Big Data.IBM 10% (Million$ 1036)revenue come from Big Data.
Other top five company revenue from Big Data: HP Million$ 664, Teradeta Million$ 435, Dell Million$ 425 ,Oracle Million$ 415, SAP Million$ 368.
Job Titles for Hadoop Professionals
Job opportunities for talented software engineers in fields of Hadoop and Big Data are enormous and profitable. Zest to become proficient and well versed in Hadoop environment is all that is required for a fresher. Having technical experience and proficiency in fields described below can help you move up the ladder to great heights in the IT industry.
A Hadoop Architect is an individual or team of experts who manage penta bytes of data and provide documentation for Hadoop based environments around the globe. An even more crucial role of a Hadoop Architect is to govern administers, managers and manage the best of their efforts as an administrator. Hadoop Architect also needs to govern Hadoop on large cluster. Every HAdoop Architect must have an impeccable experience in Java, MApreduce, Hive, Hbase and Pig.
Hadoop developer is one who has a strong hold on programming languages such as Core Java,SQL jQuery and other scripting languages. Hadoop Developer has to be proficient in writing well optimized codes to manage huge amounts of data. Working knowledge of Hadoop related technologies such as Hive, Hbase, Flume facilitates him in building an exponentially successful career in IT industry.
Hadoop Scientist or Data Scientist is a more technical term replacing Business Analyst. They are professionals who generate, evaluate, spread and integrate the humongous knowledge gathered and stored in Hadoop environments. Hadoop Scientists need to have an in-depth knowledge and experience in business and data. Proficiency in programming languages such as R, and tools such as SAS and SPSS is always a plus point.
With colossal sized database systems to be administered, Hadoop Administrator needs to have a profound understanding of designing principals of HAdooop. An extensive knowledge of hardware systems and a strong hold on interpersonal skills is crucial. Having experience in core technologies such as HAdoop MapReduce,Hive,Linux,Java, Database administration helps him always be a forerunner in his field.
Data Engineers/ Hadoop Enginners are those can create the data-processing jobs and build the distributed MapReduce algorithms for data analysts to utilize. Data Engineers with experience in Java, and C++ will have an edge over others.
Big Data Hadoop Analysts need to be well versed in tools such as Impala, Hive, Pig and also a sound understanding of application of business intelligence on a massive scale. Hadoop Analysts need to come up with cost efficient breakthroughs that are faster in jumping between silos and migrating data.
Introduction of Big Data And Hadoop
Introduction of Big Data &Hadoop Architecture
In this module, will discuss about Big Data. How Big Data impact in our social life & its important role. How Hadoop is helpful to manage & process Big Data. Hadoop Ecosystem & its Architecture. Hadoop components: HDFS & Mapreduce manage to store & process Big Data.
Topics: Big Data, Role of Big Data , Hadoop components, Hadoop architecture, HDFS, Map Reduce, Name Node, Job Tracker, Secondary Name Node, Data node, Data Pipe lining, Task Tracker, Rack Awareness, Anatomy of file read & write .
Playing around Cluster & set up Hadoop cluster
In this module, we will learn to set up Hadoop Cluster on five different mode. How to configure important files. Data loading & processing.
Topics : Hadoop cluster on single Node (pseudo mode), Multiple node cluster, Cssh Cluster, Configuring Files , Data Loading, Map reduce processing, Recover secondary Name Node , Adding & Deleting Data Node, Balancing.
Hadoop MapReduce Framework & impletation
In this module, will work on Map Reduce Framework.How Map Reduce implement on Data which is stored in HDFS . Know about Input split, input format & output format. Overall Map Reduce Process & different stages to process the data.
Topics : Mapper, Reducer, Driver, Input split, Participation , Combiner, Shuffling, Input format, output format, Text Input /output format, Sequence File format, N-Line format, reuse of JVM, Record Reader, Job scheduler, Safe Mode, Compression & Decompression (codec, Lzo, snappy).
Advanced Map Reduce (Mrv2)
In this we will work with advanced Map Reduce process complex Data. We are working with new components such as Counters, Distributed Cache to additional data while processing. Custom writable, Serialization.
Topics : Counters, Distributed Cache , Data Localization, Speculative Execution, reuse of JVM, Mrunit Testing, Unit testing, Advance map reduce framework.
PIG (Analytics using PIG) & PIG Latin
In this module, will learn about analytics with PIG. About Pig Latin scripting, complex data type, different cases to work with PIG. Execution environment, operation & transformation.
Topics : About Pig, PIG Installation, Pig latin scripting, complex Data Type, File Format, where to use PIG when there is MR , operation & transformation, compilation, Load, Filter, Join, foreach, Hadoop scripting, Pig UDF, PIG project .
Hive & HQL with Analytics
In this Module we will discuss a data-ware house package which analysis structure data. About Hive installation and loading data. Storing Data in different Table.
Topics : About Hive, Hive Installation, Manage table, External table, Complex data Type, execution engine, Partition & Bucketing , Hive UDF, Hive querry (sorting , aggregating, Joins, Subquerry), Map reduce side joins, Hive project.
Advance Hive, NoSQL Databases and HBase
In this module, you will understand Advance Hive concepts such as UDF. You will also acquire in-depth knowledge of what is HBase, how you can load data into HBase and query data from HBase using client.
Topics : Hive: Data manipulation with Hive, User Defined Functions, Appending Data into existing Hive Table, Custom Map/Reduce in Hive, Hadoop Project: Hive Scripting, HBase: Introduction to HBase, Client API’s and their features, Available Client, HBase Architecture, MapReduce Integration.
Advance HBase and ZooKeeper
This module will cover Advance HBase concepts. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster, why HBase uses Zookeeper and how to Build Applications with Zookeeper.
Topics :HBase: Advanced Usage, Schema Design, Advance Indexing, Coprocessors, Hadoop Project: HBase tables The ZooKeeper Service: Data Model, Operations, Implementation, Consistency, Sessions, States.
Hadoop 2.0, MRv2 and YARN
In this module, you will understand the newly added features in Hadoop 2.0, namely, YARN, MRv2, NameNode High Availability, HDFS Federation, support for Windows etc.
Topics: Schedulers:Fair and Capacity, Hadoop 2.0 New Features: NameNode High Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN, Upgrade your existing MRv1 code to MRv2, Programming in YARN framework.
Hadoop Project Environment
In this module, you will understand how multiple Hadoop ecosystem components work together in a Hadoop implementation to solve Big Data problems. We will discuss multiple data sets and specifications of the project. This module will also cover Apache Oozie Workflow Scheduler for Hadoop Jobs.