1)What is difference between Secondary name node,Checkpoint name node & backup secondary Nod,a poorly named component of Hadoop?
2)What is the side data distribution Techniques?
3)What is shuffling in Map reduce?
4)What is Partitioning?
5)can we change the file cached by distributed cache?
6)What if the job tracker machine goes down?
7)What are the four modules that make up the Apche hadoop Framework?
8)Which mode can hadoop be run in?list few features for each mode?
9)can we deploye job tracker other than Namenode?
10)Where are the Hdoop’s configuration files located?
Side data refers to extra static small data required by map reduce to perform job. Side data can be defined as extra read-only data needed by a job to process the main dataset. The challenge is to make side data available to all the map or reduce tasks .
First of all shuffling is the process of transfering data from the mappers to the reducers, so I think it is obvious that it is necessary for the reducers, since otherwise, they wouldn’t be able to have any input and the process by which the intermediate output of the mapper is sorted and sent across to the reducers is known as Shuffling.
A MapReduce partitioner makes sure that all the value of a single key goes to the same reducer, thus allows evenly distribution of the map output over the reducers. It redirects the mapper output to the reducer by determining which reducer is responsible for a particular key.
if job tracker will goes down,all running jobs are halted.beacuse job tracker is a single point of failure.
No, Distributed Cache tracks the caching with timestamp. Cached file should not be changed during the job execution.
a. Hadoop Common: The common utilities that support the other Hadoop modules.
b. Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
c. Hadoop YARN: A framework for job scheduling and cluster resource management.
d. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
a) Standalone, or local mode, which is one of the least commonly used environments. When it is used, it’s usually only for running MapReduce programs. Standalone mode lacks a distributed file system, and uses a local file system instead.
b)Fully distributed mode, which is most commonly used in production environments. Unlike pseudo-distributed mode, fully distributed mode runs all daemons on a cluster of machines rather than a single one.
yes,in production case.
The hadoop configuration files are located in conf-sub directory.
11>List Hadoops three cofiguration files?
12>What are slaes ad masters in Hadoop?
13>How many datanodes can run on a single Hadoop cluster?
14>What is job tracker in Hadoop?
15>How many job tracker processes can run on a single Hadoop cluster?
16>What sorts of actions does the job tracker process perform?
17>How does job tracker schedule a job for the task tracker?
18>What does the mapred.job.tracker command do?
19>What is PID?
20>What is jps?
three configuration are-
the masters are-
the Slaves are-
N-number of datanode can run on single hadoop cluster.
JobTracker is the daemon service for submitting and tracking MapReduce jobs in Hadoop. There is only One Job Tracker process run on any hadoop cluster. Job Tracker runs on its own JVM process.
only one job tracker can run on single hadoop cluster.
Client applications submit jobs to the Job tracker.
The JobTracker talks to the NameNode to determine the location of the data
The JobTracker locates TaskTracker nodes with available slots at or near the data
The JobTracker submits the work to the chosen TaskTracker nodes.
Job tracker monitors the task tracker nodes for signs of activity. If there is not enough activity, job tracker transfers the job to a different task tracker node.
Job tracker receives a notification from task tracker if the job has failed. From there, job tracker might submit the job elsewhere, as described above. If it doesn’t do this, it might blacklist either the job or the task tracker.
when a client submit a job to the job tracker,job tracker searches for a empty node,then shedule the task to server.
jps is a command which is used to check wheather the name node,data node,job tracker,task tracker is running or not.
21>Is there another way to check whether Namenode is working?
22>How would you restart Namenode?
23>What is fsk?
24>What is a map in Hadoop?
25>What is a reduer in Hadoop?
26>What are the parameters of mappers and reducers?
27>Is it possible to rename the output file, and if so, how?
28>List the network requirements for using Hadoop.
29>Which port does SSH work on?
30>What is streaming in Hadoop?
23>file system check.
24>‘Map’ is responsible to read data from input location, and based on the input type, it will generate a key value pair,that is, an intermediate output in local machine.
25>’Reducer’ is responsible to process the intermediate output received from the mapper and generate the final output.
26>The four basic parameters of a mapper are <LongWritable, text, text and IntWritable> and four basic parameters of a reducer are <Text, IntWritable, Text, IntWritable>.
27>Yes we can rename the output file by implementing multiple format output class.
30>Streaming is a feature with Hadoop framework that allows us to do programming using MapReduce in any programming language which can accept standard input and can produce standard output.