Upload
shanon-leonard
View
214
Download
1
Embed Size (px)
Citation preview
HADOOP
SEMINAR ON
Guided by:Prof. D.V.Chaudhari
Seminar by:Namrata
SakhareRoll No: 65B.E.Comp
HISTORY OF HADOOP
Large businesses needed to go through terabytes and petabytes of data. This data was initially control by a single powerful computer. But due to its limitation, it can handle data up to certain limits.To solve this problem, Google publicized MapReduce.
MapReduce : A system which supports distributed computing on large data sets on clusters.
Many other businesses were facing the same problem of scaling.Therefore, Doug Cutting developed an open source version of MapReduce system called HADOOP.
WHAT IS HADOOP ?•Hadoop is framework of tools.•The objective of hadoop is ,it supports running application on big data.•It is an open source set of tools and distributed under Apache License.•It is powerful tool designed for deep analysis and transaction of very large data .
BIG DATA•The keyword behind hadoop is BIG DATA.•Big data facing challenges
Velocity Variety
Volume
Big Data
TRADITIONAL APPROACH
BIG DATA Powerful Computer
Processed by
BIG DATA Powerful ComputerProcessing Limits
HADOOP APPROACH
BIG DATABroken Into Pieces
BIG DATA
Computation
Computation
Computation
Computation
Combined Result
Combined Result
COMPUTATION OF DATA
ARCHITECTURE
MapReduce
HDFS
Task tracker
Name Node
Date Node
Job Tracker
MASTER SLAVE ARCHITECTURE
Task tracke
rData node
Task tracke
r
Task tracke
r
Task tracke
r
Data node
Data node
Data node
Data node Name node
Task tracker Job tracker
Master
Slave
JOB TRACKER
Task tracke
rData node
Task tracke
r
Task tracke
r
Task tracke
r
Data node
Data node
Data node
Data node Name node
Task tracker Job tracker
NAME NODE
Task tracke
rData node
Task tracke
r
Task tracke
r
Task tracke
r
Data node
Data node
Data node
Data node Name node
Task tracker Job tracker
Master
Slave
TASK TRACKER AND DATA NODE
Task tracke
r
Data node
Task tracke
r
Task tracke
r
Task tracke
r
Data node
Data node
Data node
FAULT TOLERANCE FOR DATA
Task tracke
rData node
Task tracke
r
Task tracke
r
Task tracke
r
Data node
Data node
Data node
Data node Name node
Task tracker Job tracker
Master
Slave
HDFS
FAULT TOLERANCE FOR PROCESSING
Task tracke
rData node
Task tracke
r
Task tracke
r
Task tracke
r
Data node
Data node
Data node
Data node Name node
Task tracker Job tracker
Master
Slave
MAPREDUCE
MASTER BACK UP
Task tracke
rData node
Task tracke
r
Task tracke
r
Task tracke
r
Data node
Data node
Data node
Data node Name node
Task tracker Job tracker
Master
Slave
Tables are
backed up
EASY PROGARMMING Where the file is
located
How to manage failures
How to break
computations into pieces
How to program for
scaling
Don’t have to worry about
Programmer
FEATURES OF HADOOP
Main Features Of Hadoop :•Works on distributed model.. :It Works on numerous low cost computer instead of single powerful computer.
•Linux based set of tools. : It Works On Linux Operating System.
TOOLS IN HADOOP
Tools Of HADOOP
Scoop
Flume
Oozie
Pig
Mahout
Hbase
Hive
IMPLEMENTATION OF HADOOP
•Yahoo•IBM•FACEBOOK•AMAZON•AMERICAN AIRLINES•THE NEWYORK TIMES•EBAY
THANK YOU…