Upload
maryan-faryna
View
207
Download
0
Embed Size (px)
Citation preview
Hadoop Basic ConceptsMarian Faryna
Buzzword BingoBigData Ecosystem HDFS Replica
factor
Data locality Hive Data variety Column-based storage
Commodity hardware
Resource manager
MapReduce High Availability
Coordination service Shuffling Eventual
consistencyName Node
WHY HADOOP?
PROBLEMSWITHRELATIONAL DATABASES
HADOOP CAN:
● Process large data sets effectively● Work with structured/unstructured data● Process data in different modes
ADDITIONAL HADOOP PROS
● High Availability● Horizontal Scalability● Commodity Hardware● BASE Principle
Hadoop BASE Principle :
● Basically Available● Soft state● Eventually consistent
100 nodes cluster. 800 scrobblers per second. 40 million per day
Hadoop cluster consist of 532 nodes, 120 million active users, 300+ million search queries daily
1100 nodes, processing 12 PB storage data. 200+ million active users. 30 million users update their statuses at least once each day
1650 nodes cluster 75+ millions of active users, 30+ million songs 1+ billion plays per day
Who is using Hadoop
WHAT IS HADOOP?
Hadoop is an ecosystem for distributed processing and distributed storage large sets of data
HADOOPEcosystem
column-based storage
coordinationservice
New Node
Journal Node
Basically Available
Highly Available
replica factor
HDFS
Eventual Consistency
BASE Principle
Soft State
BASE Principle
MAPREDUCE
MapReduce algorithm
MapReduce example
Altogether
data locality
resource manager
IT’S ALL ABOUT DATAprocessing
And remember
Thanks!Any questions?