21
Hadoop Basic Concepts Marian Faryna

Hadoop core concepts

Embed Size (px)

Citation preview

Page 1: Hadoop core concepts

Hadoop Basic ConceptsMarian Faryna

Page 2: Hadoop core concepts

Buzzword BingoBigData Ecosystem HDFS Replica

factor

Data locality Hive Data variety Column-based storage

Commodity hardware

Resource manager

MapReduce High Availability

Coordination service Shuffling Eventual

consistencyName Node

Page 3: Hadoop core concepts

WHY HADOOP?

Page 4: Hadoop core concepts

PROBLEMSWITHRELATIONAL DATABASES

Page 5: Hadoop core concepts

HADOOP CAN:

● Process large data sets effectively● Work with structured/unstructured data● Process data in different modes

Page 6: Hadoop core concepts

ADDITIONAL HADOOP PROS

● High Availability● Horizontal Scalability● Commodity Hardware● BASE Principle

Page 7: Hadoop core concepts

Hadoop BASE Principle :

● Basically Available● Soft state● Eventually consistent

Page 8: Hadoop core concepts

100 nodes cluster. 800 scrobblers per second. 40 million per day

Hadoop cluster consist of 532 nodes, 120 million active users, 300+ million search queries daily

1100 nodes, processing 12 PB storage data. 200+ million active users. 30 million users update their statuses at least once each day

1650 nodes cluster 75+ millions of active users, 30+ million songs 1+ billion plays per day

Who is using Hadoop

Page 9: Hadoop core concepts

WHAT IS HADOOP?

Page 10: Hadoop core concepts

Hadoop is an ecosystem for distributed processing and distributed storage large sets of data

Page 11: Hadoop core concepts

HADOOPEcosystem

column-based storage

coordinationservice

Page 12: Hadoop core concepts
Page 13: Hadoop core concepts

New Node

Journal Node

Basically Available

Highly Available

replica factor

HDFS

Page 14: Hadoop core concepts

Eventual Consistency

BASE Principle

Page 15: Hadoop core concepts

Soft State

BASE Principle

Page 16: Hadoop core concepts

MAPREDUCE

Page 17: Hadoop core concepts

MapReduce algorithm

Page 18: Hadoop core concepts

MapReduce example

Page 19: Hadoop core concepts

Altogether

data locality

resource manager

Page 20: Hadoop core concepts

IT’S ALL ABOUT DATAprocessing

And remember

Page 21: Hadoop core concepts

Thanks!Any questions?