18
Pravin Singh introducing BIG DATA

Introducing Big Data

Embed Size (px)

Citation preview

Page 1: Introducing Big Data

Pravin Singh

introducing

BIG DATA

Page 2: Introducing Big Data

WHAT THE HECK IS BIG DATA?

Any collection of data sets so large and complex that it becomes difficult to process using current data management tools or traditional data processing applications.

Volume

• Exceeds physical limits of vertical scalability

Velocity

• Decision window small due to data change rate

Variety

• Many different formats make integration expensive

Page 3: Introducing Big Data

WHY SO LOW-COST?

Source: EMC

Page 4: Introducing Big Data

WHY SO LOW-COST?

Source: EMC

Page 5: Introducing Big Data

WHY SO FAST?

Massive Parallel Processing Data Locality Optimized for write once – read many Sequential reads, not random access

Page 6: Introducing Big Data

Hello Hadoop!

You have an interesting name.

1

Page 7: Introducing Big Data

Hadoop Architecture

Source: Hortonworks

Page 8: Introducing Big Data

The Hadoop Zoo

HDFS

MapReduce

Pig Hive HCat Giraph Mahout

Zookeeper

Page 9: Introducing Big Data

The Real Simple Hadoop Architecture

MapReduce Engine

JobTracker TaskTracker 1

TaskTracker 2 … TaskTracker

N

HDFS ClusterNameNod

eDataNode

1DataNode

2 … DataNode N

Page 10: Introducing Big Data

Hello HDFS!

Have we met before?

2

Page 11: Introducing Big Data

HDFS

My Data.txt

150 MB

64 MB

64 MB

22 MBName Node

64 MB64 MB

64 MB64 MB

22 MB22 MB

Page 12: Introducing Big Data

3 Hello MapReduce!

Have you lost some weight?

Page 13: Introducing Big Data

MapReduce

Input File Map

<Key, Value> <Key, Value><Key, Value>

.

.

Shuffle & Sort

<Key, Value> <Key, Value><Key, Value>

.

.

Reduce Result

Page 14: Introducing Big Data

MapReduce

Big Data for Dummies.txt

How many times the words “Big data” and“Hadoop” show up?

Page 15: Introducing Big Data

MapReduce

<Big data, 7><Hadoop, 4>

<Big data, 9><Hadoop, 6>

<Big data, 3><Hadoop, 8>

<Big data, 7><Big data, 9><Big data, 3><Hadoop, 4><Hadoop, 6><Hadoop, 8>

<Big data, 7, 9, 3><Hadoop, 4, 6, 8>

<Big data, 19><Hadoop, 18>

Page 16: Introducing Big Data

Let’s Play MapReduce!’coz All Talk and No Play Makes Session a Dull Affair.

Page 17: Introducing Big Data

?Questions. Comments. Feedback.

Page 18: Introducing Big Data

See you at the (Data) Lake Next Time.THANK YOU!