2014.08.12.ATPESC data analyticspress3.mcs.anl.gov/computingschool/files/2014/01/...Aug 12, 2014 · @Facebook 8/12/2014 ATPESC Avery Ching. Make the world more open & connected

Exascale Data Analytics@Facebook

8/12/2014 ATPESC

Avery Ching

Make the world more open & connected

How are analytics used?

DATA

Ads Targeting

Site Integrity

News FeedProduct Insights (Ads, Pages, Platform)

Recommendations

People-you-may-know Pages-you-may-like

A/B Test Experimentation

Framework

Graph Search

DATA

Ads Targeting

Site Integrity

News FeedProduct Insights (Ads, Pages, Platform)

Recommendations

People-you-may-know Pages-you-may-like

A/B Test Experimentation

Framework

Graph Search

>1.23B monthly

active users

4.75B content items shared daily

350M photos

uploaded daily

10B messages

daily

Evolution of the Warehouse

2007!

!

=>Relational Database

2011!

Sophisticated Hadoop Ecosystem

!

Log(Category,Event,Parameters)

Social

User contentTransform

Graph Search index

2014

Warehouse !!!!

600 TB/day 10PB/day

300PB Project 1 EB “soon”

2014Global Data Warehouse

Today

Data Ingestion

Scrapes

Log line reaches central storage (10s)

Log Storage

Copier-Loader

Log line reaches warehouse (1hr)

WarehouseMySQL

Log line generated: <user_id, photo_id>

Scribe

Core Analytics

Hive Query Compiler

Execution Engine

Corona Map-Reduce

Job Scheduler Resource Manager

Data Pipelines

HDFS !!!

Social Graph

User ContentTransform

Graph Search

Real-Time Stream Processing

Warehouse

PumaLog Storage

Stream processed (<1min)

Application Insights20M

15M

10M

5M

0M

AUG 8 AUG 10 AUG 12 AUG 14 AUG 16 AUG 18 AUG 20 AUG 22 AUG 24 AUG 26 AUG 28 AUG 30 SEP 1 SEP 3

Act

ive

Use

rs

Daily Active People Weekly Active People Monthly Active People

Real-Time Analysis

ScubaODS

Fast Answers

Query Parser Planner and Scheduler

HDFS !!!

PRESTO

Distributed SQL Execution Engine

Product Engineer

Business Analyst

Data Scientist

http://prestodb.io/

http://prestodb.io

Graph Analytics Graph & ML Apps

Giraph Platform

HDFS !!!

Corona Job Scheduler & Resource Manager

https://giraph.apache.org/

https://giraph.apache.org

Putting it all together

Scribe

Puma Streaming Analytics

Hive Query Compiler

Execution Engine

Corona Map-Reduce

Scheduler and Resource Manager

HDFS !!

HDFS !!

HDFS !!

Data Pipelines Giraph Graph

Analytics

Presto Interactive

Query

Scuba Real-time

Slice & Dice

Scaling the Platform

Storage

Not all Data is Equal

0

100000

300000

250000

150000

200000

50000

60 120

Partition Age

Nu

mb

er o

f acc

esse

s/d

ay

Data LifecycleHot Data Cold DataWarm Data

Format: Encoding, Compression, Sorting

Replication: Erasure Codes

Hardware: Storage Hierarchy

Storage Format

Application Input Stream Load/Transform

Hive Storage Format

HDFS Files

RcFile

High-availability

Sync Header

101 102 103 104 105

111 112 113 114 115

121 122 123 124 125

131 132 133 134 135

A B C D...

101 111 121 131

102 112 122 132

103 113 123 133

104 114 124 134

105 115 105 135

...

Table RCFile

Column Encodings

User Action Country

18 like ID

17 post BR

90 like US

38 comment BR

23 comment US

42 like US

Index Key

0 comment

1 friend

2 like

3 photo

4 post

2

4

2

0

0

2

* ORCFile developed in collaboration with HortonWorks

In practice

• Type-agnostic Encodings 20%

80%

• Adaptive Encoding

5x => 8x

• CPU/Space tradeoff

Reader and Writer PerformanceReads and writes need to be fast

Amortize Decisions

Memory Management

Lazy Operations

Experimental Results

https://github.com/facebook/hive-dwrf

https://github.com/facebook/hive-dwrf

Data LifecycleHot Data Cold DataWarm Data

Format: Encoding, Compression, Sorting

Replication: Erasure Codes

Hardware: Storage Hierarchy

Tying it all together

0

80

40

20

60

Sep-11

Dec-11

Mar-1

2

Jun-12

Dec-12

Mar-1

3

Sep-12

Simulated: 25 PB raw input data, 8% MOM growth Base physical = RCFile + 3x replication Optimized physical = ORCFile + Sorting + Erasure codes (applied gradually)

Base physical size !Optimized physical size

Compute

MapReduce == Easy parallel programming

!

Data partitioning / parallel processing

Scheduling with locality

Failure handling

MapReduce - Word count example

map(Object key, Text value) { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { context.write(new Text(itr.nextToken()), new IntWritable(1)); } }

Input: “An elephant is an elephant” Output: {an, 1} {elephant, 1} {is, 1} {an, 1} {elephant, 1}

Map (load text lines from TextInputFormat)

reduce(Text key, Iterable<IntWritable> values) { int sum = 0; for (IntWritable val : values) { sum += values.get(); } context.write(key, new IntWritable(sum)); }

Input: Map output pairs Output: {an, 2} {elephant, 2} {is, 1}

Reduce (Store text lines in TextOutputFormat)

Hadoop (v1)

Resource Management

Job ManagementJob Client

Task Tracker

1

Task Tracker

0

Task Tracker

2

M0 R0

M1 R1

M0

M1 R1

M0

M1

Job Tracker

R0

R1

R0

Temporary Solution: Queue up Jobs

Please take

a Number

9

Hadoop (v1)

Resource Management

Job ManagementJob Client

Task Tracker

1

Task Tracker

0

Task Tracker

2

M0 R0

M1 R1

M0

M1 R1

M0

M1

Job Tracker

R0

R1

R0

Corona

Resource ManagementJob Client

Task Tracker

1

Task Tracker

0

Task Tracker

2

M0 R0

M1 R1

M0

M1 R1

M0

M1

R0

R1

R0

Cluster Mgr

Resource Management

Job Management

Job Client

Task Tracker

1

Task Tracker

0

Task Tracker

2

M0 R0

M1 R1

M0

M1 R1

M0

M1

R0

R1

R0

Cluster Mgr

J0 J1 J1J0J1J0

J

Corona

Multi-Tenancy Challenges800+ users, 10’s of teams

Ads, BI, Data Science, Feed, Growth, Infrastructure etc.

!

!

!

!

!

!

!

!

bi.critical bi.nonslabi.sla bi.adhoc

Multi-Tenancy Challenges!

What happens to resources when pool is under-utilized?

bi

critical nonslasla adhoc

Pool Groups

Pool Priorities

!

Many other Improvements!

Resource Sandboxing

Online Upgrades

Restartable Job Trackers

Corona can scale to ...!

4000+ node Clusters

120K+ Jobs/Day

20m+ Tasks Completed/Day

What’s next?

!

Multi-Temperate Data

Seamless Multi-Namespace Query

Distributed Machine Learning

Questions?

Documents

2014.08.12.ATPESC data analyticspress3.mcs.anl.gov/computingschool/files/2014/01/...Aug 12, 2014 · @Facebook 8/12/2014 ATPESC Avery Ching. Make the world more open & connected