60
November 12, 2014 | Las Vegas, LV Eva Tse, Netflix

(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

Embed Size (px)

DESCRIPTION

As Netflix expands their services to more countries, devices, and content, they continue to evolve their big data analytics platform to accommodate the increasing needs of product and consumer insights. This year, Netflix re-innovated their big data platform: they upgraded to Hadoop 2, transitioned to the Parquet file format, experimented with Pig on Tez for the ETL workload, and adopted Presto as their interactive querying engine. In this session, Netflix discusses their latest architecture, how they built it on the Amazon EMR infrastructure, the contributions put into the open source community, as well as some performance numbers for running a big data warehouse with Amazon S3.

Citation preview

Page 1: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

November 12, 2014 | Las Vegas, LV

Eva Tse, Netflix

Page 2: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 3: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 4: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 5: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 6: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 7: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 8: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 9: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

Cloud

apps

Suro Ursula

CassandraAegisthus

Dimension data

Event Data

15 min

Daily

Amazon S3

SS tables

Page 10: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

Amazon S3

Storage Compute Service Tools

Page 11: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

Amazon S3v2.0

Storage Compute Service Tools

Page 12: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 13: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 14: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

• Works well on

Amazon Simple Storage

Service (S3)

Page 15: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 16: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 17: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 18: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 19: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

YARN-1864

YARN-2026

YARN-2012

YARN-2214

YARN-2360

YARN-2540

Page 20: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

S3

Page 21: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

S3

Page 22: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 23: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 24: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

Tez Plan

Tez Execution Engine

Logical Plan

Physical Plan

MR Plan

MR Execution Engine

MRCompilerTezCompilerd

Page 25: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 26: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 27: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

A Distributed SQL Query Engine for Big Data

Page 28: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

techblog.netflix.com

Page 29: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 30: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 31: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

21 committed PRs and 14 PRs in review

Page 32: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

S3

Page 33: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

v2.0

Page 34: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 35: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 36: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

techblog.netflix.com

Page 37: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 38: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 39: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 40: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 41: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 42: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 43: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 44: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

Amazon S3v2.0

d

Storage Compute Service Tools

Page 45: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 46: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 47: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 48: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 49: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

YARN-1864

YARN-2026

YARN-2012

YARN-2214

YARN-2360

YARN-2540

HIVE-6783

HIVE-6785

HIVE-6938

HIVE-7800

PARQUET-100

PARQUET-106

PARQUET-2

PARQUET-22

PARQUET-70

PARQUET-75

PARQUET-92

PARQUET-99

PIG-3986

Page 50: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 51: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 52: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 53: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 54: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 55: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 56: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 57: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 58: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
Page 59: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

Talk Time Title

PFC-305 Wednesday, 1:15pm Embracing Failure: Fault Injection and Service Reliability

BDT-403 Wednesday, 2:15pm Next Generation Big Data Platform at Netflix

PFC-306 Wednesday, 3:30pm Performance Tuning EC2

DEV-309 Wednesday, 3:30pm From Asgard to Zuul, How Netflix’s proven Open Source

Tools Can Accelerate and Scale Your Services

ARC-317 Wednesday, 4:30pm Maintaining a Resilient Front-Door at Massive Scale

PFC-304 Wednesday, 4:30pm Effective Inter-process Communications in the Cloud: The

Pros and Cons of Micro Services Architectures

ENT-209 Wednesday, 4:30pm Cloud Migration, Dev-Ops and Distributed Systems

APP-310 Friday, 9:00am Scheduling using Apache Mesos in the Cloud

Page 60: (BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

http://bit.ly/awsevals