53

Netflix viewing data architecture evolution - QCon 2014

Embed Size (px)

DESCRIPTION

Netflix's architecture for viewing data has evolved as streaming usage has grown. Each generation was designed for the next order of magnitude, and was informed by learnings from the previous. From SQL to NoSQL, from data center to cloud, from proprietary to open source, look inside to learn how this system has evolved. (from talk given at QConSF 2014)

Citation preview

Page 1: Netflix viewing data architecture evolution - QCon 2014
Page 2: Netflix viewing data architecture evolution - QCon 2014

Please read the notes associated with each slide for the full context of the

presentation

Page 3: Netflix viewing data architecture evolution - QCon 2014

Who am I?

Philip Fisher-Ogden• Director of Engineering @

Netflix

• Playback Services (making “click play” work)

• 6 years @ Netflix, from 10 servers to 10,000s

Page 4: Netflix viewing data architecture evolution - QCon 2014

Story

Netflix streaming – 2007 to present

Page 5: Netflix viewing data architecture evolution - QCon 2014

Device Growth

20071 device

200810s of devices

200910s of devices

2010100s of devices

2011+1000+ devices

Page 6: Netflix viewing data architecture evolution - QCon 2014

Experience Evolution

Page 7: Netflix viewing data architecture evolution - QCon 2014

Subscribers & Viewing

53M global subscribers

50 countries

>2 billion hours viewed per month

Page 8: Netflix viewing data architecture evolution - QCon 2014

Improved Personalization

Better Experience

Viewing

Virtuous Cycle

Page 9: Netflix viewing data architecture evolution - QCon 2014

Viewing Data

Who, What, When, Where, How Long

Page 10: Netflix viewing data architecture evolution - QCon 2014

Real time data use cases

What have I watched?

Page 11: Netflix viewing data architecture evolution - QCon 2014

Real time data use cases

Where was I at?

Page 12: Netflix viewing data architecture evolution - QCon 2014

Real time data use cases

What else am I watching?

Page 13: Netflix viewing data architecture evolution - QCon 2014

Session Analytics

Page 14: Netflix viewing data architecture evolution - QCon 2014

Session Analytics

Page 15: Netflix viewing data architecture evolution - QCon 2014

Active Sessions

Last Position

Viewing History

Data Feed

Generic Architecture

Start Stop

Collect

ProcessStream State

Session Summary

Event Stream

Provide

Page 16: Netflix viewing data architecture evolution - QCon 2014

Architecture Evolution

• Different generations

• Pain points & learnings

• Re-architecture motivations

Page 17: Netflix viewing data architecture evolution - QCon 2014

Real Time Data

2007 2009 20102008 2011 2012 2013 2014 Future

SQL

No

SQL

Cac

hin

g

redismemcached

Page 18: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 1

2007 2009 20102008 2011 2012 2013 2014 Future

SQL

No

SQL

Cac

hin

g

redismemcached

Page 19: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 1

Start Stop

SessionsLogs / Events

History / Position

SQL

Page 20: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 1 pain points

• Scalability

– DB scaled up not out

• Event Data Analytics

– ad hoc

• Fixed schema

Page 21: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 2

2007 2009 20102008 2011 2012 2013 2014 Future

SQL

No

SQL

Cac

hin

g

redismemcached

Page 22: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 2 motivations

• Scalability

– Scale out not up

• Flexible schema

– Key/value attributes

• Service oriented

Page 23: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 2

Start Stop

No

SQL

50 data partitions

Viewing Service

Page 24: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 2 pain points

• Scale out

– Resharding was painful

• Performance

– Hot spots

• Disaster Recovery

– SimpleDB had no backups

Page 25: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3

2007 2009 20102008 2011 2012 2013 2014 Future

SQL

No

SQL

Cac

hin

g

redismemcached

Page 26: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 landscape

• Cassandra 0.6

• Before SSDs in AWS

• Netflix in 1 AWS region

Page 27: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 motivations

• Order of magnitude increase in requests

• Scalability

– Actually scale out rather than up

Page 28: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

StatelessTier

(fallback)

Sessions

Viewing History

Mem

cach

ed

Page 29: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Start

Stop

Page 30: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

Page 31: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

update

Page 32: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

snapshot

Sessions

Page 33: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

Viewing History

Mem

cach

ed

Page 34: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 reads

Vie

win

g Se

rvic

e

StatefulTier

What have I

watched?

Viewing History

Mem

cach

ed

View Summary

Page 35: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 reads

Vie

win

g Se

rvic

e

StatefulTier

Latest PositionsWhere

was I at?

Viewing History

StatelessTier

(fallback)

Mem

cach

ed

Page 36: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 reads

Vie

win

g Se

rvic

e

StatefulTier

What else am I

watching?

Active Sessions

Page 37: Netflix viewing data architecture evolution - QCon 2014

gen 3 - Requests ScaleOperation Scale

Create (start streaming) 1,000s per second

Update (heartbeat, close) 100,000s per second

Append (session events/logs) 10,000s per second

Read viewing history 10,000s per second

Read latest position 100,000s per second

Page 38: Netflix viewing data architecture evolution - QCon 2014

gen 3 – Cluster ScaleCluster Scale

Cassandra Viewing History ~100 hi1.4xl nodes~48 TB total space used

Viewing Service Stateful Tier ~1700 r3.2xl nodes50GB heap memory per node

Memcached ~450 r3.2xl/xl nodes~8TB memory used

Page 39: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 pain points

• Stateful tier

– Hot spots

– Multi-region complexity

• Monolithic service

• read-modify-write poorly suited for memcached

Page 40: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 3 learnings

• Distributed stateful systems are hard

– Go stateless, use C*/memcached/redis…

• Decompose into microservices

Page 41: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 4

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

StatelessTier

(fallback)

Viewing History

Sessions

Mem

cach

ed

Page 42: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 4

Stream State/Event Collectors

Stateless Microservices

Data Processors

Data Services Data Feeds

Page 43: Netflix viewing data architecture evolution - QCon 2014

Real Time Data – gen 4

Viewing History

Session State Session Positions

Session Events

Data Tiers

red

isre

dis

Page 44: Netflix viewing data architecture evolution - QCon 2014

Session Analytics

• Summarize detailed event data

• Non-real time, but near real time

• Some shared logic with real time

Page 45: Netflix viewing data architecture evolution - QCon 2014

Session Analytics - Processing

2007 2009 20102008 2011 2012 2013 2014 Future

CustomService

(Java on AWS)

Mantis

Bat

ch

Nea

r R

eal-

Tim

e

Stre

am

Pro

cess

ing

Page 46: Netflix viewing data architecture evolution - QCon 2014

Session Analytics - Storage

2007 2009 20102008 2011 2012 2013 2014 Future

Bat

ch

Nea

r R

eal-

Tim

e

Stre

am

Pro

cess

ing

Page 47: Netflix viewing data architecture evolution - QCon 2014

Session Analytics – gen 1

• Storage • Processing

Sess

ion

sLo

gs

Page 48: Netflix viewing data architecture evolution - QCon 2014

Session Analytics – gen 1 pain points

• MapReduce good for batch

– Not for near real time

• Complexity

– Code in 2 systems / frameworks

– Operational burden of 2 systems

Page 49: Netflix viewing data architecture evolution - QCon 2014

Session Analytics – gen 2

• Storage • Processing

Session Events & Logs

Java

Page 50: Netflix viewing data architecture evolution - QCon 2014

Session Analytics – gen 2 learnings

• Reduced complexity

– shared code and ops

• Batch still available

• New bottleneck

– harder to extend logic

Page 51: Netflix viewing data architecture evolution - QCon 2014

Session Analytics – gen 3 (*)

• Storage • Processing

Mantis

Storm

Samza

Spark Streaming

Stream Processing Frameworks

Page 52: Netflix viewing data architecture evolution - QCon 2014

Takeaways

• Polyglot Persistence– One size fits all doesn’t

fit all

• Strong opinions, loosely held– Design for long term, but

be open to redesigns

Page 53: Netflix viewing data architecture evolution - QCon 2014

Thanks!

@philip_pfo