38
Near line systems to improve Netflix recommendations Gopal Krishnan Feb 2015

Nearline systems to improve Netflix recommendations

Embed Size (px)

Citation preview

Page 1: Nearline systems to improve Netflix recommendations

Near line systems to improve Netflix recommendations

Gopal Krishnan

Feb 2015

Page 2: Nearline systems to improve Netflix recommendations

About me

Gopal Krishnan

Director, Consumer Science Engineering

Netflix, Inc.

Driving innovation through AB testing the member experience.

Twitter: @sgkrishnan

LinkedIn: https://www.linkedin.com/pub/gopal-krishnan/0/7a7/905

Page 3: Nearline systems to improve Netflix recommendations

Netflix: global streaming video service for TV and movies

Page 4: Nearline systems to improve Netflix recommendations

Netflix is available on 1000+ devices

Page 5: Nearline systems to improve Netflix recommendations

More than 57M members globally

• In more than 50 countries

• Planning to launch in all (200+) countries in 2 years.

Page 6: Nearline systems to improve Netflix recommendations

Netflix Consumes 34% of peak downstream bandwidth in North America

Page 7: Nearline systems to improve Netflix recommendations

Netflix Consumes 6% of peak upstream bandwidth in North America

Page 8: Nearline systems to improve Netflix recommendations

What my team does?

• Help improve rate of innovation through AB testing to improve member experience

• Infrastructure for algorithmic support

– Feature value store to help model training

– Services to store and serve explicit data sources

– Services to collect, process, validate, and serve implicit data sources

– Caching services

• Data improves our understanding of end to end user behavior

Page 9: Nearline systems to improve Netflix recommendations

Every part of Netflix is personalized

Page 10: Nearline systems to improve Netflix recommendations

Every part of Netflix is personalized

Page 11: Nearline systems to improve Netflix recommendations

Every part of Netflix is personalized

Page 12: Nearline systems to improve Netflix recommendations

NETFLIX RECOMMENDATIONS WITH ONLINE MICRO SERVICES

Page 13: Nearline systems to improve Netflix recommendations

Life Cycle of Netflix Recommendation Data

Devices

Data Collection

Offline Big Data Analysis

Netflix recommendation:

online services

Netflix API Netflix beacon telemetry

Page 14: Nearline systems to improve Netflix recommendations

Data Collection: explicit inputs

Plays

Star ratings

Page 15: Nearline systems to improve Netflix recommendations

Data Collection: explicit inputs

Page 16: Nearline systems to improve Netflix recommendations

Data Collection: explicit inputs

Virtual plays from new user on-boarding

Page 17: Nearline systems to improve Netflix recommendations

Outputs from offline analysis

Devices

Data Collection

Offline Big Data Analysis

Netflix recommendation:

online services

Netflix API Netflix beacon telemetry

“Implicit” Data Services

Popularity Targeting

User clustering

Page 18: Nearline systems to improve Netflix recommendations

Recommendations combines both online and aggregated offline data

Devices

Data Collection

Offline Big Data Analysis

Netflix recommendation:

online services

Netflix API Netflix beacon telemetry

“Explicit” Data Services

My List On Ramp

Taste pref

“Implicit” Data Services

Popularity Targeting

User clustering

Page 19: Nearline systems to improve Netflix recommendations

WHY BOTHER WITH NEAR LINE SYSTEMS THEN?

Page 20: Nearline systems to improve Netflix recommendations

Our algorithms became too complex to be computed online leading to higher latency.

Near line systems improve our availability story.

Near line systems allow us to innovate at a greater velocity.

Page 21: Nearline systems to improve Netflix recommendations

Near line systems improve agility and availability

Devices

Data Collection

Big Data Analysis(Hadoop, Teradata)

Netflix recommendation:

online services

Pre-computed recommendations

“Explicit” Data Services

“Implicit” Data Services

Post-processat run time

Page 22: Nearline systems to improve Netflix recommendations

Manhattan pre-compute engine

Manhattan: Netflix pre-compute engine

Video Ranker

Row selection

Similars

Top picks

Page 23: Nearline systems to improve Netflix recommendations

What data would improve recommendations even further?

Page 24: Nearline systems to improve Netflix recommendations

All UI Events from all key platforms

• Moving beyond explicit inputs from users, we would like to track all member activity to derive deeper insights.

• Challenges include:

– 1000s of device platforms

– Non-standardized UIs across different platforms

– Lack of earlier focus on tracking the browse experience

Page 25: Nearline systems to improve Netflix recommendations

Patterns arise in aggregate

Page 26: Nearline systems to improve Netflix recommendations

Challenges with collecting UI Events

• Consistent data semantics across lots of device and UI platforms.

• Scaling to handle billions of events.

• Near real-time semantic data quality and validation

• Dealing with data loss (low power devices, loss at the network, etc.)

Page 27: Nearline systems to improve Netflix recommendations

Canaries for data quality

Near real time feedback and validation on data quality.

Page 28: Nearline systems to improve Netflix recommendations

“Trending” on Netflix

Now being AB tested

Page 29: Nearline systems to improve Netflix recommendations

Near line systems for Netflix recommendations

Devices

Data Collection

Big Data Analysis(Hadoop, Teradata)

Netflix recommendation:

online services

Pre-computed recommendations

“Explicit” Data Services

“Implicit” Data Services

Post-processat run time

Near line data processing and serving

systems

Page 30: Nearline systems to improve Netflix recommendations

“Trending on Netflix” near line system

Take rates (play/impression)kafka stream

Cassandra

dashboards

StreamProcessing(ETA: low # of minutes)

Play start(kafka stream)

1000’s / sec

Impressions (kafka stream)

millions / sec

Page 31: Nearline systems to improve Netflix recommendations

“Trending on Netflix” near line system

Play start(kafka stream)

1000’s / sec

Impressions (kafka stream)

millions / secStream ProcessingWindowed operations.Small batches.Merging streams.Flexibility.

Take rates

Impressions rollup

Personalized Ranked videos

Merged to generate “Trending on Netflix”

Page 32: Nearline systems to improve Netflix recommendations

Spark Streaming at Netflix

• Collaborating with Databricks to make sure Spark (batch and streaming) works well in a cloud environment

– Resiliency and scalability testing

• Actively working on studying scaling needs for algorithmic needs for both Spark batch and Spark streaming.

Page 33: Nearline systems to improve Netflix recommendations

Spark at Netflix

• Several different use cases where we are interested in Spark – both batch and streaming.

• Largest Spark batch production cluster is 150 m3.2xl instances for personalization.

• Netflix has both Spark batch and Spark streaming in production.

Page 34: Nearline systems to improve Netflix recommendations

Spark at Netflix

• Integrating with Spark with Scala (mostly), python, and some SQL.

• Python typically via iPython notebook integration.

• Running in standalone mode or in mesos.

Page 35: Nearline systems to improve Netflix recommendations

Spark: areas to watch for.

• We have really not tested the multi-tenancy boundaries yet. Mostly spinning custom purpose clusters for now.

• Tuning the jobs and optimizing performance of jobs remains a challenge as we make steady inroads.

• Incrementally getting better with stability and scale as we tackle larger use cases this year.

Page 36: Nearline systems to improve Netflix recommendations

Netflix Tech Blog

• Tech blog about the “Trending on Netflix” row published today.

• Watch for upcoming tech blog from Netflix on near line systems and another one about Spark in the coming weeks.

Page 37: Nearline systems to improve Netflix recommendations

Now Hiring leaders and engineers!

Talk to me in person or at

Twitter: @sgkrishnan

LinkedIn:https://www.linkedin.com/pub/gopal-krishnan/0/7a7/905

Page 38: Nearline systems to improve Netflix recommendations