53
The Netflix API for a global service Katharina Probst Engineering Manager, API DevNexus, February 2016

The Netflix API for a global service

Embed Size (px)

Citation preview

Page 1: The Netflix API for a global service

The Netflix API for a global service

Katharina ProbstEngineering Manager, APIDevNexus, February 2016

Page 2: The Netflix API for a global service

What is Netflix?

Stream TV shows and movies anywhere, any time.

Page 3: The Netflix API for a global service

Global!(except China and where we can’t operate for legal reasons)

Page 4: The Netflix API for a global service

NetflixOriginals

Page 5: The Netflix API for a global service

Scale❏ Peak

downstream traffic in the US is 37%, upstream almost 7%.

❏ 75 Million subscribers worldwide and growing

Source: http://www.sandvine.com/news/global_broadband_trends.asp

Page 6: The Netflix API for a global service

Netflix API

❏ Architecture❏ Resiliency❏ Developer velocity❏ Tooling and DevOps❏ Current and future directions

API

Page 7: The Netflix API for a global service

Netflix API

❏ Architecture❏ Resiliency❏ Developer velocity❏ Tooling and DevOps❏ Current and future directions

API

Page 8: The Netflix API for a global service

API

Personali-zationEngine

User Info Ratings Similar

MoviesA/B TestEngine….

ELB

Zuul (gateway)

Page 9: The Netflix API for a global service

RxJava Hystrix

Java Service

Layer

Mid-tierServices

UI Teams

Client Server

Internet

Application

/tv/home

API Team

Service Teams

Page 10: The Netflix API for a global service

What is the API used for?

Examples:❏ Discovery

❏ Recommendations❏ Move metadata❏ Ratings

❏ Sign-up and Profiles❏ Playback

❏ Bookmarks❏ DRM

❏ A/B testing

API

Page 11: The Netflix API for a global service

Direct dependencies on other services

Page 12: The Netflix API for a global service

Netflix API

❏ Architecture❏ Resiliency❏ Developer velocity❏ Tooling and DevOps❏ Current and future directions

API

Page 13: The Netflix API for a global service

Hystrix Primer

❏ Protection from and control over

latency and failure from dependencies

❏ Stop cascading failures in a complex

distributed system

❏ Fall back and gracefully degrade

❏ Fail fast and rapidly recover

https://github.com/Netflix/Hystrix

Page 14: The Netflix API for a global service

API

Personali-zationEngine

User Info Ratings Similar

MoviesA/B TestEngine….

Page 15: The Netflix API for a global service

API

Personali-zationEngine

User Info Ratings Similar

MoviesA/B TestEngine….

Don’t let this happen.

Page 16: The Netflix API for a global service

API

Personali-zationEngine

User Info Ratings Similar

MoviesA/B TestEngine….

Don’t let this happen.

Page 17: The Netflix API for a global service

Fallback Response

Do this instead. API

Personali-zationEngine

User Info Ratings Similar

MoviesA/B TestEngine….

Page 18: The Netflix API for a global service

FailureInjectionTesting(FIT)

Goal: Study how the system behaves when a failures occur (e.g., backend service unreachable).

Page 19: The Netflix API for a global service

More automated failure testingGoal: Find groups of service calls that are needed for success.

http://techblog.netflix.com/2016/01/automated-failure-testing.html

Page 20: The Netflix API for a global service

Autoscaling & Capacity Management

http://nflx.it/1LvqLUi

Page 21: The Netflix API for a global service

Autoscaling & Capacity Management

❏ Red: traffic for current week (x-axis)❏ Black: traffic for previous week for comparison❏ What happened on February 7? Superbowl!

Page 22: The Netflix API for a global service

AWS Controls Reactive, does not scale up fast enough

Page 23: The Netflix API for a global service

Fine-grained Control with Scryer Complements AWS Controls

❏ Faster scale-up, improved cost❏ Use reactive policy for organic scale down

Page 24: The Netflix API for a global service

Netflix API

❏ Architecture❏ Resiliency❏ Developer velocity❏ Tooling and DevOps❏ Current and future directions

API

Page 25: The Netflix API for a global service

Lots of devices, lots of variety

Page 26: The Netflix API for a global service

Different interaction models

Page 27: The Netflix API for a global service

And just to make things a little more interesting….

❏ A/B tests❏ profiles❏ localization

Page 28: The Netflix API for a global service

Add server-side scripting capability

❏ Reduce network chattiness

❏ Support device optimizations

❏ Enable faster development for internal users

Page 29: The Netflix API for a global service

Discrete HTTP requests pay network tax repeatedly

Page 30: The Netflix API for a global service

Single, optimized request; pay network tax once

Client data assembly logic pushed to server

Page 31: The Netflix API for a global service

Local MethodRemote API

GET/users/{user_id}/lists

getLists(userId)

Page 32: The Netflix API for a global service

❏ UI (script) changes can happen independently

❏ Script changes can be pushed to running servers, so decoupled from API push schedule

❏ Decoupling leads to greater developer velocity

Impact on velocity and collaboration

Page 33: The Netflix API for a global service

Netflix API

❏ Architecture❏ Resiliency❏ Developer velocity❏ Tooling and DevOps❏ Current and future directions

API

Page 34: The Netflix API for a global service

Run 1% of your traffic on the new code and see how it does

Page 35: The Netflix API for a global service

❏ Errors: 2xx, 4xx, 5xx❏ latency❏ network❏ busy threads❏ load, memory consumption❏ ...

So you’ve run a canary. Now what?

Control Canary

Page 36: The Netflix API for a global service
Page 37: The Netflix API for a global service

Successful canary

red/black push

Page 38: The Netflix API for a global service

Continuous Delivery with Spinnaker

http://techblog.netflix.com/2015/09/moving-from-asgard-to-spinnaker.html

Page 39: The Netflix API for a global service

Quickly see status of all clusters

http://techblog.netflix.com/2015/09/moving-from-asgard-to-spinnaker.html

Page 40: The Netflix API for a global service

Prod is a little different….

Page 41: The Netflix API for a global service

The things you can do

… with server groups … with instances

Page 42: The Netflix API for a global service

Script Management

Page 43: The Netflix API for a global service
Page 44: The Netflix API for a global service

Operations

Page 45: The Netflix API for a global service

Operations

Page 46: The Netflix API for a global service

Operations

Page 47: The Netflix API for a global service

Real-time analysis

http://www.slideshare.net/g9yuayon/qcon-talk-on-netflix-mantis-a-stream-processing-system

Submit a query, see requests in real time.

Page 48: The Netflix API for a global service

Netflix API

❏ Architecture❏ Resiliency❏ Developer velocity❏ Tooling and DevOps❏ Current and future

directions

API

Page 49: The Netflix API for a global service

● > 900 active endpoints

● ~60 direct dependencies

● 78 thread pools● 1000+ threads● high memory usage

What we’vegrown to

Page 50: The Netflix API for a global service

Script isolation & node

❏ Groovy scripts run as part of API process

❏ UI teams would like to use other languages (in particular node.js)

var response = model.get("todos[0..2]

['name','done']");

API remote service layer

Client libs

UI/device scripts (node)

Falcor

Services

Page 51: The Netflix API for a global service

Thin client libraries

❏ Fat client libraries❏ business logic and

have❏ multiple dependencies

❏ Move business logic and dependencies to services

API remote service layer

Thin client libs

UI/device scripts (node)

Falcor

Services

Page 52: The Netflix API for a global service

Remove metadata from API servers

❏ Metadata takes up significant memory in API servers

❏ Challenge: reduce chattiness to metadata

MetadataService

API remote service layer

Thin client libs

UI/device scripts (node)

Falcor

Services

Page 53: The Netflix API for a global service

In the beginning...

Katharina Probst | [email protected] | www.linkedin.com/in/katharinaprobst