68
Improving the power of a picture via A/B testing Gopal Krishnan Director of Engineering Dale Elliott Senior Software Engineer Kenny Xie Senior Data Scientist

Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Embed Size (px)

Citation preview

Page 1: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Improving the power of a picture via A/B testingGopal Krishnan Director of EngineeringDale Elliott Senior Software EngineerKenny Xie Senior Data Scientist

Page 2: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 3: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 4: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 5: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

TV is a lean back experience

Page 6: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

90 seconds

Page 8: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Pop Quiz

Page 9: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

A round plane figure whose boundary (the

circumference) consists of points equidistant from a fixed point (the center).

Page 10: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 11: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

A round plane figure whose boundary (the

circumference) consists of points equidistant from a fixed point (the center).

Page 12: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 13: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Can we do better?

Page 14: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 15: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Sensitivity test

Page 16: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

The Short Game

Page 17: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Single title A/B test result

14% better 6% better

Page 18: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Testable Hypothesis

Page 19: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Displaying better artwork will result in greater engagement and retention by helping members discover stories they will enjoy even faster.

Page 20: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Data Driven

Page 21: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Netflix API serviceBeacon (telemetry collection service)

Hive (computes artwork performance metrics for every title/country/locale

pair)

Netflix Image Library

Device (PS3, website, etc.)

Feedback loop

Serve artwork based on A/B logic

Feed with artwork based on perf metric

Collect plays & client impressions

Page 22: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Anatomy of artwork

Page 23: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Stable Image id for ground truth data

source-file-id-1 source-file-id-3source-file-id-2

Lineage-id-1

Page 24: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Diversity matters

Page 25: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Diversity matters

Page 26: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Pop Quiz

1 2

4 5 6

3

Page 27: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Building the A/B tests

vs.

Page 28: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Pairs of Explore and Exploit Tests

Explore Test

Current production explore

New explore

Exploit Test

Current production exploit

New exploit

Winner

Winner

● No member overlap● Explore and exploit allocation happens

simultaneously

Page 29: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Multi-title explore allocation test

Page 30: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6

Title 1 Control Image Test Image 1 Test Image 2 Test Image 3 Test Image 4 Test Image 5

Title 2 Control Image Test Image 1 Test Image 2 Test Image 3 Test Image 4 Test Image 5

... ... ... ... ... ... ...

Title n Control Image Test Image 1 Test Image 2 Test Image 3 Test Image 4 Test Image 5

Test Evolution: Single Title to Multiple Titles

Single title, multi-cell test

Page 31: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Engineering implementation / complexity

• Our A/B infrastructure is optimized for comparing test cells to each other

• Need to compare data across cells for one title of many

• Avoid creating hundreds of tests (one per title)

Page 32: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Solution:• Treat all the members who see a title’s images as a virtual test

• Impression tracking -- not just test cell allocation -- defines test population per title

Engineering implementation / complexity

Allocated Members

Title A impres-sions

Title B impres-sions

Page 33: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Problems with multi-title, multi-cell test

• Cohorts of testers who all saw the same set of images

• Same number of images for every title

Page 34: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Single-cell explore allocation test

Page 35: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Title 1

“Cells” 1 2 3 4 5 6

Image Control Image 1 Image 2 Image 3 Image 4 Image 5

Title 2

“Cells” 1 2 3 4

Image Control Image 1 Image 2 Image 3

Test Evolution: Images per titleMulti-cell explore evolves to Single-cell explore

Devolves?

Virtual Tests inside one test cell

Page 36: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Engineering implementation / complexity

Goals• No cohorts

• Image stickiness

• No persistent storage

We used a deterministic, pseudo-random calculation• new Random(memberID * titleId).nextInt(numImages)

Page 37: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Netflix API Service

Engineering implementation / complexity

No persistence neededCells Cell 1 Cell 2

Title 1

Ctrl Image Random of [Ctrl, Test 1, ... Test X1]

Title 2

Ctrl Image Random of [Ctrl, Test 1, ... Test X2]

... ... ...

Title n Ctrl Image

Random of [Ctrl, Test 1, ... Test Xn]

Image Data Feed

(Title ID, Image Lists)

Netflix Image Lib.

Random assignment to all test members.

Single-cell explore test

Page 38: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

● No more cohorts

● Flexible

● Clear winners for many titles

● Overall win based on key metrics

Can we do better?

Result

Page 39: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Problems

• Over exposure of under-performing images

• Under exposure of niche titles

• Unfair burden on testers

Page 40: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Title-level allocation test

Page 41: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Solution: Title-Level Allocation

• Limit allocated members per title

• Less exposure of under-performing images

• Still get enough data to determine winner

• Allocate from a gigantic pool

• More exposure for niche titles

• Spreads testing burden

Page 42: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Test Evolution: Testers per titleC

Title A

Title B

Title C

Title A

Title B

● Some titles have few testers in the small pool

● Most titles have full testing allocation from larger pool

Page 43: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Engineering implementation / complexity• Goals from previous test

• No cohorts• Image stickiness• No persistent storage

• New goals• Less exposure for under-performing images• More exposure for niche titles• Faster decision and rollout of winning images

• This time, we needed to persist the allocations

Page 44: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Netflix API Service

Architecture

Image Data Feed

Yellow Square

(Y2)

Netflix Image LibraryMember

Allocated?

Title fully Allocated

?

Allocate with Random Assignment

Log and storeAllocation

SelectAssigned Image

SelectControl Image

SelectAssigned Image

No

No

Yes

Yes

Title Metadata Service (VMS)

Kafka

Page 45: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Oops

● Underestimated traffic

● Many titles allocated per member at once

● Write to Y2 for every allocation

Result: Service disruption; we had to turn off the test

Page 46: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Netflix API Service

ScalingImage Data Feed

Yellow Square

(Y2)

Netflix Image Library

Allocate with Random Assignment

Log and storeAllocation

KafkaStream

Processor

1 write per member every 30 sec.

Storing allocations as they occurred overloaded Yellow Square.

Now, we log them to a stream and consolidate many writes into one.

Page 47: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain
Page 48: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Who to Test on?

Test on the same population you are planning to rollout the changes to

Page 49: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Two Member Cohorts

• New Members are assigned to the experimental condition at the time of sign-up

• Existing Members are assigned to the experimental condition any time after free trial ended

Page 50: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Decision Focuses More on New Members

• A “pure” sample which is not tainted by a previous Netflix experience

• A more sensitive sample (“on the fence”)

Page 51: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Tiers of Metrics• Primary: Customer retention• Secondary: Streaming hours• Tertiary: all other customer engagement metrics

• Play rate• Number of Netflix visits• ...

Page 52: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

How to Pick the Winner in Explore?

• Take fraction = (number of users played the title) / (number of users been seen the title)

• Correlated with retention

• Measurable from day one

Page 53: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

What is a Play?

Page 54: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

What is a Play?

Page 55: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

What is a Play?

Page 56: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Does Impression Location Matter?

Page 57: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Does Impression Location Matter?

Page 58: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Does Impression Location Matter?

Page 59: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Does it Matter How Many Impressions it Takes to Play?

Netflix just recommended an awesome show to me and I am going to watch it!!!

Page 60: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Does it Matter How Many Impressions it Takes to Play?

I have seen the show on Netflix a few times. Maybe, I should try it...

Page 61: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Take Fraction is NOT as trivial as its definition implies.

Page 62: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

How to Make the Final Decision?

Final decision is based on the exploit test• Retention movement

• Streaming hours movement

• Engagement with titles explored in the test, titles not explored in the test

• ….

Page 63: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Our Image Selection Test is a Win!

• Improved customer retention

• Improved customer engagement

Page 64: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Some Learnings

Page 65: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Emotions excellent to convey complex nuances

Page 66: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Great stories travel - but regional nuances can be powerful

Page 67: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Nice Guys Often Finish Last

Page 68: Improving the power of a picture at Netflix -- the Science and Engineering Behind the Curtain

Contact:Gopal KrishnanDale ElliottKenny Xie

More details available at Netflix techblog.

Talk to us outside at the booth.