88
Ad Serving at Spotify Scale A journey of incremental full stack overhaul Kinshuk Mishra, Director of Engineering [email protected] @_kinshukmishra

Qcon London 2017 - Architecture overhaul - Ad serving @ Spotify scale

Embed Size (px)

Citation preview

Page 1: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Ad Serving at Spotify ScaleA journey of incremental full stack overhaul

Kinshuk Mishra, Director of [email protected]@_kinshukmishra

Page 2: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

A lucky mistake

Page 3: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Expected consequences

Page 4: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Sarcastic empathy

Page 5: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Some valuable feedback

Page 6: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

The unintended consequence

Artist engagement for exposed users went up

Page 7: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

The unintended consequence

Promising insights about content promotion use-case

Page 8: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

The unintended consequence

Confirmation that the ad server is a powerful

messaging platform

Page 9: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Why should you care?

Page 10: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Introduction

Ad technology

stack

Architecture Evolution

Page 11: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Introduction

Ad technology

stack

Architecture Evolution

Page 12: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

What I do● Founded ads engineering team at Spotify in 2011

● Build all things ads engineering - team & software

● Major focus areas :

○ Ad delivery (Backend and Web)

○ Multi-platform native ads (Client Platform)

○ Ad performance (ML and Data)

Page 13: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

3 noteworthy things

Page 14: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Full stack refactor

Evolution at scale

Pragmatic choices

Page 15: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

100,000,000+ MAU

Page 16: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

50,000,000+Subscribers

Page 17: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

30,000,000+ Songs

Page 18: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

2,000,000,000+ Playlists

Page 19: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

$5,000,000,000+ Revenue paid to rightsholders

Page 20: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

60 Markets

Page 21: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Platform Ubiquity

Page 22: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Freemium business model

Page 23: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Ad

Page 24: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale
Page 25: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale
Page 26: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale
Page 27: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Introduction

Ad technology

stack

Architecture Evolution

Page 28: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Beauty of Ad Server

Relevancy Pacing Unique View Sequence Optimization

Page 29: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Complexity of Ad tech ecosystem

Page 30: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

In essence it is pretty simple

Client

User Profile database

Ad Server

Campaign Management Portal

Billing/Reporting

Ad campaign database

Data CollectionSystem

Page 31: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Spotify Ads infrastructure in 2011

EdgeServiceDesktop

LogDelivery HDFS

User Profile

Batch

Basic Ad Server

CampaignManagement

Billing/Reporting

Page 32: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Spotify Ads infrastructure in 2017

iOS

EdgeService

Android

Ads SDK

Desktop

Web

Chromecast/Playstation/

FireTV

Ad Aggregation

Service

LogDelivery GCS

User Profile

Targeting Service

DMP

Stream Batch

Ad Server

Decision Delivery Ad Exchanges

CampaignManagement

Optimization

ModelingSelf-Serve Portal

Creative Generation Payments

Billing/Reporting

Page 33: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Multi-platform clients

iOS

EdgeService

Android

Ads SDK

Desktop

Web

Chromecast/Playstation/

FireTV

Ad Aggregation

Service

LogDelivery GCS

User Profile

Targeting Service

DMP

Stream Batch

Ad Server

Decision Delivery Ad Exchanges

CampaignManagement

Optimization

ModelingSelf-Serve Portal

Creative Generation Payments

Billing/Reporting

Page 34: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Data collection

iOS

EdgeService

Android

Ads SDK

Desktop

Web

Chromecast/Playstation/

FireTV

Ad Aggregation

Service

LogDelivery GCS

User Profile

Targeting Service

DMP

Stream Batch

Ad Server

Decision Delivery Ad Exchanges

CampaignManagement

Optimization

ModelingSelf-Serve Portal

Creative Generation Payments

Billing/Reporting

Page 35: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Intelligence

iOS

EdgeService

Android

Ads SDK

Desktop

Web

Chromecast/Playstation/

FireTV

Ad Aggregation

Service

LogDelivery GCS

User Profile

Targeting Service

DMP

Stream Batch

Ad Server

Decision Delivery Ad Exchanges

CampaignManagement

Optimization

ModelingSelf-Serve Portal

Creative Generation Payments

Billing/Reporting

Page 36: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Ad Delivery

iOS

EdgeService

Android

Ads SDK

Desktop

Web

Chromecast/Playstation/

FireTV

Ad Aggregation

Service

LogDelivery GCS

User Profile

Targeting Service

DMP

Stream Batch

Ad Server

Decision Delivery Ad Exchanges

CampaignManagement

Optimization

ModelingSelf-Serve Portal

Creative Generation Payments

Billing/Reporting

Page 37: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Demand fulfillment

iOS

EdgeService

Android

Ads SDK

Desktop

Web

Chromecast/Playstation/

FireTV

Ad Aggregation

Service

LogDelivery GCS

User Profile

Targeting Service

DMP

Stream Batch

Ad Server

Decision Delivery Ad Exchanges

CampaignManagement

Optimization

ModelingSelf-Serve Portal

Creative Generation Payments

Billing/Reporting

Page 38: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Now you know too

Ad server is a powerful messaging platform

Page 39: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Introduction

Ad technology

stack

Architecture Evolution

Page 40: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Architecture overhaul is hard

● While keeping the business running

● While innovating on new products

● When you should have done it yesterday

Page 41: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Why did Spotify evolve Ads architecture?

Page 42: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Future needs●

● Growth in scale

● Emergence of new client platforms

● Cheap cloud computing

● New products to meet business objectives

● Technical debt

Page 43: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

The 3 stories

Page 44: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Fixing the legacy mess

Story 1

Page 45: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Original ad server designEdge Service

Routerhash(userid)

Ad server ring with partitions

Ad server instance

Memcache

Memcache

Memcache

Memcache

Campaign DB

User DBDesktop

Rendering Ad trigger decisioning

Ads Ranking

Ads Caching

Ad batching & fetch communication

Page 46: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Problems

Page 47: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Stateful service with faulty persistence

Page 48: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Cache as a data store

Page 49: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Service cluster as a hashed ring

Page 50: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Ad decisioning in Client

Page 51: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Batch Client-Server Calls

Page 52: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Fix strategy

Page 53: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Fix strategy tactic

Page 54: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Isolate refactor to one system at a time

Page 55: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

The ad server transition

EdgeService

LogDelivery HDFS

User Profile

Batch

Smart Ad Server

CampaignManagement

Billing/Reporting

Ad Server Proxy

(routing) Basic Ad Server

Gradual transition from basic to smart ad serving

Desktop

Rendering Ad trigger decisioning

Ads Ranking

Ads Caching

Ad batching & fetch communication

Page 56: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

After the ad server transition

Proxy Service

LogDelivery HDFS

User Profile

Batch

CampaignManagement

Billing/Reporting

Smart Ad Server

Desktop

Rendering Ad trigger decisioning

Ads Ranking

Ads Caching

Ad batching & fetch communication

Page 57: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Lean, mean and fast

Story 2

Page 58: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Division of responsibilities

Desktop iOS

Android

Ads SDK

Desktop

Web

Rendering Ad trigger decisioning

Ads Ranking

Ads Caching

Ad batching & fetch communication

Ad decisioning

Ad fetch orchestration

Client context

Ad Trigger & Render

Before After

Page 59: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Problems

Page 60: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Thick Clients

Page 61: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Logic duplication

Page 62: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Tightly coupled monolith

Page 63: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Fix strategy

Page 64: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Reduce State Management

Page 65: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Break monolith into services

Page 66: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Isolate platform independent logic into a lib

Page 67: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Fix tactic

Page 68: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Design your systems to be master of one thing

Page 69: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Remember division of responsibilities?

Desktop iOS

Android

Ads SDK

Desktop

Web

Rendering Ad trigger decisioning

Ads Ranking

Ads Caching

Ad batching & fetch communication

Ad decisioning

Ad fetch orchestration

Client context

Ad Trigger & Render

BAD GOOD

Page 70: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Multiplatform Client design

iOS

Proxy Service

Android

Ads SDK

Desktop

Web

Chromecast/Playstation/

FireTV

Ad Aggregation

Service

LogDelivery GCS

User Profile

Targeting Service

DMP

Stream Batch

Ad Server

Decision Delivery Ad Exchanges

CampaignManagement

ModelingSelf-Serve Service

Creative Generation Payments

Billing/Reporting

Page 71: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Knowledge is power, Unreliable data is your enemy

Story 3

Page 72: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Event Stream Historical

ETL1 ETL2 ETL3

UserEntity1(attribute1, attribute2) UserEntity1(attribute1, attribute3) UserEntity1(attribute1, attribute3’)

Page 73: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale
Page 74: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Problems

Page 75: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Duplicate, undiscoverable and fragmented datasets

Page 76: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Metric inaccuracy

Page 77: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Overloaded Data Infra

Page 78: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Fix strategy

Page 79: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Focus on reliable and timely log delivery

Page 80: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale
Page 81: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Data engineering with SLA

Page 82: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Dataset canonicalization

Page 83: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Some useful lessons learnt from architectural overhaul

Page 84: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Test with minimal impact radius

Page 85: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Mistakes are inevitable

Page 86: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Speed up build decisions

Page 87: Qcon London 2017 -  Architecture overhaul - Ad serving @ Spotify scale

Think for tomorrow, Solve for today