Upload
tommaule
View
189
Download
0
Tags:
Embed Size (px)
Citation preview
NOW TV and Linear Streaming: The unpredictable scalability challenge
Tom Maule – NOW TV Solution Architect
2
• Tom Maule– Solution Architect at NOW TV, Sky
– Previously Senior Java Developer on NOW TV Platform team (since project inception in early 2012)
I have also previously worked in the defence and telecoms industries
linkedin.com/in/tommaule
@tommaule
Who am I?
3
Abstract
• NOW TV Introduction
• Linear streaming challenges
• 7th April 2014
• Fixes and improvements
• 13th April 2015
• Future work and next steps
4
Introduction - Overview• NOW TV is the online no-contract TV streaming service from Sky
• Available on over 60 devices including the award-winning NOW TV Box
• NOW TV offers movies and entertainment VOD and linear content, and for the first time in the UK, pay-as-you-go Sports linear content
6
Introduction – Streaming Features
VOD Streaming
VOD DRM
Linear Streaming
Linear DRM
Concurrency Limits
7
Introduction - NOW TV Architecture
CDN
Content
Content Metadata
Account Data
VOD Transcoding
Linear Transcoding
CDN
Manifest and video chunks
Live video stream
Stream upload
Asset upload
Content metadata, User services
User device
Video Assets
NOW TV Platform
Load Balancer Load Balancer
Services
Logs
Splunk
MMS
Icinga
Monitoring & alerting:
New Relic
8
Video On Demand (VOD)• Video content, available on demand, whenever users want it.
• Platform load is predictable, just ask any of Netflix, Amazon Instant Video, YouTube, etc
10
Linear Streaming• Unlike other OTT (Over-the-Top) Providers, NOW TV offers streaming of live channels
• This is typically NOT predictable
• Load is driven by live events, not by time of day
Linear VOD
11
NOW TV and Linear Streaming: The unpredictable scalability challengeTom Maule – NOW TV Solution
Architect
20
What happened?
• High load stressed our database
• Retries only compounded the problem
• Observed issues:– Customers couldn’t start new streams
– Existing streams were terminated
– Concurrency errors during and shortly after the outage
– Very high read and write queues in Mongo DB
– Entitlement and Viewing History APIs performed very slowly
– High proportion of time was spent updating indexes in Mongo DB
21
Issues to Address
• Heartbeating resiliency
• Concurrency inaccuracies
• Entitlement checking
• Products storage
• Viewing History
• Indexes in Mongo DB
• Mongo DB write lock
H
C
E
P
V
I
M
H C E P V I M
22
Heartbeating: Introduction• After playout initiation, actual video chunks are served by CDN, and don't touch our platform
• Lightweight heartbeats call back to our platform to notify us of continued playout every 10 mins
• NOW TV use heartbeats to:– Enforce concurrency rules
– Enforce entitlement
– Record bookmark positions (VOD only)
CDN
NOW TV Video chunks
Heartbeats(10 min interval)
H C E P V I M
23
Heartbeating: Previously• Previously, a non-OK heartbeat response would terminate playout on the user’s device
• Fail in favour of NOW TV– When NOW TV platform is unavailable, existing playouts are terminated on next heartbeat.
CDN
NOW TV Video chunks
H C E P V I M
24
Heartbeating: Previously• Previously, a non-OK heartbeat response would terminate playout on the user’s device
• Fail in favour of NOW TV– When NOW TV platform is unavailable, existing playouts are terminated on next heartbeat.
CDN
NOW TV Video chunks
Heartbeat
non-OK response
H C E P V I M
25
Heartbeating: Previously• Previously, a non-OK heartbeat response would terminate playout on the user’s device
• Fail in favour of NOW TV– When NOW TV platform is unavailable, existing playouts are terminated on next heartbeat.
CDN
NOW TV Heartbeat
non-OK response
H C E P V I M
26
Heartbeating: Today• Today, playout continues unless a specific STOP heartbeat response is received
• Fail in favour of the customer– Existing streams will NOT be terminated if NOW TV becomes unavailable
CDN
NOW TV Video chunks
H C E P V I M
27
Heartbeating: Today• Today, playout continues unless a specific STOP heartbeat response is received
• Fail in favour of the customer– Existing streams will NOT be terminated if NOW TV becomes unavailable
CDN
NOW TV Video chunks
Heartbeat
non-STOP response
H C E P V I M
28
Heartbeating: Future• Game of Thrones Linear customers produce ripple-effect heartbeating– Due to heartbeats fixed to a 10 minute period
• In future, we will randomise the first heartbeat period in attempt to smooth out these ripples
H C E P V I M
29
{ “playouts”: [] }
Concurrency: Introduction• Concurrency of 2 streams is managed through the concept of Playout Slots
• A playout slot keeps track of a currently playing stream
• Slots are allocated on playout initiation
NOW TV
Mongo DB
C E P V I MH
30
{ “playouts”: [] }
Concurrency: Introduction• Concurrency of 2 streams is managed through the concept of Playout Slots
• A playout slot keeps track of a currently playing stream
• Slots are allocated on playout initiation
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
NOW TV
Mongo DB
Play
C E P V I MH
31
{ “playouts”: [] }
Concurrency: Introduction• Concurrency of 2 streams is managed through the concept of Playout Slots
• A playout slot keeps track of a currently playing stream
• Slots are allocated on playout initiation
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
NOW TV
Mongo DB
Play
C E P V I MH
32
{ “playouts”: [] }
Concurrency: Introduction• Concurrency of 2 streams is managed through the concept of Playout Slots
• A playout slot keeps track of a currently playing stream
• Slots are allocated on playout initiation
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
NOW TV
Mongo DB
Play
C E P V I MH
33
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
Concurrency: Introduction• Slots are updated on heartbeats to refresh the time stamp
• Slots are terminated on an END event
NOW TV
Mongo DB
C E P V I MH
34
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
Concurrency: Introduction• Slots are updated on heartbeats to refresh the time stamp
• Slots are terminated on an END event
{ “playouts”: [ { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
NOW TV
Mongo DB
END
C E P V I MH
35
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
Concurrency: Introduction• Slots are updated on heartbeats to refresh the time stamp
• Slots are terminated on an END event
{ “playouts”: [ { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
NOW TV
Mongo DB
Play
{ “playouts”: [ { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “CBF789”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
C E P V I MH
36
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
Concurrency: Previously• Failure to receive an END event (due to app crash or connectivity loss), blocked a slot until timeout
• Previously, this blocked subsequent playouts for up to 10 minutes
• “Concurrency limit reached” errors were seen after our service had been restored on GoT night
NOW TV
Mongo DB
C E P V I MH
37
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
Concurrency: Previously• Failure to receive an END event (due to app crash or connectivity loss), blocked a slot until timeout
• Previously, this blocked subsequent playouts for up to 10 minutes
• “Concurrency limit reached” errors were seen after our service had been restored on GoT night
NOW TV
Mongo DB
C E P V I MH
38
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }
Concurrency: Previously• Failure to receive an END event (due to app crash or connectivity loss), blocked a slot until timeout
• Previously, this blocked subsequent playouts for up to 10 minutes
• “Concurrency limit reached” errors were seen after our service had been restored on GoT night
NOW TV
Mongo DB
Play
C E P V I MH
39
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>”, “deviceId” : “box1” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” , “deviceId” : “box2” } ] }
Concurrency: Today• Now, slots allocated to the same Device ID can be ‘reclaimed’
• No more “Concurrency limit reached” errors following app crashes or service outages
NOW TV
Mongo DB
box1
box2
C E P V I MH
40
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>”, “deviceId” : “box1” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” , “deviceId” : “box2” } ] }
Concurrency: Today• Now, slots allocated to the same Device ID can be ‘reclaimed’
• No more “Concurrency limit reached” errors following app crashes or service outages
NOW TV
Mongo DB
box1
box2
C E P V I MH
41
{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>”, “deviceId” : “box1” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” , “deviceId” : “box2” } ] }
Concurrency: Today• Now, slots allocated to the same Device ID can be ‘reclaimed’
• No more “Concurrency limit reached” errors following app crashes or service outages
NOW TV
Mongo DB
Play{ “playouts”: [ { “id” : “FCE987”, “heartbeat”: “<timestamp>”, “content”: “<content_id>”, “deviceId” : “box1” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” , “deviceId” : “box2” } ] }
box1
box2
C E P V I MH
42
Entitlements: Introduction
• Entitlement is granted based upon the products purchased and the content being consumed
• Products and content are tagged with entitlement tags
• Tag intersection indicates entitlement to consume
tag: sports tag: entertainment
E P V I MH C
tag: entertainment
tag: movies
tag: sports
43
Entitlements: Previously
• Entitlement checking was not efficient – checked by content ID /entitlement/movie/<id>
/entitlement/episode/<id>
/entitlement/stream/<id>
• Entitlement was checked on every details page before any call-to-action
• Content tags almost never changed
E P V I MH C
44
Entitlements: Today
• Entitlement checking by tag(s) was introduced– /entitlement/tags/movies
• Entitlement checking now only needed to occur once per collection or ‘section’ of the app
• Where entitlement checking by content ID is still necessary– tags are cached in memory
E P V I MH C
45
Product Storage: Previously• Every purchase and renewal of any product resulted in a new Product entity in Mongo DB
Entertainment – June 2015
Movies – August 2015
Sports – 20th July 2015
Entertainment – July 2015
Entertainment – August 2015
Movies – September 2015
Entertainment – September 2015
Sports – 12th September 2015
Movies – October 2015
Entertainment – October 2015
Movies – November 2015
Entertainment – November 2015
P V I MH C E
46
Product Storage: Today• We store entitlement entities instead of products, updating on renewals rather than duplicating
Entertainment – June 2015
Movies – August 2015
Sports – 20th July 2015
Entertainment – July 2015 Entertainment – August 2015
Movies – September 2015
Entertainment – September 2015
Sports – 12th September 2015
Movies – October 2015
Entertainment – October 2015
Movies – November 2015
Entertainment – November 2015
P V I MH C E
47
Viewings & Bookmarks: Introduction
• Viewing a VOD asset => Viewing
• Heartbeating during a VOD asset => Bookmark
• Viewings and Bookmarks were stored separately
• No capping or archiving
V I MH C E P
48
Viewings & Bookmarks: Previously• Upon fetching a customer’s viewing history, multiple database queries were made:
1 query to the viewings collection to fetch n viewings for the customer
n queries to the bookmarks collection to fetch the bookmark position for each viewing
TOTAL: n + 1 Mongo DB queries for a single request!
Some customers had thousands of items in their viewing history!
{ “_id”: “abc123”, “accountId”: “account1”, “contentId”: “movie1”, “timestamp”: “<timestamp>” }
{ “_id”: “bcd345”, “accountId”: “account1”, “contentId”: “movie2”, “timestamp”: “<timestamp>” }
{ “_id”: “cde456”, “accountId”: “account1”, “contentId”: “episode1”, “timestamp”: “<timestamp>” }
Viewings
{ “_id”: “fed987”, “accountId”: “account1”, “contentId”: “movie1”, “position”: 1187 }
{ “_id”: “edc765”, “accountId”: “account1”, “contentId”: “movie2”, “position”: 2854 }
{ “_id”: “dcb543”, “accountId”: “account1”, “contentId”: “episode1”, “position”: 3542 }
Bookmarks}
V I MH C E P
49
Viewings & Bookmarks: Today
• The original reason for keeping viewings and bookmarks separate was no longer apparent
• Now, viewings and bookmarks are merged
– Unnecessary document ID replaced with compound ID – improving indexing efficiency
– Shortened field names - reducing storage consumption and further improving indexing efficiency
{ “_id”: “abc123”, “accountId”: “account1”, “contentId”: “movie1”, “timestamp”: “<timestamp>” }
Viewing
{ “_id”: “fed987”, “accountId”: “account1”, “contentId”: “movie1”, “position”: 1187 }
Bookmark
{ “_id”: { “accountId”: “account1”, “contentId”: “movie1” }, “position”: 1187, “timestamp”: “<timestamp>” }
View History
{ “_id”: { “aid”: “account1”, “cid”: “movie1” }, “pos”: 1187, “ts”: “<timestamp>” }
V I MH C E P
50
Mongo Indexes
{ “_id”: “abc123”, “accountId”: “account1”, “contentId”: “movie1”, “timestamp”: “<timestamp>” }
{ “_id”: “abc123”, “accountId”: “account1” }
{ “_id”: “abc123”, “accountId”: “account1”, “timestamp”: “<timestamp>” }
{ “_id”: “abc123”, “accountId”: “account1”, “contentId”: “movie1” }
{ “_id”: “fed987”, “accountId”: “account1”, “contentId”: “movie1”, “position”: 1187 }
{ “_id”: “fed987”, “accountId”: “account1”, “contentId”: “movie1” }
{ “_id”: { “aid”: “account1”, “cid”: “movie1” }, “pos”: 1187, “ts”: “<timestamp>” }
{ “_id.aid”: “account1”, “ts”: “<timestamp>” }
{ “_id”: “abc123” }
{ “_id”: “fed987” }
{ “_id”: { “aid”: “account1”, “cid”: “movie1” } }
Viewing Bookmark View History
I MH C E P V
51
Mongo Instance
Database 1
Collection 1
Mongo Write Locks: Previously
Document Collection 2
Document Document Document
Document Document Document Document
Database 2
Collection 3
Document Document
Collection 4
Document Document Document Document
MH C E P V I
52
Mongo Instance
Database 4
Database 2
Database 1
Mongo Write Locks: Today
Collection 2
Document Document Document
Collection 1
Document Document Document Document Document
Database 3
Collection 3
Document Document
Collection 4
Document Document Document Document
MH C E P V I
54
NOW TV Customer Base 2014 - 2015• Our customer base TRIPLED, again, in the year up to April 2015
2013 2014 2015
55
NOW TV and Linear Streaming: The unpredictable scalability challengeTom Maule – NOW TV Solution
Architect
56
What happened?• Good platform availability throughout
• 2.5x the load that affected us just one year earlier
• Twice the normal concurrency for a typical Monday night
59
Recognition
MongoDB Innovation Award 2015 recognises organisations that are creating ground-breaking applications. These projects represent the best and most innovative work in the industry over the last year.
DTG Innovation Award 2015 recognises organisations
which have driven innovation in a particular technology or
sector
60
What’s Next For NOW TV?
• Our growth is expected to continue along the same trajectory
• Moving to active-active datacentre architecture for increased resiliency
• Cloud-based ‘overflow’ scaling for high-load events
• Microservices
• Sub-system resiliency
61
Credits• The entire NOW TV Technology team
are credited with our success
– Platform Software Engineers
– Platform Quality Assurance Engineers
– Dev-Ops Engineers
– App Developers & Testers
– Analysts, scrum masters and management
• Be a part of our future success, work for NOW TV at Sky
– www.workforsky.com
– @workforsky