Upload
datastax
View
32
Download
0
Embed Size (px)
Citation preview
How Sky Uses Cassandra Video Platform
Dr. Strange
A Little About Me
Richard Ellingham• Part of the Site Reliability Engineering team at Sky UK within the
OTT SWE Department - OTT refers to content from a third party that is delivered to an end-user, with the ISP simply transporting IP packets• I’ve been involved with Cassandra at Sky for c. 3 years• Run a small number of core teams that form a Centre of Excellence
• Specialise in areas of Performance Testing, Reliability and Persistence Engineering – specifically Cassandra
So What?
Online Video Platform (OVP)Talk: What critical services does Cassandra underpin that allow customers to stream content to their devices?
Primarily OVP provides playout services for the OTT Client Devices, this includes• Authorisation of content entitlements for a given user/device) -
UMV• Policing Device and Stream Concurrency – DCM• Enforcing Stream Start Parental PIN checks – DCM• Providing URLs for Streaming and Download – TM, LCM, DM• Granting and Issuing DRM Tokens - DRM• Managing Content Bookmarks - BM
A Little About Cassandra Estate
Estate• Circa 100 production nodes – total about 500 nodes across all
environments• Run a combination of Apache Cassandra and Datastax Cassandra• Circa 10 clusters - Combination of multi-tenant and dedicated• Largest cluster is c. 35 nodes holding 4TB, smallest 6 nodes• Active/Active across multiple DCs• We can support up to 3M concurrent users streaming content
Cassandra Cluster
Hemel
A Little About Cassandra Estate
Rack1 Rack2 Rack3
Node3
Node2
Node1
Node3
Node2
Node1
Node3
Node2
Node1
Slough
Rack1 Rack2 Rack3
Node3
Node2
Node1
Node3
Node2
Node1
Node3
Node2
Node1
• Majority of estate is DSE Cassandra – 4.8.x• We only run Cassandra workloads for DSE• CentOS 7.2 (12 vCPUs)• Java 1.8 - G1GC (32GB Heaps)• 500GB data mount
• Minimum if 2 DCs with 3 Racks per DC• Minimum of 2 nodes per Rack• Clients read/write local_quorum• Use a Downgrade Policy• 1 Seed defined per Rack
In the Beginning - MPOD
Challenges - Monolithic System• Release cycle was slow – c. 6 weeks• Hard to scale
• Persistence layer scaling was limited – only so far you can scale up• Data layer was running an Oracle RAC cluster cross DC using
Active Dataguard• Data footprint was tricky to manage. • Scheduling changes was tricky
• Availability – We weren’t getting the uptimes we wanted.• Hard to make truely active/active - Bi Directional Goldengate
(nightmare!)
Some challenges we needed to address
Availability & Reliability• Multi-DC – Truly Active/Active• 100% service uptime• Distributed Architecture
Scalability• Use micro-service
architecture - Decoupled services from monolithic architecture
• Scale out architecture
Build New Capabilities• Needed quick release cycles
to introduce new features and ways of consuming content
International Presence• Need to be able to offer
propositions quickly to other territories
Some challenges we needed to address
Typical Load Profile• High profile events see c. 700k concurrent users (eg GoT,
Football)• Sharp ramp up
• Sharp rises just before start of programme• Sharp rises just before end of programme
OVP – Online Video Platform / Streaming Platform Service
Video Platform (SPS)
Client
UMV API
TM
LCM
Identity
AC
TM API
LCM API
DCM API DRM API
DM API
DCM
UMV
DM
Content Discovery
Gonzales
Commerce
MCS
WIG
Headend
DRM License Issuers
Playready
Cisco VGC Fairplay
Marlin
Oogway
CRM API
CBS
IDS
CRM
IDS
1
2
3
54
6
8
CDNAkamai
Level3
9
10
Content Discover/Browse
Returns: OAUTH Token
Input: Credentials
Input: OAUTH Token
Returns: User
Token
Heartbeat
Input: DRM Token
Returns: DRM
License
Stream Content
Retrieve Entlmnts
Confirm: OAUTH Token
Cache EntlmntsRetrieve
Metadata
Save Events
Browse Events
Royalties Fraud
Call Centre
Persist Metadata
Return: DRM Token
ConcrncyCheck
12
13
14
7
11
15
16
17