Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Machine Learning meets Databases
Ioannis PapapanagiotouCloud Database Engineering
Create Personalized Recommendations for discoveries of engaging video content that maximizes member joy.
Personalize Everything
90 seconds
90 seconds...
What do caches touch?
Signing up*Logging inChoosing a profilePicking liked videosPersonalization*Loading home page*Scrolling home page*A/B testsVideo image selection
Searching*Viewing title detailsPlaying a title*Subtitle / language prefsRating a titleMy ListVideo history*UI stringsVideo production*
* multiple caches involved
Key-Value store optimized for AWS and tuned for Netflix
Ephemeral Volatile Cache
What is EVCache?
Distributed, sharded, replicated key-value storeTunable in-region and global replicationBased on MemcachedResilient to failureTopology awareLinearly scalableSeamless deployments
Why Optimize for AWS
Instances disappearZones failRegions become unstableNetwork is lossyCustomer requests bounce between regions
Failures happen and we test all the time
EVCache Use @ Netflix Hundreds of terabytes of dataTrillions of ops / dayTens of billions of items storedTens of millions of ops / secMillions of replications / secThousands of serversHundreds of instances per clusterHundreds of microservice clientsTens of distinct clusters3 regions
Architecture
Server
Memcached
EVCar
Application
Client Library
Client
Eureka(Service Discovery)
Architecture
us-west-2a us-west-2cus-west-2b
ClientClient Client
Reading (get)
us-west-2a us-west-2cus-west-2b
Client
Primary Secondary
Writing (set, delete, add, etc.)
us-west-2a us-west-2cus-west-2b
ClientClient Client
Use Case: Lookaside Cache
Application
Client Library
Client REST/gRPC Client
S S S S
C C C CData Flow
Use Case: Transient Data Store
Application
Client Library
Client
Application
Client Library
Client
Application
Client Library
Client
Time
Use Case: Primary Store
Offline / Nearline Precomputes for
Recommendations
Online Services
Offline Services
Online Application
Client Library
Client
Data Flow
Use Case: Impression store
Hive
Online Services
Offline Services
Online Application
Client Library
Client
Data Flow
Pipeline of Personalization
Compute A
Compute B Compute C
Compute D
Online Services
Offline Services
Compute E
Data Flow
Online 1 Online 2
Additional Features
Kafka● Global data replication● Consistency metrics
Key Iteration● Cache warming● Lost instance recovery● Backup (and restore)
Region BRegion A
APP APP
Repl Proxy
Repl Relay
1 mutate
2 send metadata
3 poll msg
5 https s
end msg
6 mutate4 get data
for set
Kafka Repl Relay Kafka
Repl Proxy
Cross-Region Replication
7 read
Open Source
https://github.com/netflix/EVCache(client and REST proxy)
Viewing History
Requirements for Viewing History● Time series dataset● Support high writes● Cross region replication● Large Dataset
Growth of Viewing History
1) Massively scalable architecture2) Multi-datacenter,
multi-directional replication3) Linear scale performance4) Transparent fault detection and
recovery5) Flexible, dynamic schema data
Viewing History
1) Apply Custom Filters (user, device, subtitle, episode, season)
2) Tunable consistency to tradeoff performance vs data consistency
Growth of Viewing History
New Data Model
Use Case: A/B Metadata
● Wanted to capture information about each test○ Owner○ Properties ○ Start time/End Time○ Allocation
Dynomite
A framework that makes non-distributed data stores, distributed. Can be used with many key-value storage engines
Features: highly available, automatic failover, node warmup, tunable consistency, backups/restores
Pluggable Storage Engines
● Layer on top of a non-distributed key value data store○ Peer-peer, Shared Nothing○ Auto-Sharding○ Multi-datacenter○ Linear scale○ Replication○ Gossiping
Replication
Dyno - Java Client
● Connection Pooling● Load Balancing● Effective failover● Pipelining● Scatter/Gather● Metrics
Moving Across Storage Engines
Data Explorer for Dynomite (UI)
Open Source
https://github.com/netflix
/dynomite Proxy (C)
/dyno Client (Jedis)
/dynomite-manager Sidecar (Tomcat Container)
/dyno-queues Distributed queue recipe (Java)
Other Datastores● Source of truth: Hive backed by S3● Elastic Search● MySQL, Postgres, AWS Aurora
We are Hiring!https://jobs.netflix.com/jobs/865007
Twitter: @ipapapaLinkedIn: https://www.linkedin.com/in/ipapapa/Github: https://github.com/ipapapa
Thank you.