Upload
amy-w-tang
View
1.718
Download
4
Tags:
Embed Size (px)
DESCRIPTION
This talk was given by Subbu Subramanian (Staff Software Engineer @ LinkedIn) in 2012 at Netflix.
Citation preview
` Recruiting Solutions
Databus
1/29/2013 Databus 1
`
INTRODUCTION
2
`
LinkedIn by Numbers
World’s largest professional network 187M+ members world-wide as of Q3 2012 Growing at the rate of two per second
85 of Fortune 100 companies use Talent Solutions to hire
> 2.6M company pages > 4B search queries 75K+ developers leveraging out APIs 1.3M unique publishers
Databus 3
`
The Consequence of Specialization in Data Systems
Data Consistency is critical !!! Data Flow is essential
`
Solution: Databus
5
Primary DB Data Change Events
Databus
Standardization
Standardization
Standardization
Standardization
Standardization Search Index
Standardization
Standardization Graph Index
Standardization
Standardization
Read Replicas
Updates
`
Extract changes from database commit log
Tough but possible
Consistent!!!
Application code dual writes to database and pub-sub system
Easy on the surface
Consistent?
Two Ways
`
Key Design Decisions : Semantics
• Logical clocks attached to the source – Physical offsets could be used for internal
transport – Simplifies data portability
• Pull model – Restarts are simple – Derived State = f (Source state, Clock) – + Idempotence = Timeline Consistent!
7
`
Key Design Decisions : Systems
• Isolate fast consumers from slow consumers – Workload separation between online, catch-up,
bootstrap • Isolate sources from consumers
– Schema changes – Physical layout changes – Speed mismatch
• Schema-aware – Filtering, Projections – Typically network-bound can burn more CPU
8
`
Requirements
• Timeline consistency • Guaranteed, at least once delivery • Low latency • Schema evolution • Source independence • Scalable consumers • Handle for slow/new consumers without
affecting happy ones (look-back requirements)
9
`
ARCHITECTURE
10
`
Initial Design (2007)
DB
Databus 11
Relay In Memory Buffer
Direct Pull
Happy Consumer
Happy Consumer
Slow Consumer
…
Pros: 1. Consumer Scaling 2. Some isolation
Cons: Slow consumers overwhelming the DB
Source clock timer SCN
0
102400 DB
Relay
70000
100000 3 hrs
Proxied Pull
`
Software Architecture
Four Logical Components
• Fetcher – Fetch from db, relay…
• Log Store – Store log snippet
• Snapshot Store – Store moving data
snapshot
• Subscription Client – Orchestrate pull
across these
`
The Databus System
13
Relay In Memory Buffer
Source clock timer SCN 0
102400 DB
Relay 70000
100000
80000
30000
Snapshot
Bootstrap Service
Log Storage Snapshot Store
90000
Log
Server
3 hrs
10 days
infinite
Happy Consumer
Happy Consumer
Slow Consumer
…
`
The Relay
• Change event buffering (~ 2 – 7 days) • Low latency (10-15 ms) • Filtering, Projection • Hundreds of consumers per relay • Scale-out, High-availability through
redundancy
`
Deployment Options
Option 1: Peered Deployment Option 2: Clustered Deployment
`
The Bootstrap Service
• Catch-all for slow / new consumers • Isolate source OLTP instance from large scans • Log Store + Snapshot Store • Optimizations
– Periodic merge – Predicate push-down – Catch-up versus full bootstrap
• Guaranteed progress for consumers via chunking • Implementations
– Database (MySQL) – Raw Files
• Bridges the continuum between stream and batch systems
`
The Consumer Client Library
• Glue between Databus infra and business logic in the consumer
• Isolates the consumer from changes in the databus layer.
• Switches between relay and bootstrap as needed
• API – Callback with transactions – Iterators over windows
`
Fetcher Implementations
• Oracle – Trigger-based
• MySQL – Custom-storage-engine based
• In Labs – Alternative implementations for Oracle – OpenReplicator integration for MySQL
`
Meta-data Management
• Event definition, serialization and transport – Avro
• Oracle, MySQL – Avro definition generated from the table schema
• Schema evolution – Only backwards-compatible changes allowed
• Isolation between upgrades on producer and consumer
`
Scaling the consumers (Partitioning)
• Server-side filtering – Range, mod, hash – Allows client to control partitioning function
• Consumer groups – Distribute partitions evenly across a group – Move partitions to available consumers on failure – Minimize re-processing
`
A NEW CONSUMER
21
`
Development with Databus – Client Library
Databus 22
Consumers
onDataEvent(DbusEvent, Decoder) … …
register(consumers, sources , filter) start() , shutdown(),
Databus Client
Stream Event Callback
API
Bootstrap Event Callback
API
implement
Databus Client Library
Consumers
Client API
`
Databus Consumer Implementation class MyConsumer extends AbstractDatabusStreamConsumer {
ConsumerCallbackResult onDataEvent(DbusEvent e, DbusEventDecoder d){
//use map-like Avro GenericRecord
GenericRecord g = d.getGenericRecord(e, null);
//or use the auto-generated Java class
MyEvent e = d.getTypedValue(e, null,
MyEvent.class);
…
return ConsumerCallbackResult.SUCCESS; }
}
Databus 23
`
Starting the client public void main(String[]) { //configure
DatabusHttpClientImpl.Config clientConfig = new DatabusHttpClientImpl.Config();
clientConfig.loadFromFile(“mydbus”, “mdbus.props”);
DatabusHttpClientImpl client = new DatabusHttpClientImpl(clientConfig); //register callback
MyConsumer callback = new MyConsumer(); client.registerDatabusStreamListener(callback,
null, "com.linkedin.events.member2.MemberProfile”); //start client library
client.startAndBlock(); }
Databus 24
`
Event Callback APIs
•
Databus 25
`
PERFORMANCE
26
`
Relay Throughput
Databus 27
`
Consumer Throughput
Databus 28
`
End-End Latency
Databus 29
`
Snapshot vs Catchup
Databus 30
Recruiting Solutions Recruiting Solutions Recruiting Solutions 31