Introduction to Databus

` Recruiting Solutions

Databus

1/29/2013 Databus 1

`

INTRODUCTION

2

`

LinkedIn by Numbers

World’s largest professional network 187M+ members world-wide as of Q3 2012 Growing at the rate of two per second

85 of Fortune 100 companies use Talent Solutions to hire

> 2.6M company pages > 4B search queries 75K+ developers leveraging out APIs 1.3M unique publishers

Databus 3

`

The Consequence of Specialization in Data Systems

Data Consistency is critical !!! Data Flow is essential

`

Solution: Databus

5

Primary DB Data Change Events

Databus

Standardization

Standardization

Standardization

Standardization

Standardization Search Index

Standardization

Standardization Graph Index

Standardization

Standardization

Read Replicas

Updates

`

Extract changes from database commit log

Tough but possible

Consistent!!!

Application code dual writes to database and pub-sub system

Easy on the surface

Consistent?

Two Ways

`

Key Design Decisions : Semantics

• Logical clocks attached to the source – Physical offsets could be used for internal

transport – Simplifies data portability

• Pull model – Restarts are simple – Derived State = f (Source state, Clock) – + Idempotence = Timeline Consistent!

7

`

Key Design Decisions : Systems

• Isolate fast consumers from slow consumers – Workload separation between online, catch-up,

bootstrap • Isolate sources from consumers

– Schema changes – Physical layout changes – Speed mismatch

• Schema-aware – Filtering, Projections – Typically network-bound can burn more CPU

8

`

Requirements

• Timeline consistency • Guaranteed, at least once delivery • Low latency • Schema evolution • Source independence • Scalable consumers • Handle for slow/new consumers without

affecting happy ones (look-back requirements)

9

`

ARCHITECTURE

10

`

Initial Design (2007)

DB

Databus 11

Relay In Memory Buffer

Direct Pull

Happy Consumer

Happy Consumer

Slow Consumer

…

Pros: 1. Consumer Scaling 2. Some isolation

Cons: Slow consumers overwhelming the DB

Source clock timer SCN

0

102400 DB

Relay

70000

100000 3 hrs

Proxied Pull

`

Software Architecture

Four Logical Components

• Fetcher – Fetch from db, relay…

• Log Store – Store log snippet

• Snapshot Store – Store moving data

snapshot

• Subscription Client – Orchestrate pull

across these

`

The Databus System

13

Relay In Memory Buffer

Source clock timer SCN 0

102400 DB

Relay 70000

100000

80000

30000

Snapshot

Bootstrap Service

Log Storage Snapshot Store

90000

Log

Server

3 hrs

10 days

infinite

Happy Consumer

Happy Consumer

Slow Consumer

…

`

The Relay

• Change event buffering (~ 2 – 7 days) • Low latency (10-15 ms) • Filtering, Projection • Hundreds of consumers per relay • Scale-out, High-availability through

redundancy

`

Deployment Options

Option 1: Peered Deployment Option 2: Clustered Deployment

`

The Bootstrap Service

• Catch-all for slow / new consumers • Isolate source OLTP instance from large scans • Log Store + Snapshot Store • Optimizations

– Periodic merge – Predicate push-down – Catch-up versus full bootstrap

• Guaranteed progress for consumers via chunking • Implementations

– Database (MySQL) – Raw Files

• Bridges the continuum between stream and batch systems

`

The Consumer Client Library

• Glue between Databus infra and business logic in the consumer

• Isolates the consumer from changes in the databus layer.

• Switches between relay and bootstrap as needed

• API – Callback with transactions – Iterators over windows

`

Fetcher Implementations

• Oracle – Trigger-based

• MySQL – Custom-storage-engine based

• In Labs – Alternative implementations for Oracle – OpenReplicator integration for MySQL

`

Meta-data Management

• Event definition, serialization and transport – Avro

• Oracle, MySQL – Avro definition generated from the table schema

• Schema evolution – Only backwards-compatible changes allowed

• Isolation between upgrades on producer and consumer

`

Scaling the consumers (Partitioning)

• Server-side filtering – Range, mod, hash – Allows client to control partitioning function

• Consumer groups – Distribute partitions evenly across a group – Move partitions to available consumers on failure – Minimize re-processing

`

A NEW CONSUMER

21

`

Development with Databus – Client Library

Databus 22

Consumers

onDataEvent(DbusEvent, Decoder) … …

register(consumers, sources , filter) start() , shutdown(),

Databus Client

Stream Event Callback

API

Bootstrap Event Callback

API

implement

Databus Client Library

Consumers

Client API

`

Databus Consumer Implementation class MyConsumer extends AbstractDatabusStreamConsumer {

ConsumerCallbackResult onDataEvent(DbusEvent e, DbusEventDecoder d){

//use map-like Avro GenericRecord

GenericRecord g = d.getGenericRecord(e, null);

//or use the auto-generated Java class

MyEvent e = d.getTypedValue(e, null,

MyEvent.class);

…

return ConsumerCallbackResult.SUCCESS; }

}

Databus 23

`

Starting the client public void main(String[]) { //configure

DatabusHttpClientImpl.Config clientConfig = new DatabusHttpClientImpl.Config();

clientConfig.loadFromFile(“mydbus”, “mdbus.props”);

DatabusHttpClientImpl client = new DatabusHttpClientImpl(clientConfig); //register callback

MyConsumer callback = new MyConsumer(); client.registerDatabusStreamListener(callback,

null, "com.linkedin.events.member2.MemberProfile”); //start client library

client.startAndBlock(); }

Databus 24

`

Event Callback APIs

•

Databus 25

`

PERFORMANCE

26

`

Relay Throughput

Databus 27

`

Consumer Throughput

Databus 28

`

End-End Latency

Databus 29

`

Snapshot vs Catchup

Databus 30

Recruiting Solutions Recruiting Solutions Recruiting Solutions 31

Technology

Introduction to Databus