Kineograph: Taking the Pulse of a Fast-Changing and Connected World

Preview:

DESCRIPTION

 

Citation preview

Kineograph: Taking the Pulse of a Fast-Changing and Connected World

Speaker: LIN Qianhttp://www.comp.nus.edu.sg/~linqian

Information

time-sensitiverich connections

Challenges

1. Timeliness guarantees

2. Graph

3. Graph-mining

Kineograph

distr. in-memory graph storageincremental graph mining

Graph nodes

Ingest nodes

Continuous Data feeds

 

 

  

Global consistent snapshots

Incremental computation on a static graph snapshot

Progress table

Snapshooter

Graph Storage

Computation

 Master

Graph computation

Graph updates

Graph nodes

storage layercomputation layer

Storage layer

key/value storelogical partitions

Graph partitioning

edge-cutno locality consideration

Snapshot

ingest nodesgraph nodes

global progress table

Ingest node

graph-update operationssequence number

Epoch commit protocol

……

s1

4 6 7

1 2 4 s1

sn

Partition u

5 6 8

2 3 5 s1

sn

Partition v

0

s1

…sn 3

Progress table

Ingest nodes

Graph nodes Epoch specified by progress table and snapshooter

Global tx vector

Snapshootersn

123

47

Graph update / compute Pipeline

GraphComputation

SnapshotConstruction

IncomingTweets … …

Si-1 Si Si+1

Ci

ti-1

Time

ti ti’ ti

’’

Epoch

Timeliness

Consistency

no global serialization(diff. from 2PL or t.s. ordering)

Atomicity

v u

v u

Deterministic vertex creation

Computation layer

incremental graph-mining

vertex-based computation model

Incremental Graph Computation

Detect Vertex Status

Compute New Vertex Values

Propagate Updates

Graph-Scale Aggregation

Change Significantly?

Init

Updates from other vertices

Y

N

Push model

sender-side aggregation

Pull model

read a subset of neighbors

Execution model

BSP + Dynamic scheduling

3 apps

TunkRankSP

K-exposure

TunkRank

SP

K-exposure

Fault tolerance among servers

Paxos-based solution

Ingest node failure

incarnation number

Fault tolerance @ storage layerquorum-based replication

Fault tolerance @ computation layer

roll back & re-executeprimary/backup replication

Incremental expansion

Decaying

C#

17,000 LOC

Twitter feeds

8M vertices, 29M edges100M tweets with 100K/sec

power-law

Graph-update throughput

Incremental vs. Non-incremental

Scalability

Incoming data rate

Failure recovery