160
A Brief History Of Time In Riak

A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

  • Upload
    others

  • View
    25

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

A Brief History Of TimeIn Riak

Page 2: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Time in Riak

• Logical Time

• Logical Clocks

• Implementation details

Page 3: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Mind the GapHow a venerable, established, simple data structure/algorithm

was botched multiple times.

Page 4: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Order of Events

• Dynamo And Riak

• Temporal and Logical Time

• Logical Clocks of Riak Past

• Now

Page 5: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

TL;DR

• The Gap between theory and practice is:

• Real

• Deep and steep sided

• Scattered Invariants are hard to enforce

Page 6: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Why Riak?

Page 7: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Scale Up

$$$Big Iron (still fails)

Page 8: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Scale Out

Commodity Servers CDNs, App servers

Expertise

Page 9: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Fault Tolerance

Page 10: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Low Latency

Page 11: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Low Latency

Amazon found every 100ms of latency cost them 1% in sales.

Page 12: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Low Latency

Google found an extra 0.5 seconds in search page generation time dropped traffic by 20%.

Page 13: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Trade Off

Page 14: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

CAP

Page 15: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

C A

http://aphyr.com/posts/288-the-network-is-reliable

Page 16: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

C A

Page 17: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

C A

Page 18: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

C AP P

Page 19: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Availability

When serving reads and writes matters more than consistency of data. Defered consistency.

Page 20: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Eventual Consistency

Eventual consistency is a consistency model used in distributed computing that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. 

--Wikipedia

Page 21: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Riak Overview

Page 22: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Riak Overview Erlang implementation of Dynamo

Page 23: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

{“key”: “value”}

Page 24: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Riak Overview Consistent Hashing

Page 25: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

• 160-bit integer keyspace

• divided into fixed number of evenly-sized partitions/ranges

• partitions are claimed by nodes in the cluster

• replicas go to the N partitions following the key

node 0

node 1

node 2

node 3

hash(“users/clay-davis”)

N=3

The Ring

Page 26: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

VNODES

supervisor process

basic unit of concurrency

32, 64, 128, 256, 512....

# vnodes = ring_size

10-50 vnodes / node

Page 27: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

preflist

{SHA1(key)

Page 28: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

AvailabilityAny non-failing node can respond to any

request

--Gilbert & Lynch

Page 29: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

7. Fault Tolerance

node 0

node 1

node 2

node 3

Replicas are stored N - 1 contiguous partitions

node 2offline

put(“cities/london”)

Page 30: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

7. Fault Tolerance

node 0

node 1

node 2

node 3

Replicas are stored N - 1 contiguous partitions

node 2offline

put(“cities/london”)

FALLBACK “SECONDARY”

node 2HINTED HANDOFF

Page 31: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Riak Overview Quorum

Page 32: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Quora: For Consistency• How many Replicas must respond: 1, quorum, all?

• Strict Quorum: Only Primaries are contacted

• Sloppy Quorum: Fallbacks are contacted

• Fast Writes? W=1

• Fast Reads? R=1

• Read Your Own Writes? PW+PR>N

Page 33: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Replica A Replica B Replica C

Client X Client Y

PUT “sue”

C’

PUT “bob”

A’ B’

Sloppy

Page 34: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Replica A Replica B Replica C

Client X Client Y

PUT “sue”

C’

PUT “bob”

NO!!!! :(

Strict

Page 35: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

–Google Book Search p.148 “The Giant Anthology of Science Fiction”, edited by Leo Margulies and Oscar Jerome Friend, 1954

"'Time,' he said, 'is what keeps everything from happening at once.'"

Page 36: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Temporal Clocksposix time number line

Thursday, 1 January 1970

0 129880800 1394382600

Now-ishMy Birthday

Page 37: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

temporal clocks• CAN

• A could NOT have caused B

• A could have caused B

• CAN’T

• A caused B

Page 38: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Physics Problem

4,148 km 14 ms Light 21 ms fibre

SF NY

PUT “bob” 1394382600000

PUT “sue” 1394382600020

Page 39: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

DynamoThe Shopping Cart

Page 40: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

A B

HAIRDRYER

Page 41: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

A B

HAIRDRYER

Page 42: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

A B

PENCIL CASE

HAIRDRYER

Page 43: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

A B

PENCIL CASEHAIRDRYER

Page 44: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

A B

[HAIRDRYER], [PENCIL CASE]

Page 45: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Clocks, Time, And the Ordering of Events

• Logical Time

• Causality

• A influenced B

• A and B happened at the same time

Leslie Lamport http://dl.acm.org/citation.cfm?id=359563

Page 46: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Detection of Mutual Inconsistency in

Distributed Systems

Version Vectors - updates to a data item

http://zoo.cs.yale.edu/classes/cs426/2013/bib/parker83detection.pdf

Page 47: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors or Vector Clocks?

http://haslab.wordpress.com/2011/07/08/version-vectors-are-not-vector-clocks/

version vectors - updates to a data item

Page 48: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors

A CB

Page 49: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors

A CB

{a, 1}

{b, 1}

{c, 1}{a, 2}

[ ]{a, 2}, {b, 1}, {c, 1}

Page 50: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors

A CB

{a, 1}

{b, 1}

{c, 1}{a, 2}

[ ]{a, 2}, {b, 1}, {c, 1}

Page 51: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors

A CB

{a, 1} {b, 1} {c, 1}

{a, 2}

[ ]{a, 2}, {b, 1}, {c, 1}

Page 52: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors

A CB

{a, 1} {b, 1} {c, 1}

{a, 2}

[ ]{a, 2}, {b, 1}, {c, 1}

Page 53: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors

A CB

{a, 1} {b, 1} {c, 1}

{a, 2}

[ ]{a, 2}, {b, 1}, {c, 1}

Page 54: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors

[{a,2}, {b,1}, {c,1}]

Page 55: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Update

[{a,2}, {b,1}, {c,1}]

Page 56: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Update

[{a,2}, {b,2}, {c,1}]

Page 57: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Update

[{a,2}, {b,3}, {c,1}]

Page 58: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Update

[{a,2}, {b,3}, {c,2}]

Page 59: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Descends

• A descends B : A >= B

• A has seen all that B has

• A summarises at least the same history as B

Page 60: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Descends

[{a,2}, {b,3}, {c,2}] [{a,2}, {b,3}, {c,2}]

[{a,2}, {b,3}, {c,2}] []

>=[{a,4}, {b,3}, {c,2}] [{a,2}, {b,3}, {c,2}]

Page 61: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Dominates

• A dominates B : A > B

• A has seen all that B has, and at least one more event

• A summarises a greater history than B

Page 62: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Dominates

[{a,1}] []

>[{a,4}, {b,3}, {c,2}] [{a,2}, {b,3}, {c,2}]

[{a,5}, {b,3}, {c,5}, {d, 1}] [{a,2}, {b,3}, {c,2}]

Page 63: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Concurrent

• A concurrent with B : A | B

• A does not descend B AND B does not descend A

• A and B summarise disjoint events

• A contains events unseen by B AND B contains events unseen by A

Page 64: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Concurrent

[{a,1}] [{b,1}]

|[{a,4}, {b,3}, {c,2}] [{a,2}, {b,4}, {c,2}]

[{a,5}, {b,3}, {c,5}, {d, 1}] [{a,2}, {b,4}, {c,2}, {e,1}]

Page 65: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,
Page 66: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Merge

• A merge with B : A ⊔ B

• A ⊔ B = C

• C >= A and C >= B

• If A | B C > A and C > B

• C summarises all events in A and B

• Pairwise max of counters in A and B

Page 67: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Merge

[{a,1}] [{b,1}]

⊔[{a,4}, {b,3}, {c,2}] [{a,2}, {b,4}, {c,2}]

[{a,5}, {b,3}, {c,5}, {d, 1}] [{a,2}, {b,4}, {c,2}, {e,1}]

Page 68: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Version Vectors Merge

[{a,1}{b,2}]

[{a,4}, {b,4}, {c,2}]

[{a,5}, {b,3}, {c,5}, {d, 1},{e,1}]

Page 69: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Syntactic Merging

• Discarding “seen” information

• Retaining concurrent values

• Merging divergent clocks

Page 70: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Temporal vs Logical

[{a,4}, {b,3}, {c,2}]

[{a,2}, {b,3}, {c,2}]

A

B

“Bob”

“Sue”

?

Page 71: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Temporal vs Logical

[{a,4}, {b,3}, {c,2}]

[{a,2}, {b,3}, {c,2}]

A

B

“Bob”

“Sue”

Bob

Page 72: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Temporal vs Logical

1429533664000

A

B

“Bob”

“Sue”

?1429533662000

Page 73: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Temporal vs Logical

1429533664000

A

B

“Bob”

“Sue”

Bob?1429533662000

Page 74: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Temporal vs Logical

[{a,4}, {b,3}, {c,2}]

A

B

“Bob”

“Sue”

?[{a,2}, {b,4}, {c,2}]

Page 75: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Temporal vs Logical

[{a,4}, {b,3}, {c,2}]

A

B

“Bob”

“Sue”

[Bob, Sue][{a,2}, {b,4}, {c,2}]

Page 76: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Temporal vs Logical

1429533664000

A

B

“Bob”

“Sue”

?1429533664001

Page 77: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Temporal vs Logical

1429533664000

A

B

“Bob”

“Sue”

Sue?1429533664001

Page 78: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Summary

• Eventually Consistent Systems allow concurrent updates

• Temporal timestamps can’t capture concurrency

• Logical clocks (Version vectors) can

Page 79: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

History Repeating“Those who cannot remember the past are condemned

to repeat it"

Page 80: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Terms

• Local value

• Incoming value

• Local clock

• Incoming clock

Page 81: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Riak Version Vectors

• Who’s the actor?

Page 82: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Riak 0.n Client Side IDs

• Client Code Provides ID

• Riak increments Clock at API boundary

• Riak syntactic merge and stores object

• Read, Resolve, Rinse, Repeat.

Page 83: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Riak 0.n Client Side IDs

• Client Code Provides ID

• Riak increments Clock at API boundary

• Riak syntactic merge and stores object

• Read, Resolve, Rinse, Repeat.

Page 84: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Client VClock

• If incoming clock descends local

• Write incoming as sole value

Page 85: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Client VClock

• If local clock descends incoming clock

• discard incoming value

Page 86: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Client VClock

• If local and incoming clocks are concurrent

• merge clocks

• store incoming value as a sibling

Page 87: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Client VClock

• Client reads merged clock + sibling values

• sends new value + clock

• new clock dominates old

• Store single value

Page 88: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Client VClock

• What Level of Consistency Do We Require?

Page 89: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Client Side IDs Bad

• Unique actor ID:: database invariant enforced by client!

• Actor Explosion (Charron-Bost)

• No. Entries == No. Actors

• Client Burden

• RYOW required

Page 90: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

RYOW

• Invariant: strictly increasing events per actor.

• PW+PR > N

• Availability cost

• Bug made it impossible!

Page 91: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

When P is F

• Get preflist

• count primaries

• Send request to N

• Don’t check responder status!

Page 92: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

P1

P2

F P

F

F3

Client

Preflist=[P1, P2, F3]

PUT pw=2

Page 93: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

P1

P2

F P

F

F3

Client

Preflist=[P1, P2, F3]

PUT pw=2

Page 94: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

P1

P2

F P

F

F3

Client

Preflist=[P1, P2, F3]

PUT pw=2

Page 95: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

P1

P2

F P

F

F3

Client

Preflist=[P1, P2, F3]

OK pw=2

OK

OK

Page 96: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

P

P2

F1

P3

F F

Client

Preflist=[P2, P3, F1]

GET pr=2

Page 97: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

P

P2

F1

P3

F F

Client

Preflist=[P2, P3, F1]

not found pr=2

notfound

notfound

Page 98: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Client VClock

• If local clock ([{c, 1}]) descends incoming clock ([{c,1}])

• discard incoming value

Page 99: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Client VClock

• Read not_found []

• store “bob” [{c, 1}]

• read “bob” [{c, 1}]

• store [“bob”, “sue”] [{c, 2}]

Page 100: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Client VClock

• Read not_found []

• store “bob” [{c, 1}]

• read not_found []

• store “sue” [{c, 1}]

Page 101: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Client Side ID RYOW

• Read a Stale clock

• Re-issue the same OR lower event again

• No total order for a single actor

• Each event is not unique

• System discards as “seen” data that is new

Page 102: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClocks Riak 1.n

• No more VV, just say Context

• The Vnode is the Actor

• Vnodes act serially

• Store the clock with the Key

• Coordinating Vnode, increments clock

• Deliberate false concurrency

Page 103: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClocks False Concurrency

C1 C2

RIAK

GET Foo GET Foo

Page 104: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClocks False Concurrency

C1 C2

RIAK

[{a,1},{b4}]->”bob”

[{a,1},{b4}]->”bob”

Page 105: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClocks False Concurrency

C1 C2

RIAK

PUT [{a,1},{b,4}]=“Rita”

PUT [{a,1},{b,4}]=“Sue”

Page 106: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClocks False Concurrency

C1 C2

PUTFSM1 PUTFSM2

VNODE Q

RITASUE

VNODE

Page 107: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClocks False Concurrency

VNODE Q

RITA

VNODE a [{a,2},{b,4}]=“SUE”

[{a,1},{b,4}]

Page 108: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClocks False Concurrency

VNODE Q

[{a,3},{b,4}]=[RITA,SUE]

VNODE a

[{a,2},{b,4}]=“SUE”

Page 109: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClock

• If incoming clock descends local

• Increment clock

• Write incoming as sole value

• Replicate

Page 110: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClock• If incoming clock does not descend local

• Merge clocks

• Increment Clock

• Add incoming value as sibling

• Replicate

Page 111: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClock GOOD

• Far fewer actors

• Way simpler

• Empty context PUTs are siblings

Page 112: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClock BAD

• Possible latency cost of forward

• No more idempotent PUTs

• Store a SET of siblings, not LIST

• Sibling Explosion

• As a result of too much false concurrency

Page 113: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Sibling Explosion

• False concurrency cost

• Many many siblings

• Large object

• Death

Page 114: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Sibling Explosion

• Data structure

• Clock + Set of Values

• False Concurrency

Page 115: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Sibling Explosion

Page 116: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Sibling Explosion

C1 C2

RIAK

GET Foo GET Foo

Page 117: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Sibling Explosion

C1 C2

RIAK

not_found

not_found

Page 118: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Sibling Explosion

C1

RIAK

PUT []=“Rita”[{a,1}]->”Rita”

Page 119: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Sibling Explosion

C2

RIAK

PUT []=“Sue”[{a,2}]->[”Rita”, “Sue”]

Page 120: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Sibling Explosion

C1

RIAK

PUT [{a, 1}]=“Bob”

[{a,3}]->[”Rita”, “Sue”, “Bob”]

Page 121: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Sibling Explosion

C2

RIAK

PUT [{a,2}]=“Babs”[{a,4}]->[”Rita”, “Sue”, “Bob”, “Babs”]

Page 122: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClock

• Trick to “dodge’ the Charron-Bost result

• Engineering, not academic

• Tested (quickchecked in fact!)

• Action at a distance

Page 123: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Dotted Version VectorsDotted Version Vectors: Logical Clocks for Optimistic Replication

http://arxiv.org/abs/1011.5808

Page 124: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClocks + Dots Riak 2.n

• What even is a dot?

• That “event” we saw back a the start

A

{a, 1}

{a, 2}

Page 125: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Oh Dot all the Clocks

• Data structure

• Clock + List of Dotted Values

[{{a, 1}, “bob”}, {{a, 2}, “Sue”}]

Page 126: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClock• If incoming clock descends local

• Increment clock

• Get Last Event as dot (eg {a, 3})

• Write incoming as sole value + Dot

• Replicate

Page 127: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClock• If incoming clock does not descend local

• Merge clocks

• Increment Clock

• Get Last Event as dot (eg {a, 3})

• Prune siblings!

• Add incoming value as sibling

• Replicate

Page 128: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Oh drop all the dots

• Prune Siblings

• Remove any siblings who’s dot is seen by the incoming clock

• if Clock >= [Dot] drop Dotted value

Page 129: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClocks[{a, 4}]

Rita

Sue

Babs

Bob

[{a, 3}]

Pete

Page 130: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClocks + Dots

[{a, 4}]

Rita

Sue

Babs

Bob

[{a, 3}]

Pete{a,1}

{a,2}

{a,3}

{a,4}

Page 131: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClocks + Dots

[{a, 4}]

Babs

[{a, 3}]

Pete

{a,4}

Page 132: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Vnode VClocks + Dots

[{a, 5}]

Babs

Pete

{a,4}

{a,5}

Page 133: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Dotted Version Vectors

• Action at a distance

• Correctly capture concurrency

• No sibling explosion

• No Actor explosion

Page 134: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

KV679

Page 135: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Riak Overview Read Repair. Deletes.

Page 136: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Replica A Replica B Replica C

Client X

PUT “bob”

Page 137: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Read Repair

Replica A Replica B Replica C

ClientGET

“Bob”

“Bob”

not_found

Page 138: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Read Repair

Replica A Replica B Replica C

Client

“Bob”

“Bob”!!!

Page 139: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Replica A Replica B Replica C

Client X

DEL ‘k’ [{a, 4}, {b, 3}]

C’

Page 140: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Replica A Replica B Replica C

C’

Del FSM

GET

Page 141: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Replica A Replica B Replica C

C’

Del FSM

GET A=Tombstone, B=Tombstone, C=not_found

Page 142: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Read Repair

Replica A Replica B Replica C

“Tombstone”!!!

Page 143: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Replica A Replica B Replica C

C’

Client

GET A=Tombstone, B=Tombstone, C=Tombstone

FSMnot_found

Page 144: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Replica A Replica B Replica C

C’

REAP

FSM

Page 145: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Replica A Replica B Replica C

Client X

PUT “sue” []

Sue [{a, 1}]

C’

Page 146: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Replica A Replica B Replica C

C’

Hinted Hand off tombstone

Page 147: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Replica A Replica B Replica C

Client

GET A=Sue[{a,1}], B=Sue[{a,1}], C=Tombstone [{a,4}, {b1}]

FSMnot_found

Ooops!

Page 148: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

KV679 Lingering Tombstone

• Write Tombstone

• One goes to fallback

• Read and reap primaries

• Add Key again

• Tombstone is handed off

• Tombstone clock dominates, data lost

Page 149: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

KV679 Other flavours

• Back up restore

• Read error

Page 150: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

KV679 RYOW?

• Familiar

• History repeating

Page 151: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

KV679 Per Key Actor Epochs

• Every time a Vnode reads a local “not_found”

• Increment a vnode durable counter

• Make a new actor ID

• <<VnodeId, Epoch_Counter>>

Page 152: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

KV679 Per Key Actor Epochs

• Actor ID for the vnode remains long lived

• No actor explosion

• Each key gets a new actor per “epoch”

• Vnode increments highest “Epoch” for it’s Id

• <<VnodeId, Epoch>>

Page 153: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Replica A Replica B Replica C

Client

GET A=Sue[{a:2,1}], B=Sue[{a:2,1}], C=Tombstone [{a:1,4}, {b1}]

FSM

[Sue, tombstone]

Page 154: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Per Key Actor Epochs BAD

• More Actors (every time you delete and recreate a key _it_ gets a new actor)

• More computation (find highest epoch for actor in Version Vector)

Page 155: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Per Key Actor Epochs GOOD

• No silent dataloss

• No actor explosion

• Fully backwards/forward compatible

Page 156: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Are we there yet?

?

Page 157: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Summary

• Client side Version Vectors

• Invariants, availability, Charron-Bost

• Vnode Version Vectors

• Sibling Explosion

Page 158: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Summary

• Dotted Version Vectors

• “beat” Charron-Bost

• Per-Key-Actor-Epochs

• Vnodes can “forget” safely

Page 159: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Summary

• Temporal Clocks can’t track causality

• Logical Clocks can

Page 160: A Brief History Of Time - ChalmersA Brief History Of Time In Riak Time in Riak • Logical Time • Logical Clocks • Implementation details Mind the Gap How a venerable, established,

Summary

• Version Vectors are EASY!

• Version Vectors are HARD!

• Mind the Gap!