20
Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout

Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

Exploiting Commutativity ForPractical Fast Replication

Seo Jin Park and John Ousterhout

Page 2: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Problem: consistent replication adds latency and throughput overheads§ Why? Replication happens after ordering

● Key idea: exploit commutativity to enable fast replication before ordering

● CURP (Consistent Unordered Replication Protocol)§ Clients replicate in 1 round-trip time (RTT) if operations are commutative§ Simple augmentation on existing primary-backup systems

● Results§ RAMCloud’s performance improvements

● Latency: 14 µs → 7.1 µs (no replication: 6.1 µs)● Throughput: 184 kops/sec → 728 kops/s (~4x)

§ Redis cache is now fault-tolerant with small cost (12% latency ↑, 18% throughput ↓)

Slide 2

Overview

Page 3: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Unreplicated Systems: 1 RTT for operation

● Replicated Systems: 2 RTTs for operations

Slide 3

Consistent Replication Doubles Latencies

client

Server

① write x = 1

② ok X: 1

client

Primary Backup

① write x = 1

④ ok X: 1③ ok② X: 1

X: 1 X: 1X: 1

Page 4: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

Slide 4

Strawman 1 RTT Replication

Client

Client

Primary

… x←1 x←2x←1

x←2

… x←2 x←1

Backups

…Strong consistency is broken!

State: x = 1

State: x = 2

Page 5: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Consistent replication protocols must solve two problems:§ Consistent Ordering: all replicas should appear to execute operations in the same order§ Durability: once completed, an operation must survive crashes.

● Previous protocols combined the two requirements

Slide 5

What Makes Consistent Replication Expensive?

Client

Client

Client Primary

x←3 x←1 x←2

x←1

x←3

x←2

Backups

x←3 x←1 x←2

x←3 x←1 x←2

x←3 x←1 x←2

1 RTTfor serialization

Time to completean operation 1 RTT

for replication

Page 6: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● For performance: cannot do totally ordered replication in 2 RTTs

● Replicate just for durability & exploit commutativity to defer ordering§ Safe to reorder if operations are commutative (e.g. updates on different keys)

● Consistent Unordered Replication Protocol (CURP): § When concurrent operations commute, replicate without ordering§ When not, fall back to slow totally-ordered replication

Slide 6

Exploiting Commutativity to Defer Ordering

Page 7: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Primary returns execution results immediately (before syncing to backups)

● Clients directly replicate to ensure durability

Slide 7

Overview of CURP

Client

Client

Primary

… x←1 x←2 y←5 z←7y←5

z←7

… x←1 x←2

Backups

z←7

y←5

Witnesses

async

1 RTTTime to complete

an operation

Witness● No ordering info● Temporary until primary replicates to backups● Witness data are replayed during recovery

garbagecollection

Page 8: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

Slide 8

Normal Operation

ClientPrimary

execute

“z←7”async

record“z←7”accepted

result:ok

z←7

y←5

Witnesses

… x←1 x←2

Backups

… x←1 x←2 y←5 z←7

● Clients send an RPC request to primary and witnesses in parallel● If all witnesses accepted (saved) request, client can complete operation

safely without sync to backups.

Page 9: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

Slide 9

Normal Operation (continued.)

ClientPrimary

… x←1 x←2 y←5 z←7① execute “z←2”

… x←1 x←2

① record“z←2”rejected

② sync“z←2” y←5 z←7

z←7

y←5

Witnesses

Backupssynced

● If any witness rejected (not saved) request, client must wait for sync to backups.§ Operation completes in 2 RTTs mostly (worst case 3 RTTs)

● When a primary receives a sync request, usually syncing to backups is already completed or at least initiated

Page 10: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

Slide 10

Crash Recovery

Primary

… x←1 x←2 y←5 z←7 … x←1 x←2

Backups

z←7

y←5

Witnesses

async

New Primaryy←5 z←7… x←1 x←2

② retry

y←5 z←7

y←5 z←7

async

State: x = 2, y = 5, z = 7

State: x = 2, y = 5, z = 7

● First load from a backup and then replay requests in a witness● Example:

Page 11: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

1. Replay from witness may be out of order§ Witnesses only keep commutative requests

2. Primaries may reveal not-yet-durable data to other clients§ Primaries detect & block reads of unsynced data

3. Requests replayed from witness may have been already recovered from backup§ Detect and avoid duplicate execution using RIFL

Slide 11

3 Potential Problems for Strong Consistency

Page 12: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Witness has no way to know operation order determined by primary● Witness detects non-commutative operations and rejects them

§ Then, client needs to issue explicit sync request to primary

● Okay to replay in any order

Slide 12

P1. Replay From Witness May Be Out Of Order

Witness

y←5x←3

Witness

Clientx←3

y←5acceptedx←4

Clientrejected

Page 13: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Primary doesn’t know if an operation is made durable in witnesses● Subsequent operations (e.g. reads) may externalize the new data

● Must wait for sync to backups if a client request depends on any unsynced updates

x = 3

Slide 13

P2. Primaries May Reveal Not-yet-durable Data

Client A

Client BPrimary

… x←1 x←2 x←3

x←3

read x… x←1 x←2

Backups

async

Recorded in witnesses or not?

State: x = 3

Page 14: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

State: x: 3, y: 7

● A client request may exist both in backups and witnesses.

● Replaying operations recovered by backups endangers consistency

Slide 14

P3. Replay Can Cause Duplicate Executions

Primary

… x←1 x←2 x←3 z←7 … x←1 x←2 x←3

Backups

z←7

x←2

Witnesses

async

New primary

… x←1 x←2 x←3 x←2 z←7② retry

Detect & ignore duplicate requests, e.g. RIFL [SOSP’15]①

State: x: 2, y: 7

Page 15: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

Performance Evaluation of CURP

Slide 15

Perf

orm

ance

Ø Implemented in Redis and RAMCloud

Consistency

RAMCloud

CURP

CU

RP

Fast KV cache

Replicated in-memory KV store

Consistently replicated &As fast as no replication

Page 16: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

0.5

0.001

0.01

0.1

1

0 10 20 30 40 50 60 70

Fra

ctio

n o

f W

rite

s

Latency (µs)

Original (3 B)CURP (3 B, 3 W)CURP (1 B, 1 W)

Unreplicated

● Writes are issued sequentially by a client to a master§ 40 B key, 100 B value

§ Keys are randomly (uniform dist.) selected from 2 M unique keys

Slide 16

RAMCloud’s Latency after CURP

Median7.1 μs vs. 14 µsConfiguration

● Xeon 4 cores (8 T) @ 3 GHz

● Mellanox Connect-X 2InfiniBand (24 Gbps)

● Kernel-bypassing transport

Page 17: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Thanks to CURP, can batch replication requests without impacting latency: improves throughput

● Each client issues writes (40B key, 100B value) sequentially

Slide 17

RAMCloud’s Throughput after CURP

4x

~6% per replica

0

100

200

300

400

500

600

700

800

900

0 5 10 15 20 25 30Write

Thro

ughput (k

write

per

seco

nd)

Client Count (number of clients)

UnreplicatedCURP (1 B, 1 W)CURP (3 B, 3 W)Original (3 B)

Page 18: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Design§ Garbage collection§ Reconfiguration handling (data migration, backup crash, witness crash)§ Read operation§ How to extend CURP to quorum-based consensus protocols

● Performance§ Measurement with skewed workloads (many non-commutative ops)§ Resource consumption by witness servers§ CURP’s impact on Redis’ performance

Slide 18

See Paper for…

Page 19: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Rely on commutativity for fast replication§ Generalized Paxos (2005): 1.5 RTT§ EPaxos (SOSP’13) : 2 RTTs in LAN, expensive read

● Rely on the network’s in-order deliveries of broadcasts§ Special networking hardware: NOPaxos (OSDI’16), Speculative Paxos (NSDI’15)§ Presume & rollback: PLATO (SRDS’06), Optimistic Active Replication (ICDCS’01)§ Combine with transaction layer for rollback: TAPIR (SOSP’15), Janus (OSDI’16)

● CURP is§ Faster than other protocols using commutativity§ Doesn’t require rollback or special networking hardware§ Easy to integrate with existing primary-backup systems

Slide 19

Related work

Page 20: Exploiting Commutativity For Practical Fast Replication...Exploiting Commutativity For Practical Fast Replication ... Crash Recovery Primary …x←1x←2y←5z←7 …x←1x←2 Backups

● Total order is not necessary for consistent replication● CURP clients replicate without ordering in parallel with sending requests

to execution servers è 1 RTT● Exploit commutativity for consistency● Improves both latency (2 RTTs -> 1 RTT) and throughput (4x)

§ RAMCloud’s latency: 14 µs → 7.1 µs (no replication: 6.1 µs)

§ Throughput: 184 kops/sec → 728 kops/s (~4x)

Slide 20

Conclusion