60
CONSENSUS IN DISTRIBUTED COMPUTING LET’S TALK ABOUT

Consensus in distributed computing

Embed Size (px)

Citation preview

Page 1: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

LET’S TALK ABOUT…

Page 2: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

RUBEN TAN LONG ZHENG

▸ CTO of Neuroware, Inc

▸ We Do Blockchain Stuff™

▸ Co-founder of Javascript Developers Malaysia

▸ Proud owner of 2 useless cats

▸ @roguejs

Page 3: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

SUPER HIGH-LEVEL OVERVIEW

▸ Consensus in Distributed Computing

▸ Consensus

▸ Agreeing that something is the truth

▸ Distributed Computing

▸ A network of nodes operating together

Page 4: Consensus in distributed computing
Page 5: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

FAILURE MODES

▸ Fail-stop = a node dies

▸ Fail-recover = a node dies and comes back later (Jesus/Zombie)

▸ Byzantine = a node misbehaves

Page 6: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

BYZANTINE GENERAL’S PROBLEM

▸ One of the first impossibility proof in computer communications

▸ Impossible to solve in a perfect manner

▸ Originated from the Two General’s Problem (1975)

▸ Explored in detail in Leslie Lamport, Robert Shostak, Marshall Pease paper: The Byzantine General Problem (1982)

Page 7: Consensus in distributed computing

ENEMY

A

B

C

D

E

F

TRAITOR

ATTACK!

ATTACK!

ATTACK!

RETREAT!

RETREAT!

RETREAT!

ATTACK! RETREAT!

Page 8: Consensus in distributed computing

ENEMY

A

B

C

D

E

F

TRAITOR

MUAHAHA, NO CONSENSUS!

ROUTS THE FLEEING ARMY

ATTACKERS HAVE

INSUFFICIENT FORCE

AND ARE DESTROYED

Page 9: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

BYZANTINE FAULT TOLERANCE

▸ Byzantine Fault

▸ Any fault that presents different symptoms to different observers (some general attack, some general retreat)

▸ Byzantine Failure

▸ The loss of a system service reliant on consensus due to Byzantine Fault

▸ Byzantine Fault Tolerance

▸ A system that is resilient/tolerant of a Byzantine Fault

Page 10: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

ON A SIDENOTE…

▸ Distributed computing is inherently unreliable

▸ Peter Deutsch, Bill Joy, Tom Lyon and James Gosling

▸ The Eight Fallacies of Distributed Computing (1994-1997)

▸ Today, we still have engineers who believe in some, if not all of the fallacies

Page 11: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

EIGHT FALLACIES OF DISTRIBUTED COMPUTING

▸ The network is reliable

▸ Latency is zero

▸ Bandwidth is infinite

▸ The network is secure

▸ Topology does not change

▸ There is only one administrator

▸ Transport cost is zero

▸ The network is homogeneous (same platform)

Page 12: Consensus in distributed computing

When you believe in any of the eight fallacies…

Page 13: Consensus in distributed computing

CONSENSUS

The Real Talk Begins™

Page 14: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

CONSENSUS OVERVIEW

▸ Achieving Consensus = distributed system acting as one entity

▸ Consensus Problem = getting nodes in a distributed system to agree on something (value, operation, etc)

▸ Basically… consensus = THE HIVE MIND

▸ Common Examples

▸ Commit transactions to a database

▸ Synchronising clocks

Page 15: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

FLP IMPOSSIBILITY PROOF

▸ Michael J. Fisher, Nancy A. Lynch, and Michael S. Patterson

▸ Impossibility of Distributed Consensus with One Faulty Process (1985) - Dijkstra (dike-stra) Award (2001)

▸ In synchronous settings, it is possible to reach consensus at the cost of time

▸ Consensus is impossible in an asynchronous setting even when only 1 node will crash

Page 16: Consensus in distributed computing
Page 17: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

SOLVING THE CONSENSUS PROBLEM

▸ Strong consensus follows these properties:

▸ Termination - all nodes eventually decide on a value

▸ Agreement - all nodes decide on a value

▸ Validity - the value decided must be proposed by a node (AKA no default value to fall back on)

▸ Termination + Agreement + Validity = Consensus

Page 18: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

CONSENSUS PROTOCOLS

▸ 2 Phase Commit

▸ 3 Phase Commit

▸ Basic Paxos

▸ The Future…

Page 19: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

2 PHASE COMMIT

▸ Simplest consensus protocol

▸ Phase 1 - Proposal

▸ A node (called coordinator) proposes a value to all other nodes, then gathers votes

▸ Phase 2 - Commit-or-abort

▸ The coordinator sends:

▸ Commit if all nodes voted yes. All nodes commit the new value

▸ Abort if 1 or more nodes voted no. All nodes abort the value

Page 20: Consensus in distributed computing

COOR.

NODE

NODE

NODE

NODE

Coordinator proposes a value

Page 21: Consensus in distributed computing

COOR.

NODE

NODE

NODE

NODE

All nodes vote yes or no

Page 22: Consensus in distributed computing

COOR.

NODE

NODE

NODE

NODE

Coordinator sends commit if

all nodes voted yes; sends

abort otherwise All nodes now

update themselves

to contain the

proposed value, or

all nodes abort

Page 23: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

2 PHASE COMMIT

▸ Agreement - every node accepts the value from the coordinator at phase 2 = YES

▸ Validity - commit/abort originated from the coordinator = YES

▸ Termination = no loops in the steps, doesn’t run forever = YES

▸ Therefore, 2 phase commit fulfils the requirements of a consensus protocol

Page 24: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

2 PHASE COMMIT

▸ Blocking failure when coordinator fails before sending proposal to all nodes

COOR.

NODE

NODE

NODE

Coordinator proposes a value

Page 25: Consensus in distributed computing

▸ Blocking failure when coordinator fails before sending proposal to all nodes

2 PHASE COMMIT

CONSENSUS IN DISTRIBUTED COMPUTING

COOR.

NODE

NODE

NODE

Receives proposed

value, votes yes, now

waiting for commit

Page 26: Consensus in distributed computing

▸ Blocking failure when coordinator fails before sending proposal to all nodes

2 PHASE COMMIT

CONSENSUS IN DISTRIBUTED COMPUTING

COOR.

NODE

NODE

NODE

Coordinator crashes… and a different

coordinator comes in to propose a

different value

NEW COOR.

Page 27: Consensus in distributed computing

▸ Blocking failure when coordinator fails before sending proposal to all nodes

2 PHASE COMMIT

CONSENSUS IN DISTRIBUTED COMPUTING

COOR.

NODE

NODE

NODENEW COOR.

Node cannot accept new proposal

because waiting on commit. Cannot

abort because first Coordinator might

recover.

Page 28: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

2 PHASE COMMIT

▸ Guarantees safety, but not liveness

▸ Safety = all nodes agree on a value proposed by a node

▸ Liveness = should still be able to function when some nodes crash

Page 29: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

3 PHASE COMMIT

▸ Similar to 2 Phase Commit, with an extra phase (duh)

▸ Phase 1 - Proposal - same as 2PC

▸ Phase 2 - Pre-approve - similar to 2PC commit-or-abort, but nodes reply with ACK instead

▸ Phase 3 - Do Commit - now the nodes commit

▸ Tolerant of node crashes, but not network partitions

▸ Won’t cover in detail

Page 30: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Presented by Leslie Lamport in The Part-Time Parliament (1988)

▸ Named after the Paxos civilisation’s legislation

▸ Remains as:

▸ The hardest to understand in theory

▸ The hardest to implement

▸ The closest we get to reaching ideal consensus

Page 31: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Used in:

▸ Apache Zookeeper

▸ Google Chubby (BigTable)

▸ Google Spannar

▸ Apache Mesos

▸ Apache Cassandra

▸ etc

Page 32: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Components:

▸ Proposers

▸ Proposes values to other nodes

▸ Acceptors

▸ Respond to proposers with votes

▸ Commits chosen value & decision state

▸ Server can have both 1 Proposer & 1 Acceptor

Page 33: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Uses a two-base approach:

▸ Broadcast Prepare

▸ Find out if there’s already a chosen value

▸ Block older proposals that have yet to be completed

▸ Broadcast Accept

▸ Ask acceptors to accept a value

Page 34: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Prepare(n)

▸ n = proposal number [max++]~[server id]

▸ Return(p, v)

▸ p = proposal number

▸ v = current accepted value (if any)

▸ Accept(p, v)

▸ p = proposal number

▸ v = value to be accepted

Page 35: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Proposal Phase

▸ Proposer generates a proposal number p

▸ Proposer broadcasts p and a value v

▸ Acceptor checks p if higher than its min-p, updates if so

▸ Acceptor replies any acc-p and acc-v

▸ Proposer waits for majority

▸ Checks if any return acc-p is highest, and replace v with acc-v

Page 36: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Accept Phase

▸ Proposer sends p and v to all acceptors

▸ Acceptors check if p is lower than min-p, and ignores if so. Otherwise, acc-p = min-p = p and acc-v = v

▸ Acceptor reply accepted or rejected

▸ If majority accepted, terminate with v. Otherwise, restart Propose Phase with new p

Page 37: Consensus in distributed computing

A1

A2

A3

7

v7 is proposed with p1

P1MIN-P 0 ACC-P - ACC-V -

MIN-P 0 ACC-P - ACC-V -

MIN-P 0 ACC-P - ACC-V -

P

7

Page 38: Consensus in distributed computing

A1

A2

A3

7

v7 is proposed with p1

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 0 ACC-P - ACC-V -

MIN-P 0 ACC-P - ACC-V -

P

7

P1 7

Page 39: Consensus in distributed computing

A1

A2

A3

7

v7 is proposed with p1

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 0 ACC-P - ACC-V -

P

7

P1 7

P1 7

Page 40: Consensus in distributed computing

A1

A2

A3

7

v7 is proposed with p1

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P

7

P1 7

P1 7

Page 41: Consensus in distributed computing

A1

A2

A3

7

v7 is proposed with p1

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P

7

P1 7

P1 7

ACC-P -

ACC-V -

Page 42: Consensus in distributed computing

A1

A2

A3

7

v7 is proposed with p1

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P

7

P1 7

P1 7

ACC-P -

ACC-V -

ACC-P -

ACC-V -

Page 43: Consensus in distributed computing

A1

A2

A3

7

Has majority! Since acc-p and acc-v are both null, we know that we are the only proposers in the network so far

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P

7

P1 7

P1 7

ACC-P -

ACC-V -

ACC-P -

ACC-V -

Page 44: Consensus in distributed computing

A1

A2

A3

Now, we send out p and v in the accept phase

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P

7

P1 7

P1 7

Page 45: Consensus in distributed computing

A1

A2

A3

Acceptors update acc-p and acc-v

P1MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P

7

P1 7

P1 7

Page 46: Consensus in distributed computing

A1

A2

A3

Accept!

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P

Page 47: Consensus in distributed computing

A1

A2

A3

Accept!

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P

Accept!

Page 48: Consensus in distributed computing

A1

A2

A3

Accept!

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P

Accept!

Oh look, we have majority! v7 is the terminated value then!

Page 49: Consensus in distributed computing

A1

A2

A3

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

PShuddup, nobody loves you

Accept? :(

Page 50: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS - MULTI PROPOSERS

▸ What if there were multiple proposers?

▸ Brace yourself, It’s Complicated™ (not really)

Page 51: Consensus in distributed computing

A1

A2

A3

7

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P1

7

P1 7

P2

P1 7

Page 52: Consensus in distributed computing

A1

A2

A3

7

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P

7

P1 7

P1 7

ACC-P -

ACC-V -

ACC-P -

ACC-V -

P2

Page 53: Consensus in distributed computing

A1

A2

A3

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P1

P2

P1 7

P1 7

P1 7 P2 5

5

v5 is proposed with p2

Page 54: Consensus in distributed computing

A1

A2

A3

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P1

P2

P1 7

P1 7

P1 7 P2 5

ACC-P 1

ACC-V 7

5

Page 55: Consensus in distributed computing

A1

A2

A3

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P1

P2

P1 7

P1 7

P1 7 P2 7

7

value of p2 is changed to 7

Page 56: Consensus in distributed computing

A1

A2

A3

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P1

P2

P1 7

P1 7

P1 7 P2 7

Broadcast accept phase with p2 and v7

Page 57: Consensus in distributed computing

A1

A2

A3

MIN-P 2 ACC-P 1 ACC-V 7

MIN-P 2 ACC-P 1 ACC-V 7

MIN-P 2 ACC-P 1 ACC-V 7

P1

P2

P1 7

P1 7

P1 7 P2 7

P2 7

P2 7

Both proposer succeed! No blocking here.

Page 58: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

BASIC PAXOS

▸ This is BASIC Paxos: 2PC with a twist (Quorum)

▸ It has vulnerabilities!

▸ Best of 2PC (safety), with strong liveness

▸ Most Consensus Algorithm are a variant of Paxos

▸ Forms the basis of Distributed Computing research

Page 59: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

CLOSING…

▸ Basic Paxos is not Byzantine Fault Tolerant

▸ It is a challenge to create a consensus protocol (termination, agreement, validity) that is Byzantine Fault Tolerant

▸ Nakamoto Consensus (aka bitcoin consensus) skirts around Byzantine problems by imposing proof-of-work

▸ Raft is an implementation of Paxos, used in etcd and consul

Page 60: Consensus in distributed computing

PAXOS - BEST GEEKY PICKUP LINE NEVER

Ruben Tan

CONSENSUS IN DISTRIBUTED COMPUTING