Byzantine fault-tolerance

COMP 413

Fall 2002

Overview

• Models– Synchronous vs. asynchronous systems– Byzantine failure model

• Secure storage with self-certifying data

• Byzantine quorums

• Byzantine state machines

Models

Synchronous system: bounded message delays (implies reliable network!)

Asynchronous system: message delays are unbounded

In practice (Internet): reasonable to assume that network failures are eventually fixed (weak synchrony assumption).

Model (cont’d)

• Data and services (state machines) can be replicated on a set of nodes R.

• Each node in R has iid probability of failing

• Can specifiy bound f on the number of nodes that can fail simultaneously

Model (cont’d)

Byzantine failures

• no assumption about nature of fault

• failed nodes can behave in arbitrary ways

• may act as intelligent adversary (compromised node), with full knowledge of the protocols

• failed nodes may conspire (act as one)

Self-certifying data

Byzantine quorums

• Data is not self-certifying (multiple writers without shared keys)

• Idea: replicate data on sufficient number of replicas (relative to f) to be able to rely on majority vote

Byzantine quorums: r/w variable

Representative problem: implement a read/write variable

Assuming no concurrent reads, writes for now

Assuming trusted clients, for now

Byzantine quorums: r/w variableHow many replicas do we need?• clearly, need at least 2f+1, so we have a majority

of good nodes• write(x): send x to all replicas, wait for

acknowledgments (must get at least f+1)• read(x): request x from all replicas, wait for

responses, take majority vote (if no concurrent writes, must get f+1 identical votes!)

Does this work? Yes, but only if

• system is synchronous (bounded msg delay)

• faulty nodes cannot forge messages (messages are authenticated!)

Now, assume

• Weak synchrony (network failures are fixed eventually)

• messages are authenticated (e.g., signed with sender’s private key)

Byzantine quorums: r/w variableLet’s try 3f+1 replicas (known lower bound)• write(x): send x to all replicas, wait for 2f+1

responses (must have at least f+1 good replicas with correct value)

• read(x): request x from all replicas, wait for 2f+1 responses, take majority vote (if no concurrent writes, must get f+1 identical votes!? – no, it is possible that the f nodes that did not respond were good nodes!)

Byzantine quorums: r/w variableLet’s try 4f+1 replicas• write(x): send x to all replicas, wait for 3f+1

responses (must have at least 2f+1 good replicas with correct value)

• read(x): request x from all replicas, wait for 3f+1 responses, take majority vote (if no concurrent writes, must get f+1 identical votes!? – no, it is possible that the f faulty nodes vote with the good nodes that have an old value of x!)

Byzantine quorums: r/w variableLet’s try 5f+1 replicas• write(x): send x to all replicas, wait for 4f+1

responses (must have at least 3f+1 good replicas with correct value)

• read(x): request x from all replicas, wait for 4f+1 responses, take majority vote (if no concurrent writes, must get f+1 identical votes!)

• Actually, can use only 5f replicas if data is written with monotonically increasing timestamps

Still rely on trusted clients• Malicious client could send different values to

replicas, or send value to less than a full quorum • To fix this, need a byzantine agreement protocols

among the replicas

Still don’t handle concurrent accesses

Still don’t handle group changes

Byzantine state machine

BFT (Castro, 2000)

• Can implement any service that behaves like a deterministic state machine

• Can tolerate malicious clients

• Safe with concurrent requests

• Requires 3f+1 replicas

• 5 rounds of messages

Byzantine state machine

• Clients send requests to one replica• Correct replicas execute all requests in same order• Atomic multicast protocol among replicas ensures

that all replicas receive and execute all requests in the same order

• Since all replicas start in same state, correct replicas produce identical result

• Client waits for f+1 identical results from different replicas

BFT protocol

BFT: Protocol overview

• Client c sends m = <REQUEST,o,t,c>σc to the primary. (o=operation,t=monotonic timestamp)

• Primary p assigns seq# n to m and sends <PRE-PREPARE,v,n,m> σp to other replicas. (v=current view, i.e., replica set)

• If replica i accepts the message, it sends <PREPARE,v,n,d,i> σi to other replicas. (d is hash of the request). Signals that i agrees to assign n to m in v.

BFT: Protocol overview

• Once replica i has a pre-prepare and 2f+1 matching prepare messages, it sends <COMMIT,v,n,d,i> σi to other replicas. At this point, correct replicas agree on an order of requests within a view.

• Once replica i has 2f+1 matching prepare and commit messages, it executes m, then sends <REPLY,v,t,c,i,r> σi to the client. (The need for this last step has to do with view changes.)

• More complexity related to view changes and garbage collection of message logs

• Public-key crypto signatures are bottleneck: a variation of the protocol uses symmetric crypto (MACs) to provide authenticated channels. (Not easy: MACs are less powerful: can’t prove authenticity to a third party!)

Byzantine fault-tolerance

Documents

BFT: Speculative Byzantine Fault Tolerance With Minimum Cost

1 Efﬁcient Byzantine Fault Tolerance - Informáticabessani/publications/tc11-minimal.pdf · 1 Efﬁcient Byzantine Fault Tolerance Giuliana Santos Veronese, Miguel Correia Member,

Byzantine Fault Tolerance - DCL

CheapBFT: Resource-efficient Byzantine Fault Tolerance · PDF fileCheapBFT: Resource-efﬁcient Byzantine Fault Tolerance Rudiger Kapitza¨ 1 Johannes Behl2 Christian Cachin3 Tobias

Distributed Algorithms Practical Byzantine Fault …disi.unitn.it/~montreso/ds/handouts17/10-pbft.pdf · Distributed Algorithms Practical Byzantine Fault Tolerance Alberto Montresor

System Reliability and Fault Tolerance Reliable Communication Byzantine Fault Tolerance

ByzID: Byzantine Fault Tolerance from Intrusion Detection

Zyzzyva: Speculative Byzantine fault tolerance - Department of

LFT: Byzantine Fault Tolerance ÉT à1 X L Ÿdocs.icon.foundation/en/whitepaper/_static/LFT.pdfLFT: Byzantine Fault Tolerance|À—Xﬂ ‰ÉT˝à1¥ iXLà‹Ÿ theloop January 2,

L-15 Fault Tolerance 1. Fault Tolerance Terminology & Background Byzantine Fault Tolerance Issues in client/server Reliable group communication 2

Leaderless Byzantine Fault Tolerance...Leaderless Byzantine Fault Tolerance by Tian Qin Research Project Submitted to the Department of Electrical Engineering and Computer Sciences,

Tolerance Practical Byzantine Fault - NUS Computing - Homerahul/allfiles/cs6234-16-pbft.pdf · Byzantine Fault Tolerance Problem Distributed computing with faulty replicas N replicas

Practical Byzantine Fault Tolerance and Proactive …pmg.csail.mit.edu/papers/bft-tocs.pdfPractical Byzantine Fault Tolerance and Proactive Recovery † 401 The rest of the article

Byzantine Fault Tolerance - University at Buffalostevko/courses/cse486/spring20/lectures/35-bft1.pdfByzantine Fault Tolerance •Can we achieve consensuswith f Byzantine faults? –But

Practical Byzantine Fault Tolerance and Proactive … · Practical Byzantine Fault Tolerance and Proactive Recovery † 399 1. INTRODUCTION We are increasingly dependent on services

Practical Byzantine Fault Tolerance Jayesh V. Salvi salvi@cs.umn

ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE

Practical Byzantine Fault Tolerancepmg.csail.mit.edu/papers/osdi99.pdf · Practical Byzantine Fault Tolerance ... and message digests produced by collision-resistanthash functions

Practical Byzantine Fault Tolerance and Proactive Recovery

Byzantine Fault Tolerance 15-440 Distributed Systems