23
Chapter 4 Chapter 4 Wenbing Zhao Wenbing Zhao Department of Electrical and Computer Department of Electrical and Computer Engineering Engineering Cleveland State University Cleveland State University [email protected] [email protected] Building Dependable Building Dependable Distributed Systems Distributed Systems Building Dependable Distributed Systems, Copyright Wenbing Zhao 1

Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University [email protected] Building Dependable Distributed Systems

Embed Size (px)

Citation preview

Page 1: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Chapter 4Chapter 4

Wenbing ZhaoWenbing ZhaoDepartment of Electrical and Computer EngineeringDepartment of Electrical and Computer Engineering

Cleveland State UniversityCleveland State University

[email protected]@ieee.org

Building Dependable Building Dependable Distributed SystemsDistributed Systems

Building Dependable Distributed Systems, Copyright Wenbing Zhao 1

Page 2: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Data and Service Replication Replication resorts to the use of space redundancy to

achieve high availability Instead of running a single copy of the service, multiple copies

are used Usually deployed across a group of physical nodes for fault

isolation

Data and service replication Usually use different approaches Transactional data replication Optimistic replication (omitted) Balance consistency and performance: CAP theorem (omitted)

Building Dependable Distributed Systems, Copyright Wenbing Zhao 2

Page 3: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Data and Service Replication Service replication: State machine replication

Each replica is modeled as a state machine: state, interface, deterministic state change via interface

Replica consistency issue: coordination needed Total order of requests to the server replicas Sequential execution of requests

Data replication: Direct access on data Operation on data: read or write Context: transaction processing => concurrent access

to replicated data essential

Building Dependable Distributed Systems, Copyright Wenbing Zhao 3

Page 4: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Service Replication State is encapsulated Clients interact with exported interfaces (APIs) Replication algorithm used to coordinate replicas (for

consistency) Fault tolerance middleware

Building Dependable Distributed Systems, Copyright Wenbing Zhao 4

Page 5: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Building Dependable Distributed Systems, Building Dependable Distributed Systems, Copyright Wenbing ZhaoCopyright Wenbing Zhao Wenbing ZhaoWenbing Zhao

Replication StylesReplication Styles Active replication

Every input (request) is executed by every replica Every replica generates the outputs (replies) Voting is needed to cope with non-fail-stop faults

Passive replication One of the replicas is designated as the primary replica Only the primary replica executes requests The state of the primary replica is transferred to the backups

periodically or after every request processing Semi-active replication

One of the replicas is designated as the leader (or primary) The leader determines the order of execution Every input is executed by every replica per the leader’s

instruction

Page 6: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Building Dependable Distributed Systems, Building Dependable Distributed Systems, Copyright Wenbing ZhaoCopyright Wenbing Zhao Wenbing ZhaoWenbing Zhao

DuplicateInvocationSuppressed

DuplicateResponsesSuppressed

Active ReplicationActive ReplicationActively Replicated

Client Object AActively Replicated

Server Object B

RM RM RM RM RM

Page 7: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Building Dependable Distributed Systems, Building Dependable Distributed Systems, Copyright Wenbing ZhaoCopyright Wenbing Zhao Wenbing ZhaoWenbing Zhao

Active Replication with Active Replication with VotingVoting

Question: to cope with f number of faults (non-malicious), how many replicas are needed?

Page 8: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Building Dependable Distributed Systems, Building Dependable Distributed Systems, Copyright Wenbing ZhaoCopyright Wenbing Zhao Wenbing ZhaoWenbing Zhao

State Transfer

State State

Response

Invocation

Passive ReplicationPassive ReplicationPassively Replicated

Client Object APassively Replicated

Server Object B

PrimaryReplica

PrimaryReplica

RMRM RM RMRM

Question: can passive replication tolerate non-fail-stop faults?

Page 9: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Building Dependable Distributed Systems, Building Dependable Distributed Systems, Copyright Wenbing ZhaoCopyright Wenbing Zhao Wenbing ZhaoWenbing Zhao

Ordering info

Ordering info Ordering info

Response

Invocation

Semi-Active ReplicationSemi-Active ReplicationSemi-Actively Replicated

Client Object ASemi-Actively Replicated

Server Object B

PrimaryReplica

PrimaryReplica

RMRM RM RMRM

Page 10: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Building Dependable Distributed Systems, Building Dependable Distributed Systems, Copyright Wenbing ZhaoCopyright Wenbing Zhao Wenbing ZhaoWenbing Zhao

Implementation of Service Replication:Ensuring Strong Replica Ensuring Strong Replica ConsistencyConsistency For active replication,

use a group communication system or a consensus algorithm that guarantees total ordering of all messages (plus deterministic processing in each replica)

Passive replication with systematic checkpointing

Semi-active replication Use two-phase commit

Page 11: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Building Dependable Distributed Systems, Building Dependable Distributed Systems, Copyright Wenbing ZhaoCopyright Wenbing Zhao Wenbing ZhaoWenbing Zhao

Total Ordering of MessagesTotal Ordering of Messages What is total ordering of messages?

All replicas receive the same set of messages in the same order Atomic multicast – If a message is delivered to one replica, it is also

delivered to all non-faulty replicas With replication, we need to ensure total ordering of messages sent by

a group of replicas to another group of replicas FIFO ordering between one sender and a group is not sufficient

m1

m2m1

m1m1

m1

m2m2

m1

Page 12: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Building Dependable Distributed Systems, Building Dependable Distributed Systems, Copyright Wenbing ZhaoCopyright Wenbing Zhao Wenbing ZhaoWenbing Zhao

Potential Sources of Non-Potential Sources of Non-determinismsdeterminisms Multithreading

The order of accesses of shared data by different threads might not be the same at different replicas

System calls/library calls A call at one replica might succeed while the same call might fail

at another replica. E.g., memory allocation, file access

Host/process specific information Host name, process id, etc. Local clocks - gettimeofday()

Interrupts Delivered and handled asynchronously – big problem

Page 13: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Data Replication

Transactional data replication Read/write ops on a set of data items within the scope

of a transaction At the transaction level, executions appear to be

sequential (One-copy serializable) Actual ops on each data item often concurrent

Optimistic data replication Eventual consistency: eventually, all updates will be

propagated to all data items

Building Dependable Distributed Systems, Copyright Wenbing Zhao 13

Page 14: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Transactional Data Replication One-copy serializable

A transactional data replication algorithm should ensure that the replicated data appear to the clients as a single copy

The interleaving of the execution of the transactions be equivalent to a sequential execution of those transactions on a single copy of the data.

Make read ops cheaper than updates: read ops are more prevalent

It is challenging to design sound replication algorithms

Building Dependable Distributed Systems, Copyright Wenbing Zhao 14

Page 15: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Wrong Data Replication Algorithms Write-all

A read op on a data item x can be mapped to any replica of x Write on x must be applied to all replicas of x

Problem: what if a replica becomes faulty? Blocking! Any single replica fault => bring down the entire

system!

Building Dependable Distributed Systems, Copyright Wenbing Zhao 15

Page 16: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Wrong Data Replication Algorithms Write-all-available

A read op on a data item x can be mapped to any replica of x Write on x is applied to available replicas of x

Problem: cannot ensure one-copy serializable execution!

Building Dependable Distributed Systems, Copyright Wenbing Zhao 16

Page 17: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Attempting to Fix Write-All-Available Problem caused by accessing the not-fully-recovered

replica => how about preventing this? Still won’t work

Ti does not precedes Tj because Tj reads y before Ti writes to y Tj does not precedes Ti because Ti reads x before Tj writes to x Ti: R(x), W(y) Tj: R(y), W(x) Hence, Ti and Tj are not serializable!

Building Dependable Distributed Systems, Copyright Wenbing Zhao 17

Page 18: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Building Dependable Distributed Systems, Copyright Wenbing Zhao 18

Page 19: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Insight to the Problem The problem is caused by the fact that conflicting

operations are performed at difference replicas We must prevent this from happening A solution: use quorum-based consensus What is a quorum?

Given a system with n processes, a quorum is formed by a subset of the processes in the system

Any two quorums must intersect in at least one process Read quorum: a quorum formed for read ops Write quorum: a quorum formed for write ops

Building Dependable Distributed Systems, Copyright Wenbing Zhao 19

Page 20: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

A Quorum-Based Replication Algorithm Basic idea:

Write ops apply to a write quorum Read ops apply to a read quorum Fault tolerance: given total number replicas N and

write quorum size W (>= read quorum size R), can tolerate up to N-W failures

Quorum rule Each replica assigned a positive weight, e.g., 1 A read quorum has a min total weight RT A write quorum has a min total weight WT RT+WT > total weight && 2WT > total weight

Building Dependable Distributed Systems, Copyright Wenbing Zhao 20

Page 21: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

A Quorum-Based Replication Algorithm Since update is applied to a quorum of replicas, we need to track which replica has the latest value => use version numbers Version number is incremented after each update

Read rule A read on data x is mapped to a read quorum replicas of x Each replica returns both the value of x and its version

number The client select the value that has the highest version

number

Building Dependable Distributed Systems, Copyright Wenbing Zhao 21

Page 22: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

A Quorum-Based Replication Algorithm Write rule A write op on data x is mapped to a write quorum replicas

of x First, retrieve version numbers from the replicas, set

v=vmax+1 for this write op Write to the replicas (in the write quorum) with new value

and version # v. A replica overwrites both the value and version number v

Building Dependable Distributed Systems, Copyright Wenbing Zhao 22

Page 23: Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems

Quorum-Based Replication Algorithm: Example

Building Dependable Distributed Systems, Copyright Wenbing Zhao 23