Fault Tolerance and Replication This power point presentation has been adapted from: (1) web.njit.edu/~gblank/cis633/Lectures/Replication.ppt

Fault Tolerance and Replication

This power point presentation has been adapted from:(1) web.njit.edu/~gblank/cis633/Lectures/Replication.ppt

Content

• Introduction• System model and the role of group

communication• Fault tolerant services• Case study: Bayou and Coda• Transaction with replicated data

Content



Introduction

• Duplicate limited or heavily loaded resources – to provide access and ensure access after failures

• Replication is important for performance enhancement, increased availability and fault tolerance.

Replication

Introduction

• Performance enhancement– Data are replicated between several originating

servers in the same domain– The workload is shared between the servers by

binding all the server IP addresses to the site’s DNS name

– It increases performance with little cost to the system

Replication

Introduction

• Increased availability– Replication is a technique for automatically

maintaining the availability of data despite server failures

– If data are replicated at two or more failure-independent servers, then client software may be able to access data at an alternative server should the default server fail or become unreachable

Replication

Introduction

• Fault tolerance– Highly available data is not necessarily providing

correct data (may be out of date)– A fault-tolerant service always guarantees the

correctness of the freshness of data supplied to the client and the effects of the client’s operations upon the data

Replication

Introduction

• Replication requirements:– Transparency

• Users should not need to be aware that data is replicated, and the performance and utility of the information retrieval should not be noticeably different from unreplicated data

– Consistency• Different copies of replicated data should be the same.

When data are changed, it is distributed to all replicated servers

Replication

Content



System Model & The Role of Group Communication

• The data in the system are composed of objects (e.g.,files, components, Java objects, etc.)

• Each logical object is implemented by a collection of physical objects called replicas, each stored on a computer.

• The replicas of a given object are not necessarily identical, at least not at any particular point in time. Some replicas may have received updates that others have not received.

Introduction


System Model


• Replica Managers (RM) – components that contain the objects on a

particular computer and perform operations on them.

• Front ends (FE)– Components that handle client’s requests

• communicate with one or more of the replica managers by message passing

• A front end may be implemented in the client’s address space, or it may be a separate process

System Model


• 5 phases in the a request upon replicated objects [Wiesmann et al. 2000]1. Front end requests service from one or more RMs which

may communicate with the other RMs. The front end may communicate through one RM or multicast to all of them.

2. RMs coordinate to prepare to execute the request. This may require ordering of the operations.

3. RMs execute the request (may be reversible later).4. RMs reach agreement on effect of the request.5. One or more RMs pass a response back to the front end.

System Model


• RM in group communication is complex, especially in the case of dynamic groups. – A group membership service may be used to

manage the addition and removal of replica managers, and detect and recover from crashes and faults.

The role of group communication


• Tasks of a Group Membership Service1. Provide an interface for group membership

changes2. Implement a failure detector3. Notify members of group membership changes4. Perform group address expansion for multicast

delivery of messages.




Join

Groupaddress

expansion

Multicastcommunication

Group

send

FailGroup membership

management

Leave

Process group

Content



Fault Tolerant Services

• Replicating data and functionality at replica managers can be used to provide a service that is correct despite process failures– A replication service is correct if it keeps

responding despite faults – Clients can’t see the difference between a service

provided by replication and one with a single copy of the data.

Introduction


• A criteria for replicated objects is linearizable– Every operation is synchronous

• Clients must wait for one operation to complete before starting another.

– A replicated shared object is sequentially consistent if for any execution interleaved operations produce a single correct copy and the order of the operations is consistent with the order in which they were performed

Introduction


• Read-only requests have no impact on the replicated object

• Update processes may need to managed properly to avoid inconsistency.

• A strategy to avoid inconsistency– Make all updates to a primary copy of the data and

copy that to the other replicas (passive replication). – If the primary fails, one of the backups is promoted

to act as primary.

Update process

Fault Tolerant ServicesPassive (primary-backup) replication


• The sequence of events when a client requests an operation 1. Request: front end issues a request with a unique

identifier to the primary replica manager.2. Coordination: primary processes request atomically,

checking ID for duplicate requests.3. Execution: request is processed and stored.4. Agreement: if an update, primary sends info to

backups, which update and acknowledge.5. Response: primary notifies front end, which passes

information to client.

Passive (primary-backup) replication


• It gives fault tolerance at a cost in performance.– high overhead to updating the replicas, so it gives

lower performance than non-replicated objects.• To solve this issue:

– Allow read-only requests to be made to backup RMs, but send all updates to the primary.

– Limited value for transaction processing systems but is very effective for decision support systems (mostly read-only requests).

Passive (primary-backup) replication

Fault Tolerant ServicesActive Replication


• Active Replication steps:1. Request: front end attaches unique ID to request and

multicasts (totally ordered, reliable) to RMs. Front end is assumed to fail only by crashing.

2. Coordination: every correct RM receives request in same total order.

3. Execution: every RM executes the request.4. Coordination: (not required due to multicast)5. Response: each RM sends response to front end, which

manages responses depending on failure assumptions and multicast algorithm.

Active Replication


• The model assumes totally ordered and reliable multicasting. – This is equivalent to solving consensus, which

requires either a synchronous system or a technique such as failure detectors in an asynchronous system.

– The model can be simplified if updates are assumed to be commutative, so that the effect of two operations is the same in any order.

• E.g. A bank account—daily deposits and withdrawals can be done in any order unless the balance goes below zero. If a process avoids overdrafts, the effects are commutative.

Active Replication

Content



Case study: Bayou and Coda

• Implementation of replication techniques to make services highly available– Giving clients access to the service (with reasonable

response times)– Fault tolerant systems send updates and all correct RMs

receive updates as soon as possible. • May be unacceptable for high availability systems. • May be desirable to increase performance by providing slower

(but still acceptable) updates with a minimal set of RMs. • Weaker consistency tends to require less agreement and

provides more availability.

Introduction


• Is an approach to high availability– Users working in a disconnected fashion can make

any updates in any partition at any time, with the updates recorded at any replica manager.

– The replica managers are required to detect and manage conflicts at the time when two partitions are rejoined and the updates are merged.

– Domain specific policies, called operational transformations, are used to resolve conflicts by giving priority to some partitions.

Bayou


• Bayou holds state values in a database to support queries and updates.

• Updates are a special case of a transaction, using the equivalent of a stored procedure to guarantee the ACID properties.

• Eventually every RM gets the same set of updates and applies them so that their databases are identical.

• However, since this is delayed, in an active system with a consistent stream of updates the databases may never really be identical.

Bayou


• Bayou Update Resolution– Updates are marked as tentative when they are first

applied to a database. – Once coordination with the other RMS makes it

possible to resolve conflicts and place the updates in a canonical order, they are committed.

– Once committed, they remain applied in their allotted order. Usually, this is achieved by designating a primary RM.

– Every update includes a dependency check and follows a merge procedure.

Bayou

Case study: Bayou and CodaBayou


• In Bayou, replication is not transparent to the application. – Knowledge of the application semantics is required to

increase data availability while maintaining a replication state that can be called eventually sequentially consistent.

• Disadvantages include increased complexity for the application programmers and the users.

• The operational transformation approach is particularly suited for groupware, where workers access documents remotely.

Bayou


• The Coda file system is a descendent of Andrew File System (AFS) – To address several requirements that AFS does not

meet – particularly the requirement to provide high availability despite disconnected operation

– It was developed in a research project at Carnegie-Mellon University

– Increasing users of AFS that use laptop:• A need to support disconnected use of replicated data

and to increase performance and availability.

Coda


• The Coda architecture:– Coda has Venus processes at the client computers and Vice

processes at the file servers. – The Vice processes are replica managers. – A set of servers holding replicas of a file volume is a volume

storage group (VSG). – Clients access a subset known as the available volume storage

group (AVSG), which varies as servers are connected or disconnected.

– Updates are distributed by broadcasting to the AVSG after a close. – If the AVSG is empty (disconnected operation) files are cached

until reconnected.

Coda


• Coda uses an optimistic replication strategy– files can be updated when the network is partitioned or during

disconnected operation.• A Coda version vector (CVV) is a timestamp that is used at

each site to determine whether there are any conflicts among updates at the time of reconnection.

• If no conflict, updates are performed. • Coda does not attempt to resolve conflicts. • If there is a conflict, the file is marked inoperable, and the

owner of the file is notified. This is done at the AVSG level, so conflicts may recur at the VSG level.

Coda

Content



Transaction with Replicated Data

• Client should see that transactions on replicated objects should appear the same as on non-replicated objects

• Client transactions are interleaved in a serially equivalent manner.

• One-copy serializability:– If replicated object transactions are performed

and the result is the similar as on a single set of objects

Introduction


• 3 replication schemes for network partition:– Available copies with validation

• Available copies replication is applied in each partition. When a partition is repaired, a validation procedure is applied and any inconsistencies are dealt with.

– Quorum consensus: • A subgroup must have a quorum (has sufficient members) in order to be

allowed to continue providing a service in the presence of a partition. When a partition is repaired (and when a replica manager restarts after a failure), replica managers get their objects up-to-date by means of recovery procedures.

– Virtual partition: • A combination of quorum consensus and available copies. If a virtual

partition has a quorum, it can use available copies replication.

Introduction


• Allows for some RMs to be unavailable.• Updates must be made to all available replicas

of the data, with provisions to restore and update a RM that has crashed.

Available copies

Transaction with Replicated DataAvailable copies


• An optimistic approach that allows updates in different partitions of a network.

• When the partition is corrected, conflicts must be detected and compensating actions must be taken.

• This approach is limited to situations in which such compensation is possible.

Available copies with validation


• Is a pessimistic approach to replicated transactions.• A quorum is a subgroup of RMs that is large enough to give it the right

to carry out transactions even if some RMs are not available. • This limits updates to a single subset of the RMs, which update other

RMs after a partition is corrected.• Gifford’s File Replication:

– a Quorum scheme in which a number of votes is assigned to each copy of a replicated file.

– A certain number of votes are required for either read or update operations, with writes limited to subsets of more than half the RMs.

– The rest of the RMs will be updated as a background task when they are available.

– Copies of data without enough read votes are considered weak copies and may be read locally with limits assumed on their currency and quality.

Quorum consensus


• This approach combines Quorum Consensus to handle partitions and Available Copies for faster read operations.

• A virtual partition is an abstraction of a real partition and contains a set of replica managers.

Virtual Partition Algorithm

Transaction with Replicated DataVirtual Partition Algorithm



• Issues:– If network partitions are intermittent, different

virtual partitions can form:• Overlapping virtual partitions violate one-copy

serializability.

– Higher logical timestamps determine the selection of consistent virtual partitions where partitions are uncommon.

Virtual Partition Algorithm


End of the Chapter …

Documents

Fault Tolerance and Replication This power point presentation has been adapted from: (1) web.njit.edu/~gblank/cis633/Lectures/Replication.ppt