© Chinese University, CSE Dept. Distributed Systems / 12 - 1 Distributed Systems Topic 12: Recovery and Fault Tolerance Computer Science & Engineering

© Chinese University, CSE Dept. Distributed Systems / 12 - 1

Distributed Systems

Topic 12: Recovery and Fault Tolerance

Computer Science & Engineering Department

The Chinese University of Hong Kong


Outline

1 Introduction

2 Transaction Recovery

3 Fault Tolerance

4 Hierarchical and Group Masking of Faults

5 CORBA Fault Tolerance Service

6 Summary


1 Introduction

Fault tolerance: the survival attribute of systems. Fault-tolerant applications:

– transaction based

– process control

Recovery aspects of distributed transactions. The design of real time services.

– Fail-stop vs Byzantine failure.

Masking failures in a service. CORBA fault tolerance service.


1 Basic Approaches

Fault Detection:– Push Model: Server objects send heartbeat

messages to Fault Manager.

– Pull Model: Fault Manager polls (or pings) server objects through their is_alive() interface.

Data Recovery:– Checkpoint and rollback: Save the server object

states. Roll back to checkpointed states at recovery.

– Message logging and replay: Log all messages. Replay them at recovery.


2 Transaction Recovery

Recovery concerns data durability (permanent and volatile data) and failure atomicity.

A server keeps data in volatile memory and records committed data in a recovery file.

Recovery manager– save data items in permanent storage

– Restore the server’s data items after a crash

– reorganize the recovery file for better performance

– reclaim storage space (in the recovery file)


2 Intentions List

An intentions list of a server is a list of data item names and values altered by a transaction.

The server uses the intentions list when a transaction commits or aborts.

When a server prepares to commit, it must have saved the intentions list in its recovery file.

The recovery files contain sufficient information to ensure the transaction is committed by all the servers.


Type of entry Description of contents of entry

Data item A value of a data item

Transaction status Transaction identifier, transaction status (prepared, committed, aborted) and others

Intentions list Transaction identifier and a sequence of intentions, each of which consists of <id

of data item>, <position in recovery file of value of data item>

2 Entries in Recovery File


2.1 Logging

A log contains history of all the transactions performed by a server.

The recovery file contains a recent snapshot of the values of all the data items in the server followed by a history of transactions.

When a server is prepared to commit, the recover manager appends all the data items in its intentions list to the recovery file.

The recovery manager associates a unique identifier with each data item.


2.1 Log for Banking Service


2.1 Recovery by Logging

Recovery of data items– Recovery manager is responsible for restoring the

server’s data items.– The most recent information is at the end of the log.– A recovery manager gets corresponding intentions

list from the recovery file.

Reorganizing the recovery file– Checkpointing: the process of writing the current

committed values (checkpoint) to a new recovery file.– Can be done periodically or right after recovery.


2.2 Shadow Versions

Shadow versions technique uses a map to locate versions of the server’s data items in a file called a version store.

The versions written by each transaction are shadows of the previous committed versions.

When prepared to commit, any changed data are appended to the version store.

When committing, a new map is made. When complete, new map replaces the old map.


2.2 Shadow Versions Example


2.2 Log and 2PC


2.3 Recovery of 2PC


3 Fault Tolerance

Two contrasting points on distributed systems:– The operation of a service depends on the correct

operation of other services.

– Joint execution of a set of servers is less likely to fail than any one of the individual components.

Designers of a service should specify its correct behavior and the way it may fail

Failure semantics: a description of the ways a service may fail. Can be used for its clients to mask its failures.


Class of failure Subclass DescriptionOmission failure A server omits to respond to a requestResponse failure Server responds incorrectly to a request

Value failure Return wrong valueState transition Has wrong effect on resources (forfailure example, sets wrong values in data items)

Timing failure Response not within a specified time intervalCrash failure Repeated omission failure: a server repeatedly

fails to respond to requests until it is restartedAmnesia-crash A server starts in its initial state, having

forgotten its state at the time of the crashPause-crash A server restarts in the state before the crashHalting-crash Server never restarts

3 Characteristics of Faults


3 Fail-Stop vs Byzantine Failures

A fail-stop server is one that fails cleanly. That is, it either functions, or else it crashes.

Byzantine failure behavior is used to describe the worse possible failure semantics of a server: it fails maliciously or arbitrarily.

Byzantine agreement is intended for correct behaviors within response time requirement in the presence of faulty hardware.

It depends on if messages can be authenticated.


3 Byzantine Generals


3 Byzantine Agreement Algorithms

Byzantine agreement algorithms send more messages and use more active servers.

When messages can be authenticated, 2N+1 servers are required to tolerate N bad servers.

When messages cannot be authenticated, 3N+1 servers are required.

With enough good servers, solutions require O(N2) messages with constant delay time.

Fortunately, the good news is ...


4 Hierarchical Masking of Faults

We describe two approaches to masking faults: hierarchical failure masking and group failure masking.

In hierarchical failure masking, a server of higher level tries to mask faults at lower-level.

When a lower-level failure cannot be masked, it is converted to a higher level exception.

Example: Server crash is masked in RR protocol by raising an exception to the client.


4 Group Failure Masking

A service can be made fault tolerant by implementing it by a group of servers.

A group is t-fault tolerant if it can tolerate up to t member failures.

For fail-stop failures, t+1 servers are needed. For Byzantine failures, 2t+1 servers needed. To ensure correctness, the server program

must be deterministic, and each operation must be atomic w.r.t. other operations.


4 Group Failure Masking

A group can be closely synchronized or loosely synchronized.

In a closely synchronized group of servers:– All members execute requests immediately.– Server programs are both deterministic and atomic.– Suitable for real time system and Byzantine failures.

In a loosely synchronized group of servers:– One server (primary) performs requests, others

(backup) log the requests and take over if needed.– Requires less resource but takes longer to recover.


5 CORBA Fault Tolerance Service

ApplicationObjects

CORBAfacilities

CORBAservices

Object Request Broker

Fault Tolerance


5 Outline of Fault Tolerant CORBA

Fault Tolerance Properties Replication Styles, Membership Styles, Consistency

Styles, Fault Monitoring Styles Infrastructure-Controlled and Application-Controlled Object Group References and Alternative

Destinations At-Most-Once Invocation (repeated requests

detected) Fault Detection and Notification Checkpointing and Logging


5 Architectural Overview


5 Fault Detectors, FaultNotifier, Fault Analyzer, and ReplicationManager


5 Property Management

interface PropertyManager {

void set_default_properties(in Properties props)

raises (InvalidProperty,UnsupportedProperty);

Properties get_default_properties();

void remove_default_properties(in Properties props)


void set_type_properties(in TypeId type_id, in Properties overrides)


Properties get_type_properties(in TypeId type_id);

void remove_type_properties(in TypeId type_id, in Properties props)

raises (InvalidProperty, UnsupportedProperty);

void set_properties_dynamically(in ObjectGroup object_group, in Properties overrides)

raises(ObjectGroupNotFound, InvalidProperty, UnsupportedProperty);

Properties get_properties(in ObjectGroup object_group)

raises(ObjectGroupNotFound);

};


5 ObjectGroupManager & GenericFactory

// Specification of ObjectGroupManager Interface

// which ReplicationManager Inherits

interface ObjectGroupManager {

ObjectGroup create_member(in ObjectGroup object_group, in Location the_location, in TypeId type_id …)

ObjectGroup add_member(in ObjectGroup object_group, in Location the_location, in Object member);

ObjectGroup remove_member(in ObjectGroup object_group,in Location the_location);

ObjectGroup set_primary_member(in ObjectGroup object_group,in Location the_location);

Locations locations_of_members(in ObjectGroup object_group);

ObjectGroup get_object_group_ref(in ObjectGroup object_group);

ObjectGroupId get_object_group_id(in ObjectGroup object_group);

};

// Specification of GenericFactory Interface

// which ReplicationManager Inherits and Application Objects Implement

interface GenericFactory {

typedef unsigned long long FactoryCreationId;

Object create_object(in TypeId type_id, in Criteria the_criteria, out FactoryCreationId factory_creation_id);

void delete_object(in FactoryCreationId factory_creation_id);

};


5 Replication Management

// Specification of ReplicationManager Interface

interface ReplicationManager :

PropertyManager, ObjectGroupManager, GenericFactory

{

void register_fault_notifier(in FaultNotifier fault_notifier);

FaultNotifier get_fault_notifier()

raises (InterfaceNotFound);

};


5 Logging and Recovery

// Specification of Checkpointable Interface// which Updateable and Application Objects Inheritinterface Checkpointable {

State get_state()raises(NoStateAvailable);

void set_state(in State s)raises(InvalidState);

};

// Specification of Updateable Interface// which Application Objects Inheritinterface Updateable : Checkpointable {

State get_update()raises(NoUpdateAvailable);

void set_update(in State s)raises(InvalidUpdate);

};


5 Fault Detection and Notification

// Specification of PullMonitorable Interface which Application Objects Inheritinterface PullMonitorable {

boolean is_alive();};// Specification of FaultNotifier Interfaceinterface FaultNotifier {

typedef string ConsumerId;void push_structured_fault(in CosNotification::StructuredEvent event);CosNotifyFilter::Filter create_subscription_filter (in string constraint_grammar)

raises (CosNotifyFilter::InvalidGrammar);ConsumerId connect_structured_fault_consumer(

in CosNotifyComm::StructuredPushConsumer push_consumer,in CosNotifyFilter::Filter filter) ;

void disconnect_consumer( in ConsumerId connection)raises(CosEventComm::Disconnected);

…};


6 Summary

Transaction recovery– long-life application and data integrity– atomic commit protocol is the key– checkpoints and logging in a recovery file

Fault tolerance– real-time application– importance of fault semantics– primary-backup server for fail-stop failures – closely synchronized group for Byzantine failures

Emerging CORBA Fault Tolerance Service

Documents

© Chinese University, CSE Dept. Distributed Systems / 12 - 1 Distributed Systems Topic 12: Recovery and Fault Tolerance Computer Science & Engineering