15
A reactive system architecture for building fault-tolerant distributed applications q Changgui Chen a, * , Weijia Jia a, * , Wanlei Zhou b, * a Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong b School of Information Technology, Deakin University, Burwood, Vic. 3125, Australia Received 22 November 2002; accepted 28 March 2003 Available online 5 December 2003 Abstract Most fault-tolerant application programs cannot cope with constant changes in their environments and user requirements be- cause they embed policies and mechanisms together so that if the policies or mechanisms are changed the whole programs have to be changed as well. This paper presents a reactive system approach to overcoming this limitation. The reactive system concepts are an attractive paradigm for system design, development and maintenance because they separate policies from mechanisms. In the paper we propose a generic reactive system architecture and use group communication primitives to model it. We then implement it as a generic package which can be applied in any distributed applications. The system performance shows that it can be used in a distributed environment effectively. Ó 2003 Elsevier Inc. All rights reserved. Keywords: Fault-tolerant computing; Reactive systems; Group communication; Distributed applications; Software development 1. Introduction Computer systems may fail due to hardware and software faults. In many cases, such failures may have disastrous results, for instance, failures in an airline traffic control system may result in plane crashes (Cris- tian, 1991; Cristian et al., 1996). With the ever-growing dependency being placed on distributed systems, the requirement for reliability of these systems is increased. It is therefore essential to build distributed systems that can tolerate component failures. However, the development of distributed and fault- tolerant computing systems is a very difficult task. One of the reasons is that, in normal practice, most fault-toler- ant computing policies and mechanisms are deeply em- bedded into application programs; if one of the policies and mechanisms is to be changed the whole programs have to be changed as well, therefore these applications can not cope with changes in environments, policies and mechanisms (Zhou, 1999). To build better fault-tolerant distributed applications that can adapt to constant changes in environments and user requirements, it is necessary to separate fault-tolerant computing policies and mechanisms from application programs. In this paper we propose a novel approach––the re- active system approach to achieving this goal. The re- active system concepts are an attractive paradigm for system design, development and maintenance because they can separate policies from mechanisms (Chen and Zhou, 2000). Reactive systems were defined by Harel and Pnueli as systems that maintain ongoing interac- tions with their environments, rather than producing some final results on terminations (Caspi et al., 1994; Harel and Pnueli, 1985). Such systems are often con- current and distributed. Many researches have been done on the development of reactive systems since the 1990Õs (partly see Boasson, 1998; Bounabat et al., 1999; Boussinot, 1991; Harel and Shtul-Trauring, 1990; Harel q The work was supported by HK RGC, China grant nos: CityU 1055/01E and CityU 1076/00E; CityU Strategic grants nos: 7001355 and 7001446. * Corresponding authors. E-mail addresses: [email protected], [email protected] (C. Chen), [email protected] (W. Jia), [email protected] (W. Zhou). 0164-1212/03/$ - see front matter Ó 2003 Elsevier Inc. All rights reserved. doi:10.1016/S0164-1212(03)00098-0 The Journal of Systems and Software 72 (2004) 401–415 www.elsevier.com/locate/jss

A reactive system architecture for building fault-tolerant distributed applications

Embed Size (px)

Citation preview

Page 1: A reactive system architecture for building fault-tolerant distributed applications

The Journal of Systems and Software 72 (2004) 401–415

www.elsevier.com/locate/jss

A reactive system architecture for building fault-tolerantdistributed applications q

Changgui Chen a,*, Weijia Jia a,*, Wanlei Zhou b,*

a Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kongb School of Information Technology, Deakin University, Burwood, Vic. 3125, Australia

Received 22 November 2002; accepted 28 March 2003

Available online 5 December 2003

Abstract

Most fault-tolerant application programs cannot cope with constant changes in their environments and user requirements be-

cause they embed policies and mechanisms together so that if the policies or mechanisms are changed the whole programs have to be

changed as well. This paper presents a reactive system approach to overcoming this limitation. The reactive system concepts are an

attractive paradigm for system design, development and maintenance because they separate policies from mechanisms. In the paper

we propose a generic reactive system architecture and use group communication primitives to model it. We then implement it as a

generic package which can be applied in any distributed applications. The system performance shows that it can be used in a

distributed environment effectively.

� 2003 Elsevier Inc. All rights reserved.

Keywords: Fault-tolerant computing; Reactive systems; Group communication; Distributed applications; Software development

1. Introduction

Computer systems may fail due to hardware and

software faults. In many cases, such failures may have

disastrous results, for instance, failures in an airline

traffic control system may result in plane crashes (Cris-

tian, 1991; Cristian et al., 1996). With the ever-growing

dependency being placed on distributed systems, the

requirement for reliability of these systems is increased.

It is therefore essential to build distributed systems thatcan tolerate component failures.

However, the development of distributed and fault-

tolerant computing systems is a very difficult task. One of

the reasons is that, in normal practice, most fault-toler-

ant computing policies and mechanisms are deeply em-

qThe work was supported by HK RGC, China grant nos: CityU

1055/01E and CityU 1076/00E; CityU Strategic grants nos: 7001355

and 7001446.*Corresponding authors.

E-mail addresses: [email protected], [email protected] (C.

Chen), [email protected] (W. Jia), [email protected] (W.

Zhou).

0164-1212/03/$ - see front matter � 2003 Elsevier Inc. All rights reserved.

doi:10.1016/S0164-1212(03)00098-0

bedded into application programs; if one of the policies

and mechanisms is to be changed the whole programshave to be changed as well, therefore these applications

can not cope with changes in environments, policies and

mechanisms (Zhou, 1999). To build better fault-tolerant

distributed applications that can adapt to constant

changes in environments and user requirements, it is

necessary to separate fault-tolerant computing policies

and mechanisms from application programs.

In this paper we propose a novel approach––the re-active system approach to achieving this goal. The re-

active system concepts are an attractive paradigm for

system design, development and maintenance because

they can separate policies from mechanisms (Chen and

Zhou, 2000). Reactive systems were defined by Harel

and Pnueli as systems that maintain ongoing interac-

tions with their environments, rather than producing

some final results on terminations (Caspi et al., 1994;Harel and Pnueli, 1985). Such systems are often con-

current and distributed. Many researches have been

done on the development of reactive systems since the

1990�s (partly see Boasson, 1998; Bounabat et al., 1999;

Boussinot, 1991; Harel and Shtul-Trauring, 1990; Harel

Page 2: A reactive system architecture for building fault-tolerant distributed applications

DMM DMM

Sensor Sensor Actuator Actuator

Applicationobject

Applicationobject

Policies

Mechanisms

Applications

... ...

... ...

Fig. 1. The generic reactive system architecture.

402 C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415

and Politi, 1998; Quintero, 1996; Systa, 1996), but most

of them are concentrated on process control (such as

controlling a robot) and also lack of modularity. They

mainly stress on system controllers� behaviors and have

no emphasis on the mechanisms to obtaining informa-

tion or responding to outside events. If the control al-gorithms for a system are changed, the whole system has

to be changed. Thus, these methods do not provide a

flexible system architecture to tolerate component fail-

ures.

In our research, we try to apply the reactive system

concepts in developing fault-tolerant distributed appli-

cations. We propose a generic reactive system architec-

ture model, and use the Agent (Bounabat et al., 1999;Bussmann and Demazeau, 1994) and Actor (Agha, 1986;

Quintero, 1996) concepts to build the reactive modules.

The model is constructed with group communication

mechanisms and implemented as a generic package with

the Java language. We also evaluate the system perfor-

mance to demonstrate its effectiveness and potential

benefits when it is used in a distributed environment.

The rest of paper is organized as follows: in the nextsection, we introduce the reactive system model. The

group communication services used in the system are

addressed in Section 3. Section 4 discusses several im-

plementation issues. In Section 5 we evaluate the system

performance. An application example is given in Section

6. Finally we summarize our work in Section 7.

2. The reactive system model

2.1. The generic reactive system architecture

A reactive system is composed of a number of com-

ponents, such as controllers, sensors, actuators, and

physical objects in its environment, which may run in

parallel (Boasson, 1996). Sensors are used to acquaintenvironment information, which will be sent to its

controllers. The controllers make certain decisions based

on the predefined policies and the information received

from sensors. To maintain the interaction with its en-

vironment, the system uses actuators to react to its en-

vironment by changing the states of application objects

according to the decisions received from the controllers.

Therefore, we can say that a reactive system uses sensorsand actuators to implement the mechanisms that inter-

act with its environment or applications; its controllers,

we call them decision-making managers (DMMs), are

used to implement the policies regarding to the control

of the applications (Boasson, 1993). Hence we obtain a

generic reactive system architecture, as depicted in Fig.

1, where DMMs represent decision-making managers;

sensors and actuators connect with the DMMs andapplication objects by receiving inputs or sending out-

puts of the system.

In this model, sensors can be attached to applications

to obtain their states (or equivalently, to monitor events

about the applications). These events are sent to DMMs

which will make certain decisions and react to them by

using actuators to change the states of the applications.This model represents a generic reactive system and

consists of three levels: policies, mechanisms and appli-

cations:

• Policies––The policy level deals with the system poli-

cies regarding to the control of application objects.

For example, in fault-tolerant computing, it may de-

termine what strategies are used in detecting compo-nent failures, what information is to be collected from

the application programs, and what techniques are

used in masking and/or tolerating component fail-

ures. These policies are implemented through

DMMs.

• Mechanisms––The mechanism level deals with all

mechanisms for implementing policies such as fault-

tolerant computing strategies. For example, it dealswith mechanisms used in detecting and reporting

component failures, and mechanisms used in masking

and recovering from component failures. These

mechanisms are implemented through sensors and

actuators.

• Applications––The application level deals with issues

about fault-tolerant computing application objects,

such as database servers, replicas, network routers,etc.

The major advantage of this model is the separation

of policies and mechanisms, i.e., if a policy is changed it

may have no impact on related mechanisms and vice

versa. For example, if a decision making condition

based on two sensors was ‘‘AND’’ and now is changed

to ‘‘OR’’, the sensors can still be used without anychanges required, i.e., the mechanism level can remain

unchanged. This advantage will lead to a better software

architecture and has a great significance in developing

fault-tolerant and distributed computing applications

since it can separate fault-tolerant computing policies

and mechanisms from applications (Edwards et al.,

1997; Zhou, 1999).

Page 3: A reactive system architecture for building fault-tolerant distributed applications

dec

Decision

Signaling

P External state

Fig. 2. A DMM agent.

C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415 403

2.2. Reactive modules

The reactive system model mainly consists of

DMMs, sensors, actuators and application objects.

Since these components are all active (they form the

nature of the reactive system), we introduce the Actor

concept, which was originally proposed by Hewitt

(1977), and later developed by Agha (1986), to model

them. In general, an actor represents an active object

that has a clearly defined purpose, which is an ab-

straction, or distillation, of its various functional ca-

pabilities. This is both implied and enforced by the

encapsulation shell of the actor. This shell suggests that

the contained functionality is to be viewed as a con-ceptual unit. An actor may have its own execution

thread and can, therefore, operate concurrently with

other active objects in its domain. Actor is the basic

concept in the real-time object-oriented modelling

(ROOM) (Selic et al., 1994) and it can be used to

model reactive components.

However, a DMM in a reactive system is more like an

agent. It has knowledge-based abilities to make deci-sions and change states of the system, thus DMMs can

be considered as decisional agents. Therefore, in this

paper we model a reactive system as a distributed

computing system consisting of DMM agents and sen-

sor/actuator actors.

2.2.1. DMM agents

The purpose of a DMM agent is to make decisionsaccording to the predefined policies and the collected

information from sensors, and then send them to ac-

tuators. The model of DMM agents is built on the

decisional model (Bussmann and Demazeau, 1994) al-

lowing the representation of objects according to their

behavioral aspects and their degree of intelligence.

Definition 1. A DMM agent is a 5-tuple hE; S;D; P ; deci,where

• E: set of external states received from sensors. Each

one represents at any given time an object state from

the environment.

• S: set of signaling received by the agent. Each signal-

ing reflects at any given time the state of the

controlled tools or objects used to achieve a specificgoal.

• D: set of decisions generated by the agent. Each deci-

sion is a solution concerning process behavior in the

future.

• P : agent�s control policies. Each decision is made ac-

cording to the predefined policies.

The sets above indicate the received events ðE; SÞ, theemitted (output) events ðDÞ, and the internal events ðP Þ.The decisional function dec is described as follows.

Decisional function. dec is a decisional function that

define the behavior of a DMM agent.

dec : E � S � P ! D ðe; s; pÞ ! d with

decðe; s; pÞ ¼ d ) ½e ^ s ^ p $ d�

where fi stands for ‘‘leads to’’ and $ means ‘‘simulta-

neous’’ (same as the following). This means that de-

pending on a predefined policy p, and as soon as thereceipt of an external object state e and a signaling s,corresponding decision d is instantaneously produced by

the function dec. The structure of the DMM agent can

be shown in Fig. 2.

2.2.2. Sensor actors

Sensors/actuators are modelled as actors. The pur-

pose of a sensor is to capture the state information ofapplication objects and then report to its subscribers.

For instance, it may either report to its subscribers

immediately once an event occurs (called event sensor),or periodically check the state of an application and

then report to the subscribers (called polling sensor). Ageneric sensor actor has a main function: sen

sen : A ! E; a ! e with senðaÞ ¼ e ) ½a $ e�

where

• A: set of actions exerted on the system and captured

by the sensor. Each action is undergone by an outside

object.

• E: set of external states delivered by the sensor. Each

one represents at any given time an object state from

the environment.

This means that depending on an outside action a,corresponding state e (of external objects) is instanta-

neously produced by the function sen.

2.2.3. Actuator actors

The purpose of an actuator is to perform actions

according to decisions received from DMMs. For in-

stance, it may perform the task of setting a value, or

activating a buddy server, or switching a light, etc. Ageneric actuator actor can be described by a function:

act

act : D ! A; d ! a with actðdÞ ¼ a ) ½d $ a�

Page 4: A reactive system architecture for building fault-tolerant distributed applications

404 C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415

where

• D: set of decisions received from DMM agents. Each

decision is a solution concerning process behavior in

the future.

• A: set of actions emitted by the actuator. Each actionwill be carried out to change the states of the con-

trolled tools or objects.

This means that depending on a decision d made by a

DMM, corresponding action a is instantaneously pro-

duced by the function act.

2.2.4. Communication protocols

Each agent or actor provides one or more openings or

interface components, which we call ports, to commu-

nicate with other entities in its environment. The mes-

sage exchange between these ports complies with access

protocols. We give the protocols on the DMM�s ports inthe following. Sensors and actuators� protocols are

similar to these.

A DMM agent has a set of ports to communicatewith other objects. The access protocols on the input

and output ports are defined as follows:

protocol class Input

in: {{signal, Message}, {error, ErrorCode}}

out: {{enable, Command}, {disable, Command}}

protocol class Output

in: {done, Message}out: {{decision, Policy}, {exstate, Message},

{error, ErrorCode}}

where ‘‘Input’’ and ‘‘Output’’ are protocol class names,

and the pairs {signal, data-type} specifies individual

message data. The first parameter represents the content

of appropriate data object, and the last one represents

the name of the appropriate data type.

3. Group communication services

In the reactive system model we proposed above, a

DMM may subscribe to multiple sensors and actuators,

and a sensor/actuator may report to (or receive from)

multiple DMMs as well (but attached to one applicationonly). The communications between these reactive

modules are very complicated. One of the most impor-

tant issues in these communications is the ordering of

messages delivery. The different arrival order of mes-

sages from sensors to DMMs (or from DMMs to ac-

tuators) may cause the system generating different

results, for instance, in a traffic light control system,

different order of vehicle arrival in an intersection willhave different effects on traffic lights switching. Com-

ponent failure is another important issue that has to be

addressed to make fault-tolerance possible in the reac-

tive system. The correctness of the reactive system model

is, therefore, subject to different ordering of message

delivery and component failures.

To solve these problems we can use group commu-

nication services such as message ordering and multicastservice, which provide a useful infrastructure for fault

tolerance in distributed applications (Birman and van

Renesse, 1994). A reactive system is a proper candidate

for using group communication primitives, since reports

originated from sensors need to be multicasted to a

group of DMMs, and so do decisions from DMMs to a

group of actuators.

3.1. Ordering constraints

Without a mechanism to ensure ordered delivery of

messages, when two originators multicast to a group at

about the same time, their messages may not arrive in

the same relative order at members of the group. For

instance, this may happen if one of the messages is

dropped by one of the recipients and has to be re-transmitted. This is the result of different network la-

tencies on communication links between the members

and different speeds of machines on which the group

members are running.

In general, ordering constraints can be categorised

into four types: FIFO, causal, total and total + causal

(Birman, 1993) to reflect different semantical require-

ments of message delivery. Among these ordering con-straints, FIFO is the least, and total ordering is the

strictest and also most expensive. In the reactive system

model, FIFO ordering is understood as reports sent by

the same sensor are to be processed in DMMs in the

order they are sent, and so do decisions sent by the same

DMM in actuators. Causal ordering is that, if two

messages have the nature of a cause–effect (happened-before) relation, this relation should be kept at allmembers. Cause–effect relation is understood in the re-

active system as the relation between decisions and re-

lated reports based on which these decisions were made.

Since decisions (sent to actuators) and reports (sent to

DMMs) never reach the same component, we do not

consider this ordering in the reactive system model. Total

ordering in the model means that all messages from

different sensors are delivered to every DMM in the sameorder, or all decisions from different DMMs are deliv-

ered to every actuator in the same order.

To ensure the correct semantics of the reactive sys-

tem, a sensible arriving order of message delivery has to

be assigned and enforced over the reactive components

group, in other words, message delivery among DMMs,

sensors and actuators should have ordering constraints.

At least the FIFO ordering is the default constraintamong them. This is to guarantee that all DMMs get the

correct information, and applications get the correct

Page 5: A reactive system architecture for building fault-tolerant distributed applications

C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415 405

responses of their inputs. Since the order of messages

from different sensors may be a fatal factor to affect

DMMs to make decisions and the order of decisions is

crucial to actuators as well, it is necessary to apply the

total ordering in the reactive system model.

A reactive system mainly has two types of commu-nication group for multicasting, one is sensor-to-DMM

group for sensors multicasting reports to groups of

DMMs and the other is DMM-to-actuator group for

DMMs multicasting decisions to groups of actuators.

We apply these two types of group with the total or-

dering constraint, i.e., all messages from different sen-

sors are delivered to every DMM in the same order, or

all decisions from different DMMs are delivered to everyactuator in the same order. This constraint will be

achieved by implementing the corresponding ordering

protocol addressed in Section 4.

3.2. Atomic multicast service

A component failure may lead to incomplete com-

munication in the system, for example, if a sensor failsduring the transmission of its report, some DMMs may

receive its report whereas some may not. This may lead

to a conflict situation if those DMMs receiving the re-

port are making decisions whereas other DMMs are not.

Similar situation happens when a DMM fails during

sending its decisions to actuators. Therefore, the com-

plete communication must be guaranteed in the case of

component failures. We define the completeness re-quirement as follows: if a DMM receives a report from a

sensor, all other DMMs that subscribe to this sensor

should receive it as well; similarly, all actuators that are

appointed by a DMM should receive a decision from the

DMM if any one of them receives it.

This property is to guarantee that all DMMs receive

the same set of reports from a sensor so that they all can

have this information to make decisions, and all actua-tors can receive the same decisions from their DMMs,

even in the case of component failures. This property is

achieved through the atomic multicast service, which

ensures that a multicast primitive either sends messages

to all receivers or none of them, i.e. a property known as

multicasting atomicity or atomic multicast (Birman et al.,

1991).

An atomic multicasting protocol has to guarantee all-or-none semantics. It is a protocol that tolerates the

sender�s failure in the middle of multicasting. The pro-

tocol should guarantee two properties: (1) in the absence

of sender�s failure, a multicast is received by all opera-

tional members; (2) in the presence of sender�s failure, amulticast is either received by all operational members

or by none of them. Note that we use the term opera-

tional members instead of members to ensure that themulticasting will proceed despite any receiver�s failure

during the transmission. We will implement the multi-

casting atomicity protocol in the reactive system model

to guarantee the complete communication in Section 4.

3.3. Fault tolerance in the reactive system

The objective of our reactive system model is to de-velop fault-tolerant computing applications. We have to

make the reactive system model itself fault-tolerant, i.e.,

the system can continue to work and maintain the cor-

rectness even in the presence of component failures. We

give the following resolutions in the case of DMM,

sensor or actuator failures:

• A DMM crash. A DMM is most likely attached to aserver replica or runs independently on a host. Since

most reactive systems are distributed systems and the

replication technology is widely used in such systems,

we can assume that there are at least two DMMs (in-

cluding a replica) running in a reactive system. If a

DMM crashes, its function will be replaced by an-

other DMM attached to a replica until it recovers.

In the actuator group it assigned to, certain protocolhas to be invoked to ensure all the actuators receive

the complete decisions.

• A sensor/actuator crash. A sensor/actuator is either

running independently with an application object or

directly embedded into the application. If it is embed-

ded into the application, the sensor/actuator fails

only when the application crashes. In this case, the

system has a predefined policy for dealing with appli-cation object failures. If the sensor/actuator is a

stand-alone entity, we have the following policy to

deal with its failure. We choose a DMM as a coordi-

nator in the system and attach an embedded sensor/

actuator which has the same function with the

stand-alone sensor/actuator to each application ob-

ject. These embedded sensors/actuators are sub-

scribed by the coordinator only, i.e., they report tothe coordinator DMM only (the coordinator has its

functions normally). Therefore, once a stand-alone

sensor/actuator fails, its subscribers will notify the co-

ordinator and then it will activate the embedded sen-

sor/actuator attached to the application. Then the

DMMs subscribing to the failed sensor/actuator will

communicate with the embedded sensor/actuator

via the coordinator until the failed sensor/actuator re-covers. Of course, certain protocol has to be invoked

to ensure the subscribers receive the complete infor-

mation from the failed sensor.

By using these strategies we can maintain the system

continuing to work even in the case of a DMM, a sensor

or an actuator failure. To maintain the system consis-

tency, we have to ensure the complete communicationbetween the failed component and the rest of other

components in all cases.

Page 6: A reactive system architecture for building fault-tolerant distributed applications

406 C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415

3.4. Membership management

Both reliable and ordered multicast services are im-

plemented through the group membership management.

Membership is the information about a group that

shows who are currently in the group, in other words,who are operational apparently. It can be changed by

members leaving or joining, especially by crash leaving.When the membership changes, the system may suffer

from incorrect scenarios. For example, a sensor crash

failure leads to a membership change and causes its

subscribers receiving incomplete reports and the system

generating errors. Therefore, it is necessary to develop

relevant protocols to maintain the system consistencywhen the membership changes. The group membership

management is a mechanism to make the system con-

sistent and fault-tolerant.

Group construction is very cumbersome in a reactive

system. As mentioned above, there are two types of

group in the system, one is sensor-to-DMM group and

the other is DMM-to-actuator group. In the case that

different sensors are subscribed by different DMMs, ordifferent DMMs appoint to different actuators, we have

to build a number of groups based on each sensor and

its subscribers, or each DMM and its actuators. Sensors

subscribed by same DMMs will form a sensor-to-DMM

group, and DMMs appointing to same actuators will

form a DMM-to-actuator group. In each of these

groups, the group members are not equally weighted,

i.e., only part of group members are senders to multicastmessages to the rest of members which are only receiv-

ers. It is, therefore, essential to differentiate the group

members when membership changes. The membership

management protocol will also be discussed in Section 4.

4. Implementation issues

The implementation of the reactive system model

includes implementing three reactive modules: DMM,

sensor and actuator, respectively. Their design given

above is very generic so we can implement them as ge-

neric classes using the Java language. Java virtual ma-

chines, which are rapidly becoming available on every

computing platform, provide a virtual, homogeneous

platform for distributed and parallel computing on aglobal scale (Arnold and Gosling, 1996; Gosling, 1997).

The implementation of the reactive system involves

the reactive control protocols and the network applica-

tion programming interfaces (API). The network API

often includes the primitives to both reliable and unre-

liable point-to-point network communications. Java

provides these two communication primitives: multicast

data-gram and stream-based communications, thereforewe will implement the generic DMM, sensor and actu-

ator classes with these two communication patterns.

The reactive control protocols include all protocols

used to implement group communication services dis-

cussed in Section 3. They are implemented as objects

based on the network API primitives, and embedded in

the generic classes. The generic Java classes also include

a number of utility objects which are at the lowest layerand manage basic resources such as group views, com-

munication channels and threads, etc. Thus the Java

reactive classes can be understood as composite objects

consisting of multi-layered objects.

4.1. Two communication patterns

4.1.1. Multicast datagram communication

The multicast datagram method is a communication

mechanism used in the UDP protocol and the group

communication. It uses multicast datagrams to imple-

ment communication between entities within a single

thread entity. Using the single thread of control, a dat-

agram can be sent out onto a subnet, where a group of

entities are located in and receiving it, with an address

reserved for multicasting, whenever a specific event oc-curs. Other entities can connect to the subnet simply by

creating a multicast socket and join in the group.

In order to avoid packets being flooded onto other

networks, Java multicast datagrams include a TTL (time

to live) field in the header to reduce packets unneces-

sarily circulating and congesting networks. The pro-

grammer can increase the TTL allowing the packet to

travel further, for example, the TTL could be:

• 0: all subscribers on the same host machine receive

datagrams from a sensor.

• 1: all subscribers on the same subnet receive data-

grams from a sensor.

• 32: all subscribers located on any of the subnets to re-

ceive packets from sensors also located on any of the

subnets (not necessarily on the same subnet).

However, many routers provide only limited support

for multicasting. Packets are allowed to be multicasted

within a subnet, but not to pass through the router into

other subnets. This may become a problem in the case

that a DMM subscribes to multiple sensors or a sensor

reports to multiple DMMs, where multiple entities are

located on different subnets. To overcome this, a tun-nelling approach has been invoked, as shown in Fig. 3.

The tunnelling approach builds a tunnel between two

subnets and establishes two software entities, one lo-

cated on a subnet, and another on a remote subnet. On

the first subnet, a courier joins the multicast group, and

retransmits the locally multicasting packets using either

datagrams or sockets to another remote subnet. At the

other end of the tunnel, a publisher receives thesepackets and multicasts them onto the remote subnet

using the same address/port as the class.

Page 7: A reactive system architecture for building fault-tolerant distributed applications

Fig. 3. Tunnelling multicast packets between subnets.

C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415 407

4.1.2. Stream-based communication

The stream-based communication can provide reli-

able communication between entities. Current TCP

protocol used in Internet adopts this communication

method. Since each stream-based communication is

dedicated to one connection between two entities, a

stream-based class must be implemented as a multi-

threaded entity to handle multiple connections with a

group of entities. Using the Java multiple threads, amessage is sent out to each receiver using dedicated

connections established between a sender and receivers.

Each connection is handled by its own thread of exe-

cution. Other entities can subscribe to this sender by

simply requesting a connection to it, which will be

handled by a new thread.

It is required that a sensor (or DMM) class sends its

reports (or decisions) to its multiple subscribers (or ac-tuators) simultaneously. In order to achieve this in the

stream-based sensor or DMM class, we invoke the

ThreadGroup method from Java. This method uses a

ThreadGroup object to place and control a number of

threads in a class, i.e. each DMM or sensor creates a

ThreadGroup object, into which each new thread cre-

ated to handle a connection is placed so as it can control

the processing of these threads. Using this method, thesensor (or DMM) can invoke each thread in the group

to send reports (or decisions) to its subscribers (or ac-

tuators) at the same time, rather than the threads having

to send to them individually and asynchronously. Fig. 4

depicts the architecture of a stream-based sensor class.

Fig. 4. The stream-based sensor architecture.

4.2. Group communication protocols

4.2.1. Total ordering protocol

There are generally two algorithms used to implement

total ordering. One is to generate a unique sequence

number (USN) for each message delivery (Birman andvan Renesse, 1994). Thus, messages can be processed in

a unique sequential order group wide. The other one is

the token-ring algorithm. The USN algorithm has two

approaches as well: centralised approach and distributed

approach. We adopt the simple approach––the centra-lised sequencer for implementing the total ordering of

message delivery.

Centralised sequencer is a straightforward techniqueby allowing one member to be the USN generator

(named sequencer). When a message is originated from a

component, the component sends a USN request to the

sequencer. The sequencer simply keeps a USN counter

that increases by 1 each time a USN request is received.

The value of the USN counter is then returned to the

component. Then the component attaches the USN to

the message which is sent to other components later on.We use a supervisory DMM as the sequencer. To be

able to decide a total-ordering operation is deliverable,

each member keeps a variable of USN major in its local

space to record the maximum USN executed so far. If a

total-ordering operation arrived holds the next USN,

then this operation is ready to be executed. Otherwise

the operation is deferred until lower USN operations are

performed. Here we give the full algorithm for thecentralized sequencer method:

Listing 4.1––Assigning the USN protocol

Rule 1: Acquiring a USN from the sequencer. At each

member site:

while (true) {

receive (application, m); //m is a total-ordering

message.

send (the sequencer, USN-request);

receive (the sequencer, USN-reply);

m.USN:¼USN-reply.USN;multicast (m);

}

Rule 2: Assigning the USN to a member’s request. At the

sequencer site:

int USN counter:¼ 0;

while (true) {

receive (member, USN-request);

USN counter:¼USN counter+1;USN-reply:¼USN counter;

send (the member, USN-reply);

}

Rule 3: Checking if a totally-ordered message is execut-able. At each member site:

int USN major:¼ 0;

Page 8: A reactive system architecture for building fault-tolerant distributed applications

408 C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415

while (true) {

receive (member, m);if m.USN¼ ¼USN major+1

m is executable;

else m is deferred;

}

4.2.2. Multicasting atomicity protocol

The atomic multicast protocol is used to ensure the

complete communication between group members even

in the case of member crashes. It is implemented by

letting a sender multicast sequentially, i.e. one multicast

after another. The sender ensures that before starting a

new multicast, the previous one is confirmed to havereached all operational members. We refer to a multicast

that may not be received by all receivers as an unstablemulticast, and a stable multicast means that the multi-

cast is received by all operational members. The imple-

mentation is based on the assumption that a reliable

FIFO communication can be established between group

members. This is achievable by creating a TCP/IP reli-

able channel between any two members.The basic idea of achieving multicasting atomicity is

to let operational receivers exchange their latest multi-

cast received from the crashed member. The atomic

multicast protocol only deals with sender�s failures. We

first choose a DMM as a coordinator in a group (sensor-

to-DMM or DMM-to-actuator). Let G ¼ fg1; . . . ; gngðn > 2Þ be the membership of the group. At the member

gi, i ¼ 1; . . . ; n, it keeps a vector of the latest multicastfrom every senders: mcast½g1; . . . ; gm�, where mcast½gk� isthe latest multicast message from gk, k ¼ 1; . . . ;m;m < n. If gi is a sender, its vector is null. Upon the re-

ception of a message mgki from site gk at gi,

mcast½gk� :¼ mgki .

When a member first detects the failure of a sender gk(not receiving any message from it within a maximum

time frame), it sends an matomðgkÞ message indicating thefailure of gk to the coordinator. Upon receiving this

message, the coordinator initiates the multicasting

atomicity protocol by multicasting this message to the

group. Each group member that receives messages from

gk will reply to the coordinator by attaching its latest

message received from gk after receiving the matomðgkÞfrom the coordinator. The coordinator then collects

these messages that each member received from gk. Sincemulticasts are sent in a sequential order, the collected

messages will be either the mgki or the mgk

iþ1. Based on this

information, the coordinator is able to conclude that the

mgkiþ1 is the unstable message, and finally multicast it to

all of these operational members. Then these opera-

tional members receive the latest multicast from the

failed member gk. We give the full multicasting atom-

icity protocol as follows. The protocol starts when thefirst member detects the crash of gk and then sends an

matomðgkÞ message to the coordinator.

Listing 4.2––Atomic multicasting protocol

Step 1: The coordinator initializes the protocol by

multicasting the matomðgkÞ to all group members.

Step 2: Upon receiving the matomðgkÞ from the coordi-

nator, each non-coordinator member gi, that has acommunication channel with gk and receives mes-

sages from it, replies an matom-replyðgi;mcast½gk�Þ mes-

sage to the coordinator. mcast½gk� is the latest

multicast received from gk and kept at gi.Step 3: Upon receiving messages of matom-reply

ðgi;mcast½gk�Þ from all non-coordinator members,

the coordinator concludes that the unstable multicast

is the ðmgkiþ1Þ from the crashed member, and multi-

casts an matom-commitðmgkiþ1Þ message to the group.

Step 4:Upon receiving the matom-commitðmgkiþ1Þ from the

coordinator, each non-coordinator member that re-

ceived messages from gk adds the mgkiþ1 to its message

buffer.

The advantage of this protocol is that, in the absence

of failure, no extra message is needed for preventing thesender failure, except an n-vector is used at each member

to store the latest multicasts received from other group

members.

4.2.3. Membership management protocol

We use an object view to represent the membership.

The view is initialised to contain a set of initial members

fg1; . . . ; gng, and is encapsulated with two basic opera-tions, leaveGroup() and joinGroup(), to remove or add a

member from (or to) the view, respectively. View change

events are delivered in total order at all members.

Members start from the same initial view––view0. The

ith view is viewi, i ¼ 0; 1; 2; . . . ; viewi and viewiþ1 are

different only by an addition or a deletion of one

member. A view update event (join/leave) is sent to the

group coordinator first, from where the event is multi-casted to other members by the coordinator with viewiþ1

replacing viewi, i.e. changing viewi to viewiþ1. This will

ensure the same set of view change operations to be

delivered in total order at all members. Wang (1999)

presents the detailed protocols for membership changes

by a join/leave event including a crash leaving.

4.3. Reactive classes

4.3.1. DMM classes

The generic DMM class implements the following

functions: first, it subscribes to sensors by establishing

connections with the sensors and then waits for reports

from them. Upon receiving reports from sensors, it will

process them and make decisions according to the pre-

defined policy. The decisions will then be sent to therelated actuators to change the relevant applications�states. The generic DMM class leaves reports processing

Page 9: A reactive system architecture for building fault-tolerant distributed applications

C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415 409

and decisions making empty––that will be implemented

by specific DMMs.

Both the multicast DMM and stream-based DMM

classes are implemented. The multicast DMM uses one

major thread of control to achieve the above functions

while the stream-based DMM class will be imple-mented as a multi-threaded entity consisting of multi-

ple objects each of which handles a specific task. Utility

objects in the DMM includes following:

• Communication handler. This object sets up the con-

nections between this DMM and the sensors/actua-

tors in the groups it subscribes to. Multicasting

primitives are methods encapsulated in this object.• Group manager. A group object managing the lo-

cal views of an operational component. It basi-

cally provides two methods, leaveGroup() and

joinGroup(), to remove a group member when it

leaves and to add a member when a new one joins

respectively.

• USN and USNAssignor. USNAssignor assigns the

USN to each decision sent to different actuators.The USN checks the deliverability of each report re-

ceived from sensors. These two objects implement the

USN protocol described by Listing 4.1.

The DMM class also has some mini-protocol objects

which are based on utility objects to implement rela-

tively smaller protocols, such as multicast atomicity

protocol, member joining and leaving protocols formembership management, etc.

• StateTransfer. It implements the state transfer proto-

col. The protocol guarantees a new member joining

the group in a consistent way.

• CrashAtomicity. It performs the crash atomicity pro-

tocol (Listing 4.2) that guarantees the atomic multi-

casting property when a member crashes. It alsoupdates each member�s local view to remove the

crashed component.

• VoluntaryLeave. It performs the voluntary leave pro-

tocol and updates each member�s local view to re-

move the voluntarily leaving member.

The root DMM object is constructed by aggregating

and initializing the necessary lower layer objects whichare instantiated from their classes, and linking the main

service of DMM such as information processing and

decision making, etc.

4.3.2. Sensor/actuator classes

The generic Java sensor class implements the func-

tions of monitoring events and reporting to DMMs. It

can be subscribed by many other entities and capable ofreporting to them simultaneously. The Java sensor first

builds connections with its subscribers and then moni-

tors for events and reports to its subscribers once events

occur. The generic sensor class will leave information

capturing empty that will be implemented by specific

sensors.

The generic Java actuator class implements the

functions of receiving decisions from DMMs and thenperforming actions to change the states of applications.

It can be subscribed by multiple DMMs and able to

receive decisions from them and execute them one by

one. In the actuator class, there is no need to deal with

the synchronization problem.

Both multicast and stream-based sensor/actuator

classes are implemented. They are similar to the

DMM�s, i.e., they have same objects as it does and onlythe contents of these objects are different.

5. Performance study

We have conducted a series of tests to evaluate the

performance of the above Java DMM, sensor and ac-

tuator classes. The purpose of these tests is to determinethe response time of a reactive system to outside events,

i.e., the time used by DMMs, sensors and actuators to

respond to the outside inputs. According to the time

they take, we can evaluate the effectiveness of the system

running in different environments.

5.1. Test description

The evaluation is based on the average response time(ART) of the system over input events. The simplest

situation is that the system only consists of one DMM,

sensor and actuator respectively. The response time we

measure for this case starts from the sensor being trig-

gered and then reporting an event to the DMM, and

ends at the actuator receiving a decision message from

the DMM. The sensor is embedded into a test applica-tion that triggers the sensor to report to the DMM when

an empty event (a message) has occurred. The DMM

makes a response by simply producing a decision mes-

sage and then sends it to the actuator. The actuator does

nothing but sending an acknowledgment message back

to the DMM. For simplicity, we embed sensor and ac-

tuator together so a sensor entity has both functions of

sensor and actuator.In cases of multiple DMMs and sensors composition,

we take an average time from those results for each

DMM and sensor. In the tests with a single sensor and

multiple DMMs composition, the sensor measures the

time which it needs to report to and (actuator) receive

from each DMM; then this time is added to a total and

divided by the number of DMMs to the sensor. This

provides an average overhead associated with the in-creasing number of DMMs to this single sensor. In the

tests with multiple sensors and DMMs composition, we

Page 10: A reactive system architecture for building fault-tolerant distributed applications

Fig. 5. The multicasting ART without the ordering constraint.

Fig. 6. The ART of the multicasting communication.

410 C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415

first measure an average overhead for each sensor ac-

cording to the above method, then we can get the

average time for multiple sensors by adding all the

overheads to a total then divided by the number of

sensors.

We have conducted three groups of test, each in adifferent distribution environment, say a local host, a

local subnet and remote subnets respectively. Under

each of these groups, we also conducted two groups of

test: the first group using the multicast datagram com-

munication and the second using the stream-based

communication. Furthermore, in each of these groups,

we have undergone two groups of test, one with the total

ordering constraint and the other without that, tocompare the effects of ordering constraint.

The tests are performed on a collection of networked

(10 M Ethernet) Sun Sparc machines running the Solaris

operating system, across two Deakin campuses (Geelong

and Rusden, 100 km apart). Sun�s JDK1.2 Java inter-

preter is used to run all modules. Each group is tested

1000 times with an average time taken from these results.

5.2. Test results

5.2.1. Effects of multicasting classes

Multicasting DMMs and sensors/actuators use data-

grams to send and receive messages between each other.

For every event detected, a multicasting sensor sends

out a message, which can be received by all its sub-

scribers (DMMs) on the same sub-network almost si-multaneously. Therefore, increasing the number of

DMMs would have a marginal influence on the results

in this case. Similarly, all actuators on the same subnet

can receive a message from a DMM simultaneously.

The tests showed that the ART of the system is not

affected by an increasing number of sensors, i.e., each

sensor performs its function independently, nothing to

do with other sensors. This is because each multicastingsensor uses a dedicated communication channel with a

group of subscribers thus different sensors are not in-

fluenced by each other. However, the test results did

show that the ART is increased slightly with an in-

creasing number of DMMs and dramatically with the

total ordering constraint on the system.

Fig. 5 depicts the ART of the multicasting classes in

the case of one sensor/actuator without the orderingconstraint. It shows an expected graduate increase of the

ART of the multicasting communication as the sensor/

actuator is located further and further away from the

DMMs. The main observation from this figure is the

following:

• First, the multicasting ART on a single host testing

has a significant increase with an increasing numberof DMMs, whereas it is almost not affected by the

number of DMMs on a local subnet or remote sub-

nets testing! (Both ARTs increase slightly at an al-

most same speed). Fig. 5 shows that, at the

beginning, i.e., in the case of 1 or 2 DMMs, the

ART on a local host is the lowest. However, it over-

takes that on a local subnet or remote subnets after

the number of DMMs increases to 3 or 9 respectively.The reason for this is that, in the case of local host

testing, all modules are running on a single machine,

thus more DMMs will require more resources from

the machine and the response time will take much

longer. While on a local subnet or remote subnets,

each multicasting DMM is running on a separate

host and subscribes to a sensor by simply creating a

multicast socket and joining in a group, hence, it willnot add too much extra overheads on the network.

• Second, the ARTs on a local subnet and remote sub-

nets testing have a big difference, i.e., the ART on re-

mote subnets is much greater than it on a local

subnet. This is because there is an added overhead

of tunneling datagrams occurred from one subnet

to another, in the case of remote subnets testing.

When the total ordering constraint applies, the ART

increases more dramatically. Fig. 6 depicts the com-

Page 11: A reactive system architecture for building fault-tolerant distributed applications

0

10

20

30

40

50

60

70

80

1 3 5 7 9 11

Number of DMMs

AR

T (

ms)

Local host

Local subnet

Remote subnets

Fig. 7. The stream-based ART without ordering constraint.

0

10

20

30

40

50

60

70

80

1 3 5 7 9 11

Number of DMMs

AR

T (

ms)

Local hostLocal subnetLocal subnets©Local host©

Fig. 8. The ART with stream-based communication.

C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415 411

parison of the ART in the cases of local host testing and

local subnet testing, where ‘‘�’’ means the test results

with the total ordering constraint. We find that there are

big differences between the ARTs with and without the

constraint. As the number of DMM increases, the dif-

ferences are getting bigger and bigger. This means thatthe ART with the ordering constraint increases more

dramatic than the ART without the constraint does. In

this case the ART on a local subnet (or remote subnets)

increases significantly with an increasing number of

DMMs, not like those in the test of no ordering con-

straint that only increase slightly.

The reason for this is that, when the total ordering

constraint applies, each message delivery will be checkedby the USN object thus the bigger the message number

is, the heavier the overhead incurs on the system. Es-

pecially, in the multicast datagram communication, a

message is easy to be delivered out of order so the sys-

tem will take longer time to correct it.

5.2.2. Effects of stream-based classes

Stream-based DMMs and sensors/actuators use sep-arate threads to manage each connection between them.

When an event is detected a sensor invokes its threads to

report to each of its subscribers (DMMs) simulta-

neously. Each thread measures the time the sensor needs

to report and receive from each DMM. The test results

confirmed our intuition that as the number of DMMs

increases, the ART of the system increases as well.

However, the results showed that the ART does notincrease with an increasing number of sensors, i.e., the

ART is not affected by the number of sensors. This is the

similar situation to the multicasting classes, i.e., it is

because each stream-based sensor uses a dedicated

socket to connect with its subscribers (DMMs) and

different sensors use different sockets so that their

communications with their subscribers are not influ-

enced by each other.Fig. 7 shows the three groups of tests in the case of

one sensor/actuator without the total ordering con-

straint. From Fig. 7, we have two observations:

• First, the time curve for a local host is steeper than

those for a local subnet and remote subnets. We

can see that, at the beginning (1 DMM), the ART

on a local host is smaller than those on a local subnetand remote subnets. However, when the number of

DMMs increases further (3 or above DMM), it is big-

ger than those on the other two cases. That means, in

the latter two cases, the ART increases much less

than it does on a local host! This is the similar situa-

tion to the multicasting classes�, i.e., each DMM or

sensor/actuator runs on an individual host in the lo-

cal subnet or remote subnets testing, whereas allDMMs and sensors/actuators are executed on one

host in the local host testing, thus with an increasing

number of DMMs, the overheads associated with the

local host testing increase significantly, whereas only

limited network overheads occur in the local subnet

or remote subnets testing.

• Second, the ARTs on a local subnet or remote sub-nets increase at an almost same speed with an increas-

ing number of DMMs, and the ART on remote

subnets is greater than that on a local subnet (but

not very much, compared with the multicasting

classes). This is due to the communication among re-

mote subnets has more overheads and requires more

system resources than that on a local subnet.

Fig. 8 depicts the comparison of the ART with or

without the ordering constraint in the cases of local host

testing and local subnet testing, where ‘‘� ’’ means the

constraint. We can find that there are slight increases of

the ARTs when the tests are constrained by the order-

ing, not like those in the multicasting communication

that has a dramatic increase. This is because the stream-

based communication is more reliable and message de-livery in this communication is rarely delivered out of

Page 12: A reactive system architecture for building fault-tolerant distributed applications

Client

RP1

RP2

Client

JDBC

JDBC

Subnet2

Computer 2

Computer 1 DB1

DB2

Gateway

Subnet1

Fig. 9. Replication manager and database server.

412 C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415

order thus the total ordering constraint only increases

time to assign the USN to messages and check for each

one, and this time is insignificant.

5.3. Discussions

The first observation from both multicasting and

stream-based class tests shows that, with an increasing

number of DMMs, the ARTs in the local subnet and

remote subnets testing increase much less than it does in

the local host testing! This observation is a good result

for the reactive system model, because most reactive

systems are distributed systems and their DMMs require

reports from sensors located in different hosts. The re-sult shows that reactive components running in a dis-

tributed environment are more effective than they

running in a local host!

Multicast datagram communication is a common

unreliable communication service which does not guar-

antee any reliability. Messages sent by using such a

service may get lost, duplicated and delivered out of

order. When the message delivery is applied with theordering constraint there is a big overhead occurred on

the system. Therefore, multicast datagrams may have

limited applications in distributed systems covering

multiple subnets. However, from our test results we

found that within a local subnet, the multicasting com-

munication without the ordering constraint is the fastest

way to respond events.

Stream-based communication is more reliable andmay be more suited to distributed systems than multi-

cast datagrams. When message delivery in such a com-

munication is applied with the ordering constraint there

is only a slight increase of the time occurred for the

system. They do not have tunneling overhead when

covering multiple subnets. Thus we recommend stream-

based communication as the implementation method for

a distributed computing system covering multiple sub-nets.

6. Application––A replicated database system

We have discussed the reactive system approach so

far. In this section we give an application example to

demonstrate the potential benefits of using this ap-proach. The application is about a replicated database

system. Replication, which is the maintenance of on-line

copies of data and other resources by using replicas, is a

key to the effectiveness of distributed systems, in which it

can provide enhanced performance, high availability

and fault-tolerance (Helal et al., 1996).

A distributed replication system within a wide area

network (WAN) is composed of several subnets locatedabout several or thousands miles away and connected by

gateways (Davidson, 1984). Replication makes data-

base servers duplicated, i.e., it produces replicas ofdatabase servers. At each subnet, there is one or more

database server groups, which are comprised of replicas

running on each workstation. For simplicity, we only

include two subnets connected by one gateway in the

network configuration, as depicted in Fig. 9. Each

workstation has a database server and a replication

manager running respectively. All database servers (or

replicas) store identical information initially and each ofthem can accept client requests that read or update

stored information independently. A replication man-

ager can access to the local database server and other

replicas through JDBC. A client connects to a replica-

tion manager and issues transaction requests through it

to obtain database services. Fig. 9 shows two worksta-

tions located on two subnets, where RP stands for a

replication manager; DB represents a database server.The task of the replicated system is to maintain the

data consistency among all the replicas throughout the

whole network, even in the case of component, site or

link failures (Davidson, 1985). If a client requires a read-

only operation, this request can be served by the local

replication manager reading from the local database

server. If a client wants to perform an update operation,

the operation has to be performed in all database serv-ers. This can be done if all the replicas are running well

without faults. However, component, site or link failures

may occur in the system, and it would be a problem to

perform an update operation on fault replica(s) or rep-

lica(s) located on a fault site or a partitioned subnet. In

these cases, some strategies have to be invoked to

maintain the data consistency. Here we discuss compo-

nent failures (or called crash failures) only and give thefailure scenarios next.

6.1. Crash failure

Crash failures in a replicated database system include

a database server failure, a replication manager failure,

or a computer (workstation) failure. They may cause the

Page 13: A reactive system architecture for building fault-tolerant distributed applications

RP1 Act

Act RP2

JDBC

JDBC

Network

Computer 2

Computer 1

DB2

DMM

DB1

DMM

PS

PS

Gateway

Client

Client

Fig. 10. Using polling sensors––PS stands for a polling sensor; Act is

an actuator.

C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415 413

data inconsistent and the system providing incorrect

services. The fault-tolerant requirement is that the sys-

tem should continue to work even in the case of failures.

We have the following strategies to deal with crash

failures occurred in the system (Zhou and Eide, 1998):

• A database server fails. For example, assume DB1 on

Computer 1 fails. In this case, RP1 on Computer 1

has to re-direct all requests to DB2 on Computer 2.

If such a request is an update request, then RP1 has

to store such an update in a stable storage (e.g. disk)

and has to perform it on the failed database DB1

when it recovers. Similarly, when a client issues an

update operation through RP2, RP2 has to store thatoperation in a stable storage and perform it on DB1

when it recovers.

• A replication manager fails. For example, assume RP1

on Computer 1 fails. In that case, all requests have to

be submitted through RP2 on Computer 2.

• A computer fails. For example, assume that Com-

puter 1 fails. In this case, all servers running on Com-

puter 1 fail. All requests have to be submitted to RP2on Computer 2. If there is an update operation, it has

to be recorded in the stable storage and has to be per-

formed on DB1 when Computer 1 recovers (and DB1

recovers as well).

In the case 2 and 3, a client is easy to know whether a

replication manager is alive or not through the program

interface. If the local replication manager fails, a clientcan submit his (her) requests to other replication man-

ager. Hence it is simple to deal with a replication man-

ager failure. In the case 1 and 3, it is essential for a

replication manager to know if a database server is alive

or not. How to detect and deal with a database server

failure remains a problem. If the failure detecting

mechanism and the failure processing policy are all

embedded into the replication manager, once the de-tecting mechanism is changed the replication manager

has to be changed as well. Thus this strategy lacks

flexibility and cannot adapt to constant changes.

We can use the Java reactive system modules intro-

duced above to deal with crash failures occurred in the

replicated database system. We use sensors to detect the

possible failures of various system objects and DMMs to

implement various fault-tolerant policies. Our methodprovides a flexible architecture and can adapt to con-

stant changes.

6.2. Fault detection

As discussed earlier, a replication manager crash or a

computer crash is simple to deal with. We only discuss a

database server failure here. When a database servercrashes, it is essential for replication managers to know

it so that they can take certain measures to further

process client requests. To achieve this, we run a Java

DMM, a Java sensor and a Java actuator on each

computer respectively. Each DMM will subscribe to all

sensors and actuators running on all computers. Due to

different types of reactive module we can have several

choices:

• Using stand-alone DMMs: Stand-alone DMMs can

run independently on each host and operate concur-

rently with other objects. Actuators are embedded

into replication managers for transmitting decisions

from DMMs to the replication managers. To detect

a database server failure, we also have two options

of using sensors: polling sensors and event sensors.1. Using polling sensors. A polling sensor can be run

on each computer independently to check the sta-

tus (liveliness) of the database server periodically

and then report to the DMMs subscribing to it,

as depicted in Fig. 10, where we only include

two workstations located on two subnets. Once

a database server fails, the polling sensor moni-

toring it will report to both DMMs. In this case,DMMs and polling sensors are running indepen-

dently on Computer 1 and 2, while actuators are

embedded into RPs.

2. Using event sensors. We may want to know the

failures of database servers immediately once

they occur. In this case we can use event sensors

instead of polling sensors running on each com-

puter. The DMMs and actuators are same asabove, but two event sensors are attached to

RP1 and RP2 respectively to report the failure

of the connection between RP1 and DB1 or be-

tween RP2 and DB2, as depicted in Fig. 11. If a

database server fails the connection between this

replica and the relevant replication manager fails

as well. Thus the corresponding event sensor will

catch this event and report to both DMMs imme-diately.

Page 14: A reactive system architecture for building fault-tolerant distributed applications

RP1

RP2 JDBC

JDBC

Network

Computer 2

Computer 1

DB2 Act

DB1

Act

DMM

DMM ES

ES

Gateway

Client

Client

Fig. 11. Using event sensors––ES stands for an event sensor; Act is an

actuator.

414 C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415

• Using embedded DMMs: We can also use embedded

DMMs to implement the above functions. In thiscase, an embedded DMM is embedded into the repli-

cation manager on each computer, thus we do not

need actuators since the DMMs can instruct the rep-

lication managers directly, as depicted in Fig. 12. Sen-

sors in this case are polling sensors which run

independently on each computer. Once a database

server fails, the polling sensor monitoring it will re-

port to both DMMs.

In all cases, both DMMs in Computer 1 and 2 will

receive the reports from the sensors about the failures of

database servers if occurred. Then they will make certain

decisions (as described earlier) and use the actuators to

instruct RP1 and RP2 to process clients� requests

properly.

6.2.1. Detecting strategy

Let S be a sensor. We use Sr to denote the sensor

report about the state of a replica r that S monitors, or

the connection state between a replica r and a replica-

tion manager. For a polling sensor, if Sr is true, that

RP1 DMM

DMM RP2

JDBC

JDBC

Network

Computer 2

Computer 1

DB2

DB1

PS

PS

Gateway

Client

Client

Fig. 12. Using embedded DMMs.

means the replica r is alive; whereas for an event sensor,

if Sr is true, that means a connection failure between the

replica r and the related replication manager occurs. We

use ‘‘:’’ to denote no event occurred. Hence we have

if :Sr; then r is faulty; where S is a polling sensor

ð1Þand

if Sr; then r is faulty; where S is an event sensor ð2ÞThe time interval for evaluating ‘‘:’’ in (1) is set to be

greater than the polling time interval of S. Formula (1)

and (2) are the strategies used by DMMs to detect the

fault existence. After the faults are detected, the DMMs

will use the solutions described in Section 6.1 to deal

with them.

7. Conclusion

This paper has presented the design, implementation

and evaluation of the reactive system model for building

better fault-tolerant distributed applications. The model

we designed in this paper is flexible, reliable and fault-

tolerant. The main advantage of the reactive systemmodel is the separation of mechanisms and policies in

software development. To build and design this model,

we introduced group communication mechanisms,

which provide fault-tolerant capability in the reactive

system. The implementation of the model is based on

multicast datagram communication and stream-based

communication respectively. Three reactive modules:

DMM, sensor and actuator are implemented as genericJava classes which can be applied in distributed appli-

cations. The performance evaluation shows that the

model with stream-based communication is more reli-

able and effective when running in a distributed envi-

ronment. The example presented in the paper

demonstrates the usefulness of our reactive system ap-

proach.

References

Agha, G., 1986. Actors: A Model of Concurrent Computation in

Distributed Systems. The MIT Press, Cambridge, MA.

Arnold, K., Gosling, J., 1996. The Java Programming Language.

Addison-Wesley Publishing Company, Reading, MA.

Birman, K.P., 1993. The process group approach to reliable distributed

computing. Communications of the ACM (December), 37–53.

Birman, K.P., Schiper, A., Stephenson, P., 1991. Lightweight causal

and atomic group multicast. ACM Transactions on Computer

Systems (August).

Birman, K.P., van Renesse, R., 1994. Reliable Distributed Computing

with the ISIS Toolkit. IEEE Computer Society Press, Silver Spring,

MD.

Boasson, M., 1993. Control systems software. IEEE Transactions on

Automatic Control 38 (7), 1094–1107.

Page 15: A reactive system architecture for building fault-tolerant distributed applications

C. Chen et al. / The Journal of Systems and Software 72 (2004) 401–415 415

Boasson, M., 1996. Modeling and simulation in reactive systems. In:

Proceedings of the Fourth Workshop on Parallel and Distributed

Real-time Systems (WPDRTS �96). IEEE Computer Society Press,

Silver Spring, MD, pp. 27–34.

Boasson, M., 1998. Software architecture for distributed reactive

systems. In: Lecture Notes in Computer Science, no. 1521 (1).

Springer-Verlag, Berlin.

Bounabat, B., Romadi, R., Labhalla, S., 1999. Designing multi-agent

reactive systems: a specification method based on reactive deci-

sional agents. In: Lecture Notes in Computer Science, no. 1733.

Springer-Verlag, Berlin, pp. 197–210.

Boussinot, F., 1991. Reactive C: An extension of C to program reactive

systems. Software––Practice and Experience 21 (4), 401–428.

Bussmann, S., Demazeau, Y., 1994. An agent model combining

reactive and cognitive capabilities. Proc. Int�l Conf. on Intel.

Robots and Systems (IROS�94), Munchen.

Caspi, P., Girault, A., Pilaud, D., 1994. Distributing Reactive Systems.

The ISCA Int�l Conf. on Parallel and Distributed Computing

Systems (PDCS�94). Las Vegas, USA.

Chen, C., Zhou, W., 2000. Design and implementation of reactive

systems for building fault-tolerant applications. In: Proc. of the

18th Int�l Conf. on Applied Informatics (AI�2000), Innsbruck,

Austria. ACTA Press, USA, pp. 497–500.

Cristian, F., 1991. Understanding fault-tolerant distributed systems.

Communications of the ACM (February), 56–78.

Cristian, F., Dancey, B., Dehn, J., 1996. Fault-tolerance in air traffic

control systems. ACM Transaction on Computer Systems (Au-

gust), 265–286.

Davidson, S.B., 1984. Optimism and consistency in partitioned

distributed database systems. ACM Transactions on Database

Systems 9 (3), 456–481.

Davidson, S.B., 1985. Consistency in partitioned networks. ACM

Computer Surveys 17 (3), 341–370.

Edwards, S. et al., 1997. Design of embedded systems: formal models,

validation, and synthesis. Proceedings of IEEE 85 (3), 366–390.

Gosling, J., 1997. A Feel of Java. IEEE Computer 30 (6), 52–57.

Harel, D., Pnueli, A., 1985. On the development of reactive system. In:

Logics and Models of Concurrent Systems. Krzysztof R. Apt.

Spring-Verlag, Berlin, pp. 477–498.

Harel, D., Politi, M., 1998. Modeling Reactive Systems with State-

charts: The Statemate Approach. McGraw-Hill Companies, New

York.

Harel, D., Shtul-Trauring, A., 1990. STATEMATE: A working

environment for the development of complex reactive systems.

IEEE Transaction on Software Engineering 16 (4), 403–414.

Helal, A.A. et al., 1996. Replication Techniques in Distributed

Systems. Kluwer Academic Publishers, Dordrecht.

Hewitt, C., 1977. Viewing control structure as patterns of passing

messages. Journal of Artificial Intelligence 8 (3), 323–364.

Quintero, J.A.D., 1996. Reactive PASCAL and the event calculus. In:

U.C. Sigmund, M. Thiels-cher (Eds.), Proc. of the Workshop at

FAPR�96: Reasoning about Actions and Planning in Complex

Environments, Darmstadt, Germany.

Selic, B., Gullekson, G., Ward, P.T., 1994. Real-Time Object-Oriented

Modeling. John Wiley & Sons, New York.

Systa, K., 1996. The Disco tool. Tampere University of Technology,

Tamphere, Finland. Available from http://www.cs.tut.fi/laitos/Dis-

co/tool.fm.html.

Wang, L., 1999. A Toolkit for Constructing Service Replication

Systems, Ph.D. thesis, Deakin University.

Zhou, W., 1999. Detecting and tolerating failures in a loosely

integrated heterogeneous database system. Computer Communi-

cations 22, 1056–1067.

Zhou, W., Eide, E., 1998. Java sensors and their applications.

Proceedings of the 21st Australian Computer Science Conference

(ACSC 98), Perth, Australia, pp. 345–356.

Dr. Changgui Chen received his Ph.D. degree from School of Com-puting and Mathematics, Deakin University, Australia in 2002, andMaster degree from University of Science and Technology of China in1993, respectively. His research area covers Distributed Systems,Fault-tolerant Computing, Internet Computing, and E-Commerce, etc.His Ph.D. dissertation is entitled as ‘‘A Reactive System Architecturefor Building better Fault-Tolerant Distributed Applications’’. FromOctober 2001, Dr. Chen had worked in City University of Hong Kongas a Senior Research Assistant for seven months. He is currently anAssociate Lecturer in School of Information Technology, DeakinUniversity.

Dr. Weijia Jia, received BSc, MSc and Ph.D., all in Computer Sciencein 1982 and 1984, and 1993, respectively. He joined German NationalResearch Centre for Information Science (GMD) in St. Augustin in1993 as a research fellow. In 1995 he joined the Department ofComputer Science/Department of Computer Engineering and Infor-mation Technology, City University of Hong Kong and currently as an(adjunct) associate professor. His research interests include computernetwork, distributed systems, multicast and Anycast QoS routingprotocols for Internet and wireless communications. In these areas hepublished more than 150 papers and books/chapters in the interna-tional journals and conference proceedings. He has been the Principal-Investigator of 18 research projects supported by RGC ResearchGrants, Hong Kong and Strategic Research Grants, CityU. He hasserved as many program committees and PC Co-Chairs for many(IEEE) International Conferences. He is a member of IEEE andhas been listed in Marquis Who’s Who in the World, 2000, 2001 and2002.

Wanlei Zhou received the B.Eng and M.Eng degrees from HarbinInstitute of Technology, Harbin, China in 1982 and 1984, respectively,and the PhD degree from The Australian National University,Canberra, Australia, in 1991. He is currently a full Professor and theActing Head in School of Information Technology, Deakin University,Melbourne, Australia. Before joining Deakin University, ProfessorZhou has been a programmer in Apollo/HP at Massachusetts, USA, AChief Software Engineer in HighTech Computers at Sydney, Australia,a Lecturer in National University of Singapore, Singapore, and aLecturer in Monash University, Melbourne, Australia. His researchinterests include theory and practical issues of building distributedsystems, Internet computing, distributed and heterogeneous databases,mobile computing, performance evaluation, and fault- tolerant com-puting. He has published more than 100 papers in refereed interna-tional journals and refereed international conferences proceedings. Hewas the Program Committee Co-Chair of the 2000 IEEE InternationalConference on Algorithms and Architectures for Parallel Processing(ICA3PP 2000) and the Program Committee Co-Chair of ICA3PP2002. Since 1997 Dr. Zhou has been involved in the organization ofmore than 30 international conferences as PC Chair, Session Chair,Publication Chair, and PC member.