Building reliable distributed systems with CORBA

Building Reliable Distributed Systems with CORBA

Sean Landis*Motorola Inc., Schaumburg, IL, e-mail: [email protected]

Silvano Maffeis†Olsen & Associates, Seefeldstr. 233, CH-8008 Zurich, Switzerland, e-mail: [email protected]

New classes of large-scale distributed applicationswill have to deal with unpredictable communicationdelays, with partial failures, and with networks thatpartition. In addition, sophisticated applications suchas teleconferencing, video-on-demand, and concurrentsoftware engineering require a group communicationabstraction. These paradigms are not adequately ad-dressed by CORBA. CORBA mainly deals with point-to-point communication and offers no support for thedevelopment of reliable applications that exhibit pre-dictable behavior in distributed systems. In this arti-cle, we present extensions to CORBA which providegroup communication, reliability, and fault tolerance.We also describe Orbix+Isis and Electra—two CORBAobject request brokers that support the implementa-tion of reliable distributed applications and groupware.

c© 1997 John Wiley & Sons, Inc.

Introduction

The object paradigm has successfully been appliedto the design and implementation of graphical user in-terfaces, application frameworks, device simulators, andobject-oriented databases. Object-oriented programmingof distributed applications is the next logical step.

Object-oriented distributed programming (OODP) isa variation of the client–server model. In OODP, ob-jects encapsulate an internal state and make it accessiblethrough a well-defined interface. Client applications mayimport an interface, bind to a remote instance of it, andissue remote object invocations.

Object-oriented design and implementation of dis-tributed systems provides many benefits. The serviceguaranteed by an object is clearly separated from thetechnology implementing the service. The programming

Received July 1, 1995; revised June 1, 1996; accepted September 11,1996

*Masters candidate at Cornell University; former employee of IsisDistributed Systems.

†Supported by the Swiss National Science Foundation while onleave at Cornell University, Ithaca, NY.

c© 1997 John Wiley & Sons, Inc.

language which was used to implement an object, theprogramming libraries, or the underlying hardware canbe exchanged as long as the object supports its old inter-face. Because object-oriented distributed systems allowthe interchange of parts, they tend to evolve more easily.

The OMG Common Object Request Broker Architec-ture (CORBA) standard [10] permits the design of opendistributed systems in an object-oriented fashion by pro-viding an infrastructure that allows objects to communi-cate independent of the specific programming languagesand techniques used to implement the objects [15]. Opensystems avoid excessive dependence on a single manu-facturer, or on a certain operating system, hardware, orprogramming language, and thus offer customers free-dom of choice. Development risks are reduced, and en-terprises can choose the components of a distributed sys-tem according to price and performance criteria.

Limitations of Present ORB Technology

The absence of an abstraction which provides mul-ticast requests to groups of CORBA objects in an ef-ficient way greatly complicates the design and imple-mentation of applications such as teleconferencing, workflow management, and concurrent software engineering.The present version of CORBA [10] is based mainly ona point-to-point communication paradigm.

Moreover, CORBA does not adequately support theimplementation of reliable distributed applications. Thebehavior of a reliable distributed application would bepredictable despite partial failures and partitioned net-works. Many real-world applications have to deal withproblems related to partial failures, but the developmentof reliable applications with current ORB technology re-quires a large amount of additional work.

Enhancements for ORB Reliability

During the past several years, we have been design-ing and implementing CORBA environments for reliabledistributed applications. The results of our work are a set

THEORY AND PRACTICE OF OBJECT SYSTEMS, Vol. 3(1), 31–43 1997 CCC 1074-3227/97/010031-13

Server

Object

Replicated

Host A

Server

Object

Replicated

Host B

Server

Object

Replicated

Host C

Client

Object

Object Group

request

reply

FIG. 1. Server implemented as an object group.

of new abstractions as well as Orbix+Isis [5] and Electra[7]—two object request brokers for reliable distributedsystems. We enhanced the OMA Model [15] with objectgroups and virtually synchronous program execution.

The rest of the article is structured as follows.The next section stipulates requirements for a reli-able CORBA environment. Then we deal with the low-level system support necessary to implement the reli-able CORBA. After that, we give a brief overview ofOrbix+Isis and Electra. Examples of reliable applica-tions which can be built with this kind of Object RequestBroker Interface (ORB) will be given in the followingsection. In the penultimate section, we compare our ap-proach with approaches taken in other projects. Finally,we summarize our findings and conclude the article.

Requirements for a Reliable CORBA

This section describes the features we feel are nec-essary to support a reliable CORBA. These features canbe provided by the communication system and can betransparent to the programmer.

Object Groups

The reliability of CORBA objects could be increasedby developing an abstraction that manages groups of dis-tributed objects. This abstraction would shield program-mers from much of the complexity inherent in distributedsystems. A researched and tested method for providingsuch an abstraction in associating CORBA objects to-gether in object groups [8].

An object group is a group of replicated or associ-ated CORBA objects implementing the same interface.

Client communication with an object group is by reli-able multicast. The client program invokes operationsvia a single object group reference without knowing thereferences of the individual objects in the group. Thus,client applications interact with a service that remainsthe same whether the service is implemented by a sin-gleton CORBA object or by a group.

In CORBA, a client binds to and sends requests to asingle object. With object groups, a client binds to thegroup and requests are sent to all members. Object groupbehavior can be transparent to the client:

• The application code used to bind to an object groupis the same as for binding to a single object.

• The client sends a single request and receives a singlereply, as it would with a single object. This is imple-mented without any single point of failure by havingthe client runtime doing the demultiplexing.

• The object group handles all reliability protocols au-tomatically so they are invisible to the client.

If desired, the application programmer can easily gainaccess to all server replies and implement reply collationstyles such as majority voting. Thus, transparency is op-tional.

As will be explained later, object groups provide faulttolerance, load sharing, efficient data distribution, andobject migration. Figure 1 shows an object group withthree member objects.

Failure Detection

Reliable group computing requires that requests areacknowledged by all group members. Failure detectioncan prevent a client from blocking waiting indefinitelyfor responses from failed group members.

The runtime performs automatic failure detectionwithout disrupting communication among active groupmembers. Unresponsive group members are identifiedusing a timeout failure detector which, after a specifiedamount of time, will declare an unresponsive memberfailed. The runtime forwards failure notifications to en-sure that operational objects have a consistent opinionon which objects have failed.

With failure detection and propagation of failure noti-fications, the runtime can transparently maintain consis-tency among object group members and can guaranteemulticast request completion.

Active Replication. Active replication is a style of dis-tributed group programming which assumes that everymember of the group is actively maintaining replicatedstate. Each object group member shares the same inter-face and equivalent implementation semantics to ensurethat each member responds to identical requests in thesame way.

32 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

ClientA B

R1C

time 1. Original Group

2. Object C admitted

FIG. 2. Disagreement in the view.

Servers implemented with actively replicated objectstolerate object, host, and network failures; as long asone replica is accessible, the service remains available.Active replication can support load-balanced servers andobject migration.

Active replica object groups manage group member-ship and provide a way for objects joining the groupto be updated and become exact replicas. Group mem-bers use a view to maintain a groupwide agreement ofgroup membership. State transfer provides a way for newmembers to become replicas. Monitoring enables groupmembers to receive notifications of group membershipchanges.

Views. A view is a dynamic list of object referencesrepresenting the object group membership. Changes ingroup membership trigger the automatic installation of anew view at each group member. Consistent group viewsprovide a mechanism for performing reliable multicaststo the object group.

When a view is stable, each object (client and server)sees an identically ordered list of group members. Re-quest delivery occurs only when the view is stable.Clients need up-to-date view information to detect acomplete failure of the group and to enable communi-cation styles where the application programmer wantsto suspend the client until a majority of the servers hasreplied to a request. Servers need up-to-date view infor-mation to know about the fraction of the work load theyare responsible for, and to implement primary-backupcomputation styles.

Figure 2 illustrates one way that request delivery to anunstable group can cause application problems. A clientrequests a distributed database search. To speed up theoperation, each of the objects in the group is to searcha different portion of the database. The client sends therequest to a group consisting of objects A and B, but be-fore B receives the request, C is admitted to the group. A,having received the request when two objects were in thegroup, searches one-half of the database. B receives therequest when three objects are in the group, and searchesone-third of the database. Object C does not receive therequest. When results are returned to the client, only five-sixths of the database have been searched. View consis-

until view is stable3. R2 is queued

Client

admittance2. C must wait for

A

CR1

R2

B

R1

time 1. Original Group

Reply

R2

R2

Replyq

FIG. 3. Delivery to a stable view.

tency is maintained by controlling when an object cangain admission to a group. Pending requests are deliv-ered before a new object is admitted. While an objectis being admitted, requests are delayed until the view isstable. Figure 3 shows request delivery to a stable view.R2 is delivered after the view is stable. View manage-ment is performed by the runtime and is transparent tothe programmer.

State Transfer. State transfer is a feature that invokesobject member functions to transfer object state fromexisting object group members to a new member beingadmitted to the group.

State transfer is a key component of active replicationbecause it allows new objects to join the group and be-come exact replicas. Objects join the group to increasefault tolerance or when they recover from failure.

When a new object is admitted to the group, the run-time system calls the state send member function onone or more members of the object group. This func-tion packages state information for the runtime systemwhich transfers it over the network to the new member.Upon arrival, the runtime calls the state receive memberfunction of the new object which uses the state informa-tion to update its state.

Group membership changes are not complete untilstate transfer has successfully completed. At that point,the group view becomes stable and pending requests canbe delivered to the group.

time

R1

ClientCA B

FIG. 4. Incomplete request delivery due to failure.

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 33

Object Monitoring. Monitoring is a feature that al-lows objects to be notified of changes in object groupmembership. The runtime transparently invokes moni-toring functions on group member objects when a newobject is admitted to the group, or when a member failsor leaves the group. Monitoring gives group members theoption of taking action in response to group membershipchanges.

Communication

For reliability, the communication system must pro-vide guarantees beyond those provided by traditional net-work protocols. These guarantees can be built upon ex-isting protocols such as TCP/IP.

Request Atomicity. Request atomicity guarantees thatrequests sent to object groups will be received by allmembers or by none.

Without atomic delivery, replicated objects may be-come inconsistent. Figure 4 illustrates how a problemcan arise when a client fails after sending a request to anobject group. Object A receives the request, but the clientfails before its runtime can send the request to objectsB and C. Without request atomicity, a programmer mustfind a way to recover from incomplete request delivery.

Atomic request delivery can be implemented withinthe communication system by having an object that re-ceived the request forward the request to the rest of thegroup in case of a client failure (Fig. 6). Thus, all objectsreceive the request.

Request Ordering. Most applications using groupcomputing require some type of request ordering guar-antee. Two useful types of ordering are total orderingand causal ordering.

Total Order. The order in which requests are de-livered to distributed objects can be compromised by avariety of delays such as network congestion, operatingsystem latency, or resource contention. Total ordering ofrequests ensures that multiple requests concurrently sentto an object group by different clients will be deliveredto every group member in the same order.

time

CA B

R1

R2 R1

R1

R2

R2

FIG. 5. Unordered request delivery.

time

R1

ClientCA B

A forwards R1

FIG. 6. Request delivery despite failure.

Figure 5 illustrates a potential request ordering prob-lem. R1 and R2, two update requests to the same record,are sent by two clients to a group of replicated objects.Objects A and B receive R1 first and then R2. Because ofnetwork latency, object C receives R2 before R1. In sub-sequent queries, objects A and B may behave differentlyfrom object C.

By providing total ordering of requests, consistencycan be maintained for all object group members. Figure7 illustrates how requests are delivered in the same orderto all group members.

The runtime delays delivery of R2 to object C untilafter R1 is delivered. Total ordering of requests greatlysimplifies the development of replicated servers, becausethe programmer can safely assume that every object inthe group receives requests in the same order.

Causal Order. Causal ordering ensures that if a re-quest is potentially dependent upon a prior request, theprevious request is delivered to an object first [6]. Causalorder is weaker than total order, but incurs less communi-cation overhead. Causal ordering provides an importantalternative since it is sufficient for many applications andis less expensive to guarantee than total ordering. Totalordering requires a delay for every request while the run-time determines the ordering of messages. With causalordering, the delay is only required for requests whichare potentially causally related; unrelated requests can bedelivered immediately.

If causal ordering is not present, a problem can oc-cur when a causal dependency exists in a chain of asyn-chronous requests. In Figure 9, a client sends R1, re-questing a file update, to object A. The client next noti-fies object B of the update with R2. Object B sends R3to access what it believes is the current file. But R1 was

CA B

R1

q

R1R2

R2 R1

R2

time

R2 is queued to

be delivered

after R1

FIG. 7. Totally ordered request delivery.


FIG. 8. Reliable CORBA.

delayed and has not yet arrived at object A! Object B isaccessing the wrong data.

By maintaining causal ordering, potentially dependentasynchronous requests are processed in the correct causalorder as shown in Figure 10. With causal ordering, objectA is assured of accessing the correct data because thedelivery of R3 is delayed by the runtime system untilafter R1 is delivered.

Virtual Synchrony. In virtually synchronous systems,all significant events—the delivery of requests, failurenotifications, and group membership changes—appearto occur at the same time in all processes. Owing tovirtual synchrony, the behavior of a distributed applica-tion is predictable despite partial failures, asynchronouscommunication, and dynamic view changes. Virtual syn-chrony relieves programmers of problems such as hand-shaking and distributed consensus. The virtual synchronymodel was originally developed in the context of the Isisproject [2].

Atomic request delivery, view management, consis-tent failure detection, and ordering of events are thebuilding blocks of the virtual synchrony model. Virtualsynchrony simplifies developing distributed servers be-cause code can be written as if events occur within eachserver at the same time and in the same order. A requestdelivery or a group view change appears to be simulta-neous at all object group members, even though morethan one action occurs in more than one process, and atdifferent times. Our model for reliable CORBA providesvirtual synchrony.

System Support

Reliable CORBA needs sophisticated support fromthe underlying communication system (Fig. 8). Thecommunication system should provide multicast, groupmembership management, ordering of events, and virtualsynchrony.

The CORBA static invocation interface (SII), dy-namic invocation interface (DII), ORB interface, and ba-sic object adapter (BOA) are based upon an ORB corelayer. The ORB core mainly provides low-level deliv-ery of CORBA requests. The ORB core itself is imple-mented on top of a virtual synchrony toolkit such as Isisor Horus.

Isis [2] or Horus [13] provide virtual synchronythrough a rather low-level C programming interface.Neither is CORBA compliant, but both can provide afoundation for a reliable ORB.

Isis

The Isis ReliableTM Software Developer Kit is the firstcommercially available environment to provide virtualsynchrony and high performance. Isis enables the im-plementation of fault-tolerant software on loosely cou-pled hardware through a library of C functions. Core ser-vices include process groups, reliable multicast, orderingof events, and failure monitoring. Higher-level servicesinclude a message spooling facility, a distributed re-source manager, and a message publication-subscriptionlayer. Isis runs on various operating systems includ-ing UNIX, Microsoft WindowsTM, Windows NTTM, andVMS. Orbix+Isis, presented in the next section, uses Isisas its communication system.

Horus

The Horus toolkit, a research project in developmentat Cornell University, offers flexible group communica-tion support by providing extensively layered and highlyconfigurable protocol objects. Horus allows users of ap-plications to pay only for services they use. Groups withdifferent communication needs can coexist in a singlesystem.

Horus runs on UNIX and supports communicationprotocols such as UDP, Deering-IP, ATM, and Machmessages. The Electra ORB presented in the first sec-tion runs on both Horus and Isis.

Orbix+Isis

Orbix+Isis is the first commercially availablesystem that supports the creation of fault-tolerantCORBA-compliant applications. Orbix+Isis integratesthe OrbixTM CORBA-compliant C++ development en-vironment from IONA Technologies, Ltd., with the IsisReliable runtime technology from Isis Distributed Sys-tems, Inc.

System Design

Orbix+Isis builds upon the strengths of its componentproducts. Orbix provides the advantages of a CORBA en-


timeClient A B

R1

R2

R3

FIG. 9. Causal ordering problem.

vironment: a standard object-oriented programming en-vironment, abstract interface definition, and distributedobjects. Isis provides process groups, view management,state transfer, request ordering, and virtual synchrony.

Fault tolerance and performance aspects of Orbix+Isisapplications are specified in the Isis repository (IsR). TheIsR externalizes this information to allow changes in ap-plication behavior without code modification or recom-pilation.

Orbix+Isis provides a gentle migration path for Or-bix applications that need a higher degree of fault tol-erance. Orbix+Isis features are transparent to the clientprogram: fault-tolerant object group behavior does notrequire modification of client code.

A server implementation class gains fault-tolerant ob-ject group behavior by inheriting from a base class.Orbix+Isis provides base classes for two styles of ob-ject groups: active replica and event stream.

Active Replica Execution Style

Active replica object groups provide fault toleranceand load balancing through active replication. Statetransfer and monitoring are implemented in the base Ac-

tiveReplica class as do-nothing virtual functions.To support state transfer, the server programmer over-

rides a send state and a receive state function. When anew object joins the object group, the send state functionis called on one or more members to stream state infor-mation onto an Orbix Request object which is passedto the function. The Orbix+Isis runtime delivers the re-quest to the new member and calls the receive state func-

timeClient A B

R1

R2

R3Q

FIG. 10. Causal request delivery.

tion which streams information off the Request objectand updates its state.

The programmer can control which objects send stateby overriding a virtual function which returns a nonzerovalue if the object should send state. The default functionalways selects the oldest object group member.

Monitoring is supported in a similar fashion byoverriding two virtual functions. Orbix+Isis calls thenewMember() function when an object joins the group,and memberLeft() when a member leaves.

The active replica execution style object group offersthree communication styles: multicast, client’s choice,and coordinator/cohort. These styles are specified on aper-operation basis in the IsR. A single active replica ob-ject group can simultaneously support operations usingeach communication style.

Multicast. The multicast communication style pro-vides highly fault-tolerant operations. Every member ofthe object group receives client requests. If the operationinterface specifies a return value, each member replies.To maintain client transparency, only the first reply isreturned to the caller.

The programmer can gain access to all return valuesby writing an Orbix smart proxy (a client side proxyobject provides communication support to the objectgroup). The smart proxy inherits from the default proxyand adds behavior to process all return values. Smartproxies can be written by the server programmer andtransparently installed on the client program.

For each IDL operation, there are two member func-tions on the proxy: one with a standard C++ map-ping signature, and another with an extended signature.The extended signature version consists of CORBA se-quences for the return values, and for all out and inoutparameters.

The default proxy simply returns the first value ofeach outbound sequence. By writing a smart proxy, theprogrammer can use the extended version to examine thesequences and craft an application specific reply to thecaller.

Smart proxies allow a computation to be dividedamong group members. The partial results can be com-bined upon receipt, and a single answer can be pres-ented to the client. This type of operation fully exploitsthe multiprocessing capabilities available in a distributedsystem.

Client’s Choice. The Client’s Choice communicationstyle is an optimization of the multicast style for read-only operations. It invokes the request on only one ob-ject in the group. The Orbix+Isis runtime automaticallychooses a member to receive the request by calling aclient side chooser function. If the chosen member fails,the chooser function is automatically called to choose a


Event

Receiver

Event

Receiver

Event

Receiver

Host A

Host B

Host C

Object Group

Event

History

Event

Stream

Event

Log

Client

Object

Client

Object

Client

Object

FIG. 11. Orbix+Isis event stream.

new member. The default chooser selects members roundrobin, but the server programmer can register an alter-native chooser function.

A query to an actively replicated database would bemore efficient as a client’s choice operation, since it isusually faster to send the query to only one member.

Coordinator/Cohort. The coordinator/cohort styleload-balances computations which are expensive relativeto communication cost. A two-phase protocol chooses asingle member as coordinator. The coordinator performsthe operation and then sends replies to the client andall other members (called cohorts). Cohorts can use thereply to update local state.

If the coordinator fails, a chooser function is automat-ically called to select a cohort which will become the newcoordinator. By default, the chooser function chooses theoldest member of the group as coordinator, but the serverprogrammer can register an alternative chooser function.

Event Stream Execution Style

Event stream execution style object groups supportasynchronous requests using a publication/subscriptionparadigm. In this paradigm, a CORBA implementation isregistered with an event stream, which represents a clear-inghouse for publication and subscription. All interfaceoperations are defined in IDL as CORBA oneway oper-ations, and are used by the client to send asynchronousevents to the event stream. The event stream holds eventsand forwards them to subscriber objects, which are calledevent receivers, as shown in Figure 11.

Event receivers can join or leave groups at will, andcan be configured to receive a user-defined number ofevents from the backlog upon joining the object group.After the initial receipt of backlog, event receivers will

continue to receive events as they arrive on the eventstream.

The event stream execution style decouples clientsfrom servers and can provide persistence for events.Clients typically send events to the event stream with-out knowing or caring if any event receivers are listening.The client may then cease communication, but the eventscan be kept by the event stream. The event stream canbe configured to keep an event log on disk to providerecovery from a total failure of the stream.

Each member of an event stream receives events in thesame order. An event stream is managed by a replicatedgroup whose fault-tolerance parameters are configurableon a per-stream basis in the IsR.

An event receiver can checkpoint its position in theevent stream and use it as a backlog starting point forthe next join. In this way, an event receiver that has beeninactive for a period of time will obtain all events.

Electra

Like Orbix+Isis, Electra [7, 9] is an implementationof the reliable CORBA presented earlier. Electra is not acommercial product, but is being used to research prob-lems related to the migration of CORBA objects, statereconciliation after the repair of a partitioned network,communication over ATM, and so forth. Electra differsfrom Orbix+Isis mainly in respect to adaptability of theORB to various communication systems, and in the wayclient transparency is provided.

System Design

Electra can run on various toolkits and operating sys-tems; the current version supports both the Horus andthe Isis toolkits. We believe that Electra can be ported toAmoeba, Chorus, Consul, Transis, and other platformsproviding multicast and threads, without much effort.

Electra is layered as depicted in Figure 12. The ORB,and BOA are based on the dynamic invocation interface(DII) which can be seen as the core of the ORB. Thecore itself is built atop of a multicast RPC module sup-porting asynchronous RPC to both singleton and groupdestinations.

It would have been relatively easy to build the mul-ticast RPC module directly on Horus. Nevertheless, forthe sake of flexibility and portability, we decided to basethe module on a toolkit-independent veneer, called theVirtual Machine.

The virtual machine interface specifies operations fordefining communication end points, for aggregating endpoints to groups, for asynchronous message passing, andfor creating lightweight processes. The RPC module andthe virtual machine communicate by means of downcallsand upcalls. A virtual machine interface suitable for Ho-rus and Isis is described in [9]. VOS is a virtual operating


DeviceDrivers IPC

Threads

OperatingSystem

Virtual Machine

Adaptor Object

Horus, Isis, etc.

VOS

Multicast RPC Module

ORB BOADII SII

FIG. 12. Electra architecture.

system layer used by applications to interact with the un-derlying operating system in a portable and thread-safeway. The VOS mainly provides operations for file andmemory management.

A toolkit-dependent adaptor object maps the virtualmachine interface onto the proprietary API provided bythe underlying toolkit. An adaptor object encapsulates allof the code which is specific to a toolkit and necessaryto support the multicast RPC module. To port Electrato a new toolkit, programmers only have to develop anappropriate adaptor object. Our adaptors for Horus andIsis comprise < 1000 lines of C++ code each.

Electra applications can be reconfigured to run on an-other toolkit by simply relinking them with the appro-priate adaptor object. Recompilation of applications isnot necessary; therefore, applications delivered in binaryform can be reconfigured as well.

Invocation Styles

Electra supports transparent and nontransparent mul-ticast, as well as synchronous, asynchronous, anddeferred-synchronous requests. All invocation styles areavailable both through the static and dynamic invocationinterface.

In transparent mode, an object group appears to theclient to be a highly available singleton object. In con-trast, nontransparent communication permits program-mers to access the individual group member’s results toan invocation. Each CORBA operation is thus mappedinto two different SII operations, one for transparentand one for nontransparent multicast. The nontranspar-ent signature employs CORBA sequences for the argu-ments which are passed back to the client, i.e., for the

out and inout arguments, the return value, and for theCORBA environment object.

The synchronous, asynchronous, or deferred-syn-chronous invocation style can be selected on a per-request basis through the CORBA environment objectwhich is passed along with every invocation in Electra.Consider the following IDL interface:

// IDLinterface example {

void op1(in float i, out float o);};

The following client code demonstrates how the oper-ation can be called in a synchronous, asynchronous, anddeferred-synchronous way:

// C++// upcall procedure for asynchronous invocation:

void op1 upcall(Float outF, const Environment& env){// . . .

}void proc1(example var& ref){

Float outF:Environment sync, async, defer;sync.call type(SyncCall);async.call type(AsyncCall);defer.call type(DeferCall);

// Synchronous (blocking) invocation:ref—> op1(7.3, outF, sync):// at this point outF holds the result.

// Asynchronous (nonblocking) invocation:ref—> op1(7.3, outF, async, op1 upcall);// outF is undefined. The result will be passed// to the upcall.

. . .// Deferred-synchronous invocation:

ref—> op1(7.3, outF, defer);// outF is undefined.// perform local computations . . .

defer.wait(); // suspends the caller only if necessary.// at this point outF holds the result.

}The synchronous call suspends the issuing thread until

the reply has arrived. After the call, outF contains theresult returned by the server. In case the object refer-ence ref is bound to an object group, the call is sus-pended only until the first member reply has arrived.This default behavior can be changed by the Environ-ment::num replies member function.

In asynchronous mode, the issuing thread is not sus-pended and outF remains undefined. As soon as thereply is received by the caller’s ORB, the op1 upcallprocedure is started with its own thread of execution,and with outF as parameter. If the server has returned


an exception, it will be assigned to the Environmentparameter of op1 upcall.

The deferred-synchronous call works much like theasynchronous one. However, by issuing the wait mem-ber on the Environment object, the caller is suspendeduntil a reply is received from the server. When waitreturns, the outF argument is defined. Thus, the En-vironment parameter acts like a “promise” object inthat it permits the caller to synchronize with the remoteinvocation at a later point, and thus to overlap commu-nication with computation.

The next code fragment demonstrates transparent andnontransparent multicast. By using CORBA sequencesin place of an operation’s out, inout, and Environ-ment arguments, programmers gain access to the repliesfrom individual group members:

// C++void proc2(example var& ref){

Float outF; FloatSeq outFSeq;Environment sync; EnvironmentSeq defer;sync.call type(SyncCall);defer.length(1);defer[0].call type(DeferCall);defer[0].num replies(MAJORITY);

// transparent multicast (as in proc1):ref—> op1(7.3, outF, sync);. . .

// nontransparent, deferred-synchronous// multicast:

ref—> op1(7.3, outFSeq, defer);// local computations. . .

defer[0].wait();// outFSeq now contains the replies of// a majority of the members:

for(ULong i =0; i <outFSeq.length(); i++){// do something with outFSeq[i]}

}

The Electra BOA

Creating object groups as well as joining and remov-ing objects from groups is accomplished by special Elec-tra operations which were included into the CORBABOA interface:

// C++class BOA {public:

// BOA interface:Object ptr create(const ReferenceData&,InterfaceDef ptr, ImplementationDef ptr);void dispose(Object ptr);

// Electra-specific operations:static void create group(Object ptr group;const ProtocolPolicy& policy=default protocol policy);void join(Object ptr group);void leave(Object ptr group);static void destroy group(Object ptr group);

virtual void get state(AnySeq& state,Boolean& done);virtual void set state(const AnySeq& state,Boolean done);virtual void view change(const View& newView);

};

The BOA::create group member function cre-ates a new object group and binds the object-referencegroup to it. The policy argument is used to tell theunderlying toolkit what kind of multicast protocol toemploy, e.g., for total ordering or causal ordering. OnHorus, the programmer can specify ATM as transportlayer and pick from a variety of ordering protocols to beplaced atop the ATM layer.

Objects in the network join or leave a group simply byretrieving the group reference from the name server andby issuing the join or leave member function withthe reference as parameter. The destroy group mem-ber function irrevocably destroys an object group. Notethat the group members themselves are not destroyed.

When an object joins a nonempty group, Electra ob-tains the internal state of some group member by in-voking its get state member. Subsequently, Electratransfers the state to the newcomer and invokes the new-comer’s set state member. A large state can be trans-ferred in fragments. For this purpose, Electra continuesto invoke the state transfer functions until the done re-turn argument of get state becomes TRUE.

An object’s state is represented as a sequenceof CORBA any objects. The programmer writesapplication-specific get state and set state mem-ber functions.

The view change member function of an objectis invoked whenever another object joins or leaves thegroup. The newView object contains information on thenew cardinality of the group as well as the object refer-ences of the group members.

The Electra Object Services

Electra provides fault-tolerant object services such asa replicated OMG naming service, a distributed life-cycleservice, and an event stream service. The life-cycle ser-vice is useful for creating objects and object groups ondemand, for replicating objects at runtime, and for moni-toring applications. The event stream service implementsasynchronous requests using a publication/subscriptionparadigm.


Examples

This section describes two applications whose devel-opment is considerably simplified by reliable CORBA.The first example shows how a replicated directoryserver can be implemented. The second example dealswith a reliable stock exchange ticker.

A Fault-Tolerant Directory Service

Consider a replicated directory service specified bythe following CORBA IDL declaration:

interface directory {// Register an entry under a key.

void insert(in string key, in any entry);// Retrieve an entry by key.

void lookup(in string key, out any entry);// Remove the entry with key.

void remove(in string key, out any entry);};

This interface declares a directory object whichmaintains entries consisting of a string and an anyobject. To provide a fault-tolerant replicated service, twocriteria must be met:

1. All directory objects must be updated identically.This is accomplished by atomic request delivery andtotally ordered request multicast.

2. When a new directory object joins a group it mustbe able to replicate the state of the objects alreadyin the group. This is accomplished with view changedetection and state transfer.

Without reliable extensions to CORBA, this typeof application is extremely difficult to write. AlthoughCORBA does support a kind of multicast capabilitythrough the send multiple requests DII opera-tion, it does not address the vital issues of atomicity,ordering, state transfer, or view management.

Without atomicity and ordering, there is no way toensure consistent replicated state across all directoryobjects. Without state transfer, new objects have no wayto become replicated. Finally, without view management,none of the other mechanisms are possible.

Given the above interface declaration, the IDL com-pilers of Orbix+Isis and Electra generate a set of C++files containing the static invocation interface of the di-rectory, the invocation stubs, and a file containing askeleton of the service with one C++ member functionper operation declared in the interface. This file also pro-vides a skeleton for the state transfer member functionsof the directory object which can be completed bythe programmer as follows (in Electra syntax):

// C++voidim directory:: get state(AnySeq& state, . . . ){// Pack all directory entries into// the “state” object:state.length(2 * keys.length());for(ULong i =0; i < 2 * keys.length(); i += 2){state[i] <<= keys[i/2];state[i+1] <<= entries[i/2];};

};

voidim directory:: set state(const AnySeq& state, . . . ){// Unpack the received directory entries.keys.length(state.length()/2);entries.length(state.length()/2);for(ULong i =0; i < state.length(); i += 2){state[i] >>= keys[i/2];state[i+1] >>= entries[i/2];};

};

To increase the degree of fault-tolerance of a di-rectory service, a directory object implementa-tion is created and joined to the respective directoryobject group. The get state member function of agroup member is automatically invoked by the ORB,and the state object is marshaled, transferred to thenewcomer, unmarshaled, and passed to the newcomer’sset state member function. Owing to totally orderedmulticast and to virtual synchrony, the internal states ofthe group members remain consistent despite member-ship changes and crashes, and despite client applicationscreating and removing entries from the service whilemembership changes are occurring.

To migrate a directory object, a new group is cre-ated and the object joined to it. Then, a new directoryobject is created on the destination host and joined to thegroup. The state of the obsolete object is automaticallycopied to the new object and the two directory ob-jects will run synchronized. Now, the obsolete object cansimply be destroyed.

A Reliable Stock Exchange Ticker Application

The next example deals with a reliable stock exchangeticker service that supplies timely stock market informa-tion to a dynamic set of receivers. The ticker service ob-tains stock quotes from one or more external data feedsand multicasts them to a group of receivers that havesubscribed to the service. A receiver can be representedby following CORBA IDL interface:

// IDLinterface ticker {

oneway void firstQuote(in string symbol,in string date, in long time, in float price);


FIG. 13. Reliable stock exchange ticker.

oneway void nextQuote(in string symbol,in string date, in long time, in float price);oneway void lastQuote(in string symbol,in string date, in long time, in float high,in float low, in float close, in float volume,in float priceVol);. . .

};

The firstQuote operation serves to submit the firstquote of the day for a given stock. symbol representsthe ticker symbol, e.g., “IBM”, “MSFT”, or “AAPL”.To submit an individual quote, nextQuote is invoked.The lastQuote operation is invoked in order to sub-mit the last quote of the day, where high contains thehighest quote of that day, low the lowest, and closethe last quote. volume provides the number of sharestraded during the day, priceVol the price per volumeproduct. As depicted in Figure 13, an event stream ob-ject group is used to transport the quotes. This ensuresthat:

• The communication infrastructure does not becomea single point of failure. For the sake of fault toler-ance, the event stream service is implemented by areplicated group.

• Receivers can access a predefined backlog of pastquotes. When a trader starts his ticker, he will be pre-sented with the backlog. This mechanism also sup-ports the disconnected operation of mobile laptopcomputers.

• All receivers obtain the same quotes in exactly thesame order, which is made possible by reliable, totallyordered group communication.

• Information feeds as well as receivers can join andleave the system dynamically without affecting theother receivers or the consistency of the data.

• The quotes are sent to the receivers by object groupinvocations, which can be transmitted by a protocolsuch as IP-Multicast to ensure scalability and goodutilization of the available network bandwidth.

• The programming effort necessary to ensure reliabil-ity of the ticker service is kept at a minimum.

The data feed is provided through a coordina-tor/cohort object group. Each member of the coordina-tor/cohort group has access to a different feed providingthe same quotes, but only the coordinator of the groupwill transmit the quotes through the event stream. Whenthe coordinator fails, a new coordinator is automaticallyelected and the transmission of the quotes is promptly re-sumed. Using a coordinator/cohort group ensures faulttolerance of the feed and that each quote is obtained andtransmitted only once.

The coordinator object injects quotes into the streamby simply invoking Event Stream operations declared ininterface ticker. For instance:

firstQuote(“IBM”, “950613”, 765101, 91.250);nextQuote(“IBM”, “950613”, 765102, 91.250);nextQuote(“IBM”, “950613”, 765103, 91.125);nextQuote(“IBM”, “950613”, 765104, 91.750);. . .lastQuote(“IBM”, “950613”, 870233, 92.250,90.875, 91.750, 2905.0, 266.5337);

Logging of events, multicast, and fault tolerance isprovided by the underlying event stream and coordina-tor/cohort frameworks. CORBA object request brokerssuch as Orbix+Isis and Electra offer the fundamentalsystem support for this kind of application, namely ob-ject groups, reliable multicast, and virtual synchrony.

Related Work

In this section, we give a brief comparison of our ap-proach with approaches taken in other object-orientedsystems.

ANSA Interface Groups

An interface group abstraction was added to the Ad-vanced Network Systems Architecture (ANSA) plat-form. The interface group abstraction permits multicastof ANSA operations and group membership manage-ment. Similar to Electra, ANSA supports a transparentand a nontransparent group invocation style. Program-mers can select between FIFO and total ordering of themulticasts delivered to group members.

The current implementation of the facility is verystraightforward. To maintain total ordering and toperform group management, multicasts are channeledthrough a group member acting as the sequencer of thegroup. A sequencer can become a performance bottle-neck and represents a single point of failure. Multicastis realized by the repeated use of single messages andnot by exploiting a hardware or software multicast facil-ity where available. In contrast, Orbix+Isis and Electraemploy fault-tolerant, distributed protocols [2] to pro-vide ordering and to perform group management. Otherdifferences are that ANSA is not CORBA compliant and


does not ensure virtually synchronous program execu-tion.

Arjuna, Avalon

Many of today’s OODP environments offer atomictransactions to ensure consistency between distributeddata objects. Two well-known examples are Arjuna [14]and Avalon-C++ [3]. Such systems are well suited forapplications which need to maintain consistency forlong-lived shared data objects. When a failure occurs,transactions are aborted and partially completed opera-tions are rolled back. Often, aborted transactions can berestarted only when the defect has been repaired, whichcan cause client applications to be suspended for a longtime.

In contrast, Orbix+Isis and Electra are well suitedfor applications that require high availability and cannottolerate long delays: for example, fault-tolerant client–server applications and groupware. For this kind of appli-cation, active replication, nonblocking communication,and virtual synchrony can lead to increased performance.For a comparison of the virtual synchrony and the trans-actional model, refer to [4].

Note, however, that OMG Common Object ServicesSpecification (COSS) [11] standardizes a transactionmonitor object service. Virtual synchrony provides thelow-level system support necessary to implement a re-liable and scalable transaction monitor. Implementing aCOSS transaction monitor on top of Orbix+Isis or Elec-tra is an area of future investigation.

COSS Server Group

It has recently been proposed that group communi-cation and fault tolerance be added to CORBA by in-corporating a server group abstraction into the OMA.An extension of the OMG Common Object Services hasbeen suggested for that purpose [1]. The proposed groupmanagement API is similar to the Electra BOA opera-tions described earlier. Similar to Orbix+Isis, conglom-eration functions can be provided which combine resultsfrom all group members into a single response for theclient.

A server group service works as follows: Upon re-ceiving a multicast request, the client’s ORB dispatchesit to the server group object. The server group object in-vokes a user-defined decomposition function to producea set of subservice requests, and the group members aresubsequently invoked. The server group object awaits thereplies, combines them via a user-defined result combi-nation function, and returns the result to the client.

Compared to our work, the advantage of the servergroup abstraction is that it can be added to an existingORB without having to modify it. The disadvantages are

that a server group object can become a performancebottleneck and a single point of failure, and that virtualsynchrony is at best guaranteed between the membersof a group, but not in the whole application. Moreover,coherent membership management and state transfer arenot addressed in the proposal, which makes the develop-ment of fault-tolerant objects difficult. Nevertheless, theserver group abstraction is valuable for systems whichdo not require full virtual synchrony, e.g., certain group-ware applications or parallel queries among independentdatabases.

Our approach is based on the realization that reliabil-ity is guaranteed only when the ORB is built on a com-munication system that implements the virtual synchronymodel, since all communication must pass through thesystem. This means that reliability and fault tolerancecannot simply be “added” to an ORB by means of acommon object service.

Conclusions

The present version of CORBA does not specify ab-stractions for the implementation of distributed applica-tions whose behavior must be predictable despite par-tial failures, partitioned networks, failed communicationlinks, and asynchronous communication. Process groupsand virtual synchrony, on the other hand, provide the ba-sis to build reliable and fault-tolerant distributed systems.Toolkits such as Isis and Horus implement the processgroup and virtual synchrony paradigm. However, theirAPIs are not CORBA compliant and rather low-level.

To combine the benefits of CORBA and the virtualsynchrony paradigm, we suggest that a CORBA objectrequest broker should be based on a toolkit such as Isisor Horus. We believe that the combination of the groupcommunication model with the CORBA model will leadto a compelling programming paradigm for future dis-tributed systems.

We described how to enhance CORBA to achieve re-liability and to support efficient one-to-many interactionwith the abstraction of an object group. An object grouppermits the programmer to treat a collection of CORBAobjects as if they were a single entity, and clients invokeoperations on object groups without needing to know theexact membership of the group. Members of an objectgroup have a consistent view of which objects are inthe group. Coherent failure notification as well as statetransfer are provided.

We also presented Orbix+Isis and Electra, the firsttwo CORBA object request brokers to support the en-hanced model. Application areas of our technology arethe replicated directory service and the reliable stock ex-change ticker outlined earlier, video-on-demand, group-ware, distributed multimedia, and various kinds of fault-tolerant client–server applications.


References

[1] Adler, R. M. (June 26–29, 1995). Group-oriented coordinationextensions to OMG’s OMA/CORBA. Object Management Grouppresentation. San Jose, CA.

[2] Birman, K. P., & van Renesse, R. (1994). Reliable distributed com-puting with the Isis toolkit. IEEE Computer Society Press.

[3] Eppinger, J. L., Mummert, L. B., & Spector, A. Z. (1991). Camelotand Avalon. Morgan Kaufmann.

[4] Guerraoui, R., & Schiper, A. (1994). Transaction Model vs. virtualsynchrony model: Bridging the gap. In Distributed systems: Fromtheory to practice: Lecture notes in computer science 938, Birman,K., Mattern, F., & Schiper, A., eds., Springer-Verlag.

[5] Isis Distributed Systems, Iona Technologies, Ltd. Orbix+Isis pro-grammer’s guide.

[6] Lamport, L. (July 1978). “Time, clocks and the ordering of eventsin a distributed system. Communications of the ACM, 21.

[7] Maffeis, S. (June 1995). Adding group communication and fault-tolerance to CORBA. Proceedings of the 1995 USENIX Confer-ence on Object-Oriented Technologies, Monterey, CA.

[8] Maffeis, S. (June 1996). The object group design pattern. Pro-

ceedings of the 1996 USENIX Conference on Object-OrientedTechnologies, Toronto, Canada.

[9] Maffeis, S. (1995). Run-time support for object-oriented dis-tributed programming. Ph.D. thesis, University of Zurich, Depart-ment of Computer Science.

[10] Object Management Group. The common object request broker:Architecture and specification. Rev. 2.0.

[11] Object Management Group. Common object services specifica-tion, Vol. I.

[12] Oskiewicz, E. & Edwards, N. (1993). A model for interface groups.Report AR.002.01, ANSA, Architecture Projects ManagementLimited, Cambridge UK.

[13] van Renesse, R., Birman, K. P., & Maffeis, S. (April 1996). Horus:A flexible group communication system. Communications of theACM, 39.

[14] Shrivastava, S. K., Dixon, G. N., & Parrington, G. D. An overviewof the Arjuna distributed programming system. Computing Lab-oratory, Technical Report, University of Newcastle upon Tyne,UK.

[15] Soley, R. M. Object management architecture guide. Object Man-agement Group Document.


Documents

Building reliable distributed systems with CORBA