Performance of adaptive CORBA middleware

J. Parallel Distrib. Comput. 64 (2004) 201–218

ARTICLE IN PRESS

*Correspond

E-mail addr

0743-7315/$ - se

doi:10.1016/j.jp

Performance of adaptive CORBA middleware

Shikharesh Majumdar,* E-Kai Shen, and Istabrak Abdul-Fatah

Department of Systems and Computer Engineering, Carleton University, Ottawa, Ont., Canada KIS 5B6

Received 5 July 2000; revised 6 November 2003

Abstract

Middleware provides inter-operability and transparent location of servers in a heterogeneous distributed environment. A careful

design of the middleware software is required, however, for achieving high performance. This research proposes an adaptive

middleware architecture for CORBA-based systems. The adaptive middleware agent that maps an object name to the object

reference has two modes of operations. In the handle-driven mode, it returns a reference for the requested object to the client that

uses this reference to re-send the request for the desired operation to the server, whereas in the forwarding mode it forwards the

entire client request to the server. The server upon invocation performs the desired operation and returns the results to the client. An

adaptive ORB dynamically switches between these two modes depending on the current system load. Using a commercial

middleware product called Orbix-MT, we have implemented a skeletal performance prototype for the adaptive ORB. Based on

measurements made on a network of workstations and a synthetic workload we observe that the adaptive ORB can produce a

significant benefit in performance in comparison to a pure handle-driven or a pure forwarding ORB. Our measurements provide

valuable insights into system behavior and performance.

r 2003 Elsevier Inc. All rights reserved.

Keywords: Middleware performance; CORBA performance; Adaptive middleware architectures; Distributed system performance; High-

performance middleware

1. Introduction

The ability to run distributed applications over a setof diverse platforms is crucial for achieving scalability aswell as gracefully handling the evolution in hardwareand platform design. An effective middleware system isrequired to provide the glue that holds the distributedsystem together and achieve high system performance.Using such a middleware software it is possible for twoapplication components written in different languagesand implemented on top of different operating systemsto communicate with one another. This paper isconcerned with the performance of middleware systemsthat provides this communication infrastructure andinter-operability in a heterogeneous distributed environ-ment. Early work on middleware for process orientedsystems is discussed in [11]. This research, however,focuses on distributed object computing (DOC) which is

ing author.

ess: [email protected] (S. Majumdar).

e front matter r 2003 Elsevier Inc. All rights reserved.

dc.2003.11.008

currently one of the most desirable paradigms forapplication implementation.There is a great deal of interest in client–server

systems that employ DOC [3]. DOC combines theadvantages of both distributed computing and object-oriented (OO) technology. It promotes reusability ofsoftware and enhances the use of commercial off-the-shelf software (COTS) components for the constructionof large applications as well as provides high perfor-mance and reliability by distributing application func-tionality over multiple computing nodes. Heterogeneityoften appears in client–server systems that span multipledifferent platforms. The common object request brokerarchitecture (CORBA) [15,31] is a standard proposed bythe object management group for the construction ofOO client–server systems in which clients can receiveservice from servers that may be implemented usingdifferent programming languages and run on diverseplatforms. Both the clients and servers use a commonstandard interface definition language (IDL) for inter-facing with the ORB that provides client–server inter-communication as well as a number of other facilities

ARTICLE IN PRESSS. Majumdar et al. / J. Parallel Distrib. Comput. 64 (2004) 201–218202

such as location and trading for services through theORB agent [17].The client–middleware–server interaction architecture

is observed to have a strong impact on systemperformance. This paper is motivated by the investiga-tion of a number of such architectures presented in [2].These architectures refer to the way a request from theclient is routed to the server and a reply from the serveris sent back to the client. In a handle-driven (H-ORB)architecture, when a client wants to request a service itsends the server name to an entity called the agent in themiddleware system. The agent performs a name toobject reference (IOR) mapping and sends an IOR orhandle back to the client. The client uses the handle tocontact the server and receive the desired service.Various COTS ORBs that include Visibroker [4](previously known as ORBeline) and Orbix-MT [8] thatis used in this research use such a handle-drivenarchitecture. In some of these middleware products,the entity that provides a dynamic directory service bymapping a server name to a handle is called the locatoror name server but we will refer to the entity as an agentin this paper. In a forwarding (F-ORB) architecture, theclient sends the entire request for service to the agentthat locates the appropriate server and forwards therequest to it. The server performs the desired service andsends a response back to the client. Each of thesearchitectures is observed to be superior for a specificworkload. In particular, the F-ORB produces a betterperformance at lower system load (small number ofclients) whereas the H-ORB is observed to be superior athigher system load (large number of clients). In line withthe results in [2], we have observed a similar relationshipbetween the throughputs achieved by the handle-drivenORB in Orbix-MT and a forwarding ORB developed byus. Due to the lesser number of messages used, an F-ORB performs better than the H-ORB at lower systemloads, but due to more complexity in construction, theF-ORB saturates at higher system loads giving rise to asoftware bottleneck [5,13] that impedes scalability. Amore detailed discussion of the observation is presentedin Sections 5.5 and 6. The research presented in thispaper combines the good attributes of the twoarchitectures into an adaptive ORB (A-ORB). Depend-ing on the system load the interaction architectureswitches between a handle-driven and a forwardingmode. The important contributions of this research arebriefly summarized. A report on a preliminary investi-gation of the adaptive middleware is presented in [29].

* To the best of our knowledge, the research atCarleton University that is described in this paper isone of the first works in adaptive middlewaresystems. Our results indicate that a significant benefitin performance can be achieved by using a hybrid

ORB architecture that changes its behavior andadapts to changes in system workload conditions.

* An important observation is that even withouthaving access to the source code of a COTSmiddleware product, it is possible to use it fordevising an adaptive ORB. An even higher systemperformance can be expected if changes to the ORBsource code is possible.

* The results of experiments described in this paperprovide valuable insights into system behavior andperformance that are important to the designer andusers of middleware-based systems. The alleviation ofsoftware bottlenecks at the agent through multi-threading is proposed. The effect of the degree ofmultithreading on system performance is discussed.Simulation, analytic modelling, as well as prototyp-

ing and measurement are three popular methods usedin system performance evaluation. We have chosen thethird approach because it does not require simplifyingassumptions that often underlie analytic and simula-tion models and it captures the effect of real systembehavior and overheads that is difficult to parameter-ize in the other model-based approaches. The perfor-mance prototype of the adaptive middleware isdeveloped in C++ using Orbix-MT. The measure-ment environment consists of a network of Sunworkstations inter-connected by a 10Mbps hub andrunning Solaris 2.6. The research results presented inthis paper are expected to be most useful in the contextof performance demanding CORBA applications inwhich every time a service is required a client contactsthe ORB agent for a handle to an appropriate serverobject. There are several motivations for applicationsto adopt such an approach that include the following.

* Since a client contacts the agent before every servicerequest, an effective load balancing can be achievedwhen multiple copies of the same object exist on thesystem. A number of commercial products such asVisibroker and Orbix [8] perform such load balancingby sending successive requests for the same service todifferent object instances. If clients directly sendrequests to object instances through their localproxies, for example, some of the copies may beheavily loaded while some others remain idle.

* In the event of an object crash or object migration,the agent that knows the locations of all objectinstances can route subsequent requests to theappropriate object instance.

* The CORBA trading service [17] allows the steeringof requests to object references with respect to someof the characteristics or quality of service that objectscan provide. Systems that provide trading servicemay need to choose different server objects atdifferent points in time for providing the sameservice. Routing every request with the help of thetrading agent is important in such a situation.

ARTICLE IN PRESSS. Majumdar et al. / J. Parallel Distrib. Comput. 64 (2004) 201–218 203

The paper is organized as follows. A briefdiscussion of CORBA and related work is presentedin the next section. A description of the adaptiveORB is provided in Section 3. The experimentalenvironment is presented in Section 4 and the resultsof the experiments are discussed in Section 5. Insightsgained into system behavior and performance aredescribed in Section 6. Section 7 presents ourconclusions.

2. Overview of CORBA and related work

CORBA is a specification for a standard architecturefor distributed OO systems that use the client–serverparadigm. The interactions among clients and serversin a CORBA system are mediated through a specialentity called the object request broker (ORB). Theservices provided by it can vary from one system toanother and include name to address mapping, loadbalancing, as well as fault tolerance. When a clientwants to receive a particular service it interacts withan agent or broker. As mentioned earlier, we haveused the term agent in this paper to refer to this entity.The agent binds the object name to a handle that can beused for sending the service request to the appropriateserver object. All inter-communications among clients,servers, and agents are coordinated by the ORB core(the communication infrastructure of the ORB) thathandles the transmission of requests as well as thereception of results and exceptions. A detailed discus-sion of the design of clients and servers in a CORBAenvironment is beyond the scope of the paper but ashort description of the different steps in the develop-ment of a CORBA compliant system using C++ asthe programming language and persistent servers ispresented.First, we need to specify, design, and write the object

interfaces using the OMG interface definition language.The IDL source code is compiled to produce the clientstubs and server skeletons that serve as the glue at theirrespective sides between the ORB core and the applica-tion implementations. IDL interfaces are mapped toC++ classes and interface operations are mapped tomember functions of those classes. The CORBA objectsare then implemented using the C++ language. Themain program in the server as well as the clientapplication code are developed. Before invoking amethod on a server object the client process mustcontact the ORB agent that locates the object andobtains the object reference that is needed for sendingthe request to the appropriate server. In many commer-cial-off-the shelf ORB products, such as Visibroker andOrbix the agent also performs load balancing. If morethan one instance of the server is available for servingthis request the agent attempts to select the server in

such a way that load is spread uniformly among all theseeligible server instances. The server, upon rising, mustregister with the agent so that its location is stored in theagent’s directory. Different architectures can be used forthe interactions among the client, servers, and the agent.We have investigated the performances of three differentarchitectures in [1]. A hybrid system that combines twoof these interaction architectures is the focus ofattention of this paper.A large body of research exists in the field of client–

server systems. A representative set of previous researchon middleware-based DOC is presented next.

2.1. Related work

Different coordination models are possible for effec-tively handling the diverse design issues of distributedsystems. A number of models for coordination of theinteractions between client, server, and the middlewareagent is presented in [3]. Various techniques fordeveloping the servers and the client-sides in a dis-tributed application are presented in [7,23,24]. Acomparison of a number of such techniques that includea socket network programming interface, C++ wrap-pers for sockets, and CORBA middleware is presentedin [24]. There is a rapid growth of interest in CORBAmiddleware [27]. The use of CORBA in enterprisecomputing and network management are discussed in [6]and [30], respectively. A number of different architec-tures for multithreaded systems and their impact onperformance of CORBA-based systems are described in[20]. Extension of CORBA to ubiquitous computing ispresented in [19], whereas reflective middleware archi-tectures using language-independent component-basedmodels are described in [18]. Most of the research thatconcern system performance have concentrated onprogramming and measurements on existing middle-ware systems [22,25,26] and not on the interactionarchitecture that this paper focuses on. More recently,real-time middleware systems have started receivingattention. A standard for real-time CORBA is discussedin [14]. Special architectural features and optimizationsrequired to develop real-time ORB end systems that canprovide end-to-end quality of service (QoS) guaranteesto applications are discussed in [21]. Using opencommunication interface for QoS provisioning inCORBA is presented in [32]. Middleware services foraiding an application to reconfigure itself for attaining adesired level of QoS when the system conditions changeare discussed in [10]. The use of smart proxies forsupporting QoS sensitive services is considered in [9]. APluggable protocol framework for improving latency inreal-time CORBA-based systems is described in [16]. Amiddleware that performs effective resource manage-ment for achieving high system performance is discussedin [12]. None of these existing works concerns the


devising of a client–middleware–server interaction ar-chitecture for achieving high performance that thispaper focuses on.

3. The adaptive ORB

Since we did not have access to the source code ofOrbix-MT, we have implemented the adaptive ORB as aseparate process called the a-agent that registers with theOrbix naming service. A high level description of theclient–agent–server interactions is provided in Fig. 1that is explained with the sequence of events presentednext.

1.
The client wants to invoke a method on an objectand performs the a-bind operation.
2.
The agent looks up the object handle. Dependingon the system state and configuration parameters(explained later in this section) the system eitheroperates in the handle driven mode and the objecthandle is sent to the client (2H in Fig. 1) or itoperates in the forwarding mode and the request isforwarded to the desired server (2F in Fig. 1).
3.
If a handle was returned to the client the client usesthe handle to invoke the requested operation (3H inFig. 1).
Fig. 1. High level interaction for the Adaptive ORB.

4.

af

Tpeobq

Note

The server performs the requested operation andsends the results back to the client (3F and 4H inFig. 1).

When a client wants to obtain an object reference andperforms an a-bind operation, the request is sent to thea-agent (instead of the Orbix agent). Whenever a requestarrives the a-agent determines the object reference andforwards the request to the appropriate object if thenumber of requests waiting to be forwarded is less thana threshold value, QL. The object handle is returned tothe client otherwise. The client uses this handle tocontact the server directly and obtain the desired service.The a-agent consists of a main listening thread, aforwarding thread and a returning thread (see Fig. 2).Two shared queues that are protected by mutualexecution locks are used for dispatching requests. Thebasic idea is to switch between a handle-driven andforwarding mode depending on the load at the a-agent.The listener thread in the A-ORB listens to the bindrequests and obtains the handle associated with therequested object. Depending on the length of theforwarding queue and the threshold QL the request(handle) is inserted into the forwarding (returning)queue.A high-level algorithm describing the operations of

the a-agent is specified in the appendix. Salient featuresof the algorithm are briefly discussed in this section. In

_bind() requestrom clients

Listener (main)Thread

ReturningThread

ForwardingThread

ForwardingQueue

ForwardingQueue

ReturningQueue

he Listener Threaduts the request intoither forwarding queuer returning queueased on the forwardingueue length

The Forwarding Thread polls theforwarding queue and gets an entry(request) if the queue is not empty.Otherwise, it sleeps for a given periodof time

The Returning Thread polls thereturning queue and gets an entry(client’s objref) if the queue is notempty. Otherwise, it sleeps for agiven period of time

The request isforwarded to theselected server

The server’shandle is returnedto the client

: All servers’ objrefs are obtained at initialization phase and stored locally. The selection of servers is based on a Round Robin strategy.

Fig. 2. The architecture of the agent in an A-ORB.

ARTICLE IN PRESSS. Majumdar et al. / J. Parallel Distrib. Comput. 64 (2004) 201–218 205

addition to registering the a-agent with the Orbix agent,the main listener thread performs a number ofinitialization operations such as reading the experimentconfiguration file, binding with the servers and initializ-ing resources used by the forwarding and the returningthreads. The registration is necessary so that the clientscan obtain the reference for the a-agent process from theOrbix agent. This step will not be required if one hasaccess to the ORB source code and can directly modifythe existing algorithm for the Orbix naming service.Similarly, the binding with the servers will be unneces-sary if the a-agent is designed from scratch or if we couldmodify the software for the Orbix naming service. Inaddition to the method name and associated parameters,a client request includes a stringified handle of the client.This is required by the a-agent (server) which needs toreturn the server handle (result) to the client. Therequest also contains a number of performance para-meters that depend on the particulars of the experimentbeing performed and are explained in Section 5.Requests arriving from the clients are processed by thelistener thread. The request is saved and a response issent back to the client. Since synchronous communica-tion is used the client remains blocked after generatingthe request. When the request is received the client isunblocked and starts waiting for a message from theagent or the server. Whether or not the message is sentby the server or the agent depends on the forwarding orhandle returning decision made by the a-agent. If thelength of the forwarding queue is less than the threshold,a forwarding decision is taken by the listening threadthat inserts the client request in a forwarding queue.As shown in the appendix, if the number of waiting

processes is greater than or equal to a threshold, therequest is inserted in the returning queue. The returningthread in the a-agent periodically checks the returningqueue. If the queue is not empty it removes the first itemfrom the queue and sends back the object handle byinvoking a predetermined method in the client (identi-fied by the stringified handle sent by the client alongwith the original service request). The main responsi-bility of the forwarding thread is to forward a request tothe appropriate server object. It periodically checks theforwarding queue for client requests that need to beforwarded. When a request is forwarded the stringifiedclient handle in the request is used by the server forreturning the result back to the client. The server returnsthe results of the operation by invoking an appropriatemethod in the client application. Thus, the method inthe client invoked by the server should contain thenecessary application code that is to be executed afterthe results of the request are available. Both theforwarding and returning threads in the agent sleep fora specific duration of time for simulating the inter-nodedelay incurred in transferring the message. Note thatwhen a process sleeps its CPU is free and can be used for

running another ready process. A more detailed discus-sion of the a-agent implementation is provided in [28]. Adiscussion of the application in terms of the clients andservers is presented in the following section.

4. Experimental environment

We have developed an adaptive ORB based on Orbix-MT. Using the commercial middleware product we haveimplemented special Unix processes that collaboratewith the Orbix naming service to achieve the adaptivemiddleware system and a synthetic application consist-ing of clients and servers. A discussion of the A-ORBarchitecture was presented in the previous section. Asimple client-server application in which two distinctservices that use the ORB is built. A client processbehaves in a cyclic manner. In each cycle, it makes onerequest to Server A and one to Server B. It performs ana-bind operation to obtain the server handle beforeevery request. A synchronous communication mechan-ism is used: after a request is made, the client waits untilthe response is received. After receiving the response, itimmediately proceeds to make a request for the otherserver. Note that after an a-bind request, the client mayget a response from the a-agent or from the requestedserver. Two specific methods, one for handling each ofthese responses, are included in the client. Note thatthere is a small additional complexity for the clientsusing the adaptive ORB and methods for eliminatingthis are part of the plans for future research (see Section7). The synthetic application provided flexibility inexperimentation with various levels of different work-load parameters such as the message size, the servicetime at each server, and inter-node delay. Such anapplication is appropriate for the high-level research onadaptive middleware presented in this paper.The software developed in our lab consists of the a-

agent, and the synthetic application that includes avariable number of clients and four server processes.Experiments were performed by executing the system ona network of seven Sun workstations running under theSolaris 2.6 operating system. The workstations are inter-connected by a 10Mbps LAN. Four workstations areused for running four server processes (two instances ofServer A and two instances of Server B) each of whichruns on a separate machine. The two copies of eachserver are independent and do not share data, but enablethe system to handle more load. The agent is run on aseparate workstation. The clients were distributed andrun on the remaining workstations. When a service isrequested from a particular server the server processgoes into a for loop and consumes a pre-determinedamount of CPU time. Inter-node delays are simulatedby forcing the sender process (thread) to sleep for aspecific duration of time.


5. Results of experiments

We ran a number of experiments for investigating theperformance of the A-ORB under different thresholdvalues as well as system and workload parameters. Arepresentative set of results is presented in this paper.More data can be found in [28]. Before presenting theresults of the experiments the factors used to control theworkload of the experimental system are brieflydescribed.

Number of clients (N): The total number of clients onthe system.

Inter-node communication delay (D): Client and servernodes may be connected to the agent or amongthemselves via several nodes in a geographicallydispersed system. Since each intermediate node uses astore and forward mechanism, the delay in transferringa message from a sender to a receiver is expected toincrease with the number of such intermediate nodes.The impact of this inter-node communication delay onperformance warrants investigation. The delay experi-enced in message delivery is simulated by forcing theclient, server, or a thread in the a-agent process to sleepfor D units of time. Except in the discussion provided inSection 5.4, which focuses on non-equidistant nodes thedelays between all node pairs, is assumed to be the same.

Request service times (SA, SB): Some systems maymake relatively low demands on server computationtimes whereas some other systems may be computationintensive. The influence of server execution times SA atServer A and SB at Server B on the performance of theA-ORB is investigated.

Message size (L): The length of the message used forsending in the request as well as for sending back thereply are represented by L.

Threshold (QL): Threshold value used by the A-ORB(defined in Section 3). The specific value chosen for QL

determines the length of the forwarding queue at whichthe A-ORB switches from one mode to the other.

QL is an important parameter for the a-agent. A smallQL implies that the A-ORB switches to the handle-driven mode for a smaller buildup of requests in theforwarding queue, whereas a large QL means thatthe system operates in the forwarding mode until thelength of the forwarding queue reaches this largevalue. Four different QL values were investigated inmost experiments. Using additional QL values ispossible but will increase the number of experiments.The four levels of QL were found to be enough fordemonstrating the utility of adaptive middleware thatthis paper focuses on. Further discussion on thedetermination of appropriate threshold values is pre-sented in Section 6. Note that a QL=0 corresponds to apurely handle driven architecture since a handle isreturned for each request to the corresponding client.Similarly a QL=24 gives rise to a purely forwarding

architecture since with N o=24 every request isforwarded to the requested server.In each experiment D, L, SA, and SB were held at

fixed values whereas N is used as the variable factor.Note that by using different combinations of SA, SB, D,and L it is possible to investigate various types ofcomputation bound and communication bound systems.In a computation bound system the servers are heavilyloaded whereas the message sizes are small. A commu-nication bound system is characterized by large mes-sages and/or long inter-node delays but small servicetimes for client requests. Various combinations ofparameters have been used in our experiment to covera broad range of such systems. The number and thechoice of levels for each of the factors reflect acompromise between representativeness and an accep-table run time for experiments.

Performance measures: The performance measures ofinterest are briefly summarized.

The overall system throughput (X) in client requests

per second: This is the average total system throughputthat is obtained by the summation of the meanthroughputs of all the clients. The throughput of aclient is defined as the rate of completion of client cycles.This parameter is a measure of capacity for the system.Note that the throughput and mean cycle completiontime (R) for a client are related by Little’s law: N=XR.

Process and CPU Utilizations (U): Utilization mea-sures often provide further insights into system perfor-mance and help in identifying hardware and softwarebottlenecks. The CPU utilization for a specific process isthe proportion of time the CPU is used for executing theprocess. Agents as well as clients and servers executecyclically in the experimental system. The utilization of aprocess (thread) is the ratio of the sum of its executiontime and message transfer time (including waiting formessage reception) with its cycle time. A hardwaredevice such as a CPU becomes a performance bottleneckwhen it is utilized 100%. A software bottleneck is said tooccur at a process when the process is busy 100% of thetime in executing and waiting for a message it has sent tobe received, while its underlying CPU is underutilized.Most of these performance indices are computed from

the measurement data obtained by placing Unix systemcalls for getting time of the day and CPU times atappropriate positions in the code. In each experiment,client cycles are repeated a large number of times with aspecific combination of D, L, SA, and SB that are heldconstant in every cycle. A factor-at-a-time approach isadopted in our experimentations and Table 1 presents asummary of all the levels of the parameters used in theexperiments. In order to understand the effect of thevariable parameter in an experiment on system behaviorand performance, the fixed parameters are held at smallvalues while the variable parameter covers a range ofvalues. The number of client cycles in an experiment was

ARTICLE IN PRESS

Table 1

Parameters for the experiment sets

Main

parameters

Range of

values of

experiment

set 1

Range of

values of

experiment

set 2

Range of

values of

experiment

set 3

Number of

clients

1, 2, 4, 8 16,

24

1, 2, 4, 8, 8,

16, 24

1, 2, 4, 8, 8,

16, 24

Service times

(ms)

SA server 10, 50, 250 10 10

SB server 15, 75, 375 15 15

Inter-node

delay (ms)

0 0 125, 250, 500

Message size

(Bytes)

150 4800, 9600,

19 200

150

0 5 10 15 20 20

2

4

6

8

10

12

14

16

Number of Clients (N)

Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

demand010D0 X

x:QL00

o:QL01

*:QL03

+:QL24

0 5 10 15 20 20

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

demand050D0 X

x:QL00

o:QL01

*:QL03

+:QL24

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

demand250D0 X

x:QL00

o:QL01

*:QL03

+:QL24

(a)

(b)

(c)

Fig. 3. The results of experiment set 1, L=150Bytes, D=0ms: (a) SA,

SB=10, 15ms, (b) SA, SB=50, 75ms, (c) SA, SB=250, 375ms.

S. Majumdar et al. / J. Parallel Distrib. Comput. 64 (2004) 201–218 207

large enough to produce an interval that is less than75% of the mean at a confidence level of 95% for theperformance measure of interest. This is appropriate forthe qualitative nature of the research presented in thispaper.

5.1. Experiment set 1: effect of server demands

The relationship between server demands and theperformance of the A-ORB are investigated. Threedifferent sets of server demands were experimented withand the results are presented in Fig. 3. Note that inaddition to the workload parameters the cycle time isalso affected by the communication overheads asso-ciated with exchange of messages between differentworkstations. For the smallest value of SA, SB the bestperformance is achieved with a threshold of 24. SinceNo=24, QL=24 implies that the client request isalways forwarded to the server. Since the serverexecution times are small, the expected waiting time inthe forwarding queue is small in comparison to theadditional delay incurred in returning a handle to theclient. As a result, forwarding the request always resultsin the best performance. Using QL=1 or 3 producescomparable results only at small values of N. UsingQL=0 implies that a handle is always returned to theclient irrespective of the system state. Since returning thehandle results in an extra message transfer, a purelyhandle-driven ORB exhibits the worst performanceunder a workload condition in which message transferdelays play a significant role in determining systemthroughput. The importance of using an adaptiveORB becomes apparent when SA, SB=50, 75ms (seeFig. 3(b)). Since the workload is characterized by alarger server execution time, the server demand has astronger influence on response time in comparison to the

previous case. For medium to high load, the A-ORBwith QL=1 or 3 outperforms the purely handle-driven(QL=0) and forwarding (QL=24) ORBs. A forwardedmessage has to wait in the forwarding queue until theserver completes its current cycle and is ready to receive


the message. Since a larger server execution time tendsto increase this queueing delay, an inferior throughput isachieved by the purely forwarding ORB in comparisonto the adaptive system. The purely handle-driven ORB

0 5 10 15 20 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Age

nt U

tiliz

atio

n U

A

demand010D0 QL03 UA

x:TMo:TR*:TF+:CPUG

0 5 10 15 20 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Age

nt U

tiliz

atio

n U

A

demand050D0 QL03 UA

x:TMo:TR*:TF+:CPUG

0 5 10 15 20 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Ser

ver

Util

izat

ion

US

demand250D0 QL01 US

x:PAo:PB*:CPUA+:CPUB

(a)

(b)

(c)

Fig. 4. Agent and server utilizatins for experiment set 1: (a) SA,

SB=10, 15ms, (b) SA, SB=50, 75ms, (c) SA, SB=250, 375ms.

performs the worst for the same reasons given for theprevious case. Approximately the same throughput isachieved with the A-ORB irrespective of QL valueswhen a set of higher server demands (SA, SB=250,375ms) is used (see Fig. 3(c)). The server is theperformance bottleneck in this situation and the strategyadopted at the a-agent for coordination has little effecton performance. This is explained further with the datapresented in Fig. 4 that shows the utilizations ofdifferent system components. For the two lower sets ofserver demands and QL=3, either the forwarding threador the returning thread is the performance bottleneck(see Fig. 4(a) and (b)). At higher values of N, dependingon the workload conditions, either one of them saturatesand its utilization becomes close to 1. It is interesting tonote that the CPU utilization of the agent CPU tends todecrease at higher values of N. This is due to the factthat at higher system loads the a-agent threads spend aconsiderable amount of time in waiting or sleeping for amessage to be received during which the CPU remainsidle. The utilizations of the servers are observed to bemuch smaller in comparison (see Table 2). Note thatbecause an agent thread waits for the reception of themessage by the client or the server, its CPU utilization islower than its process utilization. Since a synchronousmechanism is used for inter-node communication byOrbix-MT, a forwarding or returning thread waits untilthe transmitted message is received. With an increase inN this waiting time increases and saturation of the agentprocess that contains these threads occurs approxi-mately at N=4 for SA, SB=10, 15ms and at N=8 forSA, SB=50, 75ms. The performance bottleneck shifts tothe server B for SA, SB=250, 375ms as captured by theserver utilization close to 1 (see Fig. 4(c)). Since Server Bis the performance bottleneck, the QL values used at theagent process have little impact on overall systemperformance. Although Fig 4(c) presents results forQL=1, Server B is observed to be the bottleneck forother values of QL as well [28].

5.2. Experiment set 2: effect of message size

The system throughputs achieved with three differentmessage sizes are displayed in Fig. 5. As in the results ofthe previous experiment captured in Fig. 3, a purelyhandle-driven approach (QL=0) produces the worst

able 2

erver utilizations (D=0ms, L=150Bytes, QL=3)

rocess/CPU SA, SB=10, 15ms SA, SB=50, 75ms

erver A 0.22 0.28

erver B 0.25 0.38

PU A 0.21 0.27

PU B 0.24 0.37

T

S

P

S

S

C

C

ARTICLE IN PRESS

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

msg4800D0 X

x:QL00

o:QL01

*:QL03

+:QL24

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

msg9600D0 X

x:QL00

o:QL01

*:QL03

+:QL24

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

msg19200D0 X

x:QL00

o:QL01

*:QL03

+:QL24

(a)

(b)

(c)

Fig. 5. The results of experiment set 2, SA, SB=10, 15ms, D=0ms: (a)

L=4800Bytes, (b) L=9600Bytes, (c) L=19200Bytes.


performance in all the cases. As demonstrated inFigs. 5(a) and (b), comparable performances areachieved with the three other threshold values forL=4800 and 9600. For larger message, sizes the

performance of the adaptive ORB (QL=1 or 3) issignificantly better than that of a purely forwardingORB (QL=24). The rationale for the observed systembehavior is briefly summarized.The relative performance of a handle returning and a

forwarding operation depends on the inter-node delay,message size, and the length of the forwarding queue.The delay in forwarding a message increases with anincrease in the number of enqueued messages in thequeue. Returning a handle bypasses the forwardingqueue but incurs a delay in the returning queue.Although an additional message is required for return-ing the handle, note that the length of this message isvery small. Whether a forwarding or handle returningoperation is preferable depends on the relative values ofthe delays associated with these operations. At lowervalues of L the overall message transmission time issmaller, as a result of which the average size of theforwarding queue is likely to be smaller in comparisonto workloads characterized by larger values of L. Forthe parameters used in the experiments the results ofwhich are presented in Figs. 5(a) and (b), alwaysreturning the handle produces the largest delay. Switch-ing over to a forwarding mode when there are messageswaiting in the forwarding queue (QL=1 or 3) gives amuch superior performance especially at higher valuesof N. The comparable performance of a purelyforwarding ORB (QL=24) indicates that most messagesfind a small number of messages in the forwardingqueue as a result of which most of them are forwardedwith QL=1 or 3. More queueing delay is experiencedwith QL=24 when the message size is longer(L=192 000) and a purely forwarding ORB demon-strates an inferior performance in comparison to theadaptive ORB with QL=1 or 3 (see Fig. 5(c)).

5.3. Experiment set 3: effect of inter-node delay

The importance of dynamically switching the mode ofoperation between a handle-driven and a forwardingmode is captured in Fig. 6. At higher values of N theadaptive ORB (QL=1 or 3) demonstrates a substantiallybetter performance in comparison to a purely handle-driven (QL=0) and a purely forwarding (QL=24) ORB.For example, when D=125ms and N=24, approxi-mately a 75% higher throughput is achieved with QL=1or 3 in comparison to QL=0 or 24 (see Fig. 6(b)). Theresults presented in Fig. 6(c) correspond to a workloadin which D is the dominant parameter. A purely handle-driven architecture always incurs a message transferdelay that is greater than 4D. Clearly a forwardingoperation is preferable when the length of the forward-ing queue is small. Although a forwarding operationgives rise to a lesser number of message transfers, apurely forwarding ORB may give rise to a substantialqueueing delay when there is a large number of messages

ARTICLE IN PRESS

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

delay050 X

x:QL00

o:QL01

*:QL03

+:QL24

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

delay125 X

x:QL00

o:QL01

*:QL03

+:QL24

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

delay250 X

x:QL00

o:QL01

*:QL03

+:QL24

(a)

(b)

(c)

Fig. 6. The results of experiment set 3, SA, SB=10, 15ms,

L=150Bytes, (a) D=50ms, (b) D=125ms, (c) D=250ms. able 3

ter-node delays for systems with non-equidistant nodes

ase Client–server

delay (ms)

Client–agent

delay (ms)

Server–agent

delay (ms)

50 250 250

250 250 50

250 50 250

S. Majumdar et al. / J. Parallel Distrib. Comput. 64 (2004) 201–218210

to be forwarded. The length of the forwarding queue isexpected to increase with D. Thus, switching over to ahandle-driven mode for larger queue sizes gives rise to abetter performance and the adaptive ORB with QL=1

or 3 demonstrates the highest throughput. As expected,the maximum throughput attained by the system whenD=250ms is lower in comparison to that withD=125ms. The relative performances of the A-ORBwith the four different values of QL are similar to thesystem with D=125ms. It is interesting to note that apurely handle-driven ORB demonstrates a comparableperformance with a purely forwarding ORB (see Fig.6(b) for example). This indicates that at higher values ofD when the inter-node delay has the strongest influenceon performance the additional message transfer delayincurred by a handle-driven ORB is comparable to thequeueing delay incurred by a message in the forwardingqueue. The adaptive ORB demonstrates the bestperformance at higher values of N.

5.4. Systems with non-equidistant nodes

The previous discussion was based on systemscharacterized by the same inter-node delay. An inves-tigation of the A-ORB when the delay in messagetransfer between a given pair of nodes can be differentfrom that between a different pair of nodes is presentednext. We have considered three different cases (seeTable 3). As in the previous set of experiments a client, aserver, and the agent run on different workstations. Ineach case the inter-node delay between one pair of nodesis smaller than the inter-node delays between the otherpairs that are held at 250ms. The other factors are heldat fixed levels: L=150 Bytes, SA, SB=10, 15ms. Adiscussion of the results captured in Fig. 7 is provided.

Case 1: Client closer to the server: Both the purelyhandle-driven ORB (QL=0) and the purely forwardingORB (QL=24) perform comparably with one another(see Fig. 7(a)). The contributions of the inter-nodedelays to the client response times are 550ms for aforwarding operation and 600ms for a handle returningoperation (see Table 3). The delay introduced in a purelyforwarding ORB is slightly lower but the agent behavesas a central dispatcher that becomes saturated at highersystem load. Thus the small improvement in messagetransmission times seems to be balanced by the increasein queueing delays experienced by messages at theforwarding queue. As a result, both of these pure ORBarchitectures perform close to each other. Usingan adaptive system (QL=1 or 3) produces benefits

T

In

C

1

2

3

ARTICLE IN PRESS

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

nonEq25 25 5 X

x:QL00

o:QL01

*:QL03

+:QL24

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

nonEq25 5 25 X

x:QL00

o:QL01

*:QL03

+:QL24

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

nonEq5 25 25 X

x:QL00

o:QL01

*:QL03

+:QL24

(a)

(b)

(c)

Fig. 7. Systems with non-equidistant nodes, L=150Bytes, SA,

SB=10, 15ms (a) D CA=250ms, D AS=250ms, D CS=50ms (b)

D CA=250ms, D AS=50ms, D CS=250ms (c) D CA=50ms,

D AS=250ms, D CS=250ms.


especially at higher system loads: approximately a 100%throughput improvement is observed for example atN=24.

Case 2: Server closer to the agent: As shown inFig. 7(b), the purely forwarding ORB (QL=24) per-forms significantly better than a purely handle-drivenORB (QL=0). This is because a lesser number ofmessages is used in the forwarding operation and thesmaller inter-node delay incurred in forwarding themessage from the agent to the server. As a result,approximately twice the throughput is achieved with thepurely forwarding ORB. The adaptive ORB (QL=1 or3) demonstrates an even higher performance in compar-ison to the purely forwarding ORB.

Case 3: Client closer to the agent: As shown in Fig.7(a), the purely handle-driven ORB (QL=0) performssuperior to a purely forwarding ORB (QL=24). Anextra message is exchanged for the handle returningoperation in comparison to the forwarding operation.The lower inter-node delay in returning the handle tothe client that is closer to the agent node gives rise to anapproximately 100% higher throughput for the purelyhandle-driven ORB in comparison to the purelyforwarding ORB at higher values of N. It is encouragingto note that a further improvement in performance isachieved by using an adaptive system: over 25%increase in throughput is achieved at N=24 by usingQL=1 or 3 in comparison to the purely handle-drivenORB.The results presented in Cases 1–3 highlight the

importance of using a hybrid middleware system thatadapts to the current system state.

5.5. Effect of multithreading

As indicated earlier in Section 5.1, due to thesynchronous nature of inter-process communicationthe agent in the adaptive ORB may become a softwarebottleneck. Consider, for example, a scenario in whichthe returning queue is empty and multiple requests arewaiting at the forwarding queue. When the forwardingthread is blocked waiting for a server to receive aforwarding request, the underlying CPU will remainidle. This will result in a high thread utilization for theforwarding thread and a smaller CPU utilization at theagent. With multiple forwarding threads it is possible fora forwarding thread to use the CPU, while the first isblocked waiting for the sever to receive the message. Asimilar argument applies to the use of multiple returningthreads. The utility of using multiple forwarding andreturning threads is analyzed in this section.A representative set of results for a given set of

parameter values is presented in Figs. 8–10. More dataare available in [28]. We have used the smaller serverexecution times (10, 15ms) in this experiment. Highvalues of SA and SB lead to a computation boundsystem in which the A-ORB configuration will have avery minor impact on performance (see Fig. 3(c)for example). Many telecommunication systems are

ARTICLE IN PRESS

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

msg4800D125C1 4 QL24 X

x:C1

o:C2

*:C3

+:C4

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

msg4800D125C5 8 QL24 X

x:C5

o:C6

*:C7

+:C8

(a)

(b)

Fig. 9. Throughput of the A-ORB with polling period=1ms, D=125/

125/125ms, L=4800Bytes, SA, SB=10, 15ms, QL=24: (a) degree of

multithreading=1–4, (b) degree of multithreading=5–8.

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

msg4800D125 X

x:QL00

o:QL01

*:QL03

+:QL24

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

msg4800D125C2 X

x:QL00

o:QL01

*:QL03

+:QL24

(a)

(b)

Fig. 8. Throughput of the A-ORB with D=125/125/125ms,

L=4800Bytes, SA, SB=10, 15ms: (a) degree of multithreading=1,

(b) degree of multithreading=2.


communication intensive and present a commu-nication demand that is heavier in comparison to thecomputation demands of the servers. Figs. 5 and 6demonstrate that the effect of QL on performance is alsoreduced at high values of L and D. As a result, we haveavoided the choice of very high values for theseparameters in our experiments. The chosen set ofparameters leads to scenarios in which the A-ORBconfiguration parameters are expected to have asignificant impact on performance. This is appropriatefor the investigation of the effect of multithreading onperformance. Analyzing the effect of multithreading onreal applications forms an interesting direction forfuture research.The degree of multithreading refers to the number of

forwarding and returning threads in the a-agent. Asshown in Fig. 8, over 60% improvement in throughputis achieved for all threshold values as the number ofthreads is increased from 1 to 2. Note that as shown in

Fig. 8(b), the adaptive ORB (QL=1,3) performs betterthan the purely handle driven (QL=0) and the purelyforwarding (QL=24) ORB on the multithreadedsystem as well. The impact of using a higher degreeof multithreading for QL=24 is captured in Fig. 9.As shown in Fig. 9(a), as the numbers of forwardingand returning threads are increased from 1 to 4, asubstantial improvement in throughput is observed. Afurther increase in the degree of multithreadingproduces a decrease in system throughput (seeFig. 9(b)). This anomalous behavior is an outcome ofthe excessive overhead incurred at higher number ofthreads. The overhead is a function of the total numberof threads as well as the polling period associated withthe forwarding and the returning queues. The pollingoverhead includes both the associated CPU time as wellas the context switch time spent in performing wait andsignal operations on the semaphore protecting theforwarding or the returning queue. The shorter the

ARTICLE IN PRESS

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

msg4800D125 C1 4 QL24 X

x:C1

o:C2

*:C3

+:C4

0 5 10 15 20 250

2

4

6

8

10

12

14

16


Ove

rall

Sys

tem

Thr

ough

put X

(C

lient

Req

uest

/sec

)

msg4800D125 C5 8 QL24 X

x:C5

o:C6

*:C7

+:C8

(a)

(b)

Fig. 10. Throughput of the A-ORB with polling period=100ms,

D=125/125/125ms, L=4800Bytes, SA, SB=10, 15ms, QL=24: (a)

degree of multithreading=1–4, (b) degree of multithreading=5–8.

able 4

omparison of the original ORB with a purely forwarding ORB

Throughput (A-ORB

with QL=24)

Throughput (original

ORB)

0.3085 client requests/s 0.2376 client requests/s




polling period the more is the queue polling overhead.By increasing the polling period to 100ms for example,this anomalous behavior can be avoided (see Fig. 10).Note that increasing the number of threads beyond 4 forthe given set of parameters does not produce anysignificant increase in performance. This is because withfive forwarding and returning threads the softwarebottleneck at the agent is alleviated and the serverprocesses become the new performance bottlenecks [28].As a result, a further increase in the degree ofmultithreading at the agent does not produce anysignificant performance benefit. This paper focusesprimarily on an investigation of the viability of anadaptive ORB. An a-agent that uses polling wasimplemented because of its simplicity. It was sufficientto demonstrate the effectiveness of the adaptivearchitecture. More efficient ORB designs that usecondition variables and avoid the polling loop are underinvestigation.

6. Discussion and insights into system behavior

The experimental results demonstrate the utility of anadaptive middleware system. These results provideinsights into system behavior and performance that webelieve are important for both designers as well as usersof distributed systems. A short discussion of theseobservations is presented in this section.

The superiority of the adaptive ORB: The adaptiveORB is observed to produce a significant performancebenefit in most situations. However, in a computationbound system in which the servers are characterized byheavy CPU execution times that are large in comparisonto the communication delays, the architecture of theagent seems to have a small impact on performance: forthe different QL values experimented with, the adaptiveORB is observed to perform comparably with a purelyhandle-driven or a purely forwarding ORB (see Fig.3(c)). A handle returning operation uses an additionalmessage in comparison to a forwarding operation,whereas the forwarding operation requires a request towait in a centralized forwarding queue. Thus, for a lowsystem load, a forwarding operation produces a betterperformance whereas a handle-driven operation givesrise to a smaller queueing delay when N is large and theforwarding queues are long. An adaptive ORB combinesthe good attributes of both of these architectures byswitching from a forwarding to a handle-driven modewhen the number of messages in the forwarding queueexceeds a pre-determined threshold.It was interesting to observe that even without having

access to the source code it was possible to outperformthe original ORB product. Results of an experimentwith D=500ms and L=4800Bytes is presented inTable 4. The performance of the original ORB Orbix-MT is compared with a purely forwarding A-ORB(QL=24). The comparison is biased towards the originalORB. This is because a more efficient design of the A-ORB would be possible if we had access to the sourcecode of Orbix-MT. Second, since we did not have accessto Orbix source code, we could not simulate the inter-node delay between the client and the Orbix namingservice in the same way as in the case of the A-ORB. Theclient process for the original ORB is forced to sleep for500ms after receiving the message, whereas in case ofthe A-ORB the inter-node delay is simulated by forcing

T

C

N

1

2

4

ARTICLE IN PRESS

able 5

he impact of threshold value on performance

hreshold Systems throughput

2.7611 client requests/sec









the sender thread to sleep. Since with QL=24 the systemis always in the forwarding mode and the forwardingthread always sends the message to the server, thesleeping of the thread led to a saturation of the a-agentat higher values of N thus biasing the comparison infavour of the original ORB. The results show that atlower load a purely forwarding architecture performsbetter than the original ORB that uses a handle-drivenarchitecture: approximately 30% improvement is ob-served at N=1. At N=4, the purely forwarding ORBsaturates and the handle-driven ORB starts performingbetter.

The impact of workload parameters: Although theadaptive ORB produces the best performance in most ofthe experiments we have conducted, the degree ofimprovement achieved by using the adaptive architec-ture depends on the workload parameters. A significantbenefit accrues from using an adaptive agent forcommunication bound systems that are characterizedby large message sizes and/or high inter-node delayswith server demands that are small in comparison.Approximately a 100% improvement in performanceover a purely handle-driven and forwarding ORB isobserved for an inter-node delay of 250ms (see Fig.6(c)). As mentioned in the previous paragraph, using anadaptive middleware does not provide any significantbenefit for a computation bound system. A strongmotivation for using an adaptive ORB exists forbalanced systems characterized by both communicationand computation workloads.

Non-equidistant nodes: The relative performancesof a purely handle-driven and forwarding ORBdepend on the relative inter-node delays among nodesrunning the client, the server, and the agent. Whenthe client is closer to the agent in comparison to theserver a purely forwarding ORB demonstrates a super-ior performance (see Fig. 7(c)) whereas a purely handle-driven ORB seems to be preferable when the agentis closer to the server (see Fig. 7(b)). Irrespective of therelative inter-node delays used in our experiments,the adaptive ORB demonstrates the best performance(see Fig. 7).

Software bottlenecks: An interesting observation isthat performance is often limited by a saturated agentprocess instead of a hardware device. Since a synchro-nous communication is used, the forwarding or thereturning thread in the agent can spend 100% of thetime in executing and waiting for the message it has sentto be received. As a result, the agent process demon-strates a utilization of 1.0 whereas the CPU it is runningon may be under utilized. Conventional computersystems are characterized by hardware bottlenecks suchas a CPU or a disk drive that saturates and limits systemthroughput. Software bottleneck is a relatively newerphenomenon that is observed to occur on client–serversystems [1,13].

Effect of multithreading: An effective tool for combat-ing software bottlenecks is multithreading. If multipleforwarding threads exist, it is possible for two of thesethreads to forward two different client requests todifferent servers. Thus multiple forwarding operationscan occur concurrently and reduce the queueing delayexperienced by the client. A similar argument can bemade for using multiple returning threads. By increasingthe number of these threads we can expect systemperformance to improve. Our experiments demonstratethat a substantial benefit in performance is observedwith the incorporation of multiple forwarding andreturning threads in the a-agent. The multithreadingoverhead needs to be considered carefully for determin-ing the number of threads.

Choice of threshold: This paper has focused on theviability of the adaptive ORB concept. The determina-tion of appropriate threshold levels QL is beyond thescope of this paper. The threshold value is likely tobe a function of the inter-node delays as well asother workload parameters. All possible thresholdvalues QL=0.7 are investigated for N=8 andD=125ms with the other parameters fixed at levelsdescribed in Experiment Set 1. The results are presentedin Table 5. Throughput is observed to increase as QL

is increased to 1 from 0. Further increase in QL

decreases system performance indicating that QL=1is the best choice for the given workload parameters.A larger QL value is better however for SA, SB=10,15ms with the other workload parameters fixed atlevels described in Experiment Set 1. The appropriatethreshold level may depend on inter-node delays aswell. For example, a small threshold value may berequired when the client is closer to the agent incomparison to the server, whereas a higher thresholdvalue may be appropriate when the server is closer tothe agent. Saturation of the forwarding thread givesrise to a software bottleneck at the agent and isdetrimental for system performance. QL values shouldbe chosen in such a way that such bottlenecks areavoided. If multiple forwarding and returning threadsare used, the threshold level is likely to depend onthe multithreading level. Analytic and simulation

T

T

T

0

1

2

3

4

5

6

7

ARTICLE IN PRESS

able 6

hroughput with N=8, L=4800bytes, SA, SB=10, 15ms, and

=0ms

hreshold value Throughput

L=0 7.0702 client requests/s










ynamic threshold 11.6080 client requests/s


models of systems may be used for determiningthe threshold levels that will give rise to high perfor-mance. QL can also be a tuning parameter that isto be adjusted by the system administrator for achievinga desired performance level. Such tuning parametersare well known in the context of various systems.Examples include the working set parameter forvirtual memory management, and the time slice dura-tion used in a Round Robin CPU scheduling strategy.As in the case of these, a model may be used todetermine an approximate QL value that may befine-tuned during system operation. Another possi-bility is to develop an a-agent management strategythat will periodically record system performance anduse the measured values of performance indices(utilization of forwarding and returning threads forexample) to select a QL value that is appropriate for thecurrent system state. Techniques for threshold valueselection form an important direction for futureresearch.

Dynamic threshold: We have done some experimenta-tion with the use of a dynamic QL value that is afunction of the length of the returning queue. Somepreliminary results for a system in which all the inter-node delays are the same are reported. Let W be thedelay (includes queueing delay) from agent to client, X0

the delay (includes queueing delay) from agent to server,Y the delay from client to server, Z the delay from serverto client, Qf the number of requests in the forwardingqueue, Qr the number of requests in the returning queueand D the inter-node delay.The delay in completing a request depends on which

queue at the agent is used.When the forwarding queue is used, the delay

=X0+Z.When the returning queue is used, the delay

=W+Y+Z.Noting that X0=Qf D, W=Qr D, and Y=D, it can be

easily verified that the if we need to select the queue withthe shorter delay, the forwarding operation must takeplace when

Qf oQr þ 1

The handle returning operation must be used other-wise. We have conducted some preliminary experimentsand the results of one such experiment are reported inTable 6. More data are available in [28]. As shown inTable 6, the dynamic threshold based system outper-forms the static threshold-based system for all QL

values. The generalization of the dynamic thresholdtechniques to systems with non-equidistant nodes iswarranted. Experimentation with more system andworkload parameters forms an important direction forfuture research.

T

T

D

T

Q

Q

Q

Q

Q

Q

Q

Q

Q

Q

D

7. Conclusions

The popularity of distributed object oriented comput-ing that combines the advantages of OO technology withthe benefits of distributed processing is rising continu-ously. Distributed OO systems often comprise ofcomponents that are built using diverse technologies.CORBA is a standard for middleware that is required forachieving inter-operability in such a heterogeneousenvironment. The client–agent–server interaction archi-tecture employed by the middleware has a significantimpact on performance. The performance of a handle-driven and a forwarding architecture are discussed in [1].An adaptive architecture that dynamically changes itsbehavior from a handle-driven ORB to a forwardingORB is presented in this paper. Based on a commercialmiddleware product called Orbix-MT, we have imple-mented a performance prototype of an adaptive ORB ona network of Sun workstations. A synthetic workload isused to investigate the impact of different workloadparameters on performance. Results of the experimentsindicate that a significant performance benefit can accruefrom such an adaptive middleware system. The agent inthe A-ORB combines the good attributes of the handle-driven and forwarding ORB and demonstrates superiorperformance relative to a purely handle-driven and apurely forwarding ORB under various workload condi-tions. Approximately, a 100% performance benefit isobserved for a communication bound system that ischaracterized by high inter-node delays. The paper showsthat even without having access to the source code of acommercial product it is possible to modify the interac-tion architecture and achieve a significant performancebenefit. An even higher performance can be expectedfrom a system where the source code of the agent isaccessible for re-engineering. The paper demonstrates theadvantages of an adaptive middleware system andprovides a number of insights into the relationshipbetween different workload parameters and performance.The agent process is observed to become the performance


bottleneck at high system load on certain systems. Usingmultiple forwarding and returning threads is an effectiveway of alleviating such bottlenecks. The adaptive ORB isobserved to outperform the purely handle driven and thepurely forwarding ORB on such multithreaded systemsas well.As in the case of tuning parameters used by operating

systems, the determination of a suitable value to be usedfor threshold is important. As discussed in Section 6,both simulation and analytic models can be run multipletimes with different values of QL and the QL value thatproduces the best performance for a given system andworkload can be chosen as an initial value for QL.Further tuning of the parameters may be done on thesystem until a desired performance level is achieved.Preliminary research on systems that use a dynamicthreshold looks promising and further investigation iswarranted. Another method that avoids the a prioriselection of QL is described in Section 6.This paper has focused on basic principles underlying

adaptive middleware and is primarily concerned withwhether or not an adaptive ORB with a hybrid client–agent–server architecture can significantly improvesystem performance. Our experiment-based analysesshow that a substantial performance benefit can resultfrom such an architecture for a broad range of systemload. Using the insights gained from this research andfocusing on a more effective implementation of thesystem than was possible in the performance prototypeis the logical next step. Implementation of the A-ORBcan be significantly improved if it is designed fromscratch or the source code of the commercial ORB isavailable to the a-agent designer for redesign andmodification. As described in Section 4, one of the

1. Initialize and set up the performance prototype.2. while(1)//top of the main event loop {

2.1. if any request arrivesSave the request parameters and return immed

}2.2. if (request type == ABIND) {

if (queue length of forwarding queue 4=QL) {put the request into the returning queu}else {

put the request (a set of parame}

}//other types of requests are possible but aunderstanding the adaptive nature of the a-agen

}//end of while loop

The forwarding thread:-1. while(1) {

1.1. if (forwarding queue is empty) {sleep for a specified duration of time // the que

shortcomings of the experimental prototype is that theclient needs to be aware of the adaptive nature of theORB and should be able to receive a handle from theagent or the result of the operation from the server inresponse to an a-bind request. A desirable improvementis to make this hybrid nature of the system transparentto the application (client) programmer. One possibleapproach for achieving this is to make a-bind a local‘‘system call’’. The invocation of the middleware agent,the additional method invocation (when a handle isreturned by the a-agent) and the reception of the replycan be handled by the code in this system call on behalfof the client. The client can simply perform an a-bindoperation and remain blocked until the results of thedesired operation are available. Devising such a systemcomponent that can act as an interface and hide thehybrid nature of the system from the client that alwaysexpects the result of method invocation after initiatingthe bind operation is worthy of investigation.

Acknowledgments

This research was supported by Natural Sciences andEngineering Research Council of Canada and Commu-nications and Information Technology Ontario. Thanksare due to Diwakar Krishnamurthy for his help with thepreparation of the final manuscript.

Appendix

Algorithm for a-agentThe listener thread (main thread):-

iately // the client can thus be unblocked.

e

ters) into the forwarding queue

re not included because they are not necessary fort

ue is checked periodically avoiding a busy waiting loop

ARTICLE IN PRESS

}else {

1.1.1. get the parameters sent by the client from the forwarding queue1.1.2. select the server index (e.g., A1 or A2, B1 or B2) using a Round Robinstrategy//Each server X (X=A or B) has two copies: X1 and X21.1.3. sleep ‘‘delayAgentServer’’ ms//simulation of inter-node delay.1.1.4. forward the request to a server selected in step 1.1.2.

}}//end of while loop

The returning thread: -1. while(1) {

1.1. if (returning queue is empty) {sleep for a specified duration of time // the queue is checked periodically avoiding a busywaiting loop}else {1.1.1. get the parameters sent by the client from the returning queue.1.1.2. select the server index (e.g., A1 or A2, B1 or B2) using a Round Robinstrategy.1.1.3. unstringify ClntOBjRefStr to obtain the sending client’s handle1.1.4. sleep ‘‘delayClientAgent’’ ms//simulation of inter-node delay.1.1.5 return the handle of the server selected in step 1.1.2 to the client.}

}//end of while loop


References

[1] I. Abdul-Fatah, S. Majumdar Performance comparison of

architectures for client–server interactions in CORBA, Proceed-

ings of the IEEE 18th International Conference on Distributed

Computing Systems (ICDCS’98), Amsterdam, May 1998,

pp. 2–11.

[2] I. Abdul-Fatah, S. Majumdar, Performance of CORBA-based

client–server architectures, IEEE Trans. Parallel Distributed

Systems 13 (2) (2000) 111–127.

[3] R. Adler, Distributed coordination models for client/server

computing, IEEE Computer, April 1995, pp. 14–22.

[4] Borland Inprise, Visibroker: CORBA technology from inprise,

http://www.borland.com/visibroker, 1999.

[5] M.J. Fontenot, Software congestion, mobile servers, and the

hyperbolic model, IEEE Trans. Software Eng. 15 (1989) 947–962.

[6] P. Haggerty, K. Seetharaman, The benefits of CORBA-based

network management, Commun. ACM 41 (10) (1998) 73–80.

[7] M. Henning, S. Vinoski, Advanced CORBA Programming with

C++, Addison-Wesley, Longman, Reading MA, 1999.

[8] Iona Technologies, Orbix Programmers’ Guide, Dublin, Ireland,

1997.

[9] R. Koster, T. Kramp, Structuring QoS-supporting services with

smart proxies, Proceedings of the Middleware 2000, Conference,

New York, April 2000, pp. 273–288.

[10] B. Li, K. Nahrstedt, QualProbes: middleware QoS profiling

services for configuring adaptive applications, Proceedings

of the Middleware 2000, Conference, New York, April 2000,

pp. 256–272.

[11] H.W. Lockhart Jr., OSF DCE guide to developing distributed

applications, McGraw-Hill, Inc., New York, NY, 1994, p. 10 020.

[12] P. Mellinar-Smith, L. Moser, V. Kalogeraski, P. Narasimhan,

The Realize middleware for replication and resource manage-

ment, Proceedings of the Middleware’98, Conference, The Lake

District, England, September 1998, pp. 123–138.

[13] J.E. Neilson, C.M. Woodside, D.C. Petriu, S. Majumdar,

Software bottlenecking in client–server systems and rendezvous

networks, IEEE Trans. Software Eng. 21 (9) (1995) 776–782.

[14] OMG, Real-time CORBA—joint revised submission (request for

proposal), OMG-TC Document Orbos/99-02-12, March 1999.

[15] Object Management Group, The common object request broker:

architecture and specification, 2.3 Edition, June 2000.

[16] C. O’Ryan, F. Kuhns, D.C. Schmidt, O. Othman, J. Parsons, The

design and performance of a pluggable protocol framework for

real-time distributed object computing middleware, Proceedings

of the Middleware 2000, Conference, New York, April 2000,

pp. 372–395.

[17] R. Otte, P. Patrick, M. Roy, Understanding CORBA The

Common Object Request Broker Architecture, Prentice-Hall, A

Simon & Schuster Company, Upper Saddle River, NJ, 1996

p. 07458.

[18] N. Parlavantzas, G. Coulson, G.S. Blair, On the design of

reflective middleware platforms, Proceedings of the RM-2000,

Workshop on Reflective Middleware, New York, April 2000,

pp. 3–4.

[19] M. Roman, M.D. Mickunas, F. Kon, R. Campbell, LegORB

and ubiquitous CORBA, Proceedings of the RM-2000, Work-

shop on Reflective Middleware, New York, April 2000,

pp. 1–2.

[20] D.C. Schmidt, Evaluating architectures for multithreaded object

request brokers, Commun. ACM 41 (10) (1998) 62–72.

[21] D.C. Schmidt, A. Gokhale, T. Harrison, G. Parulkar, A high-

performance endsystem architecture for real-time CORBA, IEEE

Commun. Mag. 35 (2) (1997) 72–78.

[22] D.C. Schmidt, T.H. Harrison, E. Al-Shaer, Object-oriented

components for high-speed network programming, in: Proceedings

&ast;http://www.borland.com/visibroker


of the First Conference on Object-oriented technologies,

Monterey, CA, USENIX, June 1995, pp. 21–38.

[23] D.C. Schmidt, S. Vinoski. Modelling distributed object applica-

tions, C++ Report, February 1995.

[24] D.C. Schmidt, S. Vinoski. Comparing alternative client-side

distributed programming techniques, SIGS, C++ Report,

May 1995.

[25] D.C. Schmidt, S. Vinoski. Comparing alternative server program-

ming techniques, SIGS C++ Report, October 1995.

[26] D.C. Schmidt, S. Vinoski. Comparing alternative programming

techniques for multi-threaded servers, SIGS C++ Report,

February 1996.

[27] K. Seetharaman, Introduction, Commun. ACM 41 (10) (1998)

34–36.

[28] E.-K. Shen, Adaptive middleware systems M. Eng. Thesis, School

of Computer Science, Carleton University, Ottawa, Canada,

May 2000.

[29] E.K. Shen, S. Majumdar, I. Abdul-Fatah, High performance

adaptive middleware for CORBA-based systems, Proceedings of

the ACM Conference on Principles of Distributed Computing

(PODC’00), Portland, July 2000, pp. 233–241.

[30] J. Siegel, OMG overview CORBA and OMA in enterprise

computing, Commun. ACM 41 (10) (1998) 37–43.

[31] M. Stal, The broker architectural framework Object-Oriented

Programming Systems, Languages and Applications (OOP-

SLA’95), Maitland, Florida, U.S.A.

[32] A.T. Van Halteren, A. Noutash, L.J.M. Nieuwenhuis, M.

Wegdam, Extending CORBA with specialized protocols for

QoS provisioning, Proceedings of the International Symposium

on Distributed Objects and Applications (DOA’99), Edinburgh,

September 1999, pp. 318–329.

Shikharesh Majumdar is a Professor and the director of the Real Time

and Distributed Systems lab at the Department of Systems and

Computer Engineering at Carleton University in Ottawa, Canada. He

holds an M.Sc. and a Ph.D. degree in computational science from

university of Saskatchewan, Saskatoon, Canada. Before his graduate

studies in Canada he did a Bachelor of Electronics and Telecom

Engineering and a Post-Graduate Diploma in Computer Science

(hardware) from Jadavpur University in India and completed the

Corso Di Perfezionamento from Politecnico Di Torino in Italy. Dr.

Majumdar has worked at the R&D Wing of Indian Telephone

Industries (Bangalore) for six years. His research interests are in the

areas of Web-based systems, operating systems, middleware and

performance evaluation. He has received two awards related to

publications in these areas. Dr. Majumdar is a member of ACM and

IEEE. He is the Associate Editor of the IEEE TC on Operating

Systems bulletin and was a Distinguished Visitor for the IEEE

Computer Society (1998–2001).

E-Kai Shen completed his Master’s in Computer Science from

the School of Computer Science at Carleton University. His

Master’s thesis was concerned with adaptive middleware systems.

Mr. Shen has worked for a number of years at Nortel Networks

in Ottawa, Canada. Currently he is with Compal Communication

in Taiwan.

Istabrak Abdul-Fatah received his B.Eng. Degree in Electrical

Engineering-Computer Science Option from the University of Bagh-

dad in Iraq in 1979, and then he worked for ten years on Nortel’s

MeridianPBX systems in the Middle East. Mr. Abdul-Fatah was

awarded the M.Eng. Degree in Electrical Engineering from Carleton

University in Canada in 1997. He has worked as a Software Architect

at Innovance Networks in Ottawa with extensive experience in

CORBA and EJB/XML based distributed computing platforms. He

possesses expertise in the design, implementation and optimization of

OODBMS/ORDBMS and LDAP Schemas and Applications. Mr.

Abdul-Fatah has received a number of academic and industrial awards

for distinguished achievements and technical leadership. Before joining

Innovance Networks, he has worked for a number of years at Nortel

Technologies in Ottawa where he played a key role in the OSS/J

Trouble Ticket Reference Implementation. Mr. Abdul-Fatah is

currently with Pinpoint Selling in Ottawa where he is developing

CRM software.

Documents

Performance of adaptive CORBA middleware