Resource control for large-scale distributed simulation system over loosely coupled domains

J. Parallel Distrib. Comput. 65 (2005) 1171–1189www.elsevier.com/locate/jpdc

Resource control for large-scale distributed simulation system overloosely coupled domains�

Azzedine Boukerchea,∗, Armin Mikklerb, Alessandro Fabric

aSITE, University of Ottawa, CanadabUniversity of North Texas, USA

cNortel, USA

Received 1 September 2003; accepted 1 April 2005

Abstract

Collaborative and distributed simulation environments that span across large interconnection networks, such as the Internet, are apromising way of supporting applications distributed over multiple computing sites. How-ever, communication-intensive distributedapplications may suffer from variable communication delays, jitter, and latency experienced over the network. We term aloosely coupleddomainthe combination of multiple sites connected through an internetwork.

In this paper, we focus on large-scale distributed simulations based upon local time warp deployed over several sites connected throughan interconnection network (aloosely coupled domain). We propose and study a mechanism to tune elaboration according to varyingcommunication performances, to reduce resource consumption and rollback costs that result from unpredictable network behavior. Themechanism is composed of two parts that operate in a completely distributed way: a local elaboration rate control and a message flowcontrol. The experimental study is conducted through emulation to exert a control on injected delays. Our results show that the mechanismis able to globally tune the entire simulation, so as to reduce the escalation of allocated computational resources and the cost of rollbacks.As delays disappear, the mechanism allows elaboration to resume normal rate.© 2005 Elsevier Inc. All rights reserved.

Keywords:Distributed simulation; Loosely coupled systems; Local time warp; Resource control; Flow control

1. Introduction

As the size and complexity of a simulation modelgrows, the resource demand becomes prohibitive, and itmay be necessary to distribute the simulation across mul-tiple hosts. To this end, a significant body of literatureon parallel/distributed simulation has been proposed (see[5,9–12,19]). There are two basic approaches to parallelsimulation: conservative and optimistic. Whileconservative

� This work was partially supported by the Canada Research ChairProgram, Canada Foundation for Innovation, OIT/Ontario DistinguishedResearcher Award, and NSERC Grants.

∗ Corresponding author. Fax: +1 613 562 5664.E-mail addresses:[email protected](A. Boukerche),

[email protected](A. Mikkler).

0743-7315/$ - see front matter © 2005 Elsevier Inc. All rights reserved.doi:10.1016/j.jpdc.2005.04.006

synchronization techniques rely on blocking to avoid viola-tion of dependence constraints,optimisticmethods rely ondetecting synchronization errors at run-time and then on re-covery using a rollback mechanism. In both approaches, thesimulated system is modeled as a network oflogical pro-cesses(LP) which communicate only via message passing.

A combination of both conservative and optimisticparadigms has also been investigated. Tropper et al.[2,6]have introduced, clustered time warp (CTW). In their ap-proach they have used a hybrid approach, which makes useof time warp between clusters of LPs, and a sequential sim-ulation algorithm within the cluster of LPs. They have inves-tigated the performance of their scheme using several digitallogic circuits. The results reported in [3] have shown a greatimprovement when compared to pure time warp imple-mentation. They have also developed several checkpointing

http://www.elsevier.com/locate/jpdc

mailto:[email protected]

mailto:[email protected]

1172 A. Boukerche et al. / J. Parallel Distrib. Comput. 65 (2005) 1171–1189

algorithms for use with CTW, each of which occupies adifferent point in the spectrum of possible trade-offs betweenmemory usage and execution time[1].

Another hybrid methodology, local time warp (LTW), hasalso been proposed by Rajaei et al. [30]. In LTW, logical pro-cesses are grouped intoclusters. LTW allows conservativecommunication among clusters, and optimistic computationwithin each cluster, as opposed to CTW. The local optimismallows portions of the simulation to be performed in advanceof the conservative horizon imposed by inter-cluster com-munications. However, the larger the gap between conserva-tive horizon and internal elaborations, the larger the ratio ofelaborations which are likely to be rolled-back, i.e., undone.

Due to unpredictable delays, LTW can suffer from highresource demands as well as large rollback overheads at acluster involved in a delayed communication due to ongoinginternal activities. Therefore, means for slowing down inter-nal cluster computations are necessary. On the other hand, ifinternal activities are slowed down the elaboration rate maynot be able to keep up with the rate of messages from otherclusters which do not experience delays. Therefore, meansfor slowing down other clusters are also necessary.

In this paper, we propose a resource control mechanismfor LTW simulations over an internetwork which is com-posed of both an intra-cluster scheme and an inter-clusterscheme [24,25]. The intra-cluster scheme is designed to re-duce the local elaboration rate when the effects of a delayare detected on the resource demand for outbound messages.The inter-cluster scheme is designed to tune the messageflow at the sender to the rate of event acceptance of a desti-nation cluster. Existing approaches to limiting the optimismof a parallel/distributed simulation (see [35] and the refer-ences therein) are not immediately applicable in our casebecause they do not take into account the network behavior,which is always assumed utterly reliable. In addition, mostapproaches are based on an expensive global view or evensynchronization of the entire computation, which is infeasi-ble in a loosely coupled environment.

An emulator of a LTW simulator has been implemented tostudy the proposed scheme in presence of delays artificiallyinjected into communication. In what follows, we first definethe basics of simulation, parallel/distributed approaches tosimulation, and the LTW technique in Section 2.4. Then, wedescribe the effects of delays on a LTW simulation in Section3. We present the resource control mechanisms, i.e., our mainobjective in this paper, in Section 4. The experimentationand the results are reported in Section 5, and conclusionsand future work in Section 7.

2. Simulation techniques

With computer-aided simulation (simply simulation here-after) we mean the utilization of a program, thesimulator,to generate histories of a model. The model is an abstractrepresentation of a system which is the object of the study

through simulation. Model’s histories are studied as repre-sentative of possible system behaviors. Therefore, the sim-ulator must be designed so as to reproduce the model’s be-havior over model’s time with respect to the set ofstatevariableswhich are of primary concern to the study.

Simulation techniques are divided into sequential and par-allel/distributed. In the next subsections, we briefly introducethe basics of such techniques and approaches, by outlin-ing the basic concepts of discrete event simulation[13,29],present a conservative scheme [14] and the time warp opti-mistic scheme [20], and finally present a methodology forparallel discrete event simulation called LTW [30], whichcombines conservative and optimistic approaches [18]. Thesimulation we are concerned with is based on LTW.

2.1. Discrete-event simulation

In a discrete-event simulation the model evolution is de-fined by instantaneousevents. Each event corresponds to atransition in a portion of the model state. The transition con-sists in the modification of a set of state variables. Each eventhas a simulation time associated with it, calledtimestamp,which defines the occurrence time of the event. Each eventmay in turn generate new events. Generated events have atimestamp in the future of the generating event’s timestamp.

The generation of new events and the dependency of cor-responding transitions on state variables that previous eventsmay have updated, define arelation of causal orderamongevents. For a simulation to be correct, such a relation mustbe satisfied by all the events. Therefore, events must be pro-cessed according to the causal order.

In order to formally define the causal relation amongevents we give the following definitions:

• Let S be the state of the model, composed of a set ofn

state variables,S = {v1, v2 · · · vn}.• Let E be the set of events in the model, with|E| = m.

Each eventei has a timestampti associated with it, whichis assumed to be ordered and unique for simplicity:ti <

tj ⇔ i < j . An evente at simulation timet defines atransition in the model state representing the simulationup to timet .

• For each eventei , we definew(ei) ⊂ S the set of statevariables whose values are updated byei ; r(ei) ⊂ S theset of state variables which are read during the eventelaboration; ands(ei) ⊂ E the set of events which arescheduled byei . The latter have all a timestamp higherthanti (realizability condition). In the case thatw(ei) =∅, ei , is saidnull eventsince it has no effect on the state.

The direct causal dependency→e among events is de-fined as follows:ei →e ej if either or both of the followingrequirements hold:

• ej ∈ s(ei),• i < j ∧∃v ∈ S · (v ∈ w(ei)∩ r(ej ))∧ (∀k · i < k < j ⇒v /∈ w(ek)).

A. Boukerche et al. / J. Parallel Distrib. Comput. 65 (2005) 1171–1189 1173

load messages

A B

packets

Fig. 1. Network model—example of causal relation.

Note that the first requirement has the consequence thati < j (because generated events have smaller timestampsthan their father’s), whereas the second requirement explic-itly states so. Therefore, a consequence of the direct causaldependency definition is that eventei happens before eventej in simulation time. Because of the dependency, eventeimust be elaborated before eventej for the simulation to becorrect. While this is obviously satisfied if the dependencyis due to the first requirement (ej cannot exist beforeei iselaborated), in general it is not automatically satisfied if thedependency is due to the second requirement and the firstone does not hold. In the latter case, a safe way of elaborat-ing events is to elaborate them in timestamp order, in somesense by mimicking in the simulation what happens in themodel.

Given the direct causal dependency, the causality relation<e among events is defined as follows:ei <e ej if thereexists a sequence ofp indices{k1, k2 · · · kp} such that

ei →e ek1 →e ek2 →e · · · →e ekp →e→e ej .

Those eventsei, ej for whichei →e ej does not hold are saidto beconcurrent. Note that concurrent events may actuallybe elaborated in either order without correctness problems.On the other hand, by elaborating events in timestamp ordera stronger requirement than the causal relation is actuallyenforced. In the following, we show an example which pointsout the problems due to causal violations.

Consider a model of a packed-switched communicationnetwork, and consider two adjacent nodes, see Fig.1. Sup-pose node A generates packets which may be routed to anumber of adjacent nodes, B included. Suppose also thatrouting decisions are based on load information of adjacentnodes periodically communicated by nodes. In order to fixthe ideas, suppose node A generates packets at a rate of oneper time unit, and node B sends load information to A at thesame rate and with an initial delay smaller than one timeunit. Therefore, in the model packets and load informationmessages strictly alternate, starting with a packet. Hence, Aalways has the most up-to-date information about B’s load totake routing decisions, and B’s load may depend on A’s pre-vious routing decision. For instance, if A routes one packetto B then the next load message from B reflects the over-head due to that packet, and the next routing decision at Awill be taken according to B’s communicated load.

In such a model, ifRouten indicates the event correspond-ing to the routing decision for packet numbern, andBLoadnindicates the event corresponding to the receipt of B’s loadmessage numbern at node A, then

BLoadn →e Routen+1

and for all those packetsi that A routes to B,

Routei →e BLoadi .

If B’s load is initially the lowest one, and routing decisionsare taken so as to route packets through nodes with smallutilization, a simulation which elaborates first all theRouteevents, and only afterwards all theBLoad events, wouldcome to the erroneous conclusion that all the traffic from Ais routed through B because B’s load is actually not takeninto account in the simulation.

Parallel approaches to simulation exploit the concurrencyamong events to achieve speed-ups on the time required toperform a simulation.As previously noted, concurrent eventsmay be elaborated in any order, therefore they may actuallybe elaborated in parallel even though their timestamps arenot the same. A parallel simulator is composed of a set ofq

LPs which interact by means of messages, each carrying anevent and its timestamp, thus calledevent messages. Eachlogical processPh is responsible for managing a subset ofthe model stateSh ⊂ S, called local state. Local states aredisjoint, the union of which isS. Each evente thatPh mayreceive is relative to a transition inPh’s local stateSh, i.e.,r(e) ∪ w(e) ⊆ Sh. Those events which are scheduled as aconsequence ofe’s elaboration are sent as event messages tosuitable processes for elaboration. Therefore, a processPi isconnected to a processPj through a unidirectional outgoingchannel, if the underlying model requires events atPi togenerate events atPj . Hence, the communication betweenprocesses in a parallel simulation may be asymmetric.

Causality enforcement may be performed in two ways:by conservativism or by optimism. Next, we present twomethodologies, one for each approach, which are then com-bined in the LTW scheme.

2.2. Conservative methodology

Conservative approaches enforce event causality by re-quiring that each process elaborates an event only if causalityis guaranteed to be satisfied. The stream of event messagesproduced by a process must be non-decreasing in timestamporder, and event messages must be received in the order theywere sent (FIFO requirement). In this way, any processPhexactly knows what is called itshorizon of causal safeness:by defining aninput channel timeI thj for each processPjwhich may send messages toPh as the highest timestamp ofmessages received fromPj , Ph may safely elaborate eventswith timestamp less than the minimum of allI thj since noevent message will ever reachPh with a smaller timestamp.


Note that this is true only if the FIFO requirement issatisfied.

One problem with conservative approaches is the possibil-ity of starvation due to cycles of processes, each waiting forits predecessor to send a new message which advances thecorresponding input channel time. This starvation problemmust be solved by means of eitheravoidance mechanismsor detection and recovery mechanisms. The null-messagesapproach due to Chandy and Misra[14] is in the formercategory. In this scheme, each process sends a null-messagealong all of its outgoing channels whenever it advances itsown local time due to elaboration of new messages. Thenull-messages have a timestamp which indicates, accordingto the horizon of causal safeness, the lowest timestamp ofmessages the process will ever send, including the processlookahead. Optimizing strategies have been proposed in or-der to reduce the number of null-messages to be used.

2.3. Time warp methodology

Time warp is based on an optimistic approach and en-forces the causal order among events as follows: events aregreedily simulated in timestamp order and rollbacks are usedto restore a correct state if a violation in the timestamp or-der is detected. In order to perform a rollback on differ-ent processes, the concept of anti-messages is introduced.Each message has a sign; positive messages indicate ordi-nary event elaborations, whereas negative messages indicatethe need to retract any corresponding elaboration which waspossibly performed. Two messages which differ in sign arecalledanti-messages.

Each process manages aninput queue, where receivedmessages are stored and elaborated in timestamp order, with-out being discarded. The timestamp of the latest elaboratedevent is referred to aslocal virtual time (LVT), LVT. If a neg-ative message is received, the message and the correspond-ing anti-message are bothannihilated, that is they disappearfrom the input queue. Negative copies of positive messageswhich have been sent by the process are kept in theoutputqueue. If any message is received which is a straggler withrespect to theLVT, then a rollback is initiated.

Rollbacks are made possible by means of statecheck-pointing. The whole state of the process is checkpointed intothe state queueat a pace given by the state checkpointingdiscipline. A rollback is performed in three phases

• Restoration: the latest state (with respect to simulationtime) valid before the straggler’s timestamp replaces thecurrent state, and successive states are discarded from thestate queue.

• Cancellation: the negative copies of messages whichwere produced at simulation times greater than the strag-gler’s timestamp are sent, to possibly activate rollbacksat the destinations.

• Coasting-forward: the effective state which is valid atthe straggler’s timestamp is computed starting from the

restored state, and elaborating those messages with atimestamp up to the straggler’s; during this phase no mes-sage is produced.

Note that the negative messages that are sent out during acancellation have the same timestamp of the correspondingpositive messages. Each of such messages will cause a roll-back at its destinations if theLVT at the destination was al-ready advanced further than the message’s timestamp. Sinceprocesses greedily elaborate events, the longer the time anegative message takes to be delivered, the higher the prob-ability that the corresponding positive message has alreadybeen elaborated, hence the higher the probability that a roll-backcascades. If delays are large enough, a rollback maycascade throughout the entire simulation, thus involving botha large message overhead due to negative messages, and alarge computational cost due to rollbacks.

For termination purposes, and to minimize the amountof storage which is required for performing a rollback, aglobal measure of the progress of the simulation is needed.The Global Virtual Time(GVT) of a time warp simulatoris defined as the minimum of all the local virtual timesof the processes of the cluster, and of all the timestampsof messages in transit within the cluster. It indicates theminimum simulation time at which a causal violation mayoccur, thus it can be used tocommitthe safe portion of thesimulation. GVT is usually computed through a mechanismwhich runs independently of the simulation. Many differentways of computing GVT are possible[23].

2.4. Local time warp

The LTW approach consists essentially of partitioning themodel into clustersCi , each of which is simulated througha time warp mechanism. Inter-cluster coordination is per-formed according to a conservative methodology [30]. Ba-sically, communication out of a cluster is controlled by anI/O processwhich decides which outgoing messages aresafe to leave the cluster so that no causal violation may in-fluence other clusters and no rollback is able to spread overdifferent clusters. Incoming messages are instead deliveredto the proper processes when they are received. For a genericclusterCi , the decision is made according to the minimumof all the input channel timesI t ij , according to the GVT,GVTi of the cluster itself, and according to the minimumtimestamp increment in the clusterLi . The latter substitutesthe lookahead we defined for processes in the conservativescheme, and is defined as the minimum timestamp incrementthat any event message in the cluster may experience, withrespect to the timestamp of the scheduling event.GVTi inthe LTW scheme indicates the minimum simulation time atwhich a causal violation may occur within the cluster, pro-vided that no messages will ever be received from other clus-ters with smaller timestamps. Output messages with times-tamps is which are smaller than the minimum ofGVTi andthe minimum of all the input channel times, plus the mini-


1

1 1

1

11 1 1

1

2

2

2

2

2 2

2 2

2

2

2 2

Fig. 2. A model to simulate.

mum cluster lookahead, are safe to leave the cluster towardother clusters. Let this value be theoutput horizon of causalsafeness, OHi

OHi = min{min{I t ij },GVTi} + Li. (1)

Then, messages allowed to leave the cluster are those forwhich

ts�OHi . (2)

According to the null-message mechanism, whenever a mes-sage leaves the cluster orOHi advances (as in Expression1), null-messages are sent to all the adjacent clusters.

In the following, we present an example of computationsperformed by a cluster of a LTW simulator. The simulationmodel we consider is a communication network composedof nine nodes and connected as shown in Fig.2. The numberinside each node represents the delay of a message traversingthat node. The number close to a link represents the delayof a message traveling along that link. We suppose that eachnode is simulated by a logical process, and processes aregrouped into three LTW clusters, represented by the dashedcircles in Fig. 2. New packets may be produced at eachnode of the network according to some stochastic rule whichdepends on the sequence of messages received so far. Forexample, this could represent the activity of a user who sendsand receives e-mail messages, and the inter-production timebetween two messages depends on how many messages thatuser received, since he needs to read them. In Fig. 3, we showthe internal structure of a cluster. We point out the existenceof the I/O process which is responsible for computing theGVT, for keeping the input channel timesI t , and for storingoutgoing messages into theOut queue for future delivery.Since each cluster can receive and send messages from two

from

to

other

clusters

from

to

other

clusters

I/O Process

GVT , It , Out

lvt

lvt

lvt1

2

3

Fig. 3. The components of a cluster.

other clusters, we depicted two bidirectional connections ofthe I/O process with other clusters in Fig.3.

In the remainder of this example, we show some steps inthe activities of processes in the cluster at the top of Fig. 2,without showing the cluster’s I/O process. In the pictures,event messages representing packet delivery events are blacksquares, null messages are white squares, and events whichimplement the stochastic process of message production atevery node are squares with a cross inside. Null messagescan only be sent to other clusters, and event messages toother clusters are retained in the output queue until the out-put horizon of the cluster makes them safe to be sent. Everytime a production event is elaborated, a new packet deliverymessage may be sent and a new production event is sched-uled at the same process, at a future time. Next to everymessage we show its timestamp; negative messages have anegative sign following their timestamp. Next to every pro-cess we show the set of messages to elaborate, and insidethe process we show its ownLVT. Next to every picturewe show the cluster’s GVT andI t . We also show an ad-vancement in the cluster’s output horizon asts(null), and thenew composition of the output queue asOut. We only showtimestamps of messages in the output queue, and we dis-tinctly show messages added to the output queue as a sepa-rate set. For instance, in Fig. 5,Out = { } ∪ {3} means thatthe output queue was empty at the beginning of the depictedstep, and that a message with timestamp 3 is added to thequeue at that step. We do not show the reduction in the setof messages in the event queue due to messages being sentto other clusters at the beginning of a step. The timestampof sent messages can be derived by subtraction between thecomposition ofOut at the end of a step and the one at thebeginning of the next step. WithI t = {t1, t2} we indicate


3 3

0

0

00 0

0

It = { 0 , 0 }GVT = 0

L = 1 + 2 = 3

Out = { }=> ts(null) = 3

Fig. 4. Step 1—starting the simulation.

0

0

0

GVT = 03

3

2

7

3

4

It = { 0 , 0 }Out = { } U { 3 }

3 3

Fig. 5. Step 2—first event elaborations.

that the two input channels have a clock equal tot1 and t2,respectively. We assume for simplicity that theGVT is al-ways computed, and that any event message takes one stepto reach its destination.

Each process of the cluster starts with a packet produc-tion event at simulation time zero (Fig.4). Thus, GVT= 0,and all input channels have a zero-time clock. Due to theassumptions on service and transmission times, the clusterlookahead isL = 1 + 2 = 3 (the service time at a nodeplus the transmission time). Null messages with timestamp3 are sent to other clusters. At the same step the other clus-ters do the same, hence in Step 2 (Fig. 5) null messages arereceived by the cluster, thus increasing both input channelclocks at Step 3. At Step 2 the processes of the cluster starttheir elaborations: each elaborates its own production event,thus scheduling a new production event, and sending outa packet delivery message with a timestamp given by theprocess’sLVT plus the lookahead. The result is the time atwhich a packet is received at the destination, in the model.

At Step 3 (Fig. 6) all the packet delivery messages insidethe cluster reach their destinations, but in the meantime everyprocess has started elaborating its own new production event.In particular, note that this causes a causal violation in therightmost process, which is detected only at Step 4 (Fig.7). The elaborations at Step 3 cause the production of newpacket delivery messages, and a new production event ateach process. Due to the increase in GVT, the message withtimestamp 3 in the output queue which was produced at Step2 may leave the cluster, together with null messages with

3 5

7

2

9

383

It = { 3 , 3 }

=> ts(null) = 5

GVT = 2

5

10

5

4

7

Out = { } U { 7 } = { 7 }5

Fig. 6. Step 3—first message leaving the cluster.

3

It = { 3 , 3 }

Rb(3)

54

3

6

10

913

12 GVT = 3

5=> ts(null) = 6

Out = { 7 } U { 6 } U { 7- }= { 6 }

6 6

Fig. 7. Step 4—performing the first rollback.

It = { 3 , 3 }GVT = 3

3510 45

128

15

1320

6

16

4 7

Out = { }

Fig. 8. Step 5—second message leaving the cluster.

timestamp 5 sent to the other clusters. The message withtimestamp 7 produced by the rightmost process cannot leavethe cluster yet, and is retained inOut.

At Step 4, because of the rollback, the message with times-tamp 7 is annihilated in theOut; the rightmost process alsoretracts a portion of its computations, which include the pro-duction event at simulation time 4 it elaborated, thus cancel-ing the production event at simulation time 8 which causallyfollowed. The leftmost process leads to an increase in GVT,hence the cluster is able to send new null messages. Afterthe null message production, a message with timestamp 6 isinserted into the set,Out, of the cluster, and it will be sentout at the next step.

At Step 5 (Fig.8) the rightmost process has completedits rollback and is able to continue elaborations: the packetdelivery event it elaborates at simulation time 3 does notschedule other messages, which means that the rightmostprocess is its final destination. Note that at Step 6 (Fig. 9)


Out = { }

12

Rb(4)45

It = { 4 , 7 }

Rb(8)89

16

12-16-

8-

4

GVT = 4

511 => ts(null) = 7

7 7

Fig. 9. Step 6—two more rollbacks.

It = { 4 , 7 }

9

GVT = 4

54

88-

5 11

8

Out = { } U { 8 }

Fig. 10. Step 7—after the rollbacks.

the elaboration of the production event at time 4 is not thesame as the one before the rollback, since no packet deliv-ery message is sent. At Step 5 new messages from otherclusters are received, which will be elaborated only at Step6: they increase both input channel clocks, and the contem-porary increase in GVT allows the cluster to send out newnull messages with timestamp 7. At Step 6 two rollbacksare initiated at the top and the leftmost processes, as a con-sequence of the receipt of the message at simulation time 4from another cluster for the leftmost process, and of the re-ceipt of the message at simulation time 8 from the leftmostprocess for the top one. Both rollbacks cause the productionof negative messages and the retraction of a portion of theelaborations.

At Step 7 (Fig.10) previous rollbacks are completed, butthe receipt of the negative message at time 8 by the topprocess will cause another rollback at Step 8 (Fig. 11). Bothat Steps 7 and 8, a message with timestamp 8 is producedand need to be sent to other clusters; they are retained intothe set,Out, because it is not safe to send them, and only anincrease in the input channel clock associated with the othercluster which lies on the left of this will allow both messagesto be sent. Note here another change due to the productiondiscipline: the elaboration of the production event at time 5at the leftmost process (Step 8, Fig. 11) does not send out amessage to the top process as in Step 5 (Fig. 8), but to thecluster on the left:

It = { 4 , 7 }

8

Rb(8)9

515

8

11 16

GVT = 5

Out = { 8 } U { 8 }= { 8 , 8 }

14

Fig. 11. Step 8—waiting for external messages.

Note that no rollback is ever able to spread out of a clus-ter, and that logical processes of a cluster are able to con-tinue their elaborations with minimal concern of messagesfrom other clusters. On the other hand, as we will see inthe next section, delayed messages may result in resourcemanagement problems.

3. LTW simulation across the internet

With the emergence of collaborative environments thatspan across the Internet, applications will tend to be dis-tributed over multiple computing domains. The paradigm ofmeta-computing is based on the fact that computational re-sources are ubiquitous, and the topology of the environmentis transparent to any application. A simulation environmentfor the analysis of very large high-speed networks clearlyrepresents a candidate application for a meta-computing en-vironment. A simulation environment based on the LTWmethodology appears to be most suitable for this task. Nev-ertheless, in order to distribute an LTW simulation acrossthe Internet several potential problems must be considered.

First, in order to assure the correctness of simulation re-sults, the simulation environment must be implemented ontop of a reliable communication infrastructure. In general,a simulation is sensitive to loss, and even out-of-order de-livery of messages. In fact, out-of-order delivery may re-sult in a causal violation in a conservative scheme (see theFIFO requirement, Section2.2). While the Internet as suchdoes not provide any guarantee for reliable delivery of mes-sages, the underlying protocol must be capable of dealingwith dynamic network behavior. In general, reliable mes-sage exchange must be implemented at the communicatinghosts at the transport layer. Most messages in the simulationenvironment carry simple simulation events which are to bescheduled at a remote process. Even though these messagesare likely to be small and asynchronous, and would lendthemselves to be transmitted as UDP/IP packets, the need forreliable delivery suggests the use of a connection-orientedcommunication protocol, e.g., TCP/IP [15,33].

Second, a more serious problem is that it is generally im-possible to bound the message delay in the Internet, unlessQoS is implemented everywhere. Messages may traverse


Network

GVT1 GVT1-2

T′1-2 ≤ T2-1 + L1

T2-1 ≤ T1-2 + L2

Cluster 1 (C1) Cluster 2 (C2)

delayed

commit until: T2-1commit until: T1-2

T1-2

Fig. 12. Intra-cluster effect of message delays.

network areas which temporarily suffer from very high uti-lization. These messages may experience an above averagedelay. Due to theuncertaindynamics of the network, mes-sages exchanged between processes at different segmentswill experience varying delays (jitter) over the time of theexperiment. In addition to message delays due to dynamicsof network utilization, messages may be unable to reach theirdestinations due to network faults, such as node or link fail-ures. Fault situations in large communication environmentsmay persist for long periods of time, during which localcomputations are still active. Due to continued local activity(a simulation may last some days, hence local activities aswell), it will be shown that resources are quickly exhausted.A distributed simulation environment must be able to ap-propriately adapt to various delay scenarios without externalintervention. In the following sections, we first point out aseries of problems which might arise due to message delaysby means of two examples, then we present our mechanismfor solving such problems.

3.1. The effect of delays

With respect to the LTW simulation, message delays resultin the following problems which increase in prominence asthe delay increases:

• An increase in the probability of rollback, and in thecomplexity of actions necessary tore-synchronizethesimulation.

• A growth in the amount of memory required for storingall the data necessary for performing rollbacks.

• A growth in the amount of memory required for storingoutput messages to other clusters.

These problems, as the following two examples will show,are due to the continuation of local cluster elaborations inpresence of delay. In Figs.12 and 13 the notationTi−jdepicts the timestamp of a message from clusterCi to Cj .

Example 1. Consider the case where the simulator is com-posed of two clustersC1 andC2. Let us assume that due

to network conditions messages fromC1 to C2 are delayedsignificantly. This scenario is depicted in Fig.12. Accord-ing to Eq. (2), clusterC2 sends toC1 only messages withevent timets� min{I t21 ,GVT2} + L2 = OH2. While I t21is clamped to the time of the last message that has reachedC2 from C1, in this caseT1−2, the global virtual timeGVT2

advances due to events that are local toC2. LetEOH2 rep-resent the set of events that have been executed in clusterC2 since the last timeOH2 was advanced. Further, letD1,2denote the delay by which a message fromC1 to C2 is af-fected. Clearly,EOH2 is, among other things, a function ofD1,2, in particular|EOH2| can be considered proportionalto D1,2. All the events inEOH2 are susceptible to rollbackwhen a new message is finally received byC2, and in thecase that an event is rolled back all the events which causallydepend on that will be rolled back as well. By these con-siderations, the probability of a rollback, when the delayedmessage is finally received, increases asD1,2 increases. Thecorresponding rollback complexity increases as well due tothe increase inEOH2. In addition,C2 must store the simu-lation history generated by the elaboration of the events inEOH2 in accordance to the local optimistic scheme, and allthe messages destined toC1 not meeting Eq. (2). Letpoutrepresent the probability that an event elaboration schedulesan event to another cluster, and letOutOH2 be the set ofmessages toC1 produced since the last timeOH2 was ad-vanced. Then,|OutOH2| ∼ pout × |EOH2|.

Example 2. In this example, we consider a scenario inwhich a clusterC3 may receive messages from clustersC1andC2. As in Example1, we assume that messages fromC1 are delayed significantly due to network conditions orfailure. This scenario is shown in Fig. 13. Again, as long asa message fromC1 to C3 is delayed, min{min{I t3i },GVT3},is clamped toI t31, in this caseT1−3, andC3 cannot transmitmessages withts > I t31 + L3 = OH3 to other clusters.However, even in the case that the problems shown inExample 1 were solved,C3 is required to store the set ofmessagesInOH3 received fromC2 since last timeOH3 was


GVT1

GVT2

GVT3

T′1-3

Cluster 1 (C1)

Cluster 2 (C2)

Cluster 3 (C3)

T1-3

Network

delayedT2-3

commit until: min { T1-3, T2-3}

Fig. 13. Intra-cluster effect of message delays.

advanced, for future elaboration or for rollback executionpurposes. IfD1−3 indicates the delay by which a messagefrom C1 to C3 is affected,|InOH3| in general is propor-tional toD1−3. This affects the amount of memory neededfor storing data in the event queues of clusterC3. In general,it does not influence the other metrics being dependent onthe amount of elaborations performed inD1−3 time units.

As indicated above, the outlined problems arise since theelaborations local to clusters are carried on in an optimisticway even in presence of message delays. The problemsmight be completely avoided if no degree of optimism is ex-ploited, but as already pointed out this might result in an un-derutilization of the computing resources. We are proposingto limit the degree of optimism and thus the degree of free-dom for the individual processes involved in the simulationas a function of the performance of the underlying network.As the resource demand grows, the activity in each clusteris locally slowed down. In turn, other clusters will eventu-ally be slowed down because of a slower responsiveness ofdestination clusters, so as to globally tune the computationwithout any explicit control. However, it is not possible tocome to a complete stop of activities because it may lead toa deadlock, as pointed out in the following description.

4. Resource control mechanism

In Fig. 14 we present a queuing model of a cluster. Wedo not represent all LPs in a cluster, but we model a clusteras a unique LP with an aggregate elaboration rate,�. Thetimestamp of the latest elaborated event corresponds to the

out

pout1-

input queue output queue

event queue

µ

from the netp

to the net

Fig. 14. Queueing model of cluster activities.

cluster’s GVT. The time warp input queue of LPs is log-ically divided into anevent queuestoring only internallyscheduled events, and aninput queuestoring only eventsscheduled by other clusters. The output queue stores eventsscheduled for other clusters, which must not be sent becausenot safe yet. Events are elaborated one at a time at a rate� in timestamp order. Events in the event queue can be oftwo types:generatorevents, andnormal events. Generatorevents represent local activities causally independent of therest of the simulation. At the start, each cluster has only gen-erator events in its event queue and no event in its input oroutput queues. Whenever a generator event is elaborated, anew generator event is scheduled locally and a new normalevent is produced. Whenever a normal event is elaborated,either a new normal event is produced with probabilitypprodor none with probability 1−pprod. New normal events can bescheduled for other clusters with probabilitypout, or locallywith probability 1−pout. Outbound events can be routed toany of a set of destinations, each with some routing proba-bility. The set of destinations of each cluster depend on themodel, and routing probabilities for the same cluster sum upto one.


Whenever an event is inserted into the input queue, ifthere are elaborated events in the event queue with largertimestamps a rollback is triggered. A rollback can involveeach event in the event queue with timestamp larger thanthe timestamp of the received event. Starting from the firstevent in timestamp order, any of such events is actually (di-rectly) involved in a rollback with probabilityprb. An eventdirectly involved in a rollback corresponds to an event thatcaused a causality violation, hence it is not canceled to thecontrary of its effects on the elaboration. All its causallydependent events are canceled from both the event and theoutput queues, since their existence is a direct consequenceof a causal violation. Old events in the input queue and inthe event queue are canceled according to the advancementof the cluster horizon. The advancement of the cluster hori-zon also controls the departure of messages from the outputqueue.

If the advancement of the cluster horizon is sloweddown because of message delays, the rate at which theoutput queue is emptied cannot keep up with the rate atwhich it is filled. Therefore, the output queue grows tooquickly. This problem can be solved by reducing the clus-ter elaboration rate�. By doing so, the input queue startsgrowing larger since the event arrival from other clustersis not generally affected by the smaller�. The probleminvolving the input queue can be solved by decreasing thearrival rate�. Rates� and� can be tuned by means of twomechanisms which together compose our proposed scheme.The mechanism for tuning� is local to a cluster, thuscalled intra-cluster elaboration control. The mechanismfor tuning� concerns communication over the network be-tween two clusters, thus calledinter-cluster message flowcontrol.

4.1. Intra-cluster elaboration control

A cluster elaboration is slowed down by making its LPselaboratenull events. They are events whose elaborationdoes not change the state of the model, and only delaysthe elaboration of event messages. In principle, a clusterelaborates events from either a queue of null events, or fromthe normal event queue, with probabilitiespnull and 1−pnull, respectively. Elaborations of null events take the sameamount of time.pnull is computed according to the length ofthe output queue,qout: it is zero as long asqout is smaller thana yellow mark, yout, and it is monotonically non-decreasingas the used space increases. It may asymptotically approach1, however, never reaching 1 to avoid possible starvation.In fact, without making assumptions on the simulation itis not possible to enforce a fixed upper bound to the usedmemory in the input and output queues, as we will see later(Section4.3).

A possible distribution for the null event probability isthe following: a red mark, rout, a maximum null eventprobability, �null, and an exponentK > 0 can be defined

so that

pnull = �null ×(qout − yout

rout − yout

)Kif yout�qout�rout. (3)

The definition of the null event probability distribution iscompleted by requiring thatpnull is zero forqout < yout, andthat it is�null for qout > rout.

Expression (3) has the following meaning: if the outputqueue has length greater than the yellow mark but smallerthan the red mark, then the null event probability is mono-tonic strictly increasing from zero to�null. The type of curveis determined byK. If K = 1 the curve is linear, hencethe control on the elaboration rate is proportional to thegrowth of the output queue. For values ofK smaller thanonepnull grows larger for small queue lengths, hence caus-ing a stronger control earlier. On the contrary for values ofK larger than one, the control is loose for small output queuelengths and becomes tight closer to the red mark.

4.2. Inter-cluster message flow control

The message flow between any two clusters, say fromA

toB, is controlled by means of awindowing scheme. Differ-ently from sliding window schemes used in communicationprotocols, this scheme is used to enforce flow control at theapplication level.

At any time,A has a maximum window sizew of mes-sages that it can send toB. Messages are not acknowledgedupon receipt, but each message is acknowledged byB uponelaboration. The windowing scheme, and the acknowledg-ment of messages in particular, is used byA to control theflow of messages toB according toB ’s elaboration capac-ity. The elaboration capacity is assumed to be reflected bythe rate at which acknowledgement messages were receivedduring the latest message window.

A parameter�, 0 < � < 1, is used byA to determinewhether the window can slide or not, and to adjust the win-dow size. The window size can either shrink by a factor�,or expand by a factor1� . Parameter� controls the maximumdrift rate between the number of messages sent byA and thenumber of acknowledgements sent byB, per message win-dow. It also controls the rate of window size decrease in thecase that the drift rate becomes larger, and the window sizeincrease in the case thatB shows a larger elaboration capac-ity. Although two independent parameters could be used forcontrolling the drift rate and the window size, for simplicitywe use only one parameter.

WhenA reaches its maximum window size, it checks thenumber of acknowledgements fromB during the lates win-dow,�. If � < �w, thenA refrains from sending messages toB until ��w, then it setsw = max{�w,1}. If �w��w,thenA does not changew and keeps sending messages toB. If � > w, thenA setsw = max{ 1

�w,w + 1}, and keepssending messages toB. An initial window size is arbitrarilychosen.


C

C

C

1

2

3 C4

Fig. 15. Example of starvation–logical cluster topology.

4.3. Starvation

When the proposed mechanisms are used together it ispossible to reach a state of starvation of the computationif upper bounds to the space allocated to the event and theoutput queues are fixed. Note that the presence of upperbounds requires the null event probability to reach 1, to avoidoverflow. In order to show the possibility of a starvation,in this subsection we assume that window sizes correspondto space that is reserved in the event queue to incomingmessages. The starvation remains even when the messagedelay disappears.

A simple example of starvation is the following: considera simulator composed of 4 clusters,C1C2C3 andC4, con-nected as depicted in Fig.15. An arrow from clusterCi toclusterCj indicates that, according to the simulation model,clusterCi can send messages, i.e. schedule events, to clus-terCj . Therefore,C1 andC2 can send messages toC3, andC3 can send messages toC4.

Here, we briefly summarize the notation we are using inthe example:

• Outi indicates the queue of messages stored by clusterCi to be sent to other clusters.

• Evi indicates the queue of messages received by clusterCi from other clusters, and to be elaborated.

• Itij is the input channel time ofCi with respect to com-munication fromCj .

• OHi is the output horizon of causal safeness forCi .• Li is the lookahead for clusterCi .• First (Outi) is the first message to be sent, and temporarily

stored inOuti .• pinull is the probability of null event elaboration at clusterCi .

• wi,j is the window size assigned toCi to communicateto Cj .

• ts(m) is the simulation time associated to a messagem.

Suppose for simplicity that the simulation has just started.In particularw1,3 = 1, i.e.C1 can send only one messageto C3 before needing an acknowledge, and only one spacein Ev3 is reserved toC1. I t31 is zero since no message hasbeen sent fromC1 to C3 yet. Suppose that the first message

m from C1 to C3 is delayed (ts(m) > I t31). Because of thedelay,OH3 is clamped toI t31 +L3, andOut3 eventually fillsup because of internal activity. The first mechanism is suchthat p3

null = 1 whenOut3 is full, henceEv3 also fills upbecause of messages fromC2, which does not stop since itselaborations are independent of the rest of the simulation.The stream of messages fromC2 is such thatI t32 � I t31whenEv3 fills up.

To summarize, at this pointC3 experiences the followingsituation:

• Out3 is full.• p3

null = 1• Ev3 has only one free slot, the one reserved to messagem from C1.

• The acknowledge toC2 is delayed since no space is avail-able.

Suppose now thatm is finally delivered thus completelyfilling up Ev3. Also suppose thatm does not cause anyrollback. Even though unlikely, the absence of a rollbackupon receipt cannot be excluded since we do not impose anyrestriction on the simulation model. The new input channeltime is I t31 = ts(m) > I t31. The new output horizon isOH3 = I t31 + L3, and it is possible that messagem3 =first(Out3) is such thatts(m3) > OH3. That means that thenew advancement of the output horizon does not allowC3to actually send a previously stored message. Therefore thereported situation forC3 is changed only in the fact that nowEv3 is completely full, and that also the acknowledge toC1must be delayed until some space is available again inEv3.It is also possible that messagem1 = first(Out1) is such thatts(m1)+L3 < ts(m3), that is not even a null message fromC1 notifying the only possible further advance in the inputchannel time can change the situation. No other null messagecan be sent beforem1 is sent, since the conservative protocolrequires that each cluster produces a stream of messages(event messages or null messages) with monotonically non-decreasing simulation times.C1, C2 andC3 are blocked and no action will occur to

unblock them, i.e., the three clusters are starving. Note thatthe conditions which lead to a starvation in this example arestrongly dependent on the simulation model: the presenceof a very active source of messages (C2) which uses moreresources than it actually needs; the absence of a rollbackwhen the delayed message is received; the uselessness ofthe input channel time advancement allowed by bothm andits successor in the message stream.

5. Emulation experiments

We implemented an emulator of a LTW simulator for theexperimental analysis of the proposed mechanism. Eachcluster of a LTW simulator is emulated following the modelpresented in Section4. Emulated clusters communicatethrough TCP/IP connections, and delays can be injected by


1

C2

C3

C4

C

Fig. 16. Logical cluster topology of the emulated execution.

Table 1Parameters of the emulated execution

Parameters C1 C2 C3 C4

pprod 0.8 0.8 0.8 0.8pout 0.75 0.75 0.5 0.5prb 0.1 0.1 0.1 0.1� 0.5 0.5 0.5 0.5Initial w 32 32 32 32

delaying send operations on single connections. The emu-lation was executed on a cluster of 300 MHz, Pentium-IIcomputers connected through Fast-Ethernet, running Linux.Each computer hosts one cluster emulator. Note that all dataobtained in this paper, have been obtained with an intervalof confidence of 95%.

We considered an emulated execution of 4 clusters con-nected as shown in Fig.16. Arrows indicate the possibilityfor a source cluster to schedule events to a destination clus-ter. Thus, clustersC1 andC2 can send event messages toall clusters, whereas clustersC3 andC4 can only send eventmessages to each other. We used a null event probability dis-tribution like the one defined in Expression (3), Section 4.1.Values assumed by cluster parameters as defined in Section4 are listed in Table 1. Routing probabilities of events fromC1 or C2 to other clusters are all13. Such a model has beenused for two sets of experiments. The first set of experi-ments was meant to study the efficacy of the resource con-trol mechanism as opposed to absence of any control. Thesecond set of experiments was meant to study the behaviorof the resource control mechanism with different null eventprobability distributions (different values of the parameterK, see Expression (3)).

5.1. Experiment set 1: Efficacy assessment

In the first set of experiments, delays on the connectionfrom C3 to C4 are emulated starting 60 s after the elabora-tion’s start for a duration of 180 s. The delay per messageis about 10 000 times the average event elaboration time. Inorder for the resource control mechanism to start affectingthe emulation soon after delays start, we fixed the following

parameters of the resource control mechanism

yout = 50, rout = 100, �null = 0.9 K = 4.

According to the value ofK, the control is loose for smalloutput queue length and becomes tight close to the red mark.However, as we will see this is of little importance since thered mark is reached soon, therefore the main parameter be-comes�null = 0.9. Figs.17–20 show input queue size, eventqueue size, output queue size, and rollback amplitude duringthe execution of clusterC3, for an emulated execution withand without control. Each figure contains two pictures, theone on the left relative to experiments performed in absenceof resource control and the one on the right relative to exper-iments with resource control. Note that pictures within thesame figure have the same vertical scale, whereas differentfigures have different vertical scales. Such a representationhas been necessary due to different message occupations fordifferent queues.

As can be seen, the mechanism is capable of re-ducing resource consumption at Cluster 3 significantly.Such a reduction is evident in all figures, as we are go-ing to explain. ClustersC1 and C2 clearly reduce theirmessage rates toC3 during the delay phase (Fig. 17).Slowing down internal activities has the effect of reduc-ing both the event queue and the output queue occupa-tion (Figs. 18 and 19). Rollbacks are drastically reduced(Fig. 20).

In Fig. 17 the input queue size, i.e., the memory occu-pation due to events scheduled by eternal clusters, showsapproximately a linear increase as long as delays persist ifno resource control is employed (picture on the left). Onthe other hand, the resource control mechanism forces a re-duction in the input queue size growth after a shorter lin-ear growth (picture on the right). The reduction in the inputqueue size growth occurs while delays are still affecting thecomputation, starting at approximately execution time 130 s.The second phase in the input queue size growth is due tothe window size at Clusters 1 and 2 for messages to Clusters3 and 4 in the inter-cluster control shrinking from a maxi-mum of 128 back to 1. Another feature worth of notice is thebehavior after the termination of message delays. Whereasthe input queue decreases if the control is in place, it keepsgrowing without resource control mechanism even thoughat a lower pace than during delayed communication. In thelatter case a decrease can be noticed starting at around exe-cution time 400, i.e., when Clusters 1 and 2 terminate theirexecution. Though not shown, such a decrease goes down toaround zero before also Clusters 3 and 4 terminate. The per-sistence of the input queue growth after delays terminate maybe due to an unaccounted factor: elaboration delays due totoo large data structures. As the input queue grows large, theaverage time to access it increases thus indirectly reducingthe elaboration rate. The resulting elaboration rate is not suf-ficient to keep up with the rate of incoming messages, and theinput queue keeps growing. In the case of a controlled elab-oration, after delays terminate the input queue size slowly


0

200

400

600

800

1000

1200

1400

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

0

200

400

600

800

1000

1200

1400

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

Fig. 17. Input queue size, unrestrained (left) and with resource control (right).

0

1000

2000

3000

4000

5000

6000

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

0

1000

2000

3000

4000

5000

6000

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

Fig. 18. Event queue size, unrestrained (left) and with resource control (right).

0

500

1000

1500

2000

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

0

500

1000

1500

2000

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

Fig. 19. Output queue size, unrestrained (left) and with resource control (right).

decreases towards zero. We can identify two phases alsohere, denoted by two decrease velocities. The passage fromthe first phase to the second one occurs at approximatelyexecution time 300. As can be noticed, the passage from

one phase to the next one occurs at around the same time atwhich the event queue resumes normal stability conditions(see Fig.18, on the right). Whereas the window size of Clus-ters 1 and 2 for messages to Clusters 3 and 4 is kept small as


0

200

400

600

800

1000

1200

1400

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

0

200

400

600

800

1000

1200

1400

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

Fig. 20. Rollback weight, unrestrained (left) and with resource control (right).

long as the rate of elaboration of incoming events is small,such window sizes are brought back to their maximum whenthe elaboration rate is back to its maximum in the secondphase. A larger window size allows larger message rates,hence motivating the slower decrease in the event queuesize.

In Fig. 18 the growth of the event queue size can still beapproximated to a linear curve in absence of control mech-anisms (picture on the left). Also in the presence of the re-source control mechanism the growth may be considered lin-ear (picture on the right), but the steepness is much smaller.The event queue reaches sizes around 1200 with the re-source control mechanism, as opposed to sizes around 5000without any control. As soon as delays terminate, the eventqueue size has a sudden drop due to a few large rollbacks.The drop is larger in the case that no control mechanism isused, since rollbacks are larger (see Fig. 20). Like for the in-put queue, in absence of any control mechanism after delaysterminate the event queue reduction has two phases distin-guished by the presence of a stream of events from Clusters1 and 2 (first phase) or not (second phase). Even though ofmuch larger size than the input queue, the event queue ob-viously does not present the problem due to a reduced elab-oration rate caused by its largeness or another queue’s largesize since the elaboration itself is the only source of eventsfor the event queue. The simulation has been designed to bestable in absence of delays, hence the nearly perfectly lineardecrease during the first phase. When Clusters 1 and 2 ter-minate, elaboration in the second phase can proceed fasterbecause it is not hindered by the need to elaborate eventsfrom another source.

Fig. 19 is very similar to Fig. 18, even though on adifferent scale for the queue size. The main differencesconcern the elaboration after delays terminate. Duringdelays, the same type of nearly linear increase can be wit-nessed, with maximum sizes of around 1700 without anycontrol and of around 500 with the resource control mech-anism. Right after delays disappear there is a short phase

of large rollbacks that dramatically reduces the outputqueue size. After delays terminate, there is no observableproof of the presence of two phases during the decreaseof the output queue size towards zero. In fact, the out-put queue size is controlled by the elaboration rate andby the rate of advancement of the horizon. Being themconstant, there is no change in behavior until stability isregained.

Fig. 20 shows the amplitude of rollbacks. Each picturein Fig. 20 has been obtained by overlapping the amount ofcanceled events and the amount of undone events per roll-back. Recall from Section 4 that undone events are those thatcaused a causality violation, whereas canceled events arethose that have been (internally) scheduled as a consequenceof a causality violation. As can be seen, both the amplitudeof rollbacks and the frequency at which rollbacks occur areaffected by the presence of the resource control mechanism.Without any control, the rollback frequency is the same as ifthere was no delay, and the rollback size averagely increasesas the delay persists (see Fig. 20, picture on the left). Thecontrol mechanism affects both the frequency and the sizeof rollbacks by reducing both, as shown in Fig. 20, pictureon the right. Therefore, the use of the control mechanism in-creases what can be termed asevent elaboration significanceduring the delay, i.e., a measure of the probability that anevent elaboration will not be undone due to a successive roll-back. As soon as delays disappear, a short burst of rollbackslarger than any previous one is witnessed. Such a burst isdue to the sudden arrival of several messages in a critical andprecarious situation where large event queues correspond tolarge rollbacks. After the initial burst of rollbacks the arrivalof more messages does not have the same effect any moresince the message rate is now controlled by the elaborationrate of the source, not by the sudden disappearance of de-lays that caused the delivery of a large amount of messagesin a short time. Also notice that as the delay disappears,rollbacks resume their normal rates and amplitudes in bothcases.


0

200

400

600

800

1000

1200

1400

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

K = 0.5K = 1K = 5

Fig. 21. Input queue size, differentK.

5.2. Experiment set 2: Comparison of distributions

In the second set of experiments we compared several nullevent probability distributions. We used the null event prob-ability distribution defined by Expression (3), Section4.1.In a first subset of experiments we changed only parameterK and kept the others constant, and in a second subset ofexperiments we changed only parameter� and kept the oth-ers constant. We also performed experiments with differentvalues of�, but our results show erratic behaviors withoutany recognizable pattern. We emulated the cluster modelpreviously described (see Fig. 16 and Table 1), with delaysinjected for a period of 300 s.

5.2.1. Null event probability increaseIn the first subset of experiments we used a large red

mark while keeping the yellow mark relatively low in orderto show the influence of the speed in the null event prob-ability increase. Recall that the null event probability in-creases when the output queue size is between yellow andred mark. We used a larger yellow mark than in experimentset 1 to better show what happens when the resource con-trol mechanism actually starts. The constant parameters ofthe null event probability distribution defined by Expression(3), Section 4.1, are:

yout = 200, rout = 1500, �null = 0.9

Figs.21–23 show input queue size, event queue size andoutput queue size for experiments performed with the fol-lowing three values ofK: 0.5,1 and 5. As in Section 5.1,different figures have different scales for the message queueoccupation (vertical axis).

The first thing to notice is that, as expected, different val-ues ofK give rise to different behaviors. All figures showthat a smallK (K = 0.5, lowest curve in all pictures) corre-sponds to a stronger control that results in a smaller resourceconsumption. In fact, all three queues reach the smallestmaximum queue size during one run forK = 0.5. Moreover,the growth in queue sizes due to delays is slower for a small

0

500

1000

1500

2000

2500

3000

3500

4000

4500

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

K = 0.5K = 1K = 5

Fig. 22. Event queue size, differentK.

0

200

400

600

800

1000

1200

1400

1600

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

K = 0.5K = 1K = 5

Fig. 23. Output queue size, differentK.

K. That is evident from the fact that the maximum queuesize for a certain value ofK is reached later the smallertheK.

In Figs. 22 and 23 it is now visible the passage fromthe region where no control is enforced (small output queuesize) to the region where the control is in effect (the out-put queue size exceeds the yellow mark). Fig. 23 shows achange in the steepness of the output queue size shortly af-ter delays start, exactly when the output queue becomes aslarge as the yellow mark,yout = 200. In correspondenceto the same execution time, in Fig. 22 the event queue sizeshows a steepness change even more clear. Such changescorrespond to a smaller elaboration rate due to the null eventinjection, which has the consequence that the cluster pro-duces both outbound and inbound events at a slower rate.No analogous pattern is recognizable in Fig. 21 concerningthe input queue size. Even though that could be due to gen-erally smaller increases for the input queue size that mayshield small increase differences, it is justified by the factthat the input queue size depends mainly on incoming traf-fic from other clusters. Incoming traffic is only indirectlyaffected by a smaller elaboration rate of the cluster itself, asa feedback of a slower advancement of the cluster’s output


horizon. According to the simulation model only the por-tion of incoming traffic to Cluster 3 coming from Cluster 4can be affected by a slower horizon advancement, whereasincoming traffic from Clusters 1 and 2 can only slow downas a consequence of an adjustment in window sizes.

Like in the first set of experiments, the input queue sizestabilizes after a linear increase (see Fig.21). Different val-ues ofK correspond to different sizes at which the inputqueue stabilizes, and the level at which the input queue sta-bilizes is smaller for smallK. A small level is due to smallerelaboration rates in effect for a relatively long time. Smallrates cause smaller window sizes, and the relatively long ap-plication of smaller window sizes results both in a slowerincrease of the input queue size and in a smaller total inputqueue size when stabilization overcomes. Unexpectedly anddifferently from the first set of experiments, a stabilization isshown also in the event and output queue sizes, most evidentin the case thatK = 5 (see the highest curves in Figs. 22and 23). However, the stabilization for the input queue andthe one for the event and the output queues do not seem tobe correlated since the latter occurs approximately 50 s be-fore the former. The reason for such a stabilization may lieagain in a side-effect of the largeness of data structures. Thecombined effect of elaboration delays due to injected nullevents, and of elaboration delays due to large data structuresmay be such that the rate at which events are generated be-comes smaller than the transmission rate caused by delays.In this way, the simulation finds a second overall stability asa response to transmission delays. Note that if no resourcecontrol is applied, such a stability may be found only as aresult of the side-effect due to large data structures. How-ever, queue sizes at which the stability could be found wouldbe much larger since the additional elaboration delay due tonull events is missing. Therefore, the only additional costdue to large queue sizes should account for the total delaythat slows down the elaboration rate sufficiently. As last re-mark, the fact that the level at which the output queue sizestabilizes forK = 5 is close to the red mark,rout = 1500,is simply coincidental.

5.2.2. Maximum null event probabilityIn the second subset of experiments, we reduced the red

mark again in order to show the influence of the maximumnull event probability. Recall that the maximum probability� is mostly in effect when the output queue size is larger thanthe red mark. We also used a relatively largeK to minimizethe effects of this parameter on the rest of the execution. Infact, from Expression (3)� is in effect also between yellowand red mark to provide an upper bound to the probabilityincrease. With a largeK, null event probabilities are closefor most of the time during the increase period for differ-ent�. The constant parameters of the null event probabilitydistribution defined by Expression (3), Section 4.1, are:

yout = 200, rout = 800, K = 4.

0

200

400

600

800

1000

1200

1400

1600

1800

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

pi = 0.6pi = 0.8

pi = 0.99

Fig. 24. Input queue size, different�.

0

1000

2000

3000

4000

5000

6000

7000

8000

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

pi = 0.6pi = 0.8

pi = 0.99

Fig. 25. Event queue size, different�.

0

500

1000

1500

2000

2500

0 50 100 150 200 250 300 350 400

Eve

nts

Execution Time

pi = 0.6pi = 0.8

pi = 0.99

Fig. 26. Output queue size, different�.

Figs.24–26 show input queue size, event queue size andoutput queue size for experiments performed with the fol-lowing three values of�: 0.6, 0.8 and 0.99. Different fig-ures have different scales for the message queue occupation(vertical axis).


As in the previous set of experiments, different values of� result in different behaviors. A value of� that is very closeto 1 (0.99) cause the elaboration to find an early conditionof stability, right after the output queue size grows beyondthe red mark. The stability is sustained as long as delayspersist, as evident in all pictures. A smaller value of� (0.8)allows the output queue to grow larger than the red mark,without apparently finding stability since both the event andthe output queue size keep slowly but steadily increasing aslong as delays persist. Also the input queue increases, thoughnot as steadily as event and output queues. The increase ofresource consumption is much more evident for� = 0.6, inall the queues. For this value, no stability seems at hand.

6. Previous and related work

Several algorithms have been proposed to enhance the per-formance of both conservative and optimistic protocols. Anadaptive Conservative Time Windows (CTW) has proposedby Ayani et al.[4], where each logical process is assigned atime window (Wi) and events within the time windows areknown to be independents. They are considered to be safeand therefore they can be processed concurrently.

An LP based upon the CTW paradigm synchronously op-erates in two phases:

(i) Window identification: For everyLPi , a chronologicalset of eventsWi is identified such that for every evente ∈Wi , e is causally independent of anye′ ∈ Wj , j �= i.

(ii) Event processing: EveryLPi processes eventse ∈ Wi ,sequentially in chronological order.

A barrier synchronization mechanism is used to ensurethe execution of these two phases.

In each iteration of the CTW protocol, the width of thewindows is computed of the CTW-algorithm. Note that thewidth could be different for different logical processes. Asopposed to other window-based algorithms [22], it is shownthat this produces local bounds for the time windows. Whilethe results obtained were quite encouraging, the authors theyrevealed several limitations of the CTW-algorithm [4].

Several Schemes have been proposed to control the eventcomputation in time warp. Steinman [32] has proposed abreathing time warp (BTW) to control the resources neededby the time warp for its completion. In this approach, eventare executed in cycles. That means, all independents eventwill be executed first, then the BTW takes some aggressiverisk, by executing some events. Rolback mechanism sim-ilar to time warp paradigm is used to get the simulationback on track. Similar idea has been used by Sokol et al.[31], where they proposed a scheme they refer to as movingtime window (MTW). Later, Wilsey et al. [27], proposed anadaptive bounded time window protocol where the size ofthe width of the time widow is adjusted dynamically. Lastbut not the least, Ferscha et al. [17], has proposed anotheradaptive scheme, which they refer to as shock-resistant time

warp, which proactively adapt its control parameters usingstatistical analysis.

In our future work, we plan to compare our scheme withFerscha et al’s. scheme. Since among all the proposed algo-rithms, their scheme seems the most promising approach tocompare to.

7. Conclusions

With the emergence of high-speed communication infras-tructures and collaborative environments, we are facing theproblem of distributing a simulation across loosely coupleddomains. In this paper, we considered the problems whichmay arise when the simulation is distributed over large in-terconnection networks where dynamic behavior of the un-derlying communication infrastructure may have a profoundimpact in the performance of the system. The unpredictablenature of the underlying communication requires the sim-ulation (as the application) to take the performance of thenetwork into consideration in order to safely allocate com-putational resources and appropriately limit the complexityand the overhead of rollback. As a consequence, we haveproposed, in this paper, a mechanism by which the individ-ual processes in the simulation at different hosts may tunetheir resource demands as a function of the network perfor-mance. While we cannot reliably measure the network per-formance, elaboration rates and message flows can be tunedaccording to the effects of the network performance on theresource demand. We have presented the proposed mecha-nism, and performed experiments based on an emulator de-veloped in a controlled environment to gauge the effects ofthe mechanism on a distributed simulation.

Our experimental results show that the proposed mech-anism is able to control the resource consumption and theimpact of rollbacks on the computation during delays.

It also allows the computation to resume full speed afterthe delay disappears. We studied the effects of different pa-rameter choices on the resource demand of a cluster, thusidentifying tight and loose resource controls. Projected fu-ture work include the study of the resource control mech-anism in a controlled environment with hardware-injecteddelays, and its application to a real distributed simulationover the Internet, possibly with the support of collaborativeenvironments such as the web operating system (WOS)[21].We will also investigate the use of our mechanism for otherdistributed applications supported by collaborative environ-ments [7,8,16,28].

References

[1] H. Avril, C. Tropper, The dynamic load balancing of clustered timewarp for logic simulation, Workshop on Parallel and DistributedSimulation 1995, 1996, pp. 68–77.

[2] H. Avril, C. Tropper, Scalable clustered time warp and logicsimulation, VLSI Design 9 (3) (1999) 291–313.


[3] H. Avril, C. Tropper, CTW: clustered time warp, IEEE Trans. ParallelDistributed Systems, 2003.

[4] R. Ayani et al., Parallel simulation using conservative time windows,Proceedings of the 1992 Winter Simulation Conference, 1992, pp.709–717.

[5] A. Boukerche, Time management in parallel simulation, in: B.Rajkumar (Ed.), Cluster Computing, Prentice-Hall, Englewood Cliffs,NJ, 1998, pp. 375–394.

[6] A. Boukerche, Conservative circuit simulation on multiprocessormachines IEEE High Performance Computing (HiPC), Lecture Notesin Computer Science, vol. 1970, Springer, Berlin, December 2000,pp. 415–424.

[7] A. Boukerche, R.B. Araujo, D.D. Duarte, 3D web-based virtualsimulation environments extensibility through interactive non-linearstories, Eighth International IEEE Symposium on DistributedSimulation and Real-Time Applications, 2004.

[8] A. Boukerche, R.B. Araujo, D.D. Duarte, VEML: a mark uplanguage to describe web-based virtual environments through atomicsimulations, Eighth International IEEE Symposium on DistributedSimulation and Real-Time Applications, Budapest, 2004.

[9] A. Boukerche, S.K. Das, Load balancing strategies for parallelsimulation on multiprocessor machines, in: G. Zobrist, J. Walrand,K. Bagchi (Eds.), The State-of-the-art in Performance Modeling andSimulation, Gordon and Breach Publishers, London, 1998, pp. 135–164.

[10] A. Boukerche, S.K. Das, A. Datta, T.E. LeMaster, Implementation ofa time synchronizer in distributed database systems on a cluster ofworkstations, J. Parallel Distributed Comput. Practices 4 (1) (March2001) 25–47.

[11] A. Boukerche, A. Fabbri, Partitioning parallel simulation of wirelessnetworks, in: J.A. Joines, R. Barton, K. Kang, F. Fishwick(Eds.), Proceedings of the 2000 Winter Simulation Conference, pp.1449–1457.

[12] A. Boukerche, C. Tropper, Local vs. glocal lookahead in parallelsimulation, Parallel Comput. 27 (2001) 1033–1055.

[13] C.G. Cassandras, Discrete Event Systems—Modeling andPerformance Analysis, Richard D. Irwin, Aksen Associates, 1993.

[14] K.M. Chandy, J. Misra, Distributed simulation: a case study in designand verification of distributed programs, IEEE Trans. Software Eng.SE-5 (September 1979) 440–452.

[15] D.E. Comer, Internetworking with TCP/IP, Principles, Protocols, andArchitecture, vol. 1, Prentice-Hall, Englewood Cliffs, NJ, 1995.

[16] J.S. Dahmann, The high level architecture and beyond: technologychallenges, Proceedings of the 13th Workshop on Parallel andDistributed Simulation, May 1999, pp. 64–70.

[17] A. Ferscha, J. Johnson, Shock resistant time warp, Proceedings ofthe First Workshop on Parallel and Distributed Simulation, 1999.

[18] R.M. Fujimoto, Parallel discrete event simulation, ACM Comm. 33(10) (October 1990) 30–53.

[19] R.M. Fujimoto, Parallel Distributed Simulation Systems, Wiley, NewYork, 2000.

[20] D.R. Jefferson, Virtual Time, ACM TPLS 7 (3) (July 1985) 404–425.[21] P. Kropf, Overview of the web operating system (WOS) project,

Proceedings of High Performance Computing 1999, April 1999, pp.350–355.

[22] B.D. Lubachevsky, Efficient distributed event-driven simulations ofmultiple loop networks, Comm. ACM 32 (January 1989) 111–123.

[23] F. Mattern, Efficient algorithms for distributed snapshots and globalvirtual time approximation, J. Parallel Distributed Comput. 18 (4)(August 1993) 423–434.

[24] A.R. Mikler, S.K. Das, A. Fabbri, Distributed simulationfor large communication infrastructures across loosely coupleddomains, Proceedings of the Sixth International Conference onTelecommunication Systems, Modeling and Analysis, Nashville,Tennessee (USA), March 1998, pp. 561–569.

[25] A.R. Mikler, A. Fabbri, Distributed discrete event simulation acrossloosely coupled domains, Proceedings of ASTC 2000, Advanced

Simulation Technologies Conference, Washington DC (USA), April2000, pp. 274–279.

[26] A.R. Mikler, J.S.K. Wong, V.G. Honavar, An object-oriented approachto simulating large communication networks, J. Systems Software40 (2) (February 1998) 151–164.

[27] A.C. Palaniswamy, P.A. Wilsey, Adaptive bounded time windows inan optimistically synchronized simulator, in: The Great Lakes VLSIConference, 1993, pp. 114–118.

[28] C.D. Pham, R. Bagrodia, HLA support in a discrete event simulationlanguage, Proceedings of the Third Workshop on DistributedInteractive Simulation and Real-Time Applications, October 1999,pp. 93–100.

[29] U.W. Pooch, J.A. Wall, Discrete Event Simulation—A PracticalApproach, CRC Press, Computer Engineering Series, New York,1993.

[30] H. Rajaei, R. Ayani, L.-E. Thorelli, The local time warp approachto parallel simulation, Proceedings of PADS’93, May 1993, pp.119–126.

[31] S.M. Sokol et al., MTW: a strategy for scheduling discretesimulation events for concurrent executions, Proceedings of the SCSMulticonference on Distributed Simulation vol. 19(3), July 1988, pp.34–42.

[32] J. Steinman, SPEEDES: a multiple-synchronization environment forPDES simulation, Internat. J. Comput. Simulation 2 (3) (1992) 251–286.

[35] S.C. Tay, Y.M. Teo, S.T. Kong, Speculative parallel simulation withan adaptive throttle scheme, Proceedings of PADS’97, June 1997,pp. 116–123.

Azzedine Boukercheis a Canada Research Chair and a Full Professorof Computer Sciences at the School of Information Technology and En-gineering (SITE) at the University of Ottawa, Canada. He is also theFounding and Director of PARADISE Research Laboratory (PARAllel,Distributed and Interactive Simulation of LargE scale Systems and Wire-less & Mobile Networking). Prior to this he was a Faculty Member ofthe Computer Science Department at the University of North Texas, andDirector of the Parallel Simulations, Distributed and Mobile Systems Re-search Laboratory at UNT. He also worked as a Senior Scientist at theSimulation Sciences Division, Metron Corporation located in San Diego.He was employed as a Faculty at the School of Computer Science (McGillUniversity, Canada) from 1993 to 1995. He spent the 1991–1992 academicyear at the JPL-California Institute of Technology where he contributedto a project centered about the specification and verification of the soft-ware used to control interplanetary spacecraft operated by JPL/NASALaboratory.His current research interests include large-scale distributed interactivesimulation, parallel simulation, wireless communication and mobile net-works, distributed and mobile computing, wireless sensors and ad hocnetworks distributed systems, and performance modeling. Dr. Boukerchehas published several research papers in these areas. He was the recipientof the Ontario Premier’s Research Excellence Award, the recipient of thebest research paper award at IEE/ACM PADS’97, and the recipient of theThird National Award for Telecommunication Software in 1999 for hiswork on a distributed security systems on mobile phone operations, andhas been nominated for the best paper award at IEEE/ACM PADS’99 andat the ACM Modeling, Analysis and Simulation of mobile and wirelesssystems Conference 2001. He served the General co-Chair of the princi-ple Symposium on Modeling Analysis, and Simulation of Computer andTelecommunication Systems (MASCOTS), in 1998, General Chair of theFourth IEEE International Conference on Distributed Interactive Simu-lation and Real-Time Applications (DS-RT2000), General Chair for theThird ACM Conference on Modeling Analysis, and Simulation of Wirelessand Mobile Systems (MSWiM’2000), General Chair for the First Interna-tional Conference on QoS in Heterogenous Wired and Wireless Networks(QShine 2004), Second ACM International Workshop on Mobility Man-agement and Wireless Access (MobiWac 2004) as Program Co-Chair forthe Fifth IEEE International Conference on Mobile and Wireless Com-puting and Communication (MWCN’03), ACM/IFIP Europar 2003, IEEEWireless Local Networks (WLN’03, 04), the 35th SCS/IEEE Annual Sim-ulation Symposium ANSS2002, and the 10th IEEE/ACM Symposium on


Modeling Analysis, and Simulation of Computer and TelecommunicationSystems (MASCOTS’2002), the third International Conference on Dis-tributed Interactive Simulation and Real Time Applications (DS-RT’99),and ACM MSWiM’2000, and as a Deputy Vice Chair of Wireless andMobility Access Track for ACM WWW 2002, as a Guest Editor for VLSIDesign, the Journal of Parallel and Distributed Computing (JPDC), ACMWireless Networks (WINET), ACM Mobile Networks and Applications(MONET), and the International Journal of Wireless and Mobile Com-puting. He was the main organizer of a Special Session on PerformanceAnalysis of Mobile and Wireless Communication systems at the SeventhIEEE HiPC Conference. He has been a member of the Program Commit-tee of several international conferences such as ICC, VTC, ICPP, MAS-COTS, BioSP3, ICON, ICCI, MSWiM, PADS and WoWMoM, LWN,Networking conferences.Dr. Boukerche also serves as an Associate Editor for the International Jour-nal of Parallel and Distributed Computing (JPDC), ACM/Kluwer WirelessNetworks, Wiley International Journal of Wireless Communications andMobile Computing, and SCS TRANSCATIONS, and on IEEE Task Forceon Cluster Computing (TFCC) Executive Committee. He is a member ofthe IEEE and ACM.

Armin R. Mikler received his Diploma in Informatics from Fach-hochschule Darmstadt, Germany in 1988. After spending 1 year as a

Fulbright exchange student at Iowa State University (ISU), he joined ISUas a graduate student and received his M.S. and Ph.D. in Computer Sci-ence in 1990 and 1995, respectively. From 1995 to 1997, he worked asa Postdoctoral Research Associate in the Scalable Computing Laboratoryat Ames Laboratory, USDOE. In 1997, Dr. Mikler joined the faculty inComputer Science at the University of North Texas (UNT) where he holdsthe rank of Associate Professor in Computer Science with joint appoint-ment in the Department of Biological Sciences. His research interestsinclude: Intelligent Network Management, Distributed Coordination of In-telligent Mobile Agents, Distributed Decision Making, Multi-Agent Sim-ulation and Stochastic Cellular Automate applied to Computational Epi-demiology. Dr. Mikler has established and is the director of the NetworkResearch Laboratory (NRL) which provides the necessary computationalinfrastructure to conduct large-scale simulations. As a member of the In-stitute of Applied Science at UNT, he has been conducting collaborativeand interdisciplinary research in computational science, specifically in theareas of quantitative analysis of ecological processes and Biocomplex-ity. Dr. Mikler is an Associate Editor of the Telecommunication SystemsJournal and a member of the ACM and the IEEE Computer Society.

Alessandro Fabbri, received his Ph.D. from the University of Bologna,Italy. His research area of interests are in distributed simulation, andmobile computing.

Documents

Resource control for large-scale distributed simulation system over loosely coupled domains