Meteor: a middleware infrastructure for content��based decoupled interactions in pervasive grid environments

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2008; 20:1455–1484Published online 15 November 2007 inWiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.1278

Meteor: a middlewareinfrastructure forcontent-based decoupledinteractions in pervasive gridenvironments

Nanyan Jiang∗,†, Andres Quiroz, Cristina Schmidt and Manish Parashar

The Applied Software System Laboratory, Rutgers University, 94 Brett Road,Piscataway, NJ 08854, U.S.A.

SUMMARY

Emerging pervasive information and computational environments require a content-based middlewareinfrastructure that is scalable, self-managing, and asynchronous. In this paper, we propose associativerendezvous (AR) as a paradigm for content-based decoupled interactions for pervasive grid applications.We also present Meteor, a content-based middleware infrastructure to support AR interactions. The design,implementation, and experimental evaluation of Meteor are presented. Evaluations include experimentsusing deployments on a local area network, the wireless ORBIT testbed at Rutgers University, andthe PlanetLab wide-area testbed, as well as simulations. Evaluation results demonstrate the scalability,effectiveness, and performance of Meteor to support pervasive grid applications. Copyright © 2007 JohnWiley & Sons, Ltd.

Received 9 March 2007; Revised 14 September 2007; Accepted 17 September 2007

KEY WORDS: pervasive grid computing; content-based middleware; associative rendezvous messaging; decou-pled interaction; content-based routing; JXTA

1. INTRODUCTION

Emerging pervasive information and computational Grids are enabling a new generation of ap-plications that are based on seamless ‘anytime–anywhere’ access to and aggregation of pervasive

∗Correspondence to: Nanyan Jiang, The Applied Software System Laboratory, Rutgers University, 94 Brett Road, Piscataway,NJ 08854, U.S.A.

†E-mail: [email protected]

Contract/grant sponsor: National Science Foundation; contract/grant numbers: ACI 9984357, EIA 0103674, EIA 0120934,ANI 0335244, CNS 0305495, CNS 0426354, IIS 0430826

Copyright q 2007 John Wiley & Sons, Ltd.

1456 N. JIANG ET AL.

information, and the interactions between the information sources and distributed services and re-sources. These applications are context aware, and use pervasive information about the environmentand user’s preferences and actions to tailor services and applications to the user’s needs, and toautomate tasks in a transparent manner. Applications range from everyday activities (applicationsthat use sensors to monitor and manage an office building or a home) to emergencies and crisismanagement (response to an accident or fighting a fire).Illustrative scenarios that leverage pervasive environments integrating sensor/actuator devices

with distributed services and resources include scientific/engineering applications that symbioti-cally and opportunistically combine computations, experiments, observations, and real-time data tomanage and optimize its objectives (e.g. oil production, weather prediction), pervasive applicationsthat leverage the pervasive/ubiquitous information grid to continuously manage, adapt, and optimizeour living context (e.g. your clock estimates the drive time to your next appointment based on currenttraffic/weather and warns you appropriately), crisis management applications that use pervasiveconventional and unconventional information for crisis prevention and response, medical applica-tions that use in vivo and in vitro sensors and actuators for patient management, ad hoc distributedcontrol systems for automated highway systems, manufacturing systems or unmanned airbornevehicles, and business applications that use pervasive information access to optimize profits.A key application driving this research is the sensor-driven management of subsurface geosystems

and, specifically, the dynamic data-driven management and optimization of oil reservoirs [1,2]. Inthis application, sensor data are used dynamically and opportunistically to detect suboptimal oranomalous behavior, by optimization-based strategies for parameter estimation, and to provideinitial conditions to dynamically adaptive forward simulation models.Another potential class of application is crisis management. For example, one can conceive of a

fire management application where computational models use streaming information from sensorsembedded in the building along with real time and predicted weather information (temperature,wind speed and direction, humidity) and archived history data to predict the spread of the fire andto guide firefighters, warning of potential threats (blowback if a door is opened) and indicating mosteffective options. This information can also be used to control actuators in the building to managethe fire and reduce damage.While recent technical advances and cost dynamics in computing and communication technolo-

gies are rapidly enabling the realization of the pervasive grid computing vision, these environmentsand applications continue to present several significant challenges. In addition to the challenges ofdistribution, large-scale and system heterogeneity, pervasive computing environments are inherentlydynamic. For example, sensors are often resource/energy limited and mobile, and may dynamicallyjoin, leave, or fail. As a result, pervasive applications must adapt to the unreliability and uncertaintyof information and services. Further, interactions between devices, services, and resources in apervasive application are also dynamic and typically ad hoc and opportunistic.Supporting these applications presents significant requirements at the middleware level. Specifi-

cally, these applications require a middleware infrastructure that (1) seamlessly integrates pervasiveinformation sources (e.g. sensor devices) with networked services, resources, and applications, (2)is scalable and self-managing, (3) is based on content rather than names/addresses, (4) supportsdynamic, asynchronous, and decoupled interactions, and (5) provides some interaction guarantees.In this paper we propose associative rendezvous (AR) as a paradigm for content-based decou-

pled interactions for pervasive applications. AR extends the conventional name/identifier-based

Copyright q 2007 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2008; 20:1455–1484DOI: 10.1002/cpe

METEOR: A CONTENT-BASED MIDDLEWARE INFRASTRUCTURE 1457

rendezvous [3,4] in two ways. First, it uses flexible combinations of keywords (i.e. keywords, par-tial keywords, wildcards, ranges) from a semantic information space, instead of opaque identifiersthat have to be globally synchronized. Second, it enables the reactive behaviors at rendezvous points(RPs) to be embedded in messages or requests. AR differs from the publish/subscribe paradigm inthat individual interests (subscriptions) are not used for routing and do not have to be synchronized—they can be locally modified at a rendezvous node at any time. We also present Meteor, a content-based middleware infrastructure that supports AR interactions. The design, implementation, andevaluation of Meteor are presented. Evaluations include experiments using deployments on a localarea network (LAN), a wireless network testbed ORBIT, the wide-area PlanetLab testbed, as wellas simulations of systems with thousands of nodes.This paper builds on prior works (i.e. Squid [5]) and publications [6,7]. While these previous

papers have focused on specific aspects/components of the system, the goal of this paper is topresent the overall system as the integration of three key components: (1) a self-organizing contentoverlay, (2) a content-based routing engine and discovery service, and (3) the associative rendezvousmessaging substrate (ARMS). Furthermore, the two-level overlay for addressing locality that ispresented in this paper is new. Finally, this paper also presents a more comprehensive evaluationusing multiple testbeds.The rest of this paper is organized as follows. Section 2 presents the AR interaction paradigm and

its semantics. Section 3 presents the design and implementation of Meteor. Section 4 presents anevaluation of Meteor using experiments on a LAN, the ORBIT testbed, and the PlanetLab testbed,as well as simulations. Section 5 presents related work. Section 6 concludes this paper.

2. ASSOCIATIVE RENDEZVOUS

AR [6] is a paradigm for content-based decoupled interactions with programmable reactive behav-iors. Rendezvous-based interactions [3] provide a mechanism for decoupling senders and receivers.The RP, the node where rendezvous interactions occur, may be a broadband access point (AP), aforwarding node in a sensor network or a server in a wired network. Senders send messages to anRP without knowledge of who or where the receivers are. Similarly, receivers receive messagesfrom an RP without knowledge of who or where the senders are. Note that senders and receiversmay be decoupled in both space and time. Such decoupled asynchronous interactions are natu-rally suited for large, distributed, and highly dynamic systems such as emerging pervasive gridenvironments.In conventional rendezvous interactions, RPs are defined by opaque identifiers [4] that have to

be globally synchronized before they can be used. This limits the scalability as well as the dy-namism of the system. Associative interactions [8,9] use semantic content-based resolution, similarto that used by the naming service, to enable interactions. In associative interactions, participat-ing clients locally maintain and export ‘profiles’ consisting of attributes specifying credentials,context, state, interests, and capabilities. Messages are similarly enhanced to include ‘semanticselectors’. The semantic selector is a propositional expression over all possible attributes andspecifies the profiles that are to receive the message. Thus, the notion of a static client or clientgroup name used by conventional interactions is subsumed by the selector, which descriptivelynames dynamic sets of clients of arbitrary cardinality. Associative interactions only require the



existence of an information space (ontologies) that is known to all interacting entities, and elim-inates the need for expensive synchronization and complex tracking protocols in pervasive gridenvironments.

2.1. The semantics of AR interactions

The AR interaction model consists of three elements: messages, associative selection, and reactivebehaviors, which are described below.

2.1.1. AR messages

An AR message is defined as the triplet: (header, action, data). The data field may be emptyor may contain the message payload. The header includes a semantic profile in addition to thecredentials of the sender, a message context, and the Lifetime of the message. The profile is aset of attributes and/or attribute–value pairs, and defines the set of recipients of the message. Theattributes are keywords from the application information space, such as the type of data a sensorproduces (temperature or humidity) and/or its location, the type of functionality a service providesand/or its quality of service guarantees, and the capability and/or the cost of a resource, while thevalues field may be a keyword, partial keyword, wildcard, or range from the same space. At the RP,a profile is classified as a data profile or an interest profile depending on the action field of themessage. The data profile corresponds to a message carrying data to be stored in the system. Theinterest profile corresponds to a query.A sample data profile used by a sensor to publish data is shown in Figure 1(a), and a matching

interest profile is shown in Figure 1(b). Note that the number or order of the attribute/attribute–value pairs in a profile is not restricted. However, our current prototype requires that the maximumpossible attribute/attribute–value pairs must be predefined. The action field of the AR messagedefines the reactive behavior at the RP and is described below.The AR interaction model defines a single symmetric post primitive. To send a message, the

sender composes a message by appropriately defining the header, action, and data fields, andinvokes the post primitive. The post primitive resolves the profile of the message and delivers themessage to relevant RPs. The profile resolution guarantees that all the RPs that match the profilewill be identified. However, the actual delivery relies on existing transport protocols.

( location = [120, 23] )( temperature = 110 )( unit = Fahrenheit )( error <= 0.01 )( alarm )

( location [100-130, 20-50] )( temperature > 80)( unit = Fa* )( error <= 0.1 )( alarm )

(a) (b)

Figure 1. Sample message profiles: (a) a data profile for a temperature sensor and(b) an interest profile for a client.



2.1.2. Associative selection

Associative selection is the content-based resolution and matching of profiles based on keywords(e.g. exact keywords, wildcards, partial wildcard, and ranges). A profile p is a propositional ex-pression. The elements in the profile can be an attribute, ai , or an attribute–value pair (ai , vi ),where ai is a keyword and vi may be a keyword, partial keyword, wildcard, or range. The singletonattribute ai evaluates to true with respect to a profile p if and only if p contains the attribute ai . Theattribute–value pair (ai , ui ) evaluates to true with respect to a profile p if and only if p containsan attribute ai and the corresponding value vi satisfies ui .Based on the nature of profiles and the complexity of matching, associative selection is classified

as exact and complex. Exact selection involves profiles composed of complete keywords, and se-lection consists of exactly matching these keywords. Complex selection involves profiles composedof keywords of wildcard, partial wildcard, ranges, and selection consists of approximate and/orrelational matching. For example, profile (a) is associatively selected by profile (b) in Figure 1 as:(1) for the attribute location, the value [120, 23] satisfies the range [100–130, 20–50]; (2) for theattribute temperature, the value 100 satisfies the relation >80; (3) for the attribute unit, Fahrenheitmatches wildcard Fa∗; (4) for the attribute error, error<0.01 satisfies the constraint error<0.1;and (5) the attribute alarm match exactly.A key characteristic of the selection process is that it does not differentiate between interest and

data profiles. This allows all messages to be symmetric where data profiles can trigger the reactivebehaviors associated with interest messages and vice versa. The matching system thus combinesselective information dissemination with reactive behaviors. Further, both data and interest messagesare persistent and their persistence is defined by the Lifetime field.

2.1.3. Reactive behaviors

The action field of the message defines the reactive behavior at the RP. Basic reactive behaviorscurrently defined include store, retrieve, notify, and delete as listed in Table I. These reactivebehaviors are used by data producers and consumers to store/retrieve data as illustrated in Figure 2.Note that a client in the system can be a data producer, a data consumer, or both.The notify and delete actions are explicitly invoked on a data or an interest profile. The store

action stores the data and data profile at the RP. It also causes the message profile to be matchedagainst existing interest profiles with notify data action, and the data to be sent to the data consumersthat requested it in case of a positive match.The retrieve action retrieves data corresponding to each matching data profile. The notify action

matches the message profile against existing interest/data profile, and notifies the sender if thereis at least one positive match. The notify action comes in two flavors: notify data and notify interest.Notify data is used by data consumers, who want to be notified when data matching their interestprofile are stored in the system.Notify interest can be used by data producers, whowant to be notifiedwhen there is interest in the data they produce, so that they can start sending data into the system.Finally, the delete action deletes all matching interest/data profiles. Note that the actions will

only be executed if the message header contains an appropriate credential. Also note that eachmessage is stored at the rendezvous for a period corresponding to the Lifetime defined in its header.In case of multiple matches, the profiles matching are processed in random order.



Table I. Basic reactive behaviors.

Actions Semantics

store Store data profile and data in the system at the RPsMatch the data profile with existing interest profiles with ‘notify data’ actionExecute action associated with a matched profile

retrieve Match interest profile with existing data profilesSend data associated with the matched profiles to the requester

notify data Store the interest profile and the action in the system at the RPsMatch the interest profile with:

(1) existing data profiles, and send back a notification if a match occurs, and(2) existing interest profiles with ‘notify interest’ action

Send a notification to the data producer if a match occurs

notify interest Store the interest profile and the action in the system at the RPMatch the interest profile with existing interest profiles with ‘notify data’actionSend a notification to the data producer if a match occurs

delete data Match the profile with existing data profilesDelete all matching data profiles with appropriate credentials, and the data associated withthem

delete interest Match the interest profile with existing interest profilesDelete all matching interest profiles with appropriate credentials

Dataproducer

Dataconsumer

notify_interest

store

delete_datedelete_interest

notify_data

retrieve

delete_interest

notify(interest)

(a) (b)

notify(data)data

Figure 2. Basic reactive behaviors: (a) for message issued by data producers and (b) formessages issued by data consumers.

Illustrative examples: The operation of the model is illustrated in Figure 3. Client C1 is a dataproducer (e.g. a sensor), and produces data described by the profile 〈p1, p2〉. C1 publishes data inthe system only if other clients need it. Client C2 is a data consumer. In Figure 3(a), client C1 sendsa message with interest profile 〈p1, p2〉, requesting to be notified if there are other clients interestedin the type of data it produces. C1’s interest profile is stored in the system, and matched against



(a) (b)

(d)

post (<p1, p2>, store, data)C1

C2

(4)

(5) notify(C2, data)

ARpost (<p1, p2>, notify_interest(C1))

C1C2

(1)

(3) notify(C1)

(2) post(<p1, *>, notify_data(C2))

AR: Associative Rendezvous

AR

C1

C2

(8) post (<p1, p2>, delete_data(C1))

(9) post(<p1, *>, delete_interest(C2))

AR

(c)

C1

C2(7) retrieve(C2, data)

(6) post(<p1, *>, retrieve(C2))

AR

Figure 3. An example illustrating the operation of associative rendezvous.

existing interest profiles. Client C2 sends a message with interest profile 〈p1, ∗〉, requesting to benotified if there are data stored in the system matching the profile. C2’s interest profiles are storedin the system and matched against the other profiles in the system. Since C2’s profile matches C1’sprofile, the action field for C1’s profile is executed, and a notification message is sent to C1. C1starts publishing data in the system, as illustrated in Figure 3(b). The data published by C1 matchC2’s profile, whose action field is executed, resulting in a notification being sent to C2. C2 retrievesthe data, as Figure 3(c) shows, by issuing a retrieve message with interest profile 〈p1, ∗〉. Thisexample assumes that the Lifetime for data and interest profiles has not expired. The profiles willbe deleted automatically when the Lifetime expires, or the clients can delete their profiles explicitly,as illustrated in Figure 3(d).Note the symmetric behavior of the post operator. As seen in Figure 3(a), a client can subscribe to

both interests and data. This is particularly useful for sensor networks where an energy-constrainedsensor may want to produce data when there is an interest for its data, allowing it to use power andbandwidth more effectively.AR can also be used to realize different interaction semantics such as one-to-many, one-to-

some, one-to-all, and mobilities while appropriately setting data and interest profiles. Further,as these profiles are defined locally by a client, no synchronization is required to achieve theseinteraction semantics. Figure 4 illustrates a one-to-many (e.g. multicast) interaction using AR. Notethat interactions and compositions among individual application elements (i.e. sensors, actuators,services) can be achieved through the cascading of AR messages without having to be explicitlyprogrammed [6]. Our system can be used to support different interaction semantics in a pervasivesystem; however, this work is not designed for such a rapid change in sensor positions.



(5) retrieve(D3, data)

S1

D3

D1

(4) post (<p1, p2, p3>, store, data)

D2

(1) post (<p1, *, *>, retrieve(D1))

(3) post (<p1, p2, *>, retrieve(D3))

(2) post (<p1, p2, p3>, retrieve(D2))



Figure 4. One-to-many interactions using associative rendezvous. Clients D1, D2, and D3 post interest profilesto the system, requesting matching data. Next, S1 publishes data matching the interest profiles submitted by

D1, D2, and D3, and, as a result, notifications are sent to these clients.

2.2. Information aggregation using AR

The aggregation service extends the retrieve reactive behavior to specify aggregation operators, i.e.the AR message for an interest profile includes the aggregation operation as follows: (〈{attr}〉,retrieve(A)), where attr is the set of content attributes of the interest profile within the ARmessage header, and retrieve(A) specifies aggregation using the aggregation operator A. Note thataggregations can be constrained to data elements within a specific spatial region. The locationattributes of a data element for an aggregation query are specified as part of the semantic profileof the message header, similar to other content attributes, i.e. location descriptors form (typicallyleading) dimensions of the semantic information space, based on which data and interest profilesare defined. The data profile becomes (〈L , {attr}〉, store, data) and the corresponding interestprofile becomes (〈L , {attr}〉, retrieve(A)), where L specifies the location of the data producer inthe data profile and the region of interest in the interest profile. L may use latitude/longitude or anyother specification of location. For example, in Figure 1, the data profile includes the location ofthe sensors, while the interest profile includes ranges specifying the region of interest. Aggregationqueries may also be recurrent in time. In such a query, an additional parameter is required to specifythe frequency of evaluation of the persistent query; (〈L , {attr}, TTL〉, retrieve(A, Ta)), where Tais the time interval between repeated aggregates.Semantics of aggregation: The semantics of the aggregation query are as follows. When an

aggregation query is posted, the query is routed to all rendezvous peers with data profiles that matchthe query’s interest profile. All data items at all peers that match the interest profile specificationare aggregated according to the specified aggregation operator and are returned. In case of recurrentaggregation queries, the interest profile is registered at each rendezvous peer for the duration of itsTTL, and is repeatedly evaluated at the specified frequency. The aggregation operation is repeatedeach time the query is evaluated and the aggregate is returned. Note that each aggregation operationis independent.To illustrate the operation of the aggregation service, consider a traffic-monitoring system with

deployed vehicle speed sensors. An example of an aggregate query for such a system is find the



average speed in the stretch of road specified by region L every 5min for the next 1 h. An aggregationquery would be realized using the aggregation service described above as follows: the client connectsto any rendezvous peer in the system and posts an aggregate query with profile 〈L , p1〉, and retrievesinformation using the aggregator AVG defined over region L , with an aggregation frequency of 5minand a TTL of 1 h. Such an aggregate query can be written as post(〈L , p1, 3600〉, retrieve(AVG, 300)),where the time is measured in seconds. This query is routed to and registered at every peer thatstores data elements matching this query. In response, every matching data element in the systemis aggregated and returned to the client.

3. METEOR: A MIDDLEWARE INFRASTRUCTURE FOR CONTENT-BASEDDECOUPLED INTERACTIONS

Meteor is a middleware infrastructure for content-based decoupled interactions in pervasive envi-ronments. It is essentially a peer-to-peer network of RPs, where each RP is a peer and may be abroadband AP, a forwarding node in a sensor network, or a server node in a wired network. To useMeteor, applications must connect to an RP. A schematic overview of the Meteor stack is presentedin Figure 5. It consists of three key components: (1) a self-organizing overlay, (2) a content-basedrouting infrastructure (Squid), and (3) the ARMS. The aggregation service [7] specifically buildson the Squid CBR infrastructure to construct aggregation trie structures on top of the routes usedby the queries, and use them to back-propagate matching data while performing aggregations atintermediate nodes in the trie. Further, it extends the ARMS layer to provide a unified content-basedabstraction to specify aggregation operations as reactive behaviors of content-based queries.

3.1. Overlay network layer

The Meteor overlay network is a one-dimensional self-organizing structured overlay composed ofRP nodes. Peers in the overlay can join or leave the network at any time. While the design of Meteoris not tied to any specific overlay topology, the current Meteor prototype builds on Chord [10].

Associative Rendezvous Messaging

Content-based Routing

Self-organizing Overlay

Wireless/Wired substrate

Pervasive Applications

e.g. sensors, wireless network, internet, etc.

Met

eor

Stac

k

Agg

rega

tion

Figure 5. Meteor stack—schematic overview.



Finger = the successor of (this node id + 2i-1) mod 2m, 0 <= i <= m

5 + 1 85 + 2 85 + 4 105 + 8

0 5

810

11

lookup

identifier

0

Figure 6. The Meteor overlay network layer.

Advantages of Chord include its guaranteed performance, logarithmic in the number of messages(e.g. data lookup requires O(log N ) messages, where N is the number of RP nodes in the system).However, this overlay could be replaced by other structured overlays.Peer nodes in the Chord overlay form a ring topology. Every node in the Chord overlay is assigned

a unique identifier ranging from 0 to 2m − 1. Each data item stored in the system is associated witha key and mapped to an identifier from the same interval. The identifiers are arranged as a circlemodulo 2m . Each node stores the keys that map to the segment of the curve between itself and itspredecessor node. Figure 6 shows an example of a Chord overlay network with five nodes and anidentifier space from 0 to 64.We have implemented a two-level overlay design using Chord, in which nodes are organized into

independent Chord rings (groups) based on physical or logical proximity or locality. The idea is thatnodes that are physically close or that are expected to frequently exchange messages can be arrangedas individual groups, within which most message exchanges will take place. The less frequent andpossibly more expensive inter-group communication can be realized through inter-group links ata higher level. For each group, nodes’ finger tables are no different than in a single-level Chordimplementation. However, each node additionally keeps a reference to its successor in each of theother groups, which can be done by querying any node in a remote group as in a regular joinoperation. The result of this design is a fully connected higher-level overlay at relatively small cost(assuming that the number of individual groups is much smaller than the number of nodes), whereany node can be used as a link between groups. Thus, there are no specialized nodes or convergingpaths as potential bottlenecks or single points of failure. Figure 7 shows this two-level overlay fromtwo perspectives. Because the successor–predecessor relationship is maintained for nodes betweengroups, inter-group routing is expected to take a better than average number of hops, where theaverage case corresponds to starting a search from an arbitrary node in the remote group [11]. Thisis because a query will already be closer to the destination node when it enters the remote groupthan if an arbitrary node were used.The overlay network layer provides a simple abstraction to the layers above, consisting of a single

operation: lookup(identifier, group). Given a data identifier, this operation locates the RP node onthe given group that is responsible for storing it. If the group parameter is omitted, the lookup willtake place in the local group. Two wildcard values are accepted for this parameter for inter-grouprouting: ANY, which is used to route to a node on any group, starting with the local group, andALL, which is used to route to nodes on all groups. Both values first initiate routing normally toa node on the local group. For ANY, a query is propagated further only if it cannot be resolvedon that node. For ALL, the query is sent in parallel to every other group. Note that in both cases,



BA

C(a) (b)

Figure 7. Two-level overlay organization: (a) the fully connected high-level network of three groups A, B,C, and the individual lower-level groups A and B, as well as some of the successor relations between nodesin these groups that realize the high-level link, and (b) an integrated view of the network, where for each

different shape there is a directly connected group.

because the AP to remote groups is the successor of the responsible rendezvous node in the localgroup, routing in remote groups is expected to be resolved in better than average number of hops,as explained above.Note that use of the two-level overlay can avoid single points of failure, improve scalability, and

exploit locality. The latter can in turn favor applications for which low latency and responsivenessare important requirements. Because messages are transmitted to the local group first, the systemcan obtain a quick response from local nodes, while the message is propagated to other stakeholdersin remote groups. In the firefighting application, alarms and sprinklers may be directly connected tothe local sensor network and receive a message first for immediate response. The message will thenbe forwarded further along the overlay to firefighting, medical, and other response teams whosesystems might be connected to the network via nodes in remote groups.

3.2. Content-based routing layer

Squid builds on top of the Chord overlay to enable flexible content-based routing. As mentionedabove, the lookup operator provided by the Chord overlay requires an exact identifier. Squid effec-tively maps complex message profiles, consisting of keyword tuples made up of complete keywords,partial keywords, wildcards, and ranges, onto clusters of identifiers. It guarantees that all peers re-sponsible for identifiers in these clusters will be found with bounded costs in terms of the numberof messages and the number of intermediate RP nodes involved.Tuples of d keywords, wildcards, and/or ranges represent points or regions in a d-dimensional

information space. A point corresponds to a keyword tuple that contains only complete keywords,and is called simple, as shown in Figure 8(a). If a tuple contains partial keywords, wildcards, and/orranges, it is called complex and defines a region in the information space, as shown in Figure 8(d).Squid uses the Hilbert Space Filling Curve (SFC) [12] to map the multi-dimensional information

space to the one-dimensional identifier space of the peer overlay. Figure 8(b) shows an example ofHilbert SFC for a two-dimensional space. The Hilbert SFC is a locality-preserving continuous andrecursive mapping from a k-dimensional space to a one-dimensional space. It is locality preservingin that points close on the curve are mapped from close points in the k-dimensional space. TheHilbert curve readily extends to any number of dimensions, though for practical purposes its locality-preserving property can best be exploited with under six dimensions. Further, its locality-preserving



7

0

13

2940

51 7Lo

ng

itu

de

Lo

ng

itu

de

SimpleKeyword

tuple

Latitude2

1 7

SFC index

7

Latitude

(a) (b) (c)

(d) (e)

lon

git

ud

e

5

2 3

1

temperature

Complexkeyword

tuple

0

13

32

51

40

Destinationnodes6

113128

Figure 8. Routing using simple and complex keyword tuples.

and recursive nature enables the index space to maintain content locality and efficiently resolvecontent-based lookups [5].Content-based routing in Squid is achieved as follows: SFCs are used to generate the one-

dimensional index space from the multi-dimensional keyword space. Applying the Hilbert mappingto this multi-dimensional space, each profile consisting of a simple keyword tuple can be mapped toa point on the SFC. Further, any complex keyword tuple can be mapped to regions in the keywordspace and the corresponding clusters (segments of the curve) in the SFC (see Figure 8(d) and (e)).The one-dimensional index space generated corresponds to the one-dimensional identifier spaceused by the Chord overlay. Thus, using this mapping, RP nodes corresponding to any simple orcomplex keyword tuple can be located. The Squid layer of the Meteor stack provides a simpleabstraction to the layer above consisting of a single operation: deliver(keyword tuple, data), wheredata is the message payload provided by the messaging layer above. The routing process is describedbelow.Routing using simple keyword tuples: The routing process for a simple keyword tuple is illustrated

in Figure 8(a)–(c). It consists of two steps: first, the SFC mapping is used to construct the indexof the destination RP node from the simple keyword tuple, and then, the overlay network lookupmechanism is used to route to the appropriate RP node in the overlay.Routing using complex keyword tuples: The complex keyword tuple identifies a region in the

keyword space, which in turn corresponds to clusters of points in the index space. For example, inFigure 8(d), the complex keyword tuple (2–3, 1–5) representing data read by a sensor (temperaturebetween 2 and 3 units and humidity between 1 and 5 units) identifies two clusters with 6 and 4points, respectively.



Thus, a complex keyword tuple is mapped by the Hilbert SFC to clusters of SFC indices and, corre-spondingly, multiple destination identifiers. Once the clusters associated with the complex keywordtuple are identified, a straightforward approach consists of using the overlay lookup mechanism toroute individually to each RP node. Figure 8(e) illustrates this process. However, as the originatingRP node cannot know how the clusters are distributed in the overlay network, the above approachcan lead to inefficiencies and redundant messages, especially when there are a large number ofclusters. The routing process can be optimized by using the recursive nature of the SFC to dis-tribute the list of clusters to be routed. This optimization is presented in detail as trie constructionand trie-based routing as follows.Trie construction: Since SFCs are infinitely self-similar recursive data structures, their construc-

tion process can be regarded as a tree. Because of their digital causality property‡, the tree is a prefixtree, i.e. a trie. A query defines a sub-space in the multi-dimensional space and segments (calledclusters) on the SFC curve. The construction of these clusters corresponds to the construction ofa trie.Figure 9(a) illustrates the recursive resolutions of the query (011, 010–110) in a two-dimensional

keyword space, with base-2 digits as coordinates. Figure 9(b) shows the construction of the corre-sponding trie. At each recursion step the discretized space is refined, resulting in a longer curve.The query defines 2 points on the first SFC refinement (prefix 0); 3 points on the second SFCrefinement, grouped based on prefixes 011 and 0010; and 5 points on the third SFC refinement,with prefixes 011, 0111, and 0010.Trie-based routing: The trie constructed by resolving the query is embedded into the overlay

as follows. Each node in the trie is mapped to a peer node in the overlay based on its identifier.The leaf nodes have the SFC index as their identifiers. For intermediate nodes, the identifier isconstructed by padding the prefix with zeroes until the maximum bit length is reached (i.e. thelength that corresponds to the SFC index at the maximum level of refinement allowed). The peersresponsible for the identifiers are then located using the lookup mechanism provided by the overlaynetwork.Figure 10 illustrates the process. In the figure, the overlay network uses an identifier space from

0 to 26, and binary node identifiers. The source node, 111000, refines the query at the first recursionlevel, which results in a cluster with prefix 0. The node then uses the prefix 0 to construct anidentifier, by padding the prefix with zeroes, and sends a message to node 000000. At node 000000,the query is refined to generate the second level of recursion, which results in two clusters, onewith prefix 011 and the other with prefix 0010. Node 000000 constructs a sub-query identifier foreach cluster, and sends the messages to appropriate nodes in the overlay. The nodes that receive thesub-queries refine them to generate the next level of recursion, and so on. Note that node 100001does not need to refine its sub-query since the sub-query prefix, 011, is smaller than the node’s prefixof the same length, i.e. 100, meaning that the entire subtree rooted at 011∗ is stored at this node.As a result, all sub-trees of the trie associated with the query can be pruned, saving communicationand computational resources.

‡SFC digital causality: Each step of recursion transforms a point on the SFC curve into multiple points by extending itsidentifier with d digits, where d is the dimensionality of the space mapped by the curve.



000 001 010 011 100 101 110 111

111

110

101

100

011

010

001

000

(011, 010 -110)

speed

long

itude

0 1

0

1

1100

01 10

00

00 01 10 11

01

10

11

0000 0001

0011

0100

0101 0110

0111 1000

1001 1010

1011

11001101

1110 1111

0010 001010

001011

011111

011100011011

(a)

0010*

0111*

001010

011*

0*

001001011011

011100 011111

(b)

Figure 9. (a) The complex query (011, 010–110)—the first, second, and third SFC refinementand (b) the trie associated with the query.

000000

000100

001001

001111

100001

1110000*

011* 0010*

0110110111*

001010

001001

Figure 10. Embedding the tree from Figure 9(b) into the overlay network. Node 100001 isresponsible for storing the subtree routed at 011*.

The example presented above has been simplified for ease of illustration. In reality, each nodeperforms multiple query refinements (e.g. 5), which result in a large set of sub-queries, each with alonger prefix. Also, this routing procedure is shown only on a single ring or node group. If the query



is meant for multiple groups, according to the overlay’s group parameter introduced in Section 3.1,then it will be propagated to these groups from each of the possibly multiple final destination nodesfor the local query.

3.3. Trie-based in-network aggregation

The design of the aggregation service builds on the trie structure constructed by the routing engine.An aggregation query is routed to the appropriate peer nodes with data that match the query in theusual way. Further, recurrent aggregation queries are stored at the peer node for the duration of theirTTLs and are periodically evaluated based on the specified frequency. Conceptually, aggregationconsists of propagating data matching the query back up the trie, with partial aggregation performedat intermediate peer nodes and the final aggregate being computed at the peer node that issued thequery.A straightforward implementation of the service, which assumes that the overlay is static and

stable with no peer nodes joining, leaving or failing, consists of maintaining state about each query(e.g. the query operation, the parent and children node for the associated trie) at each peer nodethat forms the trie. The leaf nodes of the trie evaluate the query, perform the first aggregation,and send the result to their parent peer nodes. Each intermediate node waits until all their childrenreport their partial aggregates, aggregate these results, and forward them to their parents. However,such a simple implementation has two problems. First, the overlay is typically dynamic, with peersjoining, leaving and failing relatively often, and maintaining state about queries can be expensiveor infeasible. Second, the number of aggregation queries can be very large and storing state foreach query will require significant resources. The approach presented in this paper does not requireexplicitly storing query state at the peer nodes. Instead, the prefixes of the nodes in the path of thequery along the trie are maintained within the query itself, and the query is only stored at the leafnodes in case of a recurrent query. This list of prefixes can then be used to back-propagate andaggregate data to the source peer node—each peer uses the prefix that precedes its own in the listto route to its parent in the trie.To illustrate the process, consider the query (011, 010–110) presented earlier in this section. The

trie associated with this query was presented in Figure 9. Further, as shown in Figure 10, onlya part of the trie is actually constructed while resolving the query. The constructed trie is shownin Figure 11(a). Figure 11(b) shows the list of prefixes accumulated by the query as it reachesdifferent leaf nodes of the query trie. The in-network aggregation process is shown in Figure 12,and the pseudocode for the aggregation algorithm is presented in Figure 13. The leaf nodes evaluatethe query and locally aggregate matching data elements. These partial aggregates are then sent tothe overlay peer responsible for the parent node in the query trie. This peer is located using theprefix of the parent trie node by padding it with zeroes to obtain a valid identifier, and performinga lookup for this identifier. Note that the lead node includes the prefix list in the result message.Each intermediate node in the query trie aggregates all partial results it receives for a query fromits children in the query trie and periodically forwards these to its parent in the query trie using thesame process. Every time a peer node forwards a result up the trie, it removes its prefix from theprefix list. Details about dealing with varied link latencies managing system dynamics (e.g. joining,leaving, failures) can be found in [7].



(a)

(b)

Figure 11. (a) The trie constructed while resolving the query (011, 010–110) as presented in Figure 10 and(b) the list of prefixes accumulated by the query at the leaf peer node.

3.4. AR messaging substrate

The AR messaging layer builds on top of content-based routing layer and provides the abstractionsto enable decoupled interactions and reactive behaviors between peer elements using the unifiedinterface: post(profile, action, data).The ARMS layer implements the AR interaction model. At each RP, ARMS consists of two

components: the profile manager and the matching engine. The matching engine component isessentially responsible for matching profiles. An incoming message profile is matched againstexisting interest and/or data profiles depending on the desired reactive behavior. If the result ofthe match is positive, then the action field of the incoming message is executed first, followedby the evaluation of the action field in matched profiles. The profile manager manages locally



Figure 12. In-network, trie-based aggregation. Each peer node performs a partial aggregation and sends theresult to the peer node corresponding to its parent in the query trie.

stored profiles, and monitors message credentials and contexts to ensure that related constraints aresatisfied. For example, a client cannot retrieve or delete data for which it is not authorized. Theprofile manager is also responsible for garbage collection. It maintains a local timer and purgesinterest and data profiles when their Lifetime fields have expired. Finally, the action dispatcherexecutes the action corresponding to a positive match (Figure 14).

3.5. Implementation overview

Meteor builds on Chord (or our two-level extension to Chord). Chord, Squid, the ARMS layers,and the in-network aggregation service of the Meteor stack are currently implemented as event-driven JXTA services, so that each layer registers itself as a listener for specific messages, and getsnotified when a corresponding event is raised. Project JXTA (http://www.jxta.org) is a general-purpose peer-to-peer framework that provides basic peer-to-peer messaging services. Since Meteoris designed as an overlay network of rendezvous peers, it is incrementally deployable. A joiningRP uses the Chord overlay protocol and becomes responsible for an interval in the identifier space.In this way, the addition of a new rendezvous node is transparent to the end-hosts.The overall operation of the Meteor overlay consists of two phases: bootstrap and running. During

the bootstrap phase (or join phase), messages are exchanged between a joining RP and the rest ofthe group. During this phase, the RP attempts to discover an already existing RP in the system tobuild its routing table. The joining RP sends a discovery message to the group. If the message isunanswered after a set duration (in the order of seconds), the RP assumes that it is the first in thesystem. If a RP responds to the message, the joining RP queries this bootstrapping RP accordingto the Chord join protocol and updates its routing tables to reflect the join.



Figure 13. The pseudocode for in-network trie-based aggregation.

Associative Rendezvous

post(<p1, p2, p3>, store)

AR message

post (header, action, data)

profile action data

action data

execute action

Profile Manager

Matching Engine

Action Dispatcher

RP

AR message

Figure 14. Profile manager and matching engine at a rendezvous point.

The running phase consists of stabilization and user modes. In the stabilization mode, an RPresponds to queries issued by other RPs in the system. The purpose of the stabilization mode is toensure that routing tables are up to date, and to verify that other RPs in the system have not failedor left the system. In the user mode, each RP interacts at the Squid and ARMS layers.



4. EXPERIMENTAL EVALUATION

A prototype of Meteor has been deployed on (1) a LAN of 64 Intel Pentium-4 1.70GHz computerswith 512MB RAM Linux2.4.20-8 (kernel version) and a 100Mbps Ethernet interconnect, (2) thePlanetLab [13] wide-area testbed, which is a global scale heterogeneous distributed environmentcomposed of interconnected sites with various resources, and (3) the ORBIT [14] wireless testbed,which is composed of hundreds of wireless nodes, and is used to test wireless environments forpervasive applications. In these deployments, each peer node serves as a RP and executes aninstance of the Meteor stack. An experimental evaluation of the average performance of Meteor onup to hundreds of nodes using these deployments is presented below. Further, an evaluation of thescalability of Meteor on up to thousands of nodes using simulations is presented.The experiments primarily measured the average runtime for the primitive provided by Meteor.

For the post-operation, the measured time is the time interval between when an AR message isissued and when it reaches the last destination node(s) that contains data elements that match theprofile of the message. This includes the time for routing the message, matching the profile with thelocal repository, and executing reactive behaviors. Measurements use the native clocks at the RPs.The experiments also measured the overheads of different profile types and analyzed their impactson scalability in different deployment environments. Finally, in case of the simulations, wide-areanetwork latency models for systems with thousands of RPs are used to provide insight into thebehavior of Meteor in large-scale pervasive environments. Three types of messages are used in theexperiments:

1. AR-1 messages are store messages with a simple profile requiring exact selection, e.g.post(〈p1, p2〉, store). The actual sample profile used is post(〈105, 72〉, store).

2. AR-2 messages are retrieve messages with a simple profile requiring exact selection, e.g.post(〈p1, p2〉, retrieve). The actual sample profile used is post(〈105, 72〉, retrieve).

3. AR-3 messages are retrieve messages with a complex profile containing partial keywords,wildcards, and ranges, and requiring complex selection, e.g. post(〈p1, ∗, p∗〉, retrieve). Theactual sample profiles used are post(〈100 − 120, 50 − 80, temp∗〉, retrieve) and post(〈100 −120, ∗, temp∗〉, retrieve).

The experiments performed over different environments are based on a single Chord ring overlayunless explicitly stated. The results presented in the following section demonstrate that Meteorcan effectively scale to large number of peers while maintaining acceptable execution time formessages.

4.1. Experiments over a LAN

These experiments used three sets of messages, one for each of the three message types listedabove. The median runtime was measured for each set for different system sizes and are plottedin Figure 15. The bars on the graph are min/max, and the median runtime is plotted as triangles inthe figures.As seen in Figure 15(a) and (b) the runtime for AR-1 and AR-2messages increases as the system

size increases. AR-1 and AR-2 messages contain simple profiles, which means that the message is



LAN: post(<p1,p2>,store)

0

20

40

60

80

100

120

140

4 8 16 32 64

number of nodes

tim

e (m

illis

eco

nd

)

tim

e (m

illis

eco

nd

)

(a)

LAN: post(<p1,p2>,retrieve)

0

20

40

60

80

100

120

4 8 16 32 64

number of nodes(b)

tim

e (m

illis

eco

nd

)

LAN: post(<p1,*,p*>,retrieve)

0100200300400500600700800900

1000

4 8 16 32 64

number of nodes(c)

Figure 15. Evaluation of AR in a local area network (LAN) environment.

routed to a single destination. As the system grows, the number of intermediary nodes involved inrouting the query grows, causing an increase in the runtime. Note, however, that for AR-1messages(Figure 15(a)) the runtime increases by a factor of 1.6 (from 50 to 80ms) when the system sizeincreases by a factor of 16 (from 4 nodes to 64 nodes). A similar behavior is also observed forAR-2 messages in Figure 15(b).Figure 15(c) plots runtimes for AR-3 messages, which contain complex profiles that are routed

to multiple destinations. The AR-3 message runtimes also increase with system size, and, onceagain, the rate of increase of message runtime is smaller than that of the system size. Further,the magnitudes of the runtimes for AR-3 messages are higher than those for AR-1 and AR-2 asresolving and routing complex profiles require more computation at a node, involve multiple nodes,and require multiple messages being constructed and sent to these nodes.

4.2. Experiments over PlanetLab

Meteor has been deployed on over 60 nodes on the PlanetLab wide-area testbed. Since PlanetLabnodes are distributed across the globe, communication latencies can vary significantly with timeand node location [15]. In each of the experiments presented below, at least one node was selectedfrom each continent, including Asia, Europe, Australia, and North America. Nodes randomly joinedthe Meteor system during bootstrap phase, resulting in a different physical construction of the ring



PlanetLab: post(<p1,p2>,store)

0200400600800

1000

4 8 16 32 64number of nodes

tim

e (m

illis

eco

nd

)

tim

e (m

illis

eco

nd

)

(a)

PlanetLab: post(<p1,p2>retrieve)

0

200

400

600

800

4 8 16 32 64number of nodes(b)

tim

e (m

illis

eco

nd

)PlanetLab: post(<p1,*,p*>,retrieve)

0500

10001500200025003000

4 8 16 32 64number of nodes(c)

Figure 16. Evaluation of AR on the PlanetLab testbed.

overlay in each run. The experiments were conducted at different times of the day during a 4-weekperiod. Once again, three sets of experiments were performed, one for each message type. Theaverage runtimes for these experiments are plotted in Figure 16.As seen in Figure 16, the plots of runtime versus system size have similar trends as those for

experiments over a LAN. The magnitudes of the runtime are larger for the PlanetLab case, whichis expected since the network end-to-end latencies are larger for wide-area networks. Once again,message runtimes increase with system size; however, the rate of increase of message runtime issmaller than that of the system size. Further, the magnitudes of the runtime for AR-3 messages arealso larger than those for AR-1 and AR-2 for these experiments, as resolving and routing complexprofiles require more computation at a node, involve multiple nodes, and require multiple messagesbeing constructed and sent to these nodes.

4.3. Experiments over ORBIT

Meteor has been deployed on up to 250 nodes on the ORBIT wireless testbed. The ORBITlarge-scale radio grid emulator [14,16] consists of an array of 20× 20 open-access programmablenodes, each with multiple 802.11a,b,g or other (Bluetooth, Zigbee, GNU) radio cards. The over-lay used in this deployment was constructed as follows. The wireless nodes were divided intogroups of up to 64 nodes. Each group was associated with an AP. The APs were connected us-ing a wired interconnect via Chord ring overlay. Nodes were allowed to randomly join or leavea group. Note that communication latencies varied significantly with time and location withina group.Once again, three sets of experiments were performed, one for each message type. The average

runtimes for these experiments within the group are plotted in Figure 17. As seen from the figures,



ORBIT: post(<p1,p2,p3>,store)

0

20

40

60

80

100

120

140

160

4 8 16 32 64number of nodes

tim

e (m

illis

eco

nd

)

tim

e (m

illis

eco

nd

)

(a)

ORBIT: post(<p1,p2,p3>,retrieve)

0

20

40

60

80

100

120

140

4 8 16 32 64number of Nodes(b)

tim

e (m

illis

eco

nd

)

ORBIT: post(<p1,*,*>,retrieve)

0

200

400

600

800

1000

1200

1400

1600

4 8 16 32 64

number of Nodes(c)

Figure 17. Evaluation of AR on the ORBIT wireless testbed.

the trends in these plots are similar to those for experiments over the LAN and PlanetLab. Themagnitudes of the runtime for ORBIT case are, however, larger than those of the LAN case, whichis expected since the network end-to-end latencies are larger for wireless radios due to its less stableconnectivity and bandwidth. Also, as expected, the magnitudes of the runtime for ORBIT case areless than those for the PlanetLab experiments, since nodes in the latter case are geographicallydistributed. Once again, message runtimes increase with system size; however, the rate of increaseof message runtime is smaller than that of the system size.Further, the magnitudes of the runtime forAR-3messages are also larger than those forAR-1 and

AR-2 for these experiments, as resolving and routing complex profiles require more computation ata node, involve multiple nodes, and require multiple messages being constructed and sent to thesenodes.

4.4. Runtime at each Meteor layer

The breakdown of message runtime by Meteor layers for the above experiments is plotted inFigure 18. For AR-1 messages, which involve exact selection, the overhead of indexing and routinebased on complete keywords at the content routing layer is about 40ms for all three environments.In case of the LAN (Figure 18(b)) and ORBIT environment (Figure 18(d)) this time is about 40



0(a) (b)

(d)(c)

(e) (f)

100

200

300

400

500

tim

e (m

illis

eco

nd

)

AR-1 AR-3

LAN: a sample individual overhead

network overhead content routing matching and reaction processing

network overhead content routing matching and reaction processing network overhead content routing matching and reaction processing

network overhead content routing matching and reaction processing

network overhead content routing matching and reaction processing network overhead content routing matching and reaction processing

tim

e (m

illis

eco

nd

)

0%

20%

40%

60%

80%

100%

AR-1 AR-3

LAN: percentage of overhead

tim

e (m

illis

eco

nd

)0%

20%

40%

60%

80%

100%

AR-1 AR-3

ORBIT: percentage of overhead

tim

e (m

illis

eco

nd

)

0

200

400

600

800

1000

AR-1 AR-3

ORBIT: a sample individual overhead

tim

e (m

illis

eco

nd

)

0

100

200

300

400

500

AR-1 AR-3

PlanetLab: a sample individual overhead

tim

e (m

illis

eco

nd

)

0%

20%

40%

60%

80%

100%

AR-1 AR-3

PlanetLab: percentage of overhead

Figure 18. A sample content-based routing and interaction overhead for simple and complex typesof AR messages in a 16-RPs system.

and 50% of the total execution time, respectively, while for the PlanetLab testbed (Figure 18(e)) thetime is negligible (40ms) compared with the overall runtime, which is in seconds. The overheadof the AR layer is also small and about 10ms for the LAN environment, ORBIT wireless testbed,and the PlanetLab testbed. For a 16-RPs system the average multi-hop network latency is about50ms for both the LAN and ORBIT wireless environment, which accounts for almost half the totalruntime, while for the PlanetLab testbed the average accumulated network latencies are of the orderof hundreds of milliseconds, which dominate the total runtime and account for over 80%. Theseresults show that the underlying network latencies have a significant influence on the performanceof Meteor, especially for the wide-area deployment.



For AR-3 messages involving complex selections, the Squid overhead is about 300–400ms forall three deployments. For the LAN case (Figure 18(b)) this accounts for over 80% of the totalruntime; for the ORBIT case (Figure 18(d)), about 50% of the total runtime; and for the PlanetLabcase (Figure 18(f)), about 25% of the overall runtime. The overhead of the AR layer is larger thanfor AR-1 messages due to the costs of complex selection, and is up to 100ms, which accounts forabout 20% of the runtime in the LAN case, and less than 10% of the runtime in the ORBIT andPlanetLab cases. The runtime of network latency for AR-3 messages is slightly higher than 50msin the LAN environment, which is negligible compared with the overall runtime of 400–500ms. Inthe ORBIT case (Figure 18(c) and (d)), the network latency is about 500ms, which is comparableto the Squid overhead. In the PlanetLab case, network latency once again dominates the overallruntime as seen in Figure 18(e).Figure 19 plots the average runtime for AR-3 messages for LAN, ORBIT, and PlanetLab envi-

ronments. The average runtime for LAN is about two to three times smaller than for ORBIT, andfive times smaller than for PlanetLab, which is expected due to the wireless connectivity and higherwide-area latencies, respectively. To summarize the experiments, in case of LAN environments theoverall runtime is dominated by content routine overheads of the content routing layer, while forthe PlanetLab testbed the overall runtime is dominated by latencies at the network layer, and for theORBIT testbed the overall runtime is mainly composed of both network latency and content routinglayer. The overhead of the AR layer is almost the same in all cases. Further, while the runtime inall three cases does increase with system size, its rate of increase is much slower, indicating thescalability of Meteor.

0

200

400

600

800

1000

1200

1400

1600

LAN ORBIT PLANETLAB

tim

e (m

illis

eco

nd

)

Figure 19. Average runtime for AR-3 messages for LAN, ORBIT, and PlanetLab.



4.5. Scalability of aggregation services on ORBIT, PlantLab, and LAN testbeds

This experiment examines the scalability and efficiency of the aggregation service on ORBIT, Plan-etLab, and LAN environments. The aggregation queries used included ranges to specify the locationof interest (e.g. 15–37), and wildcards such as temp*. The aggregation operator used was COUNT.The aggregation time for different network sizes on different deployment environments is plottedin Figure 20. As shown in the figure, the aggregation time scales well in terms of system size onthese three different environments. Note that the average aggregation time in the wireless environ-ments is in between those for the wide-area environments and more stable LAN environments. Theaggregation time includes overheads such as routing table lookup and aggregate computation. Theoverall latency shows that our system can provide near-real-time pervasive services for complexqueries required by scientific applications.

4.6. Simulation results

The performance of the Meteor infrastructure is also evaluated using a simulator. The simulatorimplements the AR messaging, SFC-based mapping, and the Chord-based overlay network for anetwork of up to about 5000 nodes. The simulator models wide-area network latencies between RPsusing network delays statistics collected from the PlanetLab deployment. The resulting executiontime of AR messages consists of overhead for AR messaging, Squid content-based routing, andChord overlay network delay including network delays. As the overlay network configuration andoperations are based on Chord, its maintenance costs are of the same order as in Chord.As shown in Figure 21, the average execution time for AR-1 and AR-2messages is about 750ms

for an overlay of 1000 RPs and becomes about 850ms for about 5000 RPs. This illustrates thescalability of Meteor infrastructure for a system of over thousands of nodes. For AR-3 and AR-4messages, the runtime is about 3.5 s in a 1000 overlay network and becomes about 6 s for a networkwith 5000 RPs. The increase in runtime is mainly due to the underlying multi-hop network delays,

Messaging with complex queries

0

500

1000

1500

2000

2500

4 8 16 32 64

Number of peers

Tim

e (m

s)

LAN ORBIT PlanetLab

Figure 20. Scalability of aggregation service in terms of overlay size for Orbit testbed,PlanetLab testbed, as well as LAN.



0

20

40

60

80

100

120

1000 2000 3000 4000 5000

number of nodes

Nu

mb

er o

f C

ho

rd L

oo

kup

s

AR-1/AR-2 messages AR-3 messages

Figure 21. Simulation results for a large-scale system.

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

400 800 1200 1600

Number of nodes

Fra

ctio

n o

f n

od

es p

roce

ssin

gA

R m

essa

ges

post(<*, *, temp>, retrieve)post(<100-290, *, temp>, retrieve)post(<100-290, 10-90, temp>, retrieve)

1.451.70 1.90 2.00

3.85

4.708.00

5.10 5.60

6.55

7.20

7.60

Figure 22. Simulation result of the aggregation service on larger systems. The number of nodes processingaggregates before being normalized to system size is also shown on the curve.

which represents approximately 85% for AR-1/AR-2 messages for thousands of RPs. For AR-3messages, the network latency represents 88% of the overall overhead. As the overlay networkconfiguration and operations are based on Chord, its maintenance costs are of the same order asin Chord. From Figure 21, we can conclude that Meteor scales well to large systems of thousandsof RPs.Effect of specifying location attributes for AR aggregation: The simulator was also used to evaluate

the operation of the aggregation service for systems with up to 1600 RP nodes. This simulation



used three sets of AR messages with complex profiles:

• No specification of location, e.g. 〈∗, ∗, temp〉.• Specifying location range in one dimension, e.g. 〈100–290, ∗, temp〉.• Specifying location range in two dimensions, e.g. 〈100–290, 10–90, temp〉.The number of peer nodes that processed the aggregates was measured for each AR message.

The fraction of peer nodes processing each type of aggregation message as a function of systemsize is plotted in Figure 22.The figure shows that the number of nodes that process a message is a small fraction of the

system size. Further, while the number of nodes processing a specific message increases with thesystem size, it increases at a slower rate than the system size, indicating the scalability of the system.Finally, the results demonstrate that, as the specification of the profile becomes more specific, thefraction of nodes involved decreases, indicating the effectiveness of the Meteor routing mechanism.

5. RELATED WORK

Content-based decoupled interactions have been addressed by publish–subscribe–notify (PSN)mod-els [3]. PSN-based systems include Sienna [17] and Gryphon [18]. The AR model differs from PSNsystems in that individual interests (subscriptions) are not used for routing and do not have to besynchronized—they can be locally modified at a rendezvous node at any time.i3 [4] provides a similar rendezvous-based abstraction and has influenced this work. However,

an i3 identifier is opaque and must be globally known. AR uses semantic identifiers that aremore expressive and only require the existence of agreement upon information spaces (ontologies).Besides, its dynamic binding semantics enable profiles to be added, deleted, or changed on-the-fly.The associative broadcast [8] paradigm has also influenced this effort. The key difference between

this model and AR is that the binding of profiles takes place at intermediate nodes instead of thebroadcast medium. As a result, associative broadcast only supports transient interactions. Further,its scalability over wide areas is a concern.The rendezvous-based communication is conceptually similar to tuple space research in dis-

tributed systems [19–21]. A tuple space is a shared space that can be associatively accessed byall nodes in the system. While tuple space is a powerful model for interactions and coordination,efficient and large-scale implementation of pure tuple space-based systems is a challenge. AR main-tains the conceptual expressiveness of tuple spaces, while providing an implementation model thatis scalable.Unlike other rendezvous-based models [22], AR enables programmable reactive behaviors at RPs

using the action field within a message. In addition, AR can realize a variety of basic communicationservices without the need for mobile code [23], or any heavy duty protocols. Further, interactionsin the AR model are symmetric, allowing participants to simultaneously be information producersand consumers.Narada Brokering [24] is a distributed middleware framework that supports peer-to-peer systems

and content-based publish/subscribe interactions. It manages a network of brokers through whichend systems can interact, providing scalability, location independence, and efficient content-basedquerying and routing. However, Narada brokers are organized in a hierarchical structure, which



is maintained through tighter coupling and control mechanisms, focusing on persistence and re-liable message delivery. In contrast, Meteor is meant to support more dynamic and opportunisticinteractions in a peer-to-peer network.Content-based publish/subscribe over DHTs is a topic for which there is much current work.

DHT functionality is usually built using some sort of a structured overlay network, the most popularof which are Chord [10], used here, Pastry [25], and CAN [26], because they provide scalability,search guarantees, and bounds on messaging within the network, as well as some degree of self-management and fault tolerance with respect to the addition/removal of nodes. With this foundation,designing content-based publish/subscribe systems requires an efficient mapping between contentdescriptors and nodes in the overlay network, as well as efficient techniques for routing andmatchingbased on these content descriptors, which can contain wildcards and ranges for complex queries.The work in [27–30] addresses these issues to some extent. Meteor and Squid differ from theseapproaches mainly in the locality-preserving mapping used.TheMeteor framework has recently been used to support aWeb Services-based notification broker

service for content-based subscription management and notification dissemination targeting highlydynamic pervasive grid environments that adopt the Web service notification (WSN) standards [31].This service makes use of Meteor’s AR messaging and reactive behaviors to provide a distributedand decentralized implementation of the operations defined by the WSN interfaces.Other related work include TelegraphCQ [32], which uses window-based query semantics for

continuous queries; it uses an efficient filtering mechanism corresponding to the desired end-to-endapplication behavior. The original design is primarily a single-node system. Our system, on theother hand, distributes queries in the network.Data aggregation is an essential functionality in sensor networks, and has been addressed by a

number of research efforts. The Cornell Cougar [33], TinyDB [34], and TinyAggregation (TAG) [35]systems provide high-level programming abstractions that view the sensor network as a distributeddatabase, and provide SQL-like interfaces for querying the networks. Optimization techniques suchas aggregation trees are used to resolve queries efficiently. Only single tasking with homogeneousaggregation operations is supported. The approach presented in this paper defines the aggregationoperations as programmable reactive behaviors and can associate different aggregation operatorswith different data scopes and properties.

6. CONCLUSION

As the scale, complexity, heterogeneity, and dynamism of pervasive grid environments increase,interaction paradigms based on static names (addresses, identifiers) and on synchronous or strictlycoupled interactions are quickly becoming insufficient for effective and efficient communication.This has led researchers to consider alternative paradigms that are decoupled and content based. Inthis paper, we presented AR, a content-based decoupled interaction abstraction model for pervasiveGrid environments. AR extends conventional name/identifier-based rendezvous in two ways. First,it uses flexible combinations of keywords (i.e. keywords, partial keywords, wildcards, ranges) froma semantic information space instead of opaque identifiers that have to be globally known. Second,it enables the reactive behavior at RPs to be defined by the message. Messages and interactionsare symmetric, allowing participants to simultaneously be producers and consumers. For example,



a sensor may produce data only when there is an interest for its data, allowing it to conserve energyand bandwidth.In this paper we also presented the design and implementation of Meteor, a scalable content-

based decoupled interaction infrastructure providing rich expressiveness and programmable reactivebehavior. Meteor provides guarantees for both information lookup and performance, which areessential requirements of heterogeneous pervasive environments. An experimental evaluation ofMeteor using a wide-area PlanetLab testbed, a wireless ORBIT testbed, as well as a campus networkwas presented. Further, a simulation-based evaluation of the scalability of Meteor on systems withthousands of nodes was also presented. The evaluation results demonstrated the effectiveness ofMeteor as an interaction infrastructure for pervasive grid environments.

REFERENCES

1. Matossian V, Bhat V, Parashar M, Peszynska M, Sen M, Stoffa P, Wheeler MF. Autonomic oil reservoir optimizationon the grid. Concurrency and Computation: Practice and Experience 2005; 17(1):1–26.

2. Klie H, Bangerth W, Gai X, Wheeler M, Stoffa P, Sen M, Parashar M, Catalyurek U, Saltz J, Kurc T. Models, methodsand middleware for grid-enable multiphysics oil reservoir management. Engineering with Computers 2006; 22:349–370.

3. Eugster PT, Felber PA, Guerraoui R, Kermarrec AM. The many faces of publish/subscribe. ACM Computing Surveys2003; 35(2):114–131.

4. Stoica I, Adkins D, Zhuang S, Shenker S, Surana S. Internet indirection infrastructure. Proceedings of ACM SIGCOMM’02,Pittsburgh, PA, 2002; 73–86.

5. Schmidt C, Parashar M. Enabling flexible queries with guarantees in p2p systems. Internet Computing Journal 2004;8(3):19–26.

6. Jiang N, Schmidt C, Matossian V, Parashar M. Enabling applications in sensor-based pervasive environments. The FirstWorkshop on Broadband Advanced Sensor Networks (BaseNets 2004), San Jose, CA, U.S.A., 2004.

7. Jiang N, Schmidt C, Parashar M. A decentralized content-based aggregation service for pervasive environments.International Conference of Pervasive Services (ICPS), Lyon, France, 2006; 203–212.

8. Bayerdorffer B. Distributed programming with associative broadcast. Proceedings of the 27th Annual Hawaii InternationalConference on System Sciences, Volume 2: Software Technology (HICSS94-2), Wailea, HI, U.S.A., 1994; 353–362.

9. Bhandarkar P, Parashar M. Semantic communication for distributed information coordination. Proceedings of the IEEEConference on Information Technology, Syracuse, NY. IEEE Computer Society Press: Silver Spring, MD, 1998; 149–152.

10. Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H. Chord: A scalable peer-to-peer lookup service for internetapplications. Proceedings of the ACM SIGCOMM’01 Conference, San Diego, CA, 2001; 149–160.

11. Quiroz A. Two-level structured overlay design for cluster management in peer-to-peer networks. Technical Report TR-275,CAIP Center, Rutgers University, 2006.

12. Sagan H. Space-Filling Curve. Springer: Berlin, 1995.13. PlanetLab. http://www.planet-lab.org/ [March 2007].14. ORBIT. http://www.orbit-lab.org/ [March 2007].15. Lee SJ, Sharma P, Banerjee S, Basu S, Fonseca R. Measuring bandwidth between planetlab nodes. Proceedings of

Passive and Active Measurement Workshop (PAM 2005), Boston, MA, U.S.A., 2005; 292–305.16. Raychaudhuri D, Seskar I, Ott M, Ganu S, Ramachandran K, Kremo H, Siracusa R, Liu H, Singh M. Overview of

the ORBIT radio grid testbed for evaluation of next-generation wireless network protocols. Proceedings of the IEEEWireless Communications and Networking Conference (WCNC), New Orleans, LA, U.S.A., vol. 3, 2005; 1664–1669.

17. Carzaniga A, Wolf AL. Content-based networking: A new communication infrastructure. NSF Workshop on anInfrastructure for Mobile and Wireless Systems, Scottsdale, AZ, U.S.A., 2001.

18. Gryphon: publish/subscribe over public networks. http://www.research.ibm.com/gryphon/papers/Gryphon-Overview.pdf [March 2007].

19. Omicini A, Denti E. From tuple spaces to tuple centers. Science of Computer Programming 2001; 41:277–294.20. JavaSpaces. http://www.javaspaces.homestead.com/ [March 2007].21. Wyckoff P. T spaces. IBM Systems Journal 2001; 37(3):454–478.22. Gao J, Steenkiste P. Rendezvous points-based scalable content discovery with load balancing. Proceedings of the Fourth

International Workshop on Networked Group Communication (NGC’02), Boston, MA, 2002; 71–78.23. Tennenhouse DL, Smith JM, Sincoskie WD, Wetherall DJ, Minden GJ. A survey of active network research. IEEE

Communications Magazine 1997; 35(1):80–86.



24. Fox G, Pallickara S, Rao X. A scalable event infrastructure for peer to peer grids. Proceedings of the 2002 JointACM-ISCOPE Conference on Java Grande. ACM Press: Seattle, Washington, U.S.A., 2002; 66–75.

25. Rowstron A, Druschel P. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems.International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, 2001; 329–350.

26. Ratnasamy S, Francis P, Handley M, Karp R, Schenker S. A scalable content-addressable network. Proceedings ofthe 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. ACMPress: San Diego, CA, U.S.A., 2001; 161–172.

27. Baldoni R, Marchetti C, Virgillito A, Vitenberg R. Content-based publish–subscribe over structured overlay networks.Proceedings of the 25th International Conference on Distributed Computing Systems (ICDCS ’05), Columbus, OH, 2005;437–446.

28. Tam D, Azimi R, Jacobsen HA. Building Content-based Publish/Subscribe Systems with Distributed Hash Tables (LectureNotes in Computer Science, vol. 2944). Springer: Berlin, 2004; 138–152.

29. Aekaterinidis I, Triantafillou P. Internet scale string attribute publish/subscribe data networks. Proceedings of the ACM14th Conference on Information and Knowledge Management (CIKM), Bremen, Germany, 2005; 44–51.

30. Gupta A, Sahin OD, Agrawal D, Abbadi AE. Meghdoot: Content-based Publish/Subscribe over p2p Networks (LectureNotes in Computer Science, vol. 3231). Springer: Berlin, 2004; 254–273.

31. Quiroz A, Parashar M. Design and implementation of a distributed content-based notification broker for ws-notification.Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, Barcelona, Spain, 2006; 207–214.

32. Chandrasekaran S, Cooper O, Deshpande A, Fanklin MJ, Hellerstein JM, Hong W, Krishnamurthy S, Madden S,Raman V, Reiss F, Shah M. TelegraphCQ: Continuous dataflow processing for an uncertain world. Proceedings of 2003CIDR Conference, Asilomar, CA, U.S.A., 2003.

33. Yao Y, Gehrke JE. The cougar approach to in-network query processing in sensor networks. Sigmod Record 2002;31(3):9–18.

34. Madden S, Franklin MJ, Hellerstein JM, Hong W. TinyDB: An acquisitional query processing system for sensor networks.ACM Transactions on Database System 2005; 30(1):122–173.

35. Madden S, Franklin MJ, Hellerstein JM, Hong W. TAG: A Tiny AGgregation service for Ad-Hoc sensor networks. ACMSIGOPS Operating Systems Review 2002; 36:131–146.


Documents

Meteor: a middleware infrastructure for content���based decoupled interactions in pervasive grid environments

Meteor: a middleware infrastructure for content��based decoupled interactions in pervasive grid environments