24
Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica Middleware Laboratory MID LAB A Scalable P2P Architecture for Topic-based Event Dissemination Roberto Baldoni 1 Roberto Beraldi 1 Vivien Quéma 2 Leonardo Querzoni 1 Sara Tucci Piergiovanni 1 1. Università degli Studi di Roma “La Sapienza” 2. INRIA Rhône-Alpes 19-3-2007

A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Università di Roma “La Sapienza”Dipartimento di Informatica e Sistemistica

M idd l ewa re L abo r a to r yM I D L A B

A Scalable P2P Architecture for Topic-based Event DisseminationRoberto Baldoni1Roberto Beraldi1Vivien Quéma2

Leonardo Querzoni1Sara Tucci Piergiovanni1

1. Università degli Studi di Roma “La Sapienza”2. INRIA Rhône-Alpes

19-3-2007

Page 2: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Index■The publish/subscribe communication

paradigm

■Event routing on a large-scale

■Traffic confinement

■Current solutions

■Evaluation of outer-cluster routing in TERA

Page 3: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■ From provider-centric communication (client/server) to document-centric communication (publish/subscribe):■Publishers: produce data in the form of events.

■Subscribers: declare interests on published data with subscriptions.

■ Each subscription is a filter on the set of published events.

■An Event Notification Service (ENS) notifies to each subscriber every published event that matches at least one of its subscriptions.

■ Interaction between publishers and a subscribers is decoupled in space, time and flow [Eugster et al., ACM Computing Surveys, 03].

The publish/subscribe communication paradigm

Publishers

Event

Notification

Service

Subscrib

ers

publish(e)

subscribe(s)

unsubscribe(s)

notify(e)

Page 4: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■Topic-based event selection■ Each event published in the

system is tagged with a topic that completely characterizes its content.

■ Each subscription contains a topic which the subscriber is interested in.

■Other more complex models: hierarchical topic-based, content-based, type-based, etc.

■Topic-based publish/subscribe fits the needs of many common applications.

Event selection models

ccnosanco aocchwsdp oosscnwppip cjj po

t

topic t

Subscrib

er

notify()

Page 5: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■ In a peer-to-peer environment peers play both the roles of publishers/subscribers and event brokers.

■These publish/subscribe systems are usually implemented on top of overlay networks that provide:■ connectivity among participants

■ an efficient and reliable substrate for information diffusion

■Trivial solution to the problem of event dissemination:■ Each event is broadcasted in the network.

■ Subscription-based filtering is performed locally.

■ This usually implies a great waste of resources (on the network and on the nodes).

■The semantics of the publish/subscribe paradigm can be leveraged to confine the diffusion of each event only in the set of matched subscribers without affecting the whole network (traffic confinement).

Event routing in distributed ENSs

Page 6: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■Traffic confinement can be realized solving three problems:

■ Interest clusteringSubscribers sharing similar interests should be arranged in a same cluster ; ideally, given an event, all and only the subscribers interested in that event should be grouped in a single cluster.

■Outer-cluster routingEvents can be published anywhere in the system. We need a mechanism able to bring each event from the node where it is published, to at least one interested subscriber.

■ Inner-cluster disseminationOnce a subscriber receives an event it can simply broadcast it in the cluster it is part of.

Traffic confinement

Page 7: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■ Scribe [Castro et al., IEEE Journal on Selected Areas in Communications n.8 v.20, 2002]

■ Topic-based publish/subscribe implemented on top of DHTs.

■ For each topic a single node is responsible to act as a rendez-vous point between published events and issued subscriptions.

■ Problems:

■ single points of failure;

■ hot spots;

■ partial traffic confinement.

Current solutions

Page 8: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■CAN [Ratnasamy et al., Lecture Notes in Computer Science v.2233, 2001]

■ Based on techniques similar to Scribe.

■ Employs clustering: for each topic a

dedicated overlay is built and maintained.

■ Problems:

■ single points of failure;

■ hot spots;

■ rendez-vous nodes are forced to subscribe the topic.

Current solutions

Page 9: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■Data Aware Multicast [Baehni et al., International Conference on Dependable Systems and Networks, 2004]

■Hierarchical Topic-based publish/subscribe.

■ Each topic is mapped to an overlay network.

■Overlays are maintained and interconnected through probabilistic algorithms.

■ Problem:

■ Publishers are forced tosubscribe topics

Current solutions

Page 10: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■ Sub-2-Sub [Voulgaris et al., International Workshop on Peer-to-Peer Systems, 2006]

■Content-based publish/subscribe.

■Complex three-levels infrastructure.

■ Employs clustering: brokers with similar interests are clustered in a same overlay.

■ Similarity is calculated checking intersections among subscriptions.

■ Problems:

■ depending on subscription distribution a huge number of distinct overlays must be maintained;

■ the number of overlay networks a single node participates to is not proportional to the number of subscriptions it stores.

Current solutions

Page 11: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■A two-layer infrastructure:■All clients are connected by a single overlay network at the lower layer (general

overlay).

■Various overlay network instances at the upper layer connect clients subscribed to same topics (topic overlays).

■ Event diffusion:■ The event is routed in the

general overlay toward one of the nodes subscribed to the target topic.

■ This node acts as an access point for the event that is then diffused in the correct topic overlay.

A two-layer infrastructure for event diffusion

Topic overlay

Node

Node used as access point

Event routedin the system

General overlay

(a) System overview

TERA

Applications

Overlay Management Protocol

Network

Event Management Subscription Management

Broadcast

Partition Merging

Access Point Lookup

Peer Sampling

Size Estimation

subscribe unsubscribe publish notify

(b) Node architecture

Figure 1: The TERA publish/subscribe system.

2 An Overview of TERA

TERA is a topic-based publish/subscribe system designed to offer an event diffusion service for very

large scale peer-to-peer systems. Each published event is “tagged” with a topic and is delivered to

all the subscribers that expressed their interest in the corresponding topic by issuing a subscription

for it. The set of available topics is not fixed, nor predefined: applications using TERA can

dynamically create or delete them.

2.1 Architecture

Nodes participating to TERA are organized in a two-layers infrastructure (ref. Figure 1(a)). At

the lower layer, a global overlay network connects all nodes, while at the upper layer various topic

overlay networks connect subsets of all the nodes; each topic overlay contains nodes subscribed to

the same topic. All these overlay networks are separated and are maintained through an overlay

management protocol.

Subscription management and event diffusion in TERA are based on two simple ideas: nodes

4

Page 12: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■ Event routing in the general overlay is realized through a random walk.

■The walk stops at the first broker that knows an access point for the target topic.

Outer-cluster routing

B4

B6

B1

B3

B2

B5

t

to the topic overlay

topic AP

a B5

f B6

topic AP

x B1

a B5

topic AP

e B4

h B4

topic AP

t B1

y B6

Page 13: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

OMPs: Newscast, Cyclon, etc.

Architecture

OverlayNetwork

ACCESS POINT

LOOKUP

Applications

Network

TERA

Overlay Management Protocol

General overlay

peer sampling

Topic 1 overlay

peer sampling

sizeestim.

Topic 2 overlay

peer sampling

sizeestim.

Topic n overlay

peer sampling

sizeestim.

EVENT MANAGEMENT SUBSCRIPTION MANAGEMENT

INNER-CLUSTER

DISSEMINATION

PARTITION

MERGING

ACCESS POINT TABLE

SUBSCRIPTION

TABLE

pub

lishe

d e

vent

s (low

er la

yer)

pub

lishe

d e

vent

s (u

pper

laye

r)ev

ents

node

IDs

lookup

check subscription

add o

r

rem

ove

topic

ove

rlay

siz

e

inst

antia

te/jo

in/le

ave

a to

pic

ove

rlay

subsc

riptio

n ad

vert

isem

ents

forc

e vi

ew

exch

ange

subsc

ription a

dve

rtisem

ents

viewexchange

joinoverlay

publish notify subscribe unsubscribe

rand

om

wal

ks

node

IDs

topic overlay id

Figure 2: A detailed view of the architecture of TERA.

as for notifying subscribers. An event dissemination startsas soon as an application publishes some data in a topic.It is done in two steps: the event is first routed to a nodesubscribed to the topic (this node acts as an access point forit); then, the access point diffuses the event in the overlayassociated to the topic. The first step is realized througha lookup executed on the Access Point Lookup component:if the lookup returns an empty list of node identifiers, thenode discards the event.

When a node subscribed to the topic receives an eventfor which it must act as an access point, it uses the broad-cast primitive provided by the Inner-Cluster Disseminationservice to forward the event to all nodes belonging to thecorresponding topic overlay. When a node subscribed tothe topic receives a broadcasted event, it notifies the appli-cation.

Access Points Lookup.The Access Point Lookup component plays a central role

in TERA’s architecture as it is used by both the Event Man-agement and Subscription Management components to ob-tain lists of access points identifiers for specific topics. Itsfunctioning is based on a local data structure, called Ac-cess Point Table (APT), and a distributed search algorithmbased on random walks.

Each APT is a cache, containing a limited number of en-tries, each with the form < t, n >, where t is a topic and nthe identifier of a node that can act as an access point for t.APTs are continuously updated following a simple strategy:each time a node receives a subscription advertisement fortopic t from a node n, it substitutes the access point identi-fier for t if an entry < t, n′ > exists in the APT, otherwiseit adds a new entry < t, n > with probability 1/Pt, wherePt is the popularity of topic t estimated by n and attached

to the subscription advertisement. When an APT exceedsa predefined size, randomly chosen entries are removed.

As a consequence of this update strategy, APTs have thefollowing properties:

1. APT entries tend to contain non-stale access points,

2. inactive topics (i.e. topics that are no longer subscribedby any node) tend to disappear from APTs,

3. each access point is a uniform random sample of thepopulation of nodes subscribed to that topic,

4. the content of each APT is a uniform random sampleof the set of active topics (i.e. topics subscribed by atleast one node),

5. the size of each APT is limited.

The first property is a consequence of the way new en-tries are added to APTs; suppose, in fact, that there is onlyone topic t in the system subscribed by two nodes, na andnb; suppose, moreover, that, at certain point of time, nb

unsubscribes t. Starting from that moment, only na willadvertise t, therefore nodes containing an entry < t, nb >will eventually substitute it with entry < t, na >, as theuniformity of node samples provided by the peer samplingservice guarantees that na will eventually advertise t to allthe system population. The second property comes from thefact that inactive topics are no longer advertised. They are,thus, eventually replaced by active topics in APTs (assum-ing that the set of active topics is larger than the maximumAPT size). The third property is a consequence of the factthat subscription advertisements are sent to nodes returnedby the peer sampling service that provides uniform randomsamples, and that each node advertises its subscriptions with

Page 14: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■We want every topic to appear with the same probability in every APT, regardless of its popularity.

Results: topic distribution in APTs

4.2.1 Topic distribution in APTsWe start by presenting an experiment showing that the

method used in TERA to update APTs content ensures auniform distribution of topics in every APT. This is a fun-damental property for APTs as it allows TERA to use theircontent as a uniform random sample of the active topic pop-ulation and build on it the access point lookup mechanism.We ran tests over a system with 104 nodes, each advertisingits subscriptions every 5 cycles to 5 neighbors out of 20 (theoverlay management protocol view size). APT size was lim-ited to 10 entries. We issued 5000 subscriptions distributedin various ways on 1000 distinct topics, and we measured,for each topic, the number of APTs containing an entry forit. The expected outcome of these tests is to find a con-stant value for such measure, regardless of the initial topicpopularity distribution.

Figure 3(a) shows the results for an initial uniform distri-bution of topic popularity. The X axis represents the topicpopulation (each topic is mapped to a number). Each blackdot represents the number of times a specific topic appearsin APTs, while the grey dot represents its popularity. Theplot shows that each topic is present, on average, in the samenumber of APTs, with a very small error that is randomlydistributed around the mean. This confirms that the topicdistribution in APTs can be considered uniform.

Figures 3(b) and 3(c) show the results for an initial zipfdistribution of topic popularity. The two graphs report theresults for differently skewed popularity distributions (dis-tribution parameter a = 0.7 and a = 2.0). As these graphsshow, TERA is always able to balance APT updates, anddelivers an almost uniform distribution. Even in an extremecase (a = 2.0), the APT update mechanism is able to bal-ance the updates coming from the small number of activetopics (in this scenario only 79 topics share the whole 5000subscriptions), maintaining their presence in APTs aroundthe same average value with a small standard deviation (al-ways below 5%). In the next evaluations, we only report re-sults for zipf popularity distribution with a = 0.7, as resultsfor other values of a did not exhibit significant differences.

4.2.2 Access Point LookupIn this section, we evaluate the probability for the access

point lookup mechanism to successfully returns a node iden-tifier for a lookup operation (in the case such node exists).We denote by K the lifetime of the random walk (the max-imum number of visited nodes), by |APT | the size of APTtables, and by |T | the number of topics8. The probability pto find an access point for a specific topic in an APT is p =|APT |

|T | . Assuming that every APT contains the maximum al-lowed number of entries, the probability that an access pointcannot be found within K steps is Pr{fail} = (1− p)K .Thus, the probability to find the access point visiting at most

K nodes is Pr{success} = 1−(1− p)K = 1−“1− |APT |

|T |

”K.

Therefore, to ensure with probability P that an access pointfor a given topic will be found, it is necessary that sizes Kor |APT | be such that:

8Thanks to the fact that APTs can be considered as uniformrandom samples of the set of active topics, each node canestimate at runtime the value of |T | [16].

Distribution of subscriptions on APTs

(uniform)

0

20

40

60

80

100

120

140

160

180

200

0 200 400 600 800 1000

Topics

Nu

mb

er o

f p

resen

ces

Distribution on APTs

Popularity

Std.Dev=1,11

(a)

Distribution of subscriptions on APTs

(zipf a=0,7)

0

20

40

60

80

100

120

140

160

180

200

0 200 400 600 800 1000

Topics

Nu

mb

er o

f p

resen

ces

Distribution on APTs

Popularity

Std.Dev=2,16

(b)

Distribution of subscriptions on APTs

(zipf a=2,0)

0

250

500

750

1000

1250

1500

1750

2000

0 20 40 60 80

Topics

Nu

mb

er o

f p

resen

ces

Distribution on APTs

Popularity

Std.Dev=51,49

(c)

Figure 3: The plot shows how topics are distributedamong APTs (black dots) when the topic popular-ity distribution (grey dots) is (a) uniform and (b-c)skewed (zipf with parameter a).

4.2.1 Topic distribution in APTsWe start by presenting an experiment showing that the

method used in TERA to update APTs content ensures auniform distribution of topics in every APT. This is a fun-damental property for APTs as it allows TERA to use theircontent as a uniform random sample of the active topic pop-ulation and build on it the access point lookup mechanism.We ran tests over a system with 104 nodes, each advertisingits subscriptions every 5 cycles to 5 neighbors out of 20 (theoverlay management protocol view size). APT size was lim-ited to 10 entries. We issued 5000 subscriptions distributedin various ways on 1000 distinct topics, and we measured,for each topic, the number of APTs containing an entry forit. The expected outcome of these tests is to find a con-stant value for such measure, regardless of the initial topicpopularity distribution.

Figure 3(a) shows the results for an initial uniform distri-bution of topic popularity. The X axis represents the topicpopulation (each topic is mapped to a number). Each blackdot represents the number of times a specific topic appearsin APTs, while the grey dot represents its popularity. Theplot shows that each topic is present, on average, in the samenumber of APTs, with a very small error that is randomlydistributed around the mean. This confirms that the topicdistribution in APTs can be considered uniform.

Figures 3(b) and 3(c) show the results for an initial zipfdistribution of topic popularity. The two graphs report theresults for differently skewed popularity distributions (dis-tribution parameter a = 0.7 and a = 2.0). As these graphsshow, TERA is always able to balance APT updates, anddelivers an almost uniform distribution. Even in an extremecase (a = 2.0), the APT update mechanism is able to bal-ance the updates coming from the small number of activetopics (in this scenario only 79 topics share the whole 5000subscriptions), maintaining their presence in APTs aroundthe same average value with a small standard deviation (al-ways below 5%). In the next evaluations, we only report re-sults for zipf popularity distribution with a = 0.7, as resultsfor other values of a did not exhibit significant differences.

4.2.2 Access Point LookupIn this section, we evaluate the probability for the access

point lookup mechanism to successfully returns a node iden-tifier for a lookup operation (in the case such node exists).We denote by K the lifetime of the random walk (the max-imum number of visited nodes), by |APT | the size of APTtables, and by |T | the number of topics8. The probability pto find an access point for a specific topic in an APT is p =|APT |

|T | . Assuming that every APT contains the maximum al-lowed number of entries, the probability that an access pointcannot be found within K steps is Pr{fail} = (1− p)K .Thus, the probability to find the access point visiting at most

K nodes is Pr{success} = 1−(1− p)K = 1−“1− |APT |

|T |

”K.

Therefore, to ensure with probability P that an access pointfor a given topic will be found, it is necessary that sizes Kor |APT | be such that:

8Thanks to the fact that APTs can be considered as uniformrandom samples of the set of active topics, each node canestimate at runtime the value of |T | [16].

Distribution of subscriptions on APTs

(uniform)

0

20

40

60

80

100

120

140

160

180

200

0 200 400 600 800 1000

Topics

Nu

mb

er o

f p

resen

ces

Distribution on APTs

Popularity

Std.Dev=1,11

(a)

Distribution of subscriptions on APTs

(zipf a=0,7)

0

20

40

60

80

100

120

140

160

180

200

0 200 400 600 800 1000

Topics

Nu

mb

er o

f p

resen

ces

Distribution on APTs

Popularity

Std.Dev=2,16

(b)

Distribution of subscriptions on APTs

(zipf a=2,0)

0

250

500

750

1000

1250

1500

1750

2000

0 20 40 60 80

Topics

Nu

mb

er o

f p

resen

ces

Distribution on APTs

Popularity

Std.Dev=51,49

(c)

Figure 3: The plot shows how topics are distributedamong APTs (black dots) when the topic popular-ity distribution (grey dots) is (a) uniform and (b-c)skewed (zipf with parameter a).

Page 15: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

Which is the probability for an event to be correctly routed in the general overlay toward an access point ?

■Depends on:■ uniform randomness of topics

contained in access point tables;

■ access point table size;

■ random walk lifetime.

Results: outer-cluster routing

K =ln(1− P )

ln“1− |APT |

|T |

” or |APT | = |T |“1− K

√1− P

Note that, given K and P , |APT | linearly depends on |T |.In order to reduce APT size, it would be necessary to in-crease random walks length (i.e. using a large value for K)negatively affecting the time it takes to find an access point.To mitigate this problem, it is advisable to launch r multipleconcurrent random walks, each having a lifetime #K

r $. In-deed, the fact that topics are uniformly distributed amongAPTs guarantees that launching multiple concurrent ran-dom walks does not impact the lookup success rate. In thisway, access point lookup responsiveness is improved at thecost of a slightly larger overhead due to the independencyof each random walk lifetime.

We ran experiments to check that TERA’s behavior isclose to the one predicted by the analytical study. Tests wererun on a system with 1000 nodes, each having Cyclon viewsholding 20 nodes. At the beginning, 5000 subscriptionswere issued uniformly distributed on 1000 distinct topics.Lookups were started after 1000 cycles. Each lookup wasconducted starting four concurrent random walks (r = 4).

Figure 4(a) shows how the access point lookup success ra-tio changes when varying the lifetime of each random walk(K) for different values of |APT |. For each line, we plottedboth simulation results (solid line) and values calculated us-ing the analytical study (dashed line). The plot confirmsthat TERA’s lookup mechanism is able to probabilisticallyguarantee that an access point for an active topic will befound with probability P . Note that this plot also showsthat the actual memory size required by APTs is limited.Indeed, consider the biggest APT size plotted on the graph:400 entries. Assuming that each entry in an APT is a stringcontaining 256 characters, the memory size occupied by anAPT containing 400 entries is about 104kB.

4.2.3 Partition MergingIn this section, we analyze the probability for the partition

merging mechanism to detect a very small overlay partition,and the time it takes for this to happen. Suppose that thereis a topic represented by an overlay network partitioned intwo clusters containing |G| and 1 nodes, respectively9. Letus call n this single node. The probability p to detect thepartition in a cycle can be expressed as p = 1−(pa·pb), wherepa is the probability that none of the nodes in G advertiseits subscriptions to n, and pb is the probability that n doesnot advertise its subscriptions to any of the nodes in G.

Probability pa can be expressed as

pa = (1− Pr{a node advertises to n})|G|

Every node in G advertises its subscription to n only ifn is contained in its view for the general overlay, and if nis one of the D nodes selected for the advertisement. Letus suppose, for the sake of simplicity, that D is equal to theview size. In this case Pr{a node advertises to n} = |V iew|

(N−1) ,

9Note that the case where a partition is constituted by asingle node is the most difficult to solve as the probability fornodes belonging to distinct partitions to meet is the lowestpossible one.

Random Walk success rate.

0,0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1,0

0 1 2 3 4 5 6 7 8 9 10

Random walk lifetime

Su

ccess r

ate

APT 50 Sim APT 50 Theo APT 100 Sim

APT 100 Theo APT 400 Sim APT 400 Theo

(a)

Cycles needed to merge a partitioned node

0,0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1,0

0 50 100 150 200

Cycles

Pro

bab

ilit

y o

f m

erg

e

|G|=4 Sim

|G|=4 Theo

|G|=16 Sim

|G|=16 Theo

|G|=64 Sim

|G|=64 Theo

(b)

Figure 4: (a) The plot shows how the success ratefor access point lookups changes when varying themaximum APT size and the random walk lifetime.Solid lines represent results from the simulator,while dashed lines plot values from the formula.(b) The plot shows how the probability to detecta topic overlay partition increases with time (cy-cles). Solid lines represent results from the simula-tor, while dashed lines plot values from the formula.The tests were run varying the number |G| of nodessubscribed to the topic.

Page 16: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■ Load imposed on nodes by our system is fairly distributed:■ no hot spots or single points of failure;

■ “The more you ask, the more you pay”.

Results: load distribution

Node stress distribution

general overlay - uniform popularity

1,E-05

1,E-04

1,E-03

Node population

Percen

tag

e o

f m

essag

es h

an

dle

d

Std.Dev.=4,04E-06

Node stress distribution

global - uniform popularity

1,E-05

1,E-04

1,E-03

Node population

Percen

tag

e o

f m

essag

es h

an

dle

d

1

10

100

Nu

mb

er o

f lo

cal

su

bscrip

tio

ns

Messages (percentage)

Subscriptions

Std.Dev.=1,23E-05

(a)

Node stress distribution

general overlay - zipf popularity

1,E-05

1,E-04

1,E-03

Node population

Percen

tag

e o

f m

essag

es h

an

dle

d

Std.Dev.=4,13E-06

Node stress distribution

global - zipf popularity

1,E-05

1,E-04

1,E-03

Node population

Percen

tag

e o

f m

essag

es h

an

dle

d

1

10

100

Nu

mb

er o

f lo

cal

su

bscrip

tio

ns

Messages (percentage)

Subscriptions

Std.Dev.=1,23E-05

(b)

Figure 5: The plots show how the load generated by TERA is distributed among nodes when the

distribution of topic popularity is either uniform (a) or zipf (b). For both popularities, the figure

shows in the left graph the load distribution in the general overlay and, in the right graph, the

global load distribution (black points), together with the subscription distribution on nodes (grey

points).

Figures 5(a) show the results for a test with uniform topic popularity, while figures 5(b) show

the same results for an initial zipf distribution with parameter a = 0.7. Pictures on the left

show how load is distributed in the general overlay. As shown by the graphs, TERA is able

to uniformly distribute load among nodes, avoiding the appearance of hot spots. This result is

obtained regardless of the distribution of topic popularities. Pictures on the right show the global

load experienced by nodes; in these graphs, nodes on the X axis are ordered in decreasing local

subscriptions count (i.e. points on the left refer to nodes subscribed to more topics), in order

to show how the global load is affected by the number of subscriptions maintained at each node.

The number of subscriptions per node is also plotted with grey dots. The graphs show how load

distribution closely follows the distribution of subscription on nodes, actually implementing the

pragmatic rule “the more you ask, the more you pay”, then fairly distributing the load among

participants.

18

Page 17: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

Experiments show how our system scales with respect to:

■Number of subscriptions.

■Number of topics.

■ Event publication rate.

■Number of nodes.

(reference figure is given by a simple event flooding approach)

Results: scalability

Average notification cost

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E+06

1,E+01 1,E+03 1,E+05 1,E+07

Subscriptions

Messag

es p

er n

oti

ficati

on

Event flooding TERA

nodes: 10000

topics: 100

event rate: 1

(a)

Average notification cost

1,E+00

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E+06

1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05

Topics

Messag

es p

er n

oti

ficati

on

Event flooding TERA

nodes: 10000

subscriptions: 10000

event rate: 1

(b)

Average notification cost

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E-05 1,E-03 1,E-01 1,E+01 1,E+03 1,E+05

Event publication rate

Messag

es p

er n

oti

ficati

on

Event flooding TERA

nodes: 10000

topics: 100

subscriptions: 10000

(c)

Average notification cost

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E+06

1,E+07

1,E+08

1,E+09

1,E+01 1,E+03 1,E+05 1,E+07 1,E+09

Nodes

Messag

es p

er n

oti

ficati

on

Event flooding TERA

subscriptions: 10000

topics: 100

event rate: 1

(d)

Figure 6: The plots show the average number of messages needed by TERA to notify an event

when the number of subscriptions (a), of topics (b), the event publication rate (c) and the total

number of nodes in the system (d) varies. For each figure, results from a simple event flooding

algorithm are reported for comparison.

4.3.2 Message cost per notification

The traffic confinement strategy implemented by TERA induces some overhead. In order to assess

the global impact of this overhead, we evaluated the average cost incurred by TERA to notify a

single event to a subscriber, namely the total number of generated messages divided by the number

of notifications. This cost includes both messages generated to diffuse the event, and messages

generated for TERA’s maintenance. To offer a reference figure, we also evaluated the cost incurred

by a simple event flooding-based approach7 in the same settings.

Figure 6(a) reports the results when the total number of subscriptions varies between 102 and

106. The number of topics is fixed and equal to 100. The network considered in this test was

constituted by 104 nodes, while the event publication rate was maintained constant at 1 event per

topic in each cycle. For the evaluation to be meaningful, we required each topic to be subscribed

by at least one subscriber; therefore, each curve is limited on its left end by the number of available

topics. Moreover, we required each node to subscribe each topic at most once; therefore, each curve7Each event is broadcast in an overlay network containing all participants. The overlay is built and maintained

through the same overlay management protocol emplyed by TERA (Cyclon). Also the broadcast mechanism is the

same considered in TERA.

19

Page 18: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

Experiments show how our system scales with respect to:

■Number of subscriptions.

■Number of topics.

■ Event publication rate.

■Number of nodes.

(reference figure is given by a simple event flooding approach)

Results: scalability

Average notification cost

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E+06

1,E+01 1,E+03 1,E+05 1,E+07

Subscriptions

Messag

es p

er n

oti

ficati

on

Event flooding TERA

nodes: 10000

topics: 100

event rate: 1

(a)

Average notification cost

1,E+00

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E+06

1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05

Topics

Messag

es p

er n

oti

ficati

on

Event flooding TERA

nodes: 10000

subscriptions: 10000

event rate: 1

(b)

Average notification cost

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E-05 1,E-03 1,E-01 1,E+01 1,E+03 1,E+05

Event publication rate

Messag

es p

er n

oti

ficati

on

Event flooding TERA

nodes: 10000

topics: 100

subscriptions: 10000

(c)

Average notification cost

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E+06

1,E+07

1,E+08

1,E+09

1,E+01 1,E+03 1,E+05 1,E+07 1,E+09

Nodes

Messag

es p

er n

oti

ficati

on

Event flooding TERA

subscriptions: 10000

topics: 100

event rate: 1

(d)

Figure 6: The plots show the average number of messages needed by TERA to notify an event

when the number of subscriptions (a), of topics (b), the event publication rate (c) and the total

number of nodes in the system (d) varies. For each figure, results from a simple event flooding

algorithm are reported for comparison.

4.3.2 Message cost per notification

The traffic confinement strategy implemented by TERA induces some overhead. In order to assess

the global impact of this overhead, we evaluated the average cost incurred by TERA to notify a

single event to a subscriber, namely the total number of generated messages divided by the number

of notifications. This cost includes both messages generated to diffuse the event, and messages

generated for TERA’s maintenance. To offer a reference figure, we also evaluated the cost incurred

by a simple event flooding-based approach7 in the same settings.

Figure 6(a) reports the results when the total number of subscriptions varies between 102 and

106. The number of topics is fixed and equal to 100. The network considered in this test was

constituted by 104 nodes, while the event publication rate was maintained constant at 1 event per

topic in each cycle. For the evaluation to be meaningful, we required each topic to be subscribed

by at least one subscriber; therefore, each curve is limited on its left end by the number of available

topics. Moreover, we required each node to subscribe each topic at most once; therefore, each curve7Each event is broadcast in an overlay network containing all participants. The overlay is built and maintained

through the same overlay management protocol emplyed by TERA (Cyclon). Also the broadcast mechanism is the

same considered in TERA.

19

Page 19: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

Experiments show how our system scales with respect to:

■Number of subscriptions.

■Number of topics.

■ Event publication rate.

■Number of nodes.

(reference figure is given by a simple event flooding approach)

Results: scalability

Average notification cost

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E+06

1,E+01 1,E+03 1,E+05 1,E+07

Subscriptions

Messag

es p

er n

oti

ficati

on

Event flooding TERA

nodes: 10000

topics: 100

event rate: 1

(a)

Average notification cost

1,E+00

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E+06

1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05

Topics

Messag

es p

er n

oti

ficati

on

Event flooding TERA

nodes: 10000

subscriptions: 10000

event rate: 1

(b)

Average notification cost

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E-05 1,E-03 1,E-01 1,E+01 1,E+03 1,E+05

Event publication rate

Messag

es p

er n

oti

ficati

on

Event flooding TERA

nodes: 10000

topics: 100

subscriptions: 10000

(c)

Average notification cost

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E+06

1,E+07

1,E+08

1,E+09

1,E+01 1,E+03 1,E+05 1,E+07 1,E+09

Nodes

Messag

es p

er n

oti

ficati

on

Event flooding TERA

subscriptions: 10000

topics: 100

event rate: 1

(d)

Figure 6: The plots show the average number of messages needed by TERA to notify an event

when the number of subscriptions (a), of topics (b), the event publication rate (c) and the total

number of nodes in the system (d) varies. For each figure, results from a simple event flooding

algorithm are reported for comparison.

4.3.2 Message cost per notification

The traffic confinement strategy implemented by TERA induces some overhead. In order to assess

the global impact of this overhead, we evaluated the average cost incurred by TERA to notify a

single event to a subscriber, namely the total number of generated messages divided by the number

of notifications. This cost includes both messages generated to diffuse the event, and messages

generated for TERA’s maintenance. To offer a reference figure, we also evaluated the cost incurred

by a simple event flooding-based approach7 in the same settings.

Figure 6(a) reports the results when the total number of subscriptions varies between 102 and

106. The number of topics is fixed and equal to 100. The network considered in this test was

constituted by 104 nodes, while the event publication rate was maintained constant at 1 event per

topic in each cycle. For the evaluation to be meaningful, we required each topic to be subscribed

by at least one subscriber; therefore, each curve is limited on its left end by the number of available

topics. Moreover, we required each node to subscribe each topic at most once; therefore, each curve7Each event is broadcast in an overlay network containing all participants. The overlay is built and maintained

through the same overlay management protocol emplyed by TERA (Cyclon). Also the broadcast mechanism is the

same considered in TERA.

19

Page 20: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

Experiments show how our system scales with respect to:

■Number of subscriptions.

■Number of topics.

■ Event publication rate.

■Number of nodes.

(reference figure is given by a simple event flooding approach)

Results: scalability

Average notification cost

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E+06

1,E+01 1,E+03 1,E+05 1,E+07

Subscriptions

Messag

es p

er n

oti

ficati

on

Event flooding TERA

nodes: 10000

topics: 100

event rate: 1

(a)

Average notification cost

1,E+00

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E+06

1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05

Topics

Messag

es p

er n

oti

ficati

on

Event flooding TERA

nodes: 10000

subscriptions: 10000

event rate: 1

(b)

Average notification cost

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E-05 1,E-03 1,E-01 1,E+01 1,E+03 1,E+05

Event publication rate

Messag

es p

er n

oti

ficati

on

Event flooding TERA

nodes: 10000

topics: 100

subscriptions: 10000

(c)

Average notification cost

1,E+01

1,E+02

1,E+03

1,E+04

1,E+05

1,E+06

1,E+07

1,E+08

1,E+09

1,E+01 1,E+03 1,E+05 1,E+07 1,E+09

Nodes

Messag

es p

er n

oti

ficati

on

Event flooding TERA

subscriptions: 10000

topics: 100

event rate: 1

(d)

Figure 6: The plots show the average number of messages needed by TERA to notify an event

when the number of subscriptions (a), of topics (b), the event publication rate (c) and the total

number of nodes in the system (d) varies. For each figure, results from a simple event flooding

algorithm are reported for comparison.

4.3.2 Message cost per notification

The traffic confinement strategy implemented by TERA induces some overhead. In order to assess

the global impact of this overhead, we evaluated the average cost incurred by TERA to notify a

single event to a subscriber, namely the total number of generated messages divided by the number

of notifications. This cost includes both messages generated to diffuse the event, and messages

generated for TERA’s maintenance. To offer a reference figure, we also evaluated the cost incurred

by a simple event flooding-based approach7 in the same settings.

Figure 6(a) reports the results when the total number of subscriptions varies between 102 and

106. The number of topics is fixed and equal to 100. The network considered in this test was

constituted by 104 nodes, while the event publication rate was maintained constant at 1 event per

topic in each cycle. For the evaluation to be meaningful, we required each topic to be subscribed

by at least one subscriber; therefore, each curve is limited on its left end by the number of available

topics. Moreover, we required each node to subscribe each topic at most once; therefore, each curve7Each event is broadcast in an overlay network containing all participants. The overlay is built and maintained

through the same overlay management protocol emplyed by TERA (Cyclon). Also the broadcast mechanism is the

same considered in TERA.

19

Page 21: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■Current results (static setting)■ The outer-cluster routing mechanism is able to probabilistically guarantee that a

an event will be delivered to the correct cluster.

■ The impact of this mechanism on system scalability is small, when a large number of events is published.

■Moreover, it is able to uniformly distribute the load on the system, avoiding the appearance of hot spots.

Conclusions and open problems

Page 22: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

■Open problems■ Evaluation in dynamic settings.

■What happens when we insert subscription/node churn ?

■An unsubscription can create a stale access point in some APTs.

■We must evaluate the impact of this problem on outer-cluster routing reliability.

■Can we provide (probabilistically) reliable inner-cluster dissemination ?

■Can we exploit node heterogeneity ?

■Currently we try to uniformly distribute load.

■This is not necessarily desirable if participants have different characteristics and available resources.

Conclusions and open problems

Page 23: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

Questions ?

Page 24: A Scalable P2P Architecture for Topic-based Event ... · Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica MIDLAB Middleware Laboratory A Scalable P2P

Mid

dle

war

e L

abo

rato

ryM

IDL

AB

Topic overlays can get partitioned in specific cases.

■The Partition Merging mechanism guarantees that a partitioned topic overlay will be eventually merged.

■The more popular is the topic, the less it takes for the partition to be detected and repaired.

Results: traffic confinement

Random Walk success rate.

0,0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1,0

0 1 2 3 4 5 6 7 8 9 10

Random walk lifetime

Su

ccess r

ate

APT 10 Sim APT 10 Theo APT 50 Sim APT 50 Theo

APT 100 Sim APT 100 Theo APT 400 Sim APT 400 Theo

(a)

Cycles needed to merge a partitioned node

0,0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1,0

0 50 100 150 200

Cycles

Pro

bab

ilit

y o

f m

erg

e

|G|=4 Sim

|G|=4 Theo

|G|=16 Sim

|G|=16 Theo

|G|=64 Sim

|G|=64 Theo

(b)

Figure 4: (a) The plot shows how the success rate for access point lookups changes when varying the

maximum APT size and the random walk lifetime. Solid lines represent results from the simulator,

while dashed lines plot values from the formula. (b) The plot shows how the probability to detect a

topic overlay partition increases with time (cycles). Solid lines represent results from the simulator,

while dashed lines plot values from the formula. The tests were run varying the number |G| of nodes

subscribed to the topic.

At the beginning, 5000 subscriptions were issued uniformly distributed on 1000 distinct topics.

Lookups were started after 1000 cycles. Each lookup was conducted starting four concurrent

random walks (r = 4).

Figure 4(a) shows how access point lookup success ratio changes when varying the lifetime of

each random walk (K) for different values of |APT |. For each line, we plotted both simulation

results (solid line) and values calculated using the analytical study (dashed line). The plot confirms

that TERA’s lookup mechanism is able to probabilistically guarantee that an access point for an

active topic will be found with probability P .

15