27
1 Subscription Partitioning and Routing in Content- based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang Microsoft Research DISC 2002 Toulouse, France

1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

Embed Size (px)

Citation preview

Page 1: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

1

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

Yi-Min Wang, Lili Qiu, Dimitris Achlioptas,

Gautam Das, Paul Larson, and Helen J. Wang

Microsoft Research

DISC 2002Toulouse, France

Page 2: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

2

Motivation Phenomenal growth in Web usage Future trends

Switch from polling to notifications Example: stock quotes, sports scores, weather, news, … Yahoo! Alerts, MSN Mobile, AOL anywhere, InfoSpace, … Complements the traditional polling model in Web

Event Distribution Network (EDN) Distributed and scalable event distribution

Parallel the idea of Content Distribution Network (CDN) for event distribution

Built on top of a self-configuring overlay network of servers

Content-based publish/subscribe systems through in-network processing of aggregated subscription filters

Page 3: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

3

Dispatcher-based model

Servers

Dispatcher

Publishers (Event sources)

Subscribers

Notification Routing Service

Event traffic

Notification Traffic

Page 4: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

4

Model of Content-based Pub/Sub

Content-based filtering/routing Event schema with d attributes,

supporting equality and range predicates

Event: a point in the d–dimensional space

Subscription: a rectangle in that space

Match: a rectangle contains the point

Page 5: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

5

Subscription Partitioning Basic idea: similarity-based clustering

for reducing total event traffic Event Space Partitioning (ESP) Filter Set Partitioning (FSP)

Partition 1 Partition 2

Partition 1

Partition 2

ESP FSP

Page 6: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

6

Equality Predicates Hash predicates to get uniform distribution

Treat the hashed domain as the event space Use Event Space Partitioning

Subscription is a point; does not intersect multiple sub-spaces

Use over-partitioning for better load balancing Use offline greedy algorithm to assign buckets to

servers for load balancing Use indirection table to dynamically map buckets to

servers for load re-balancing Use bloom filters to further reduce traffic

Fast detection of true negatives at the expense of (very low) false-positive rate

Page 7: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

7

Simulation Results Actual Notification Money log

1.48M subscriptions with 0.29M unique filters over 21,741 stock symbols

Zipf-like distribution

1

10

100

1000

10000

100000

1000000

1 10 100 1000 10000 100000

Stock symbol popularity ranking

# su

bsc

rip

tio

ns

for

each

sym

bo

l

Actual Least square line fit for the middle part

Page 8: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

8

Simulation Results (Cont.) Simulate 100M new subscriptions from 43,734

symbols Scaled-up Zipf-like distribution Perturbation and permutation Uniform distribution

50 servers with over-partitioning ratio = 10 Without load re-balancing

Load imbalance (max/min) ranged from 1.41 to 6.66 (Uniform case)

With imbalance threshold of 2.0 Re-balancing was triggered only 5 times, each time

involving re-assignment of up to 3 buckets and migration of up to 0.7% subscriptions.

Page 9: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

9

Range Predicates Use Filter Set Partitioning K-Mean clustering

Use center point to represent a rectangle R-tree-based clustering

R-tree: dynamic index structure for multi-dimensional data rectangles

Offline R-tree algorithm Exhaustively and recursively search for partitions that

minimize sum of bounding rectangle volumes Online R-tree algorithm

Insert from root down the path that greedily minimizes the increase in bounding rectangle volume

Simulation results Off-line R-tree > On-line R-tree > K-Mean > Random

Page 10: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

10

Related Work Pub/Sub systems

Echo, Elvin, Gryphon, Herald, Hierarchical Proxy Architecture, Information Bus, JEDI, Keryx, Ready, Scribe, Siena, …

Clustering in the pub/sub All the previous work focus on reducing #

multicast groups [OAA+00, RLW+02, WKM00]

Page 11: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

11

Summary Proposed two subscription partitioning

and routing approaches Event Space Partitioning Filter Set Partitioning

Evaluated performance via simulations Subscription partitioning reduces network

traffic Over-partitioning helps to achieve good load

balancing dynamically Bloom filter further reduces event traffic

Page 12: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

12

Simulation Results 10,000 random subscriptions per server on

average Offline R-tree performs the best; reduces event

traffic by 20% to 60%

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20

# servers

Hit

rat

io p

er s

erve

r Random

Offline R-tree

Online R-Tree

Offline/Online R-tree

Offline K-Mean

Online K-Mean

Offline/Online K-Mean

Page 13: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

13

EDN Network Architecture

16

2

4

35

5

3

1. Submit subscriptions2. Subscription routing3. Content-based route

updates4. Peer exchange of

route updates5. Content-based event

routing6. Notification delivery

NotificationRouting Services

subscriber

EventSrc.

EDNnodes

Page 14: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

14

Backup Slides

Page 15: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

15

Optimize various performance metrics, subject to load-balancing constraints Minimize total event traffic

Volume of union of rectangles Maximize overall system throughput Minimize end-to-end latency

Precise Summary

Imprecise Summary

Subscription rectangles

Page 16: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

16

The EDN Optimization Problem

Centralized Architecture

Distributed Architecture

EventSources

Subscribers

ServerNotification

RoutingService

1

PartitionExisting

Subscriptions

2SummaryReporting

3RouteEvents

4

Route NewSubscriptions

5

Page 17: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

17

Three Research Directions

Theoretical Study Optimal or approximation algorithms for

simplified versions System Design and Simulation

Subscription partitioning for reducing event traffic

Summary-based routing for enhancing system throughput

Indigo-based Implementation Extensible routing & pub/sub architecture

Page 18: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

18

An R-tree-based EDN pub/sub system

Summary Manager

Maximal Rectangles

Subscription Rectangles

Summary- Based Router

Single- Node

Filtering Engine

Event ( = Point )

Subscription ( = Rectangle )

Summary Bounding Rectangles

Page 19: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

19

System Design and Simulation:Summary-based Routing

Basic idea: summary precision-based load balancing for enhancing system throughput

Ns servers Ts F

Dispatcher Td

R Tl

Tl

Tp Publishers

Page 20: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

20

If dispatcher is not the bottleneck, use precise summary.

Otherwise, reduce summary precision until either the outgoing link or the servers are about to become the bottleneck. Throughput increasing

Further reduction of summary precision would generate excessive false-positive traffic to throttle back the dispatcher Throughput decreasing

Page 21: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

21

Simulation results

Imprecise summaries enhance throughput

0.5

1

1.5

2

2.5

3

0% 20% 40% 60% 80% 100%

Summary precision

Rel

ativ

e th

rou

gh

pu

t

100,000 rectangles (Rp=0.75;Ro=0.97) 50,000 rectangles (Rp=0.67;Ro=0.89)

20,000 rectangles (Rp=0.54;Ro=0.82) 10,000 rectangles (Rp=0.42;Ro=0.73)

Page 22: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

22

0.5

1

1.5

2

2.5

3

0% 20% 40% 60% 80% 100%

Summary precision

Re

lati

ve

th

rou

gh

pu

t

With partitioning Without partitioning

Imprecise summaries combined with R-tree-based partitioning further enhance throughput

Page 23: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

23

Dispatcher-to-link and dispatcher-to-sever bottleneck ratios

0

1

2

3

4

5

6

0% 20% 40% 60% 80% 100%

Summary precision

Dis

pa

tch

er

bo

ttle

ne

ck

ra

tio

s

Ratio_s (Ns/F=20) Ratio_oRatio_s (Ns/F=10) Ratio_s (Ns/F=2)

Page 24: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

24

EDN on Herald

Piggyback subscription routing & summary reporting on multicast tree forming process

Need to additionally consider notification traffic (because subscribers are now part of multicast tree)

Subscriber

SubscriptionRouting

Page 25: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

25

Indigo-based Implementation

Indigo M2 routing & pub/sub architecture was not extensible

EDN used M2 messaging and built a WS-compliant, extensible routing & pub/sub architecture on top of it Close collaboration with Indigo

Extensibility proposals to Indigo Some appeared in M3

But most sealed for security for now Some being considered for M4

Page 26: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

26

EDN Extensible Routing and Pub/Sub

Indigo Messaging

EDNRoute Manager

EDNSubscription Manager

WS-EventingSubscription Manager

MSRoute Manager

WS-RoutingRoute Manager

NamespaceBinding Layer

XPathFilter

Matcher

EDNR-tree

Matcher

Page 27: 1 Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson,

27

Other XML-Messaging/Indigo interactions

State dependency management Design tool for new features involving “state

transplant” E.g., System Restore (across time), Intellimirror (across

space) Repair tool providing consistent undo

System Restore + rollback of “atomic units” GoBack3 + roll-forward of “atomic units”

Troubleshooting tool Trace-diff & state-diff approaches

Our automatic, bottom-up, black-box discovery approach complements their manual, top-down, logical declaration approach (TravisM)

Install-time and run-time information augments the authoring-time information

Targeted problem spaces help identify things to declare for manageability