33
Distributed event aggregation for content-based Publish/Subscribe systems Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno Jacobsen 2 2 University of Toronto 1 University of Oslo

DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Embed Size (px)

Citation preview

Page 1: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Distributed event aggregation for content-based Publish/Subscribe systems

Navneet Kumar Pandey1

Stéphane Weiss1

Roman Vitenberg1

Kaiwen Zhang2

Hans-Arno Jacobsen2

2University of Toronto1University of Oslo

Page 2: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Motivation: Intelligent Transport System (ITS)

• Information providers: road sensors, crowdsourced mobile apps

• Information seekers: commuters, police, first responders, radio networks etc.

2http://www.wired.com/images_blogs/autopia/2012/08/12A914.jpg

• Aggregate subscriptions

• Count number of cars passing a street light per hour

• Average speed of cars on a road segment per day

• Non-aggregate subscriptions

• Accident reports

• Traffic violation reports

Page 3: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Aggregation in pub/sub

3

• Pub/sub is well known for efficient content filtering and dissemination for distributed event sources and sinks.

• However, pub/sub does not support aggregation, which is required in emerging applications.

• Our primary objective is to retain the traditional pub/sub focus on low communication cost, while adding support for aggregation.

Page 4: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Contributions: aggregation in pub/sub

4

• We propose a framework and baseline approaches for aggregation in content-based pub/sub systems (CBPS).

• We show how the relative performance of the baseline approaches varies with workload properties.

• We propose a per-broker distributed adaptive approach.

Page 5: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

BI

P[val,8]A[val, > ,4]

S[val, > ,3]

Bp

Bq

BSBI

B Broker

Subscription Delivery Tree (SDT)

Advertisement-based pub/sub model

5

Page 6: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Comparison with stream processing

6

Aggregation in stream processing Aggregation in pub/sub

Requires global view of topology Topology is not known to individual broker nodes

Requires a priori knowledge of publication sources

Publication sources and sinks are dynamic

Needs control layer Brokers are loosely coupled

Usually have a static query plan SDTs are dynamic and determined by the pub/sub implementation

Optimized for continuous data streams

Publications come at an irregular rate

Page 7: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Proposed aggregation framework

7

Publication filtering procedure (PFP)

Subscription: { RoadID = 101, speed > 10, op=‘avg’ , Duration (ω) = 2 hour, shift size (δ) = 1 hour}

NWR3

NWR1

NWR2

subsc

ripti

on

1 2 30 Time

Notification window ranges (NWR)

Pub1Pub2 Pub3

A single publication can participate in several NWRs, even for the same subscription.

Page 8: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Proposed aggregation framework

8

Initial computation procedure (ICP)

Publication filtering procedure (PFP)

Outgoing messages: { avg(Pub1, Pub2, Pub3), avg(Pub2, Pub3) }

Outgoing messages: { avg(Pub1, Pub2), avg(Pub2), Pub3 }

NWR3

NWR1

NWR2

subsc

ripti

on

1 2 30 Time

Notification window ranges (NWR)

Pub1Pub2 Pub3

x

Processing start time presents a trade-off between communication cost and end-to-end delay.

Page 9: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Proposed aggregation framework

9

Initial computation procedure (ICP)

Publication filtering procedure (PFP)

Recurrent processing procedure (RPP)

Bp

BI

Bq

Collection delayavgp

avgq

avgpq

Collection delay is another parameter affecting the delay-communication trade-off.

Page 10: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Late aggregation approach

1010

Bp

Bq

Bs

P[val,9]

P[val,2]

P[val,5]

P[val,3]

Smin[val,>,2]

P[Valmin,3]

Messages exchanged in Late aggregation: 6

PFS ICP RPP

BSBI

Late approach aggregates messages at subscriber-edge brokers.

Page 11: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Early aggregation approach

1111

BA

BI

P[val,9]

P[val,2]

P[val,5]

P[val,3]

Smin[val,>,2]

P[valmin,9]

P[valmin,3]

P[valmin,3]

P[valmin,3]

PFS ICP RPP

Messages exchanged in Early aggregation: 3Bp

Bq

BS

Messages exchanged in Late aggregation: 6

Early approach aggregates messages at publisher-edge brokers.

Page 12: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Early does not always outperform Late

12

BI

P[val,9]

P[val,2]

P[val,5]

P[val,3]

Smax[val,>,2]

Late aggregationMessages exchanged: 6

Scount[val,>,2]

Smin[val,>,2]

P[valmax,5]

P[valmin,3]

P[valcount,2]

Early aggregationMessages exchanged: 9

12

Bp

Bq

BS

P[valmax,9]

P[valmin,9]

P[valcount,1]

P[valmax,9]

P[valmin,3]

P[valcount,3]

Page 13: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Comparison between Early and Late

13

Reducing the communication cost requires an adaptive solution

Increasing parameter Favors

Publication matching rate Early

Matching number of NWRs Late

Overlap among aggregate subscriptions Late

Ratio between aggregate and regular subscriptions Early

Several parameters affect the performance of our baselines:

Page 14: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Benefits of adaptive aggregation

14

BA

P[val,9]

P[val,2]

P[val,5]

P[val,3]

S[val,>,6]

Smin[val,>,2]

P[valmin,3]

14

BA

BA

P[val,9]

P[val,9]

P[valmin,3]

Late

6

BF

Early

5

Bp

Bq

BI BS

P[valmin,9]

Page 15: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Benefits of adaptive aggregation

15

P[val,9]

P[val,2]

P[val,5]

P[val,3]

S[val,>,6]

Smin[val,>,2]

P[valmin,3]

15

BA

BA

P[val,9]

P[val,9]

P[valmin,3]

Late

6

Bq

Per-broker adaptation reduces communication cost

Early

5

Adaptive

4

Bp

Bq

BS

Adaptive

BIBI

Page 16: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Adaptation process (MAPE-K)

16

• Matching publications within sampling period

• Changes in subscription set

• Compare the ratio between Pubs vs. NWRs

• Estimate the notification rate

• Choose the suitable mode• Transition between aggregate

and forward mode

• Start/stop aggregation at broker

Monitor

Analyze Plan

Execute

Information at a broker•Registered subscriptions•Current execution mode

Knowledge

Page 17: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Experimental setup• Implemented in Java over the PADRES framework• Topology: 16 brokers

– Combination of publisher-edge only, subscriber-edge only and mixed brokers

• Real life datasets: • Traffic dataset from the ONE-ITS service1

• Yahoo! Finance Stock dataset• Metrics:

• Number of messages exchanged• Processing overhead• End-to-end delay

17

B B B B

BB

BB

BB

BB

BB

BB BB

BB

BB

BB BB

BB

1http://one-its-webapp1.transport.utoronto.ca

Page 18: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Results (Stock dataset)

18

Varying Publication/second Varying number of subscriptions

Decision becomes more accurate when available information is sufficient

• Adaptive aggregation performs close to the best among Early and Late for all settings.

• Early perform better at high pub rates whereas Late is better with large number of subscriptions.

Page 19: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Results (Traffic dataset)

19

Varying Publication/second Varying number of subscriptionsPer-Broker adaptation can cause individual brokers to make incorrect decisions

Page 20: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Processing overhead (Stock)

20

Predicate matching cost Aggregation-related overhead

Adaptation overhead is dominating the aggregation overhead

Page 21: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Conclusions

21

• We provide an aggregation framework for CBPS with baseline solutions.

• We demonstrate that neither baseline is dominant and depends upon workload parameters.

• We provide a generic adaptive aggregation framework.

• We experimentally demonstrate that our distributed adaptive solution performs close to the best baseline across all settings.

Page 22: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Thank you!

For questions and comments

Contact: [email protected]

22

Page 23: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Motivation: stock market application

23http://opinion-forum.com/index/wp-content/uploads/2012/08/stock_market.jpg

• Information providers: stock exchanges

• Information seekers: brokers, buyers

• Non-aggregate subscriptions: • Stock value updates

• Aggregate subscriptions:• Stock market indicators (eg.

MACD)

Page 24: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Aggregation semantics

• Window parameters– Window shift size (δ)– Duration (ω)

• Example– Sliding window: Moving average of the number of cars passing a street light

per hour.

– Tumbling window: Average speed of cars on a road segment.

– Hoping window: Number of cars crossing during rush hour.

24

ω = 2 hour, δ = 1 hour,ωδ

ω = δ = 2 hour,ωδ

ω = 2 hour, δ = 24 hour,ωδ

Page 25: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Challenges of adaptive deployment

• Data flow is hard to predict:• Irregular event rates at the publishers• Dynamic number of subscriptions• Coupled with dynamic content matching• Brokers function autonomously

• Compatible solution:• Congruent to Pub/Sub routing standards• Minimum impact over QoS for regular publications

25

Page 26: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Other experiments• End to end delay• Sensitivity towards sampling period• Sensitivity towards Collection delay

26 please refer our full paper.

Page 27: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Sensitivity analysis: Collection delay

27

Increasing collection time reduces the number of messages but delays the delivery of result.

Page 28: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Publication process flow

28

Timestamp publication if

not

Matched for

aggregation

Is broker aggregating

?

Any regular

subscription

matched?

Any regular subscription matched?

Enqueue for aggregation computation

Send Tag as aggregated

No

Yes Yes

YesYes

No

No No

Page 29: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Aggregation BasicsNotification window Ranges

29

PublicationMatching

NWR21 NWR2

2 NWR23 NWR2

4

sub

2

NWR31 NWR3

2 NWR33

sub

1

NWR33

NWR31

NWR32

sub

3

1 2 3 4 5 6 70

NWR34

Time

Sliding Window

Tumbling Window

Sampling Window

Page 30: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Motivation• Pub/Sub is well known for efficient content filtering and

dissemination for distributed event source and syncs.• Content-based Pub/Sub does not supports  time-based

aggregation.

30

Page 31: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Pub/Sub systems :- a popular communication paradigm

Researches in Pub/sub have traditionally focused on performance than extending functionality.

31

Business process[4]

work- flow management[5]

work- flow management[5]stock- market

monitoring[3]

social interaction[2] social interaction[2] network monitoring and

management[6] network monitoring and

management[6]

RSS filtering[1]

Page 32: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Event distribution systems such as ITS demand aggregation filters

• Moving average of the number of cars passing a street light per hour.

• Average speed of cars on a road segment.

• Number of cars crossing a highway during rush hour.

32

Page 33: DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno

Scope of our solution• Acyclic overlay• Broker federated Pub/Sub• Advertisement based forwarding model• Time based aggregation

33