41
Content-Based Publish- Subscribe Over Structured P2P Networks Peter Triantafillou and Ioannis Aekaterinidis Research Academic Computer Technology Institute and Department of Computer Engineering and Informatics, University of Patras, Greece

Content-Based Publish-Subscribe Over Structured P2P Networks

  • Upload
    tomai

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Content-Based Publish-Subscribe Over Structured P2P Networks. Peter Triantafillou and Ioannis Aekaterinidis Research Academic Computer Technology Institute and Department of Computer Engineering and Informatics, University of Patras , Greece. Agenda. Introduction/Goal - PowerPoint PPT Presentation

Citation preview

Page 1: Content-Based Publish-Subscribe Over Structured P2P Networks

Content-Based Publish-Subscribe Over Structured P2P Networks

Peter Triantafillou and Ioannis AekaterinidisResearch Academic Computer Technology Institute and

Department of Computer Engineering and Informatics, University of Patras, Greece

Page 2: Content-Based Publish-Subscribe Over Structured P2P Networks

Agenda

• Introduction/Goal• Publish-Subscribe Systems• Publish-Subscribe over Chord– Processing Subscriptions– Processing Events

• Improving Performance• Conclusion

Page 3: Content-Based Publish-Subscribe Over Structured P2P Networks

Introduction• Publish Subscribe systems are becoming very popular for

building large scale distributed systems and applications– Anonymity between publisher and subscriber

• Centralized:– Adv: Global image of system making matching algorithm easy

to implement– Dis: Scalability

• Decentralized:– Adv: Scalability– Challenge: development of efficient distributed matching

algorithm

Page 4: Content-Based Publish-Subscribe Over Structured P2P Networks

Goal

• Chose to use Chord because:– Simplicity– Popularity– Scalable– Self-Organizing– Well Performing

• Challenge: to develop a strategy for using DHTs to provide good support for range predicates– Which are popular when specifying subscription attributes

Page 5: Content-Based Publish-Subscribe Over Structured P2P Networks

Agenda

• Introduction/Goal• Publish-Subscribe Systems• Publish-Subscribe over Chord– Processing Subscriptions– Processing Events

• Improving Performance• Conclusion

Page 6: Content-Based Publish-Subscribe Over Structured P2P Networks

Publish-Subscribe Systems

• Asynchronous messaging paradigm• Senders (publishers) of messages are not programmed

to send their messages to specific receivers (subscribers)• Published messages are characterized into classes

(without knowledge of what subscribers there may be)• Subscribers express interest in one or more classes and

only receive messages that are of interest (without knowledge of what publishers there are)

• This Decoupling of publishers and subscribers allows for greater scalability and a more dynamic network topology

Page 7: Content-Based Publish-Subscribe Over Structured P2P Networks

Pub-Sub Message Filtering

• Subscribers receive only a subset of the total messages published

• 2 Main Types– Topic Based– Content Based

• Hybrid– Coupling of topic and content based systems

Page 8: Content-Based Publish-Subscribe Over Structured P2P Networks

Topic Based Pub/Sub Systems

• Much like newsgroups• Users join a group (topic)• All messages related to that topic are

broadcasted to all users participating in the specific group

Page 9: Content-Based Publish-Subscribe Over Structured P2P Networks

Content Based Pub/Sub Systems

• Preferable• Give users the ability to express their interest

by specifying predicates over the values of a number of well defined attributes

• Matching of publications (events) to subscriptions (interest) is done based on the content (values of attributes)

Page 10: Content-Based Publish-Subscribe Over Structured P2P Networks

Hybrid System

• Publishers post messages to a topic while subscribers register content-based subscriptions to one or more topics

• Publications and subscriptions are automatically classified in topics (using an application-specific schema)

• Drawbacks:– Design of the domain schema plays fundamental role in

the system’s performance– Likely many false positives may occur.

Page 11: Content-Based Publish-Subscribe Over Structured P2P Networks

Agenda

• Introduction/Goal• Publish-Subscribe Systems• Publish-Subscribe over Chord– Processing Subscriptions– Processing Events

• Improving Performance• Conclusion

Page 12: Content-Based Publish-Subscribe Over Structured P2P Networks

Event Schema• Set of typed attributes• Each attribute ai consists of:

– Type – belong to predefined set of primitive data types– Name - string– Value v(ai)

• Any range defined by the minimum and maximum values (vmin(ai), vmax(ai)) along with the attribute’s precision Vpr(ai)

Page 13: Content-Based Publish-Subscribe Over Structured P2P Networks

Subscription Schema

• Contains all interesting subscription-attribute data types (integers, strings, etc.) and all common operators (=, ≠, <, >, etc.)

• Event matches subscription iff all the subscription’s attribute predicates/constraints are satisfied

• Can have two or more constraints for the same attribute

Page 14: Content-Based Publish-Subscribe Over Structured P2P Networks

Subscription Identifier

• Concatenation of 3 parts:– C1: id of the node receiving the subscription• Size: m bits in a Chord ring with an m-bit address space

– C2: id of the subscription itself• Size: bits equal to the rounded-up base-2 logarithm of the

maximum number of outstanding subscriptions a node can have

– C3: number of attributes on which constraints are declared• Size: max value = total number of attributes supported by the

system

Page 15: Content-Based Publish-Subscribe Over Structured P2P Networks

Subscription ID Example

• Assume Chord ring with a 3-bit identifier address space

• Each node can support 8 outstanding subscriptions with an attribute schema including 7 attributes

• Depicts subscription 3 (C2=3), belonging to node 4 (C1=4), comprised of constraints on 5 attributes (C3=5)

Page 16: Content-Based Publish-Subscribe Over Structured P2P Networks

Storing Subscriptions

• Done using the hash function provided by Chord (SHA-1)

• Returns an identifier uniformly distributed in the address space

• k=h(v(ai))• Following the Chord API, the subID is placed at

node: successor(k)

Page 17: Content-Based Publish-Subscribe Over Structured P2P Networks

Storing Subscriptions

• Procedure:

Page 18: Content-Based Publish-Subscribe Over Structured P2P Networks

Storing Example

• Attributes are processed one at a time• Subscription ID is stored at:– Successor(h(“NYSE”)), in the list dedicated for attribute

Exchange– Successor(h(“OTE”)), in the list dedicated for attribute

Symbol• Since the Price attribute is over a range of

8.30<Price<8.70 with a precision of .01 the subscription ID is stored at:– Successor(h(Price)); for the values 8.31,8.32, …, 8.69.

Page 19: Content-Based Publish-Subscribe Over Structured P2P Networks

Updating Subscriptions

• Updating attributes of a subscription with equality only 2 nodes are affected:– Delete the Subscription ID from:• nodeID = successor(h(vstale_value(ai)))

– Add the subscription ID to node: (appropriate list)• nodeID = successor(h(vupdated_value(ai)))

Page 20: Content-Based Publish-Subscribe Over Structured P2P Networks

Updating Subscriptions

• The procedure for updating a range value depends on the new values of the range bounds (vlow_NEW(ai) and vhigh_NEW(ai)) compared to the old values

• If vlow_NEW(ai) < vlow(ai) store the subID to the nodes that cover [vlow_NEW(ai), vlow(ai)) range

• If vhigh_NEW(ai) > vhigh(ai) store the subID to the nodes that cover (vhigh(ai), vhigh_NEW(ai)] range

Page 21: Content-Based Publish-Subscribe Over Structured P2P Networks

Updating Subscriptions

• If vlow_NEW(ai) > vlow(ai) delete the subID from the nodes that cover [vlow(ai), vlow_NEW(ai)) range.

• If vhigh_NEW(ai) < vhigh(ai) delete the subID from the nodes that cover (vhigh_NEW(ai), vhigh(ai)] range

Page 22: Content-Based Publish-Subscribe Over Structured P2P Networks

Processing Events: Matching Algorithm

Page 23: Content-Based Publish-Subscribe Over Structured P2P Networks

Matching Events with Subscriptions Example

• Suppose we have Subscriptions 1 and 2 generated by two clients connected to a Chord node and Event 1

• First, the algorithm will collect all the subIDs lists in which the values of the event attributes satisfy the corresponding constrains of the subscriptions

Page 24: Content-Based Publish-Subscribe Over Structured P2P Networks

Matching Events with Subscriptions Example (continued)

• The algorithm starts with attribute Exchange = “NYSE” and retrieves the subID list (LExchange) from node successor(h(“NYSE”))

• This list contains only the subID1

– LExchange -> subID1

Page 25: Content-Based Publish-Subscribe Over Structured P2P Networks

Matching Events with Subscriptions Example (continued)

• Next attribute Symbol = “OTE”; subID list (LSymbol) from node successor(h(“OTE”)) is retrieved– LSymbol -> subID1, subID2

• Since both subscriptions are satisfied for the event

Page 26: Content-Based Publish-Subscribe Over Structured P2P Networks

Matching Events with Subscriptions Example (continued)

• Next attribute Price = 8.40; subID list (LPrice) from node successor(h(8.40)) is retrieved– LPrice -> subID1

• Since only subscription 1 has a price that falls within this range.

Page 27: Content-Based Publish-Subscribe Over Structured P2P Networks

Matching Events with Subscriptions Example (continued)

• Lastly attribute Low = 8.22; subID list (LLow) from node successor(h(8.22)) is retrieved– LLow -> subID2

• Since only subscription 2 has an attribute Low

Page 28: Content-Based Publish-Subscribe Over Structured P2P Networks

Matching Events with Subscriptions Example (continued)

• After this phase of the matching process the collected subscription ID lists are:• LExchange -> subID1

• LSymbol -> subID1, subID2

• LPrice -> subID1

• LLow -> subID2

• Subscription1 was found in 3 lists while subscription2 was found in 2

• By processing the subIDs of the subscriptions (c3 part) we can find out that both subscriptions have constraints over 3 attributes.

Page 29: Content-Based Publish-Subscribe Over Structured P2P Networks

Matching Events with Subscriptions Example (continued)

• Since subscription 1 was found in 3 lists, a match is implied and it’s subID is kept in order to inform the node which generated the subscription about the matched event.

• While holding metadata info for subID1 in order to locate the IP address of the client that generated the subscription

• The node storing the subscription is contacted (using nodeID equal to c1 field of the subID1) and the event is delivered to the interested client

Page 30: Content-Based Publish-Subscribe Over Structured P2P Networks

Expected Performance

• Subscription Storage Procedure:– Average number of hops needed to store a subID

depends on the type of constraints over the attributes

– Equality: ½ log(N)• subID is stored in a single node

– Range Constraint:Nodes affected which leads to r*1/2 log(N) hops on

average to store the subID

)(av)(a v- )(av

ipr

ilowihighr

Page 31: Content-Based Publish-Subscribe Over Structured P2P Networks

Expected Performance

• Update/Deletion of Subscription– Again, depends on the type of constraints over the

attributes• Equality: update performed by contacting Log(N) nodes• Ranges: number of nodes is k*log(N) on average

– K depends on whether the new range is smaller or wider than the old range

Page 32: Content-Based Publish-Subscribe Over Structured P2P Networks

Expected Performance

• Event-Processing (matching)– Involves contacting nodes to

collect the subscription id lists

• Reminder: a Chord network with N nodes and a 2m-bit address space, the average number of nodes that must be contacted to find a successor is: – ½ log(N) hops

• By design, this proposal leads to fast and scalable event matching.

)log(21 NN eventa

Page 33: Content-Based Publish-Subscribe Over Structured P2P Networks

Agenda

• Introduction/Goal• Publish-Subscribe Systems• Publish-Subscribe over Chord– Processing Subscriptions– Processing Events

• Improving Performance• Conclusion

Page 34: Content-Based Publish-Subscribe Over Structured P2P Networks

Improving Performance

• Currently storing a subscription over the Chord ring takes r* ½ * log(N) hops on average for every attribute– r depends on precision, high, and low values

• Using an order preserving hash function we can optimize to r+ ½ *log(N) hops

Page 35: Content-Based Publish-Subscribe Over Structured P2P Networks

Order Preserving Chord

• Using a 2m- order preserving hash function• Expected performance:– ½ log(N) hops to locate node storing minimum

value of the range (vlow(ai))– Then, perform r hops to store remaining values in

the range to lead to r+ ½ log(N) total hops

Page 36: Content-Based Publish-Subscribe Over Structured P2P Networks

Order Preserving Hash Function

• Suppose every attribute is characterized by– vmin(ai): minimum value ai can take– vmax(ai): maximum value ai can take– vpr(ai): precision of ai

• vj(ai) is any value in [vlow(ai), vhigh(ai)]• OPHF is:

mm

ii

iijioij avav

avavasavh 2mod)2*

)()()()(

)(())((minmax

min

))(_()( iio anameattributehashas

Page 37: Content-Based Publish-Subscribe Over Structured P2P Networks

Subscription and Event Processing with OPHF

• Example: Storing Subscription– Consider Chord ring with 3-bit ids and 8 nodes– Subscription of a single integer attribute a arriving

at node 3 with constraint 0<v(a)<4– Using Chord requires O(r*log(N)) hops to store the

subID at three nodes

Page 38: Content-Based Publish-Subscribe Over Structured P2P Networks

Subscription and Event Processing with OPHF

• Using the OPHF with Chord:– Perform O(log(N)) hops only once to reach the

first node (node 6)– Storing the subID at nodes 7 and 0 requires 2

more hops

Page 39: Content-Based Publish-Subscribe Over Structured P2P Networks

Agenda

• Introduction/Goal• Publish-Subscribe Systems• Publish-Subscribe over Chord– Processing Subscriptions– Processing Events

• Improving Performance• Conclusion

Page 40: Content-Based Publish-Subscribe Over Structured P2P Networks

Conclusion

• Not Addressed– Load Balancing– Small Domain Problem

• Able to support equality and range attributes while leveraging Chord to build a scalable, self-organizing, well performing content based publish-subscribe system.

Page 41: Content-Based Publish-Subscribe Over Structured P2P Networks

Sources Cited

• http://en.wikipedia.org/wiki/Publish/subscribe