Upload
frye
View
23
Download
0
Tags:
Embed Size (px)
DESCRIPTION
A Recall-Based Cluster Formation Game in Peer-to-Peer Systems. Georgia Koloniari and Evaggelia Pitoura Department of Computer Science University of Ioannina, Greece http://dmod.cs.uoi.gr. ?. Peer-to-Peer Systems. ?. ?. - PowerPoint PPT Presentation
Citation preview
A Recall-Based Cluster A Recall-Based Cluster Formation Game in Peer-Formation Game in Peer-to-Peer Systemsto-Peer Systems
Georgia Koloniari and Evaggelia Pitoura
Department of Computer ScienceUniversity of Ioannina, Greece
http://dmod.cs.uoi.gr
Peer-to-Peer Systems
A class of systems in which autonomous nodes of equal roles (peers) share their resources and exchange their data
Each peer connects with a subset of other peers thus forming logical overlay networks on top of the physical network (i.e. Internet)
Queries are routed through the overlay network to discover the peers that store relevant results
VLDB 2009
2
Clustered Overlays
Peers form groups based on their content/interests (query workload) so that peers with similar content or/and interests are nearby, in the same group (cluster), in the overlay network
content query workload
VLDB 2009
3
Motivation for ClusteringPeers find and exchange within their cluster data relevant to their interests with less effort
Once the relevant clusters for a query are identified, the peers in them maintain relevant content that can be exploited to evaluate and refine a query
Previous work on content-based clustering: SONS [Stanf. Univ. ‘02]: clusters formed based on predefined classification hierarchies SETS [SIGIR ‘03]: peers partitioned in clusters corresponding to fixed globally known topic segments Cohen et al [INFOCOM ‘03]: based on a learning approach that generalizes and learns the semantic categories of the data Triantafillou et al [CIDR ‘03]: fixed set of clusters formed based on predefined semantic categories, focus on fair load distribution and reducing response times Garbacki et al [ICDCS ‘07]: superpeer-based architecture in which peers with common interests are organized based on their caches Doulkeridis et al [JSAC ‘07]: clustering applied first on the documents of each peer, and then on the feature vectors describing the derived clustersVLDB 2009
4
Our Contributions We provide a novel model for cluster
formation using a game theoretic approach
We address both cluster formation and cluster evolution / maintenance and cope with peer dynamics
We exploit both content and queries and aim at maximizing the overall query recall of the system
We propose an uncoordinated protocol based on local peer decisions for playing the game with performance comparable to a corresponding coordinated protocol VLDB 2009
5
Game Theory for P2PGame theoretic approaches have been applied to the overlay network creation problem and link selection in p2p
Fabricant et al [PODC’03]: Internet-like network modeled as a game peers establish links to reduce the shortest distance to any other peer and pay for those linksMoscibroda et al [PODC’06]: proves that allowing peers complete freedom performs worse than collaboration and even static p2p systems may never reach convergenceLaoutaris et al [INFOCOM’07]: strict bounds enforced on out-degree directed links peers express preferences for their neighbors
Our contribution:
We consider dynamic clustered overlays, focus on queries and aim at increasing their recall
VLDB 2009
6
Overview Cluster formation as a strategic game Utility functions for selfish and altruistic behavior Global system performance criteria Stability and optimality Case Studies
Playing the game Relocation policies Cluster formation and cluster evolution/maintenance
Uncoordinated cluster reformulation protocol Protocol variations: trigger and event-based Auxiliary mechanisms for controlling the overhead Experimental Evaluation
Conclusions and Future WorkVLDB 2009
7
The Game
Cluster formation modeled as a strategic game
Each peer pi is modeled as a player
A player selects a strategy si which consists of the set
of clusters the peer will join (si C)
The goal of the game is for each peer to select the
strategy that minimizes its utility function
The utility function is defined based on query recall
VLDB 2009
8
Recall-based Clustering
Selfish peers: Join the cluster that will provide the most answers to the peer’s local query workload (maximizes its recall)
Altruistic peers: Join the cluster to which it can offer the most to (maximize recall of the cluster members)
Utility function: Individual peer cost for selfish peers Individual peer contribution for altruistic peers
VLDB 2009
9
Individual Peer Cost
Ppk
k
pqresult
pqresultpqr
),(
)',()',(
The cost of evaluating its queries against the other clusters
||
|)(|
P
ca k
sc ik
)( )(
),())((
))(,(
i ijpqinQ sPpj
i
i pqrpQnum
pQqnum
: recall of q if evaluated solely on p’
),( Sppcost i
))(,( ipQqnum : number of appearances of q in the local workload Q(pi) of pi
)( isP : peers that are members of the clusters in pi’s strategy
This is the cost that measures the loss in recall for evaluating queries at cluster not in pi’s strategy
))(( ipQnum : number of queries in the local workload Q(pi) of pi
recall costmembership cost
VLDB 2009
10
Individual Peer Cost (cont’d)
||
|)(|
P
ca k
sc ik
)( )(
),())((
))(,(
i ijpqinQ sPpj
i
i pqrpQnum
pQqnum
The cost of joining a cluster measured by: a function θ which is descriptive of the communication
cost entailed in belonging to a cluster and depends on: the size of the cluster |ci| the topology within the cluster
a parameter a that quantifies the cost of each of the connections of a peer to the other cluster members
),( Sppcost i
This cost prevents the system from forming just one cluster that would otherwise minimize the cost function for all peers
recall costmembership cost
VLDB 2009
11
Individual Peer Contribution
The contribution to the queries of the other peers in the cluster The cost the other peers in a cluster pay when pi joins
2||
|)(|||
P
cca kk
sc ik
),())((
))(,(
)( )(i
sPp pQq j
j pqrpQnum
pQqnum
ij j
),( Sppcontr i
Hybrid Cost Function ),()1(),(),( SppcontrdSpdpcostSphpcost iii
d [0,1] – degree of selfishness
VLDB 2009
12
Global Cost
Social Cost: the sum of the peers individual costs
Workload-based Cost: the average cost of attaining results for the global query workload
),())(((
))(,(
)(
),(
||
|)(|||)(
)(, )(j
qinQ pqinQp sPp
i
CC
kk pqripQnum
pQqnum
Qnum
Qqnum
P
CCaSWCost
ii ijk
While social cost treats all peers as equals, the workload-based cost considers more demanding peers as more important
frequency of q in global query workload
frequency of q in local query workload of pi
Pp
i
i
SppcostSSCost ),()(
VLDB 2009
Social vs Workload Cost
In general:
||
))(())((
P
PQnumpQnum i
)()( SWCostSSCost
If each peer pi P gets an equal portion of the global query workload ( ) then:
)(*)( SSCostmSWCost
The social and workload cost are proportional
14VLDB 2009
Global Contribution Social Contribution: the sum of the peers individual
contributions
Workload-based Contribution:
Pp
i
i
SppcontrSSContr ),()(
CC
kk
Ppj
qinQ pqinQp sPpi
i
kiii ijP
CCapqr
pQnum
pQqnum
Qnum
QqnumSWCost
2)(, )( ||
|)(|||),(
))((
))(,(
)(
),()(
Social contribution favors queries popular locally to specific users, while workload contribution favors overall popular queries
If the local distribution queries at each peer follows the global query distribution, then the two measures are proportional
15VLDB 2009
Cost vs ContributionIf we ignore the membership cost, then the workload cost and workload contribution are complementary
)(1)( SWContrSWCost
For uniform query workload among the peers, the social cost and social contribution are also complementary )(1)( SSContrSSCost
VLDB 2009
16
Stability
Nash equilibrium: No peer has an incentive to change its strategy
}'{}{'),',(),( iiii ssSSSppcostSppcost
p2 joins c1 p2 joins c2
p1 joins c1 α, α α/2+1, α/2
p1 joins c2 α/2+1, α/2
α, α
Payoff Table
c1 c2
p1
p2
c1 c2
p1p2
17VLDB 2009
Stability Properties
Lemma:
In any stable state, there are no clusters ci, cj such that ci cj, i ≠ j
Corollary:
When a peer forms a cluster by itself, it cannot belong to any other cluster
VLDB 2009
18
OptimalityStability does not always ensure a satisfying social cost
Price of Anarchy: ratio between the social cost of the worst Nash equilibrium to the social optimum
The social optimum is obtained by minimizing the social cost measure over all possible configurations
To bound the social optimum: consider each peer separately
Select the configuration that yields the minimum individual cost
aggregate over all peersVLDB 2009
19
Case StudiesCase I: No underlying clustering
Case II: Symmetric clustersFor the peers in each cluster c
PppPQnumpQnumpQnum jiji ,|,|/)())(())((
PppQqPpqrpqr jiji ,in|,|/1),(),(
Configurations:A: all peers in single cluster
B: all peers a cluster by their selves
C: k clusters
Linear θ: θ(n) = φn
))(())(( ji pQnumpQnum
||/1),(),( cpqrpqr ji
Case (I.A) Case (I.B) Case (I.C)
Cost-based
Linear θ
Case (II.A) Case (II.B) Case (II.C)
Cost-based
Linear θ
)1(|)(|
1||
P
P
)1()2(
k
k
1
)12(
k
k
)1(|||)(|
)|(|||
PPm
mPP
)1(
||
m
mP
)1()2(
m
m
|)(|)1|(|
1)1|(| ,
)1|(|
|| ,
)1(|)(|
1||
cck
cka
c
c
c
c
)1()/|(|
)|(|
)/|(|)1/|(|(||
|| 2
mP
mP
mPmPP
mP
||)1|(|
1)1|(| ,
)1|(|
|| ,
1
cck
cka
c
c
)1/|(|
)|(| ,
||
|| 2
mP
mP
P
mP
VLDB 2009
20
Overview Cluster formation as a strategic game Utility functions for selfish and altruistic behavior Global system performance criteria Stability and optimality Case Studies
Playing the game Relocation policies Cluster formation and cluster evolution/maintenance
Uncoordinated cluster reformulation protocol Protocol variations: trigger and event-based Auxiliary mechanisms for controlling the overhead Experimental Evaluation
Conclusions and Future WorkVLDB 2009
21
Playing the GameConsider an instance of the system (Scur) and that each peer pi belongs to a single cluster (si = {cl}), cl Ccur (set of non empty clusters)
When it is its turn to play, pi considers all possible configurations Sj that differ with Scur only in si
pi has two options: Move to a different existing cluster cv
If |cl|≠1, form a cluster by its own
We discern different policies according to peer behavior: selfish, altruistic and hybrid
VLDB 2009
22
Selfish Policy
A peer pi selects to move to the cluster which maximizes its recall
)),((minarg jiS
new SppcostSj
content query workload
23VLDB 2009
Altruistic Policy
Peers move to the cluster to which they offer the most recall to its other members
)),((maxarg jiS
new SppcontrSj
24VLDB 2009
Clustered Overlay Formation
Given an initial configuration Each peer forming a cluster by its own All peers in a single cluster
Our game model addresses cluster formationVLDB 2009
25
Clustered Overlay Evolution
Given a clustered overlay configuration dynamic peers change their content/query workload
Our game model addresses cluster evolution /maintenance
VLDB 2009
26
Cluster Reformulation Protocol
Each peer: evaluates its cost or contribution for all clusters
in the system selects the best strategyo computes the gain for the best strategy (cluster)
o if gain > 0, pi moves to best cluster
),(),( newicurip SppcostSppcostgaini
The protocol is: uncoordinated and based on local decisions made by each peer independently
VLDB 2009
27
When does a peer determine it is its turn to play?
Event-based: after it becomes aware of a relevant event
- evaluation of queries in their local workload for selfish peers
- providing results to a query for altruistic peers
Trigger-based: when it registers a change in its gain
- continuously monitoring gain
Batch-based: after a number (batch) of relevant events
VLDB 2009
28
Controlling ParametersRelocations induce communication and processing overheadThe gain wrt the cost or contribution is not always such that justifies the entailed costRequired mechanisms to control the excessive overheads
Stopping Condition: A peer moves only if its gain is greater than a predefined threshold ε Playing Probability: A peer does not play at each of its turns, but with a probability Pr Quota: Each peer is assigned n moves for a time period Tq
VLDB 2009
29
Overview Cluster formation as a strategic game Utility functions for selfish and altruistic behavior Global system performance criteria Stability and optimality Case Studies
Playing the game Relocation policies Cluster formation and cluster evolution/maintenance
Uncoordinated cluster reformulation protocol Protocol variations: trigger and event-based Auxiliary mechanisms for controlling the overhead Experimental Evaluation
Conclusions and Future WorkVLDB 2009
30
Uncoordinated Vs Coordinated
The uncoordinated protocol performs as well as the coordinated one without the additional coordination overhead
The trigger-based protocol is the most expensive variation and the batch-based the most efficient one
The trigger-based variation adjusts faster to changes
Uncoordinated Coordinated
Coordinated protocol: cluster representatives gather and exchange the relocation
requests from their clusters requests are sorted according to gain and granted in that
order
VLDB 2009
31
Controlling Parameters
The stopping condition is the main factor determining the value of the achieved social cost
The playing probability and quota reduce the number of moves but increase the number of turns
VLDB 2009
32
Cluster FormationSetting: Starting from different initial configurations (all peers in a single cluster, each peer a cluster by its own, k random clusters, etc) Using symmetric, asymmetric, and uniform peers With selfish, altruistic and mixed peer populations
Results: The reformulation protocol identifies the underlying clusters if they exist It does not require a priori definition of the target number of clusters Its social cost in some cases (i.e. for symmetric and uniform peers) reaches a value close to the social optimum
VLDB 2009
33
Cluster Adaptation
For different update scenarios our protocol copes with changes efficiently
Selfish peers react more efficiently to workload changes, while altruistic to content changes
Reclustering reduces social cost by 10%, but requires 250 turns while the reformulation protocol only 10
Workload Updates Content Updates
VLDB 2009
34
Clustering vs Caching
Consider a cache scheme: Peers that provided results
to previous queries are cached
Future queries are forwarded to them first
If peers that receive a query forward to the peers in their own cache (transitive)
Peers in the cache are sorted based on recall
Cache is updated after each query
Symmetric peers favor clustering
For asymmetric peers using an efficient cluster topology achieves results similar to caching
Clustering adapts to changes faster (replaces all links at once)
VLDB 2009
35
Summary
We modeled cluster formation as a strategic game
We defined utility functions based on query recall and cluster membership cost
We considered both selfish and altruistic peers
We derived theoretical results regarding the stability and optimality of our game
VLDB 2009
36
Summary (cont’d) We proposed an uncoordinated protocol for
playing our game
We presented two variations: an event and a trigger-based protocol
We combined the protocol with a set of parameters for controlling the overhead
We presented an experimental evaluation of the protocol that showed that:
1. there is no need for coordination2. the protocol discovers the underlying clusters (no need
for predefining the number)3. it efficiently copes with dynamic updates
VLDB 2009
37
Future Work Study further (theoretically and experimentally) the
problem of cluster formation and evolution with multiple cluster memberships
Apply/adjust the cluster formation game for friends discovery in social networks
Use different criteria (besides recall) such as diversity, for determining the quality in clustered overlays
Compare our clustering goal (maximizing recall) to traditional goals in clustering applications (maximizing intra-cluster similarity and minimizing inter-cluster similarity)
VLDB 2009
38
Thank you
VLDB 2009
39
Input Parameters
VLDB 2009
40
Parameter Range Default Value
Topology and Strategynumber of peers (|P|) - 10000
parameter α 1 - 100 10
membership cost function (θ)
logarithmic, linear logarithmic
strategy - self-alt-hybrid-mix
degree of selfishness (d) 0.25 - 0.75 -
Data - Query Distributionnumber of semantic categories
- 10
interest locality degree (m) - 0 - 1
Controlling Parametersstopping condition (є) 0 - 10-8 10-4 , 10-3, 10-6
playing probability (Pr) 0 - 1 0.5
movement quota (n) 1 - 15 ∞
quota period in events (Tq) - 20 (5*batch size)
% of granted requests (x) 10 - 100 50