A Recall-Based Cluster Formation Game in Peer-to-Peer Systems

A Recall-Based Cluster A Recall-Based Cluster Formation Game in Peer-Formation Game in Peer-to-Peer Systemsto-Peer Systems

Georgia Koloniari and Evaggelia Pitoura

Department of Computer ScienceUniversity of Ioannina, Greece

http://dmod.cs.uoi.gr

Peer-to-Peer Systems

A class of systems in which autonomous nodes of equal roles (peers) share their resources and exchange their data

Each peer connects with a subset of other peers thus forming logical overlay networks on top of the physical network (i.e. Internet)

Queries are routed through the overlay network to discover the peers that store relevant results

VLDB 2009

2

Clustered Overlays

Peers form groups based on their content/interests (query workload) so that peers with similar content or/and interests are nearby, in the same group (cluster), in the overlay network

content query workload

VLDB 2009

3

Motivation for ClusteringPeers find and exchange within their cluster data relevant to their interests with less effort

Once the relevant clusters for a query are identified, the peers in them maintain relevant content that can be exploited to evaluate and refine a query

Previous work on content-based clustering: SONS [Stanf. Univ. ‘02]: clusters formed based on predefined classification hierarchies SETS [SIGIR ‘03]: peers partitioned in clusters corresponding to fixed globally known topic segments Cohen et al [INFOCOM ‘03]: based on a learning approach that generalizes and learns the semantic categories of the data Triantafillou et al [CIDR ‘03]: fixed set of clusters formed based on predefined semantic categories, focus on fair load distribution and reducing response times Garbacki et al [ICDCS ‘07]: superpeer-based architecture in which peers with common interests are organized based on their caches Doulkeridis et al [JSAC ‘07]: clustering applied first on the documents of each peer, and then on the feature vectors describing the derived clustersVLDB 2009

4

Our Contributions We provide a novel model for cluster

formation using a game theoretic approach

We address both cluster formation and cluster evolution / maintenance and cope with peer dynamics

We exploit both content and queries and aim at maximizing the overall query recall of the system

We propose an uncoordinated protocol based on local peer decisions for playing the game with performance comparable to a corresponding coordinated protocol VLDB 2009

5

Game Theory for P2PGame theoretic approaches have been applied to the overlay network creation problem and link selection in p2p

Fabricant et al [PODC’03]: Internet-like network modeled as a game peers establish links to reduce the shortest distance to any other peer and pay for those linksMoscibroda et al [PODC’06]: proves that allowing peers complete freedom performs worse than collaboration and even static p2p systems may never reach convergenceLaoutaris et al [INFOCOM’07]: strict bounds enforced on out-degree directed links peers express preferences for their neighbors

Our contribution:

We consider dynamic clustered overlays, focus on queries and aim at increasing their recall

VLDB 2009

6

Overview Cluster formation as a strategic game Utility functions for selfish and altruistic behavior Global system performance criteria Stability and optimality Case Studies

Playing the game Relocation policies Cluster formation and cluster evolution/maintenance

Uncoordinated cluster reformulation protocol Protocol variations: trigger and event-based Auxiliary mechanisms for controlling the overhead Experimental Evaluation

Conclusions and Future WorkVLDB 2009

7

The Game

Cluster formation modeled as a strategic game

Each peer pi is modeled as a player

A player selects a strategy si which consists of the set

of clusters the peer will join (si C)

The goal of the game is for each peer to select the

strategy that minimizes its utility function

The utility function is defined based on query recall

VLDB 2009

8

Recall-based Clustering

Selfish peers: Join the cluster that will provide the most answers to the peer’s local query workload (maximizes its recall)

Altruistic peers: Join the cluster to which it can offer the most to (maximize recall of the cluster members)

Utility function: Individual peer cost for selfish peers Individual peer contribution for altruistic peers

VLDB 2009

9

Individual Peer Cost

Ppk

k

pqresult

pqresultpqr

),(

)',()',(

The cost of evaluating its queries against the other clusters

||

|)(|

P

ca k

sc ik

)( )(

),())((

))(,(

i ijpqinQ sPpj

i

i pqrpQnum

pQqnum

: recall of q if evaluated solely on p’

),( Sppcost i

))(,( ipQqnum : number of appearances of q in the local workload Q(pi) of pi

)( isP : peers that are members of the clusters in pi’s strategy

This is the cost that measures the loss in recall for evaluating queries at cluster not in pi’s strategy

))(( ipQnum : number of queries in the local workload Q(pi) of pi

recall costmembership cost

VLDB 2009

10

Individual Peer Cost (cont’d)

||

|)(|

P

ca k

sc ik

)( )(

),())((

))(,(

i ijpqinQ sPpj

i

i pqrpQnum

pQqnum

The cost of joining a cluster measured by: a function θ which is descriptive of the communication

cost entailed in belonging to a cluster and depends on: the size of the cluster |ci| the topology within the cluster

a parameter a that quantifies the cost of each of the connections of a peer to the other cluster members

),( Sppcost i

This cost prevents the system from forming just one cluster that would otherwise minimize the cost function for all peers

recall costmembership cost

VLDB 2009

11

Individual Peer Contribution

The contribution to the queries of the other peers in the cluster The cost the other peers in a cluster pay when pi joins

2||

|)(|||

P

cca kk

sc ik

),())((

))(,(

)( )(i

sPp pQq j

j pqrpQnum

pQqnum

ij j

),( Sppcontr i

Hybrid Cost Function ),()1(),(),( SppcontrdSpdpcostSphpcost iii

d [0,1] – degree of selfishness

VLDB 2009

12

Global Cost

Social Cost: the sum of the peers individual costs

Workload-based Cost: the average cost of attaining results for the global query workload

),())(((

))(,(

)(

),(

||

|)(|||)(

)(, )(j

qinQ pqinQp sPp

i

CC

kk pqripQnum

pQqnum

Qnum

Qqnum

P

CCaSWCost

ii ijk

While social cost treats all peers as equals, the workload-based cost considers more demanding peers as more important

frequency of q in global query workload

frequency of q in local query workload of pi

Pp

i

i

SppcostSSCost ),()(

VLDB 2009

Social vs Workload Cost

In general:

||

))(())((

P

PQnumpQnum i

)()( SWCostSSCost

If each peer pi P gets an equal portion of the global query workload ( ) then:

)(*)( SSCostmSWCost

The social and workload cost are proportional

14VLDB 2009

Global Contribution Social Contribution: the sum of the peers individual

contributions

Workload-based Contribution:

Pp

i

i

SppcontrSSContr ),()(

CC

kk

Ppj

qinQ pqinQp sPpi

i

kiii ijP

CCapqr

pQnum

pQqnum

Qnum

QqnumSWCost

2)(, )( ||

|)(|||),(

))((

))(,(

)(

),()(

Social contribution favors queries popular locally to specific users, while workload contribution favors overall popular queries

If the local distribution queries at each peer follows the global query distribution, then the two measures are proportional

15VLDB 2009

Cost vs ContributionIf we ignore the membership cost, then the workload cost and workload contribution are complementary

)(1)( SWContrSWCost

For uniform query workload among the peers, the social cost and social contribution are also complementary )(1)( SSContrSSCost

VLDB 2009

16

Stability

Nash equilibrium: No peer has an incentive to change its strategy

}'{}{'),',(),( iiii ssSSSppcostSppcost

p2 joins c1 p2 joins c2

p1 joins c1 α, α α/2+1, α/2

p1 joins c2 α/2+1, α/2

α, α

Payoff Table

c1 c2

p1

p2

c1 c2

p1p2

17VLDB 2009

Stability Properties

Lemma:

In any stable state, there are no clusters ci, cj such that ci cj, i ≠ j

Corollary:

When a peer forms a cluster by itself, it cannot belong to any other cluster

VLDB 2009

18

OptimalityStability does not always ensure a satisfying social cost

Price of Anarchy: ratio between the social cost of the worst Nash equilibrium to the social optimum

The social optimum is obtained by minimizing the social cost measure over all possible configurations

To bound the social optimum: consider each peer separately

Select the configuration that yields the minimum individual cost

aggregate over all peersVLDB 2009

19

Case StudiesCase I: No underlying clustering

Case II: Symmetric clustersFor the peers in each cluster c

PppPQnumpQnumpQnum jiji ,|,|/)())(())((

PppQqPpqrpqr jiji ,in|,|/1),(),(

Configurations:A: all peers in single cluster

B: all peers a cluster by their selves

C: k clusters

Linear θ: θ(n) = φn

))(())(( ji pQnumpQnum

||/1),(),( cpqrpqr ji

Case (I.A) Case (I.B) Case (I.C)

Cost-based

Linear θ

Case (II.A) Case (II.B) Case (II.C)

Cost-based

Linear θ

)1(|)(|

1||

P

P

)1()2(

k

k

1

)12(

k

k

)1(|||)(|

)|(|||

PPm

mPP

)1(

||

m

mP

)1()2(

m

m

|)(|)1|(|

1)1|(| ,

)1|(|

|| ,

)1(|)(|

1||

cck

cka

c

c

c

c

)1()/|(|

)|(|

)/|(|)1/|(|(||

|| 2

mP

mP

mPmPP

mP

||)1|(|

1)1|(| ,

)1|(|

|| ,

1

cck

cka

c

c

)1/|(|

)|(| ,

||

|| 2

mP

mP

P

mP

VLDB 2009

20





21

Playing the GameConsider an instance of the system (Scur) and that each peer pi belongs to a single cluster (si = {cl}), cl Ccur (set of non empty clusters)

When it is its turn to play, pi considers all possible configurations Sj that differ with Scur only in si

pi has two options: Move to a different existing cluster cv

If |cl|≠1, form a cluster by its own

We discern different policies according to peer behavior: selfish, altruistic and hybrid

VLDB 2009

22

Selfish Policy

A peer pi selects to move to the cluster which maximizes its recall

)),((minarg jiS

new SppcostSj

content query workload

23VLDB 2009

Altruistic Policy

Peers move to the cluster to which they offer the most recall to its other members

)),((maxarg jiS

new SppcontrSj

24VLDB 2009

Clustered Overlay Formation

Given an initial configuration Each peer forming a cluster by its own All peers in a single cluster

Our game model addresses cluster formationVLDB 2009

25

Clustered Overlay Evolution

Given a clustered overlay configuration dynamic peers change their content/query workload

Our game model addresses cluster evolution /maintenance

VLDB 2009

26

Cluster Reformulation Protocol

Each peer: evaluates its cost or contribution for all clusters

in the system selects the best strategyo computes the gain for the best strategy (cluster)

o if gain > 0, pi moves to best cluster

),(),( newicurip SppcostSppcostgaini

The protocol is: uncoordinated and based on local decisions made by each peer independently

VLDB 2009

27

When does a peer determine it is its turn to play?

Event-based: after it becomes aware of a relevant event

- evaluation of queries in their local workload for selfish peers

- providing results to a query for altruistic peers

Trigger-based: when it registers a change in its gain

- continuously monitoring gain

Batch-based: after a number (batch) of relevant events

VLDB 2009

28

Controlling ParametersRelocations induce communication and processing overheadThe gain wrt the cost or contribution is not always such that justifies the entailed costRequired mechanisms to control the excessive overheads

Stopping Condition: A peer moves only if its gain is greater than a predefined threshold ε Playing Probability: A peer does not play at each of its turns, but with a probability Pr Quota: Each peer is assigned n moves for a time period Tq

VLDB 2009

29





30

Uncoordinated Vs Coordinated

The uncoordinated protocol performs as well as the coordinated one without the additional coordination overhead

The trigger-based protocol is the most expensive variation and the batch-based the most efficient one

The trigger-based variation adjusts faster to changes

Uncoordinated Coordinated

Coordinated protocol: cluster representatives gather and exchange the relocation

requests from their clusters requests are sorted according to gain and granted in that

order

VLDB 2009

31

Controlling Parameters

The stopping condition is the main factor determining the value of the achieved social cost

The playing probability and quota reduce the number of moves but increase the number of turns

VLDB 2009

32

Cluster FormationSetting: Starting from different initial configurations (all peers in a single cluster, each peer a cluster by its own, k random clusters, etc) Using symmetric, asymmetric, and uniform peers With selfish, altruistic and mixed peer populations

Results: The reformulation protocol identifies the underlying clusters if they exist It does not require a priori definition of the target number of clusters Its social cost in some cases (i.e. for symmetric and uniform peers) reaches a value close to the social optimum

VLDB 2009

33

Cluster Adaptation

For different update scenarios our protocol copes with changes efficiently

Selfish peers react more efficiently to workload changes, while altruistic to content changes

Reclustering reduces social cost by 10%, but requires 250 turns while the reformulation protocol only 10

Workload Updates Content Updates

VLDB 2009

34

Clustering vs Caching

Consider a cache scheme: Peers that provided results

to previous queries are cached

Future queries are forwarded to them first

If peers that receive a query forward to the peers in their own cache (transitive)

Peers in the cache are sorted based on recall

Cache is updated after each query

Symmetric peers favor clustering

For asymmetric peers using an efficient cluster topology achieves results similar to caching

Clustering adapts to changes faster (replaces all links at once)

VLDB 2009

35

Summary

We modeled cluster formation as a strategic game

We defined utility functions based on query recall and cluster membership cost

We considered both selfish and altruistic peers

We derived theoretical results regarding the stability and optimality of our game

VLDB 2009

36

Summary (cont’d) We proposed an uncoordinated protocol for

playing our game

We presented two variations: an event and a trigger-based protocol

We combined the protocol with a set of parameters for controlling the overhead

We presented an experimental evaluation of the protocol that showed that:

1. there is no need for coordination2. the protocol discovers the underlying clusters (no need

for predefining the number)3. it efficiently copes with dynamic updates

VLDB 2009

37

Future Work Study further (theoretically and experimentally) the

problem of cluster formation and evolution with multiple cluster memberships

Apply/adjust the cluster formation game for friends discovery in social networks

Use different criteria (besides recall) such as diversity, for determining the quality in clustered overlays

Compare our clustering goal (maximizing recall) to traditional goals in clustering applications (maximizing intra-cluster similarity and minimizing inter-cluster similarity)

VLDB 2009

38

Thank you

VLDB 2009

39

Input Parameters

VLDB 2009

40

Parameter Range Default Value

Topology and Strategynumber of peers (|P|) - 10000

parameter α 1 - 100 10

membership cost function (θ)

logarithmic, linear logarithmic

strategy - self-alt-hybrid-mix

degree of selfishness (d) 0.25 - 0.75 -

Data - Query Distributionnumber of semantic categories

- 10

interest locality degree (m) - 0 - 1

Controlling Parametersstopping condition (є) 0 - 10-8 10-4 , 10-3, 10-6

playing probability (Pr) 0 - 1 0.5

movement quota (n) 1 - 15 ∞

quota period in events (Tq) - 20 (5*batch size)

% of granted requests (x) 10 - 100 50

Documents

A Recall-Based Cluster Formation Game in Peer-to-Peer Systems