Improving Search in P2P Networks

Preview:

DESCRIPTION

Improving Search in P2P Networks. By Shadi Lahham. Purpose of This Lecture. General understanding of P2P systems Appreciating the need for efficient search Applying different search techniques to different scenarios. P2P Basics What Is P2P Advantages of P2P Types of P2P Systems - PowerPoint PPT Presentation

Citation preview

Improving Search in P2P Networks

By Shadi Lahham

Improving P2P Search 2

Purpose of This Lecture

• General understanding of P2P systems

• Appreciating the need for efficient search

• Applying different search techniques to different scenarios

Improving P2P Search 3

Table Of Contents

• P2P Basics– What Is P2P

– Advantages of P2P

– Types of P2P Systems

– Shortcomings

• Search Methods– The Search Problem

– Current Methods

– Suggested Methods

• Experimental Setup– Metrics– Data Collection– Calculating Costs

• Analysis of Results

• Conclusions

Introduction

P2P Basics

Improving P2P Search 5

What is P2P

• Distributed system

• Peers (nodes) are servers and clients simultaneously

• Peers are of equal roles

• Resources shared across peers

• No central server needed

• Examples of P2P system

Improving P2P Search 6

P2P Overview

file3f3

file2f2

file1f1

FileKey

Improving P2P Search 7

Advantages of P2P

• P2P vs. Centralized Servers– Distributes disk space / bandwidth

– Inexpensively scalable

– Self organized (autonomous)

– Load balancing

– Adaptative / fault tolerant

– Less susceptible to attacks

– Allows for redundancy

Improving P2P Search 8

Types of P2P Systems

• Hybrid ( napster )

• Pure ( gnutella )

• Super Peers ( kaZaA )

Improving P2P Search 9

Hybrid ( napster )

Improving P2P Search 10

Pure ( gnutella )

Improving P2P Search 11

Super Peers ( kaZaA )

• Make use of heterogeneity– Powerful peers serve as super peers

– Weaker peers act as clients

• Super-peers index clients’ files– Requires updates on join/leave/update

• Queries handled at super-peer level– Saves query costs

Improving P2P Search 12

Super Peers ( kaZaA )

Improving P2P Search 13

Hybrid - Shortcomings

• High cost on centralized index

• Performance & scalability bottleneck

• Needs maintenance

• Vulnerable ! Highly visible target

Improving P2P Search 14

Pure - Shortcomings

• Inefficient search (flooding)

• Heterogeneity of peers not considered– Bottlenecks (limited peers)

– Fragmentation

Improving P2P Search 15

Super Peers - Shortcomings

• Super nodes might become bottlenecks for clients– requires redundancy

• Bad selection of supernodes might cause even worse problems

Search Methods

Improving P2P Search 17

The Search Problem

• Connected graph

• Might contain cycles

• Individual node doesn’t know structure

• Only knows its neighbors

• No idea where data can be found

Improving P2P Search 18

The Search Problem

• Goal : Find as many occurrences of the data using min time and resources

• Solution : – BFS ?

– Bounded BFS ?– (naive approaches)

Improving P2P Search 19

Bounded BFS Search

TTL=2TTL=1TTL=0

Improving P2P Search 20

Bounded BFS Search

• Messages get a global TTL (time to live)

• Algorithm– Source broadcasts a message to a subset of

neighbors

– Neighbors search locally . Results are sent to source if found

– TTL = TTL – 1;

– As long as TTL > 0 Nodes forward message to neighbors

• Downside : wastes bandwidth / processing

Improving P2P Search 21

Current Methods

• Gnutella - BFS – High cost

– Gets complete results ( for depth D)

– Relatively short time

• Freenet - DFS – Poor response time

– Minimizes BW costs

Improving P2P Search 22

Suggested Methods

• Iterative deepening

• Directed BFS

• Local Indices

Improving P2P Search 23

Iterative Deepening

• Idea:– Search at a small depth and increase if

required

– Aims to minimize the cost of BFS without detracting from it’s ability to satisfy queries

• Notice that given enough iterations this method returns %100 results of BFS

Improving P2P Search 24

Iterative Deepening (cont…)

• Elements :– Policies P={a,b,c,..} define deepening

behavior

– BFS is run to depth a and frozen

– If source is satisfied it stops the process

– Otherwise it asks BFS to resume to depth b

– Process is repeated until source satisfied or we reach the last policy item

Improving P2P Search 25

Iterative Deepening (cont…)

• Elements :– We can specify how long to wait

between iterations

– We need a system-wide message ID to identify individual messages

Improving P2P Search 26

Example P={1,3,4} W=1

Improving P2P Search 27

Directed BFS

• Idea:– Choose a subset of neighbors to query

– Neighbors will BFS as usual

– Aims to provide a balance between good response time and results

– Minimize costs of full BFS

• Notice that only a subset of possible results are returned so we might fail to satisfy query

Improving P2P Search 28

Directed BFS Example

TTL=2TTL=1TTL=0

Improving P2P Search 29

Directed BFS (cont…)

• But which neighbors to pick ??– Maintain simple statistics on neighbors

to derive heuristics• Highest past results • Lowest average hops

– (close to nodes containing useful data) • High message count

– (stable - can handle large flow) • Shortest message queue

– (long implies saturation)• More to come …

Improving P2P Search 30

Local Indices

• Idea:– Nodes hold metadata of all nodes at

radius r

– Can process query at a few nodes, but get same number of results

– Aims to balance satisfaction / costs

Improving P2P Search 31

Local Indices

• Elements:– Policies P={a,b,c,..} define the depths at

which we search• Example P={1,5,6}• Nodes at depth 1 process the query• Nodes at depth 2,3,4 forward without

processing• Policy ends at depth 6

– System-wide Radius r (small ~ 50K metadata )

Improving P2P Search 32

Example P={1,4}

Process

Don’t process

r = ?

Improving P2P Search 33

Local Indices (cont…)

– Notice that now there is an overhead

– On Join• Send join message of TTL = r • Direct Exchange of metadata

– On leave / timeout• remove metadata of gone / dead nodes

– On Update• Send update message of TTL = r

Experimental Setup

Improving P2P Search 35

Metrics

• How to compare methods ?1. Costs

2. Results

3. Time

Improving P2P Search 36

Metrics

1. Costs – We do not base cost on a specific query but

rather calculate the average cost on Q rep ,

a representative set of real queries submitted

– It makes sense to discuss costs in aggregate (i.e., over all the nodes in the network)

– Therefore our two cost metrics are• Average aggregate bandwidth • Average aggregate processing cost

Improving P2P Search 37

Metrics

2. Results Quality– Number of results

– Satisfaction

3. Time to satisfaction

Improving P2P Search 38

Data Collection

• Data gathered from Gnutella network

• Directly measured– Iterative deepening

– Directed BFS

• Performance data & analysis– Local indices

Improving P2P Search 39

Data Collection

Number of hops

Response time

Results per message

Source IP

Etc …

Collected Data

Improving P2P Search 40

Data Collection

Symbol Description

M(Q; n) # of response messages received for query Q, from n hops away

R(Q; n) # of results received for query Q, from n hops away

N(Q; n) # of nodes n hops away that process Q

C(Q; n) # of redundant edges n hops away

Extracted Data

Improving P2P Search 41

Calculating Costs

• We’ve seen two types of costs– Bandwidth (BW) costs

– Processing costs

• Calculations should take into account– Costs of sending a query

– Costs of sending replies

• A example of calculating BW costs

Improving P2P Search 42

Calculating Costs

BWbfs (Q) = ∑ ( a(Q) · (N(Q,n) + C(Q,n)) D

n=1

+ n · ( c · R(Q,n) + d · M(Q,n))

a(Q) Size of query Q d Size of response message

c Size of result record D Max TTL

Analysis of Results

Iterative Deepening

Improving P2P Search 44

Symbols Used

Symbol Definition

D Maximum time-to-live of a message, in terms of hops

Z Number of results needed to satisfy a query

Qrep Representative set of queries for the Gnutella network

W Waiting time (in seconds) between iterations

Ng Number of neighbors of client (source node)

Improving P2P Search 45

Results – Iterative Deepening

• Recall that iterative deepening policies P={a,b,c,..} define deepening behavior

• In order to have the same level of satisfaction as BFS a policy must have D as the last depth

• Also note the degenerate case policy {D} which is the bounded BFS we presenter earlier

Improving P2P Search 46

Results – Iterative Deepening

• Variables– Define :

Pd = { d , d+1 , … , D }

P = { Pd for d = 1,2,…,D }

= { {1,2,…D}, {2,3,…D},…, {D-1,…D},{D} }

W (waiting time) can take the values

1,2,4,6,150 (seconds)

Improving P2P Search 47

Results – Iterative Deepening

• Fixed values Z = 50 , Ng = 8

– Increasing Z• Lower probability of satisfaction• Higher costs• More results

– Decreasing Ng• Slightly Lower probability of satisfaction• Significantly Lower costs

Improving P2P Search 48

Results – Iterative Deepening

Improving P2P Search 49

Results – Iterative Deepening

• BW costs same for P7 for all W’s

• As d increases costs increase.the larger d is the more likely the policy will “overshoot”

• As W decreases costs increaseon a small W premature determination of un-satisfaction again leads to overshooting

Improving P2P Search 50

Results – Iterative Deepening

Improving P2P Search 51

Results – Iterative Deepening

• Time to satisfaction is inversely proportional to cost

• Choose a policy that balances average waiting time and cost

• For example {P5 W=6}

Analysis of Results

Directed BFS

Improving P2P Search 53

Heuristics - Directed BFS

Symbol HeuristicRAND (Random)

>RES Returned the greatest number of results*

<TIME Had the shortest average time to satisfaction*

<HOPS smallest average number of hops taken by results*

>MSG Sent our client the greatest number of messages (all types)

<QLEN Had the shortest message queue

<LAT Had the shortest latency

>DEG Had the highest degree (number of neighbors)

*in the past 10 queries

Improving P2P Search 54

Results – Directed BFS

Improving P2P Search 55

Results – Directed BFS

Improving P2P Search 56

Results – Directed BFS

Improving P2P Search 57

Results – Directed BFS

• Costs in directed BFS unaffected by Z

• Users more aware of quality of results than BW costs – We recommend >RES <TIME

– Still cheaper than full BFS (~65%)

• Sum up till now– Iterative deepening - lowest costs

– Directed BFS – fastest time to satisfaction

Analysis of Results

Local Indices

Improving P2P Search 59

Results – Local Indices

• Recall that iterative deepening policies P={a,b,c,..} define the depths at which we search

• We choose policies that minimize the number of nodes that process the query

Improving P2P Search 60

Results – Local Indices

• We consider the following policies

Improving P2P Search 61

Results – Local Indices

• Also recall that joins / leaves / updates have a BW overhead

• QJR (QueryJoinRatio) gives us the ratio of queries to joins/leaves in the network

Improving P2P Search 62

Results – Local Indices

P0 r=0

Improving P2P Search 63

Results – Local Indices

Improving P2P Search 64

Results – Local Indices

21MB

71 KB

Improving P2P Search 65

Results – Local Indices

• Time to Satisfaction– Because most Query and Response

messages have r fewer hops to travel, the time to forward messages to the outermost depth and back to the source will be shorter than for BFS

– However, because nodes have larger indices, processing the query should take more time.

Improving P2P Search 66

Results – Local Indices

• Summary– Huge savings in costs

– Time to satisfaction comparable to BFS

– Determining r must take QJR into consideration

• For current QJR values (e.g. Gnutella = 10) r =1 is a good choice

Improving P2P Search 67

Relative performance

Technique Time to satisfy

Satisfaction

Probability

Number of results

Aggregate Bandwidth

Aggregate

Processing

Bounded BFS 100% 100% 100% 100% 100%

Iterative deepening 190% 100% 19% 28% 47%

Directed BFS 140% 86% 37% 38% 28%

Local indices

≈100%

100% 100% 39% 51%

Improving P2P Search 68

Conclusions

• All 3 methods show significant bandwidth and processing savings

• Methods are simple and easy to implement in current systems

• Methods might be used in conjunction

Improving P2P Search 69

Bibliography

Yang, Beverly; Garcia-Molina, Hector :• Improving Search in Peer-to-Peer Systems

http://newdbpubs.stanford.edu:8090/pub/2002-28

• Improving Search in Peer-to-Peer Systems [extended]

http://newdbpubs.stanford.edu:8090/pub/2001-47

• Designing a Super-peer Network http://newdbpubs.stanford.edu:8090/pub/2003-33

Gnutella websitehttp://www.gnutella.com/

Thank you

Recommended