70
Improving Search in P2P Networks By Shadi Lahham

Improving Search in P2P Networks By Shadi Lahham

  • View
    220

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Improving Search in P2P Networks By Shadi Lahham

Improving Search in P2P Networks

By Shadi Lahham

Page 2: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 2

Purpose of This Lecture

• General understanding of P2P systems

• Appreciating the need for efficient search

• Applying different search techniques to different scenarios

Page 3: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 3

Table Of Contents

• P2P Basics– What Is P2P

– Advantages of P2P

– Types of P2P Systems

– Shortcomings

• Search Methods– The Search Problem

– Current Methods

– Suggested Methods

• Experimental Setup– Metrics– Data Collection– Calculating Costs

• Analysis of Results

• Conclusions

Page 4: Improving Search in P2P Networks By Shadi Lahham

Introduction

P2P Basics

Page 5: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 5

What is P2P

• Distributed system

• Peers (nodes) are servers and clients simultaneously

• Peers are of equal roles

• Resources shared across peers

• No central server needed

• Examples of P2P system

Page 6: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 6

P2P Overview

file3f3

file2f2

file1f1

FileKey

Page 7: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 7

Advantages of P2P

• P2P vs. Centralized Servers– Distributes disk space / bandwidth

– Inexpensively scalable

– Self organized (autonomous)

– Load balancing

– Adaptative / fault tolerant

– Less susceptible to attacks

– Allows for redundancy

Page 8: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 8

Types of P2P Systems

• Hybrid ( napster )

• Pure ( gnutella )

• Super Peers ( kaZaA )

Page 9: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 9

Hybrid ( napster )

Page 10: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 10

Pure ( gnutella )

Page 11: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 11

Super Peers ( kaZaA )

• Make use of heterogeneity– Powerful peers serve as super peers

– Weaker peers act as clients

• Super-peers index clients’ files– Requires updates on join/leave/update

• Queries handled at super-peer level– Saves query costs

Page 12: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 12

Super Peers ( kaZaA )

Page 13: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 13

Hybrid - Shortcomings

• High cost on centralized index

• Performance & scalability bottleneck

• Needs maintenance

• Vulnerable ! Highly visible target

Page 14: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 14

Pure - Shortcomings

• Inefficient search (flooding)

• Heterogeneity of peers not considered– Bottlenecks (limited peers)

– Fragmentation

Page 15: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 15

Super Peers - Shortcomings

• Super nodes might become bottlenecks for clients– requires redundancy

• Bad selection of supernodes might cause even worse problems

Page 16: Improving Search in P2P Networks By Shadi Lahham

Search Methods

Page 17: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 17

The Search Problem

• Connected graph

• Might contain cycles

• Individual node doesn’t know structure

• Only knows its neighbors

• No idea where data can be found

Page 18: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 18

The Search Problem

• Goal : Find as many occurrences of the data using min time and resources

• Solution : – BFS ?

– Bounded BFS ?– (naive approaches)

Page 19: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 19

Bounded BFS Search

TTL=2TTL=1TTL=0

Page 20: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 20

Bounded BFS Search

• Messages get a global TTL (time to live)

• Algorithm– Source broadcasts a message to a subset of

neighbors

– Neighbors search locally . Results are sent to source if found

– TTL = TTL – 1;

– As long as TTL > 0 Nodes forward message to neighbors

• Downside : wastes bandwidth / processing

Page 21: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 21

Current Methods

• Gnutella - BFS – High cost

– Gets complete results ( for depth D)

– Relatively short time

• Freenet - DFS – Poor response time

– Minimizes BW costs

Page 22: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 22

Suggested Methods

• Iterative deepening

• Directed BFS

• Local Indices

Page 23: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 23

Iterative Deepening

• Idea:– Search at a small depth and increase if

required

– Aims to minimize the cost of BFS without detracting from it’s ability to satisfy queries

• Notice that given enough iterations this method returns %100 results of BFS

Page 24: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 24

Iterative Deepening (cont…)

• Elements :– Policies P={a,b,c,..} define deepening

behavior

– BFS is run to depth a and frozen

– If source is satisfied it stops the process

– Otherwise it asks BFS to resume to depth b

– Process is repeated until source satisfied or we reach the last policy item

Page 25: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 25

Iterative Deepening (cont…)

• Elements :– We can specify how long to wait

between iterations

– We need a system-wide message ID to identify individual messages

Page 26: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 26

Example P={1,3,4} W=1

Page 27: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 27

Directed BFS

• Idea:– Choose a subset of neighbors to query

– Neighbors will BFS as usual

– Aims to provide a balance between good response time and results

– Minimize costs of full BFS

• Notice that only a subset of possible results are returned so we might fail to satisfy query

Page 28: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 28

Directed BFS Example

TTL=2TTL=1TTL=0

Page 29: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 29

Directed BFS (cont…)

• But which neighbors to pick ??– Maintain simple statistics on neighbors

to derive heuristics• Highest past results • Lowest average hops

– (close to nodes containing useful data) • High message count

– (stable - can handle large flow) • Shortest message queue

– (long implies saturation)• More to come …

Page 30: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 30

Local Indices

• Idea:– Nodes hold metadata of all nodes at

radius r

– Can process query at a few nodes, but get same number of results

– Aims to balance satisfaction / costs

Page 31: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 31

Local Indices

• Elements:– Policies P={a,b,c,..} define the depths at

which we search• Example P={1,5,6}• Nodes at depth 1 process the query• Nodes at depth 2,3,4 forward without

processing• Policy ends at depth 6

– System-wide Radius r (small ~ 50K metadata )

Page 32: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 32

Example P={1,4}

Process

Don’t process

r = ?

Page 33: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 33

Local Indices (cont…)

– Notice that now there is an overhead

– On Join• Send join message of TTL = r • Direct Exchange of metadata

– On leave / timeout• remove metadata of gone / dead nodes

– On Update• Send update message of TTL = r

Page 34: Improving Search in P2P Networks By Shadi Lahham

Experimental Setup

Page 35: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 35

Metrics

• How to compare methods ?1. Costs

2. Results

3. Time

Page 36: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 36

Metrics

1. Costs – We do not base cost on a specific query but

rather calculate the average cost on Q rep ,

a representative set of real queries submitted

– It makes sense to discuss costs in aggregate (i.e., over all the nodes in the network)

– Therefore our two cost metrics are• Average aggregate bandwidth • Average aggregate processing cost

Page 37: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 37

Metrics

2. Results Quality– Number of results

– Satisfaction

3. Time to satisfaction

Page 38: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 38

Data Collection

• Data gathered from Gnutella network

• Directly measured– Iterative deepening

– Directed BFS

• Performance data & analysis– Local indices

Page 39: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 39

Data Collection

Number of hops

Response time

Results per message

Source IP

Etc …

Collected Data

Page 40: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 40

Data Collection

Symbol Description

M(Q; n) # of response messages received for query Q, from n hops away

R(Q; n) # of results received for query Q, from n hops away

N(Q; n) # of nodes n hops away that process Q

C(Q; n) # of redundant edges n hops away

Extracted Data

Page 41: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 41

Calculating Costs

• We’ve seen two types of costs– Bandwidth (BW) costs

– Processing costs

• Calculations should take into account– Costs of sending a query

– Costs of sending replies

• A example of calculating BW costs

Page 42: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 42

Calculating Costs

BWbfs (Q) = ∑ ( a(Q) · (N(Q,n) + C(Q,n)) D

n=1

+ n · ( c · R(Q,n) + d · M(Q,n))

a(Q) Size of query Q d Size of response message

c Size of result record D Max TTL

Page 43: Improving Search in P2P Networks By Shadi Lahham

Analysis of Results

Iterative Deepening

Page 44: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 44

Symbols Used

Symbol Definition

D Maximum time-to-live of a message, in terms of hops

Z Number of results needed to satisfy a query

Qrep Representative set of queries for the Gnutella network

W Waiting time (in seconds) between iterations

Ng Number of neighbors of client (source node)

Page 45: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 45

Results – Iterative Deepening

• Recall that iterative deepening policies P={a,b,c,..} define deepening behavior

• In order to have the same level of satisfaction as BFS a policy must have D as the last depth

• Also note the degenerate case policy {D} which is the bounded BFS we presenter earlier

Page 46: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 46

Results – Iterative Deepening

• Variables– Define :

Pd = { d , d+1 , … , D }

P = { Pd for d = 1,2,…,D }

= { {1,2,…D}, {2,3,…D},…, {D-1,…D},{D} }

W (waiting time) can take the values

1,2,4,6,150 (seconds)

Page 47: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 47

Results – Iterative Deepening

• Fixed values Z = 50 , Ng = 8

– Increasing Z• Lower probability of satisfaction• Higher costs• More results

– Decreasing Ng• Slightly Lower probability of satisfaction• Significantly Lower costs

Page 48: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 48

Results – Iterative Deepening

Page 49: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 49

Results – Iterative Deepening

• BW costs same for P7 for all W’s

• As d increases costs increase.the larger d is the more likely the policy will “overshoot”

• As W decreases costs increaseon a small W premature determination of un-satisfaction again leads to overshooting

Page 50: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 50

Results – Iterative Deepening

Page 51: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 51

Results – Iterative Deepening

• Time to satisfaction is inversely proportional to cost

• Choose a policy that balances average waiting time and cost

• For example {P5 W=6}

Page 52: Improving Search in P2P Networks By Shadi Lahham

Analysis of Results

Directed BFS

Page 53: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 53

Heuristics - Directed BFS

Symbol HeuristicRAND (Random)

>RES Returned the greatest number of results*

<TIME Had the shortest average time to satisfaction*

<HOPS smallest average number of hops taken by results*

>MSG Sent our client the greatest number of messages (all types)

<QLEN Had the shortest message queue

<LAT Had the shortest latency

>DEG Had the highest degree (number of neighbors)

*in the past 10 queries

Page 54: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 54

Results – Directed BFS

Page 55: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 55

Results – Directed BFS

Page 56: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 56

Results – Directed BFS

Page 57: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 57

Results – Directed BFS

• Costs in directed BFS unaffected by Z

• Users more aware of quality of results than BW costs – We recommend >RES <TIME

– Still cheaper than full BFS (~65%)

• Sum up till now– Iterative deepening - lowest costs

– Directed BFS – fastest time to satisfaction

Page 58: Improving Search in P2P Networks By Shadi Lahham

Analysis of Results

Local Indices

Page 59: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 59

Results – Local Indices

• Recall that iterative deepening policies P={a,b,c,..} define the depths at which we search

• We choose policies that minimize the number of nodes that process the query

Page 60: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 60

Results – Local Indices

• We consider the following policies

Page 61: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 61

Results – Local Indices

• Also recall that joins / leaves / updates have a BW overhead

• QJR (QueryJoinRatio) gives us the ratio of queries to joins/leaves in the network

Page 62: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 62

Results – Local Indices

P0 r=0

Page 63: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 63

Results – Local Indices

Page 64: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 64

Results – Local Indices

21MB

71 KB

Page 65: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 65

Results – Local Indices

• Time to Satisfaction– Because most Query and Response

messages have r fewer hops to travel, the time to forward messages to the outermost depth and back to the source will be shorter than for BFS

– However, because nodes have larger indices, processing the query should take more time.

Page 66: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 66

Results – Local Indices

• Summary– Huge savings in costs

– Time to satisfaction comparable to BFS

– Determining r must take QJR into consideration

• For current QJR values (e.g. Gnutella = 10) r =1 is a good choice

Page 67: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 67

Relative performance

Technique Time to satisfy

Satisfaction

Probability

Number of results

Aggregate Bandwidth

Aggregate

Processing

Bounded BFS 100% 100% 100% 100% 100%

Iterative deepening 190% 100% 19% 28% 47%

Directed BFS 140% 86% 37% 38% 28%

Local indices

≈100%

100% 100% 39% 51%

Page 68: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 68

Conclusions

• All 3 methods show significant bandwidth and processing savings

• Methods are simple and easy to implement in current systems

• Methods might be used in conjunction

Page 69: Improving Search in P2P Networks By Shadi Lahham

Improving P2P Search 69

Bibliography

Yang, Beverly; Garcia-Molina, Hector :• Improving Search in Peer-to-Peer Systems

http://newdbpubs.stanford.edu:8090/pub/2002-28

• Improving Search in Peer-to-Peer Systems [extended]

http://newdbpubs.stanford.edu:8090/pub/2001-47

• Designing a Super-peer Network http://newdbpubs.stanford.edu:8090/pub/2003-33

Gnutella websitehttp://www.gnutella.com/

Page 70: Improving Search in P2P Networks By Shadi Lahham

Thank you