32
Search and Replication in Unstructured Peer- to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)

Search and Replication in Unstructured Peer-to-Peer Networks

  • Upload
    katen

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Search and Replication in Unstructured Peer-to-Peer Networks. Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker). Disclaimer. Results, statements, opinions in this talk do not represent Cisco in anyway - PowerPoint PPT Presentation

Citation preview

Page 1: Search and Replication in Unstructured Peer-to-Peer Networks

Search and Replication in Unstructured Peer-to-Peer

Networks

Pei CaoCisco Systems, Inc.

(Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)

Page 2: Search and Replication in Unstructured Peer-to-Peer Networks

Disclaimer

• Results, statements, opinions in this talk do not represent Cisco in anyway

• This talk is about technical problems in networking, and does not discuss moral, legal and other issues related to P2P networks and their applications

Page 3: Search and Replication in Unstructured Peer-to-Peer Networks

Outline

• Brief survey of P2P architectures• Evaluation methodologies• Search methods• Replication strategies and analysis• Simulation results

Page 4: Search and Replication in Unstructured Peer-to-Peer Networks

Characteristics of Peer-to-Peer Networks

• Unregulated overlay network• Current application: file swapping• Dynamic: nodes join or leave frequently• Example systems:

– Napster, Gnutella;– Freenet, FreeHaven, MajoNation, Alpine, ...– JXTA, Ohaha, …– Chord, CAN, “Past”, “Tapestry”, Oceanstore

Page 5: Search and Replication in Unstructured Peer-to-Peer Networks

Architecture Comparisons

• Napster: centralized– A central website to hold file directory of all participants;

Very efficient– Scales– Problem: Single point of failure

• Gnutella: decentralized– No central directory; use “flooding w/ TTL”– Very resilient against failure– Problem: Doesn’t scale

Page 6: Search and Replication in Unstructured Peer-to-Peer Networks

Architecture Comparisons

• Various research projects such as CAN: decentralized, but “structured”– CAN: distributed hash table– “Structure”: all nodes participate in a precise

scheme to maintain certain invariants– Extra work when nodes join and leave– Scales very well, but can be fragile

Page 7: Search and Replication in Unstructured Peer-to-Peer Networks

Architecture Comparisons

• FreeNet: decentralized, but semi-structured– Intended for file storage– Files are stored along a route biased by hints– Queries for files follow a route biased by the

same hints– Scales very well– Problem: would it really work?

• Simulation says yes in most cases, but no proof so far

Page 8: Search and Replication in Unstructured Peer-to-Peer Networks

Our Focus: Gnutella-Style Systems

• Advantages of Gnutella: – Support more flexible queries

• Typically, precise “name” search is a small portion of all queries

– Simplicity, high resilience against node failures• Problems of Gnutella: Scalability

– Bottleneck: interrupt rates on individual nodes– Self-limiting network: nodes have to exit to get

real work done!

Page 9: Search and Replication in Unstructured Peer-to-Peer Networks

Evaluation Methodologies

Simulation based:• Network topology• Distribution of object popularity• Distribution of replication density of objects

Page 10: Search and Replication in Unstructured Peer-to-Peer Networks

Evaluation Methods

• Network topologies:– Uniform Random Graph (Random)

• Average and median node degree is 4– Power-Law Random Graph (PLRG)

• max node degree: 1746, median: 1, average: 4.46– Gnutella network snapshot (Gnutella)

• Oct 2000 snapshot• max degree: 136, median: 2, average: 5.5

– Two-dimensional grid (Grid)

Page 11: Search and Replication in Unstructured Peer-to-Peer Networks

Modeling Methods

• Object popularity distribution pi

– Uniform– Zipf-like

• Object replication density distribution ri

– Uniform– Proportional: ri pi

– Square-Root: ri pi

Page 12: Search and Replication in Unstructured Peer-to-Peer Networks

Evaluation Metrics

• Overhead: average # of messages per node per query

• Probability of search success: Pr(success)• Delay: # of hops till success

Page 13: Search and Replication in Unstructured Peer-to-Peer Networks

Load on Individual Nodes

• Why is a node interrupted:– To process a query– To route the query to other nodes– To process duplicated queries sent to it

Page 14: Search and Replication in Unstructured Peer-to-Peer Networks

Duplication in Flooding-Based Searches

. . . . . . . . . . . .

• Duplication increases as TTL increases in flooding• Worst case: a node A is interrrupted by N * q * degree(A)

messages

1

2 3 4

5 6 7 8

Page 15: Search and Replication in Unstructured Peer-to-Peer Networks

Duplications in Various Network Topologies

Flooding: % duplicate msgs vs TTL

0

20

40

60

80

100

2 3 4 5 6 7 8 9

TTL

dupl

icat

e m

sgs

(%)

Random

PLRG

Gnutella

Grid

Page 16: Search and Replication in Unstructured Peer-to-Peer Networks

Relationship between TTL and Search Successes

Flooding: Pr(success) vs TTL

0

20

40

60

80

100

120

2 3 4 5 6 7 8 9

TTL

Pr(s

ucce

ss) % Random

PLRG

Gnutella

Grid

Page 17: Search and Replication in Unstructured Peer-to-Peer Networks

Problems with Simple TTL-Based Flooding

• Hard to choose TTL:– For objects that are widely present in the

network, small TTLs suffice– For objects that are rare in the network, large

TTLs are necessary• Number of query messages grow

exponentially as TTL grows

Page 18: Search and Replication in Unstructured Peer-to-Peer Networks

Idea #1: Adaptively Adjust TTL

• “Expanding Ring”– Multiple floods: start with TTL=1; increment

TTL by 2 each time until search succeeds• Success varies by network topology

– For “Random”, 30- to 70- fold reduction in message traffic

– For Power-law and Gnutella graphs, only 3- to 9- fold reduction

Page 19: Search and Replication in Unstructured Peer-to-Peer Networks

Limitations of Expanding Ring

Flooding: #nodes visited vs TTL

0

2000

4000

6000

8000

10000

12000

2 3 4 5 6 7 8 9

TTL

#nod

es v

isite

d

Random

PLRG

Gnutella

Grid

Page 20: Search and Replication in Unstructured Peer-to-Peer Networks

Idea #2: Random Walk

• Simple random walk– takes too long to find anything!

• Multiple-walker random walk– N agents after each walking T steps visits as

many nodes as 1 agent walking N*T steps– When to terminate the search: check back with

the query originator once every C steps

Page 21: Search and Replication in Unstructured Peer-to-Peer Networks

Search Traffic Comparison

avg. # msgs per node per query

1.863

2.85

0.053

0.961

0.027 0.0310

0.51

1.52

2.53

Random Gnutella

Flood Ring Walk

Page 22: Search and Replication in Unstructured Peer-to-Peer Networks

Search Delay Comparison

# hops till success

2.51 2.394.03 3.4

9.12

7.3

0

2

4

6

8

10

Random Gnutella

Flood Ring Walk

Page 23: Search and Replication in Unstructured Peer-to-Peer Networks

Lessons Learnt about Search Methods

• Adaptive termination • Minimize message duplication• Small expansion in each step

Page 24: Search and Replication in Unstructured Peer-to-Peer Networks

Flexible Replication

• In unstructured systems, search success is essentially about coverage: visiting enough nodes to probabilistically find the object => replication density matters

• Limited node storage => what’s the optimal replication density distribution?– In Gnutella, only nodes who query an object store it =>

ri pi – What if we have different replication strategies?

Page 25: Search and Replication in Unstructured Peer-to-Peer Networks

Optimal ri Distribution

• Goal: minimize ( pi/ ri ), where ri =R• Calculation:

– introduce Lagrange multiplier , find ri and that minimize:

( pi/ ri ) + * ( ri - R)

=> - pi/ ri2 = 0 for all i

=> ri pi

Page 26: Search and Replication in Unstructured Peer-to-Peer Networks

Square-Root Distribution

• General principle: to minimize ( pi/ ri ) under constraint ri =R, make ri propotional to square root of pi

• Other application examples:– Bandwidth allocation to minimize expected

download times– Server load balancing to minimize expected

request latency

Page 27: Search and Replication in Unstructured Peer-to-Peer Networks

Achieving Square-Root Distribution

• Suggestions from some heuristics– Store an object at a number of nodes that is

proportional to the number of node visited in order to find the object

– Each node uses random replacement• Two implementations:

– Path replication: store the object along the path of a successful “walk”

– Random replication: store the object randomly among nodes visited by the agents

Page 28: Search and Replication in Unstructured Peer-to-Peer Networks

Evaluation of Replication Methods

• Metrics– Overall message traffic– Search delay

• Dynamic simulation– Assume Zipf-like object query probability– 5 query/sec Poisson arrival– Results are during 5000sec-9000sec

Page 29: Search and Replication in Unstructured Peer-to-Peer Networks

Distribution of ri

Replication Distribution: Path Replication

0.001

0.01

0.1

11 10 100

object rank

repl

icat

ion

ratio

(n

orm

aliz

ed)

real result

square root

Page 30: Search and Replication in Unstructured Peer-to-Peer Networks

Total Search Message Comparison

• Observation: path replication is slightly inferior to random replication

Avg. # msgs per node (5000-9000sec)

0

10000

20000

30000

40000

50000

60000

Owner RepPath RepRandom Rep

Page 31: Search and Replication in Unstructured Peer-to-Peer Networks

Search Delay Comparison

Dynamic simulation: Hop Distribution (5000~9000s)

0

20

40

60

80

100

120

1 2 4 8 16 32 64 128 256

#hops

quer

ies

finis

hed

(%)

Owner Replication

Path Replication

Random Replication

Page 32: Search and Replication in Unstructured Peer-to-Peer Networks

Summary

• Multi-walker random walk scales much better than flooding– It won’t scale as perfectly as structured

network, but current unstructured network can be improved significantly

• Square-root replication distribution is desirable and can be achieved via path replication