26
GIA: Making Gnutella- like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from the authors’ original presentatio

GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Embed Size (px)

Citation preview

Page 1: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

GIA: Making Gnutella-like P2P Systems Scalable

Yatin Chawathe

Sylvia Ratnasamy, Scott Shenker, Nick Lanham,

Lee Breslau

Parts of it has been adopted from the authors’ original presentation

Page 2: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

The Peer-to-peer Phenomenon

• Internet-scale distributed system– Distributed file-sharing applications

e.g., Napster, Gnutella, KaZaA

• File sharing is the dominant P2P app

• Mass-market– Mostly music, some video, software

Page 3: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

The Problem

• Large scale system: millions of users– Wide range of heterogeneity– Large transient user population (in a system with

100,000 nodes, 1600 nodes join and leave per minute)

• Existing search solutions cannot scale– Flooding-based solutions limit capacity– Distributed Hash Tables (DHTs) not

necessarily appropriate

Page 4: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

A Solution: GIA

• Scalable Gnutella-like P2P system

• Design principles:– Explicitly account for node heterogeneity– Query load proportional to node capacity

• Results:– GIA outperforms Gnutella by 3–5 orders of

magnitude

Page 5: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Outline

• Existing approaches

• GIA: Scalable Gnutella

• Results: Simulations & Experiments

• Conclusion

Page 6: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Gnutella

• Distributed search and download• Unstructured: ad-hoc topology

– Peers connect to random nodes– Supports keyword–based

search• Random search

– Flood queries across network• Scaling problems

– As network grows, search overhead increases

P1

P2

P4

P3

who has“madonna”

P 4 has “

madonna-

american-lif

e.mp3”

P5P6

P2 has “madonna-ray-of light.mp3”

Page 7: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Distributed Hash Tables (DHTs)

• Structured solution– Given the exact filename, find its location

• Can DHTs do file sharing? – Yes, but with lots of extra work needed for

keyword searching

• Do we need DHTs?• Not necessarily: Great at finding rare files, but most

queries are for popular files

• Poor handling of churn – why?

Page 8: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Other Solutions

• Supernodes [KaZaA]

– Classify nodes as low- or high-capacity

– Only pushes the problem to a bigger scale

• Random Walks [Lv et al]

– Forwarding is blind

– Queries can get stuck in overloaded nodes

• Biased Random Walks [Adamic et al]

– Right idea, but exacerbates overloaded-node problem

Page 9: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Outline

• Existing approaches

• GIA: Scalable Gnutella

• Results: Simulations & Experiments

• Conclusion

Page 10: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

GIA: High-level view

• Unstructured, but take node capacity into account– High-capacity nodes have room for more

queries: so, send most queries to them

• Will work only if high-capacity nodes:– Have correspondingly more answers, and– Are easily reachable from other nodes

Page 11: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

• Make high-capacity nodes easily reachable– Dynamic topology adaptation converts them into high-

degree nodes

• Make high-capacity nodes have more answers– One-hop replication

• Search efficiently– Biased random walks

• Prevent overloaded nodes– Active flow control

GIA Design

Query

Page 12: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Dynamic Topology Adaptation

• Make high-capacity nodes have high degree (i.e., more neighbors), and keep low capacity nodes within short reach from them.

• Per-node level of satisfaction, S:– 0 no neighbors, 1 enough neighbors

Satisfaction S is a function of: ● Node’s capacity ● Neighbors’ capacities

● Neighbors’ degrees

When S << 1, look for neighbors aggressively

Page 13: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Dynamic Topology Adaptation

Each GIA node maintains a host cache containing a list of other GIA nodes. The host cache is populated using a variety of methods (like contacting well-known web-based hosts, and exchanging host information using PING-PONG messages.

A node X with S < 1 randomly picks a node Y from its host cache, and examines if it can be added as a neighbor.

Page 14: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Topology adaptation steps

Life of Node X: it picks node Y from its host cache

Case 1 {Y can be added as a new neighbor}(Let Ci represent capacity of node i)if num nbrsX + 1 < max nbrs then we have roomACCEPT Y ; return

Case 2 {Node X decides if to replace an existing neighbor in favor of Y}subset := i U i nbrsX : Ci ≤ CY

if no such neighbors exist then REJECT Y ; returnelse candidate Z := highest-degree neighbor from subset

If Y has higher capacity than Zor (num nbrsZ > num nbrsY + H) {Y has fewer nbrs}

then DROP Z; ACCEPT Yelse REJECT Y{Do not drop poorly connected nodes in favor of well-connected ones}

Page 15: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Active Flow Control

• Accept queries based on capacity

– Actively allocation “tokens” to neighbors

– Send query to neighbor only if we have received token from it

• Incentives for advertising true capacity

– High capacity neighbors get more tokens to send outgoing queries

– Allocate tokens with start-time fair queuing. Nodes not using their tokens are marked inactive and this capacity id redistributed among its neighbors.

Page 16: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Practical Considerations

• Query resilience: node death– Periodic keep-alive messages

– Query responses are implicit keep-alive messages

• Determining node capacity– Function of bandwidth and storage

• Each query is assigned a unique GUID by its originator. – This helps with reverse-path forwarding of responses.

Returning queries are forwarded to a different neighbor, so that they don’t traverse the same path twice

Page 17: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Outline

• Existing approaches

• GIA: Scalable Gnutella

• Results: Simulations & Experiments

• Conclusion

Page 18: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Simulation Results

• Compare four systems– FLOOD: TTL-scoped, random topologies– RWRT: Random walks, random topologies– SUPER: Supernode-based search– GIA: search using GIA protocol suite

• Metric:– Collapse point: aggregate throughput that the

system can sustain (per node query rate beyondwhich the success rate drops below 90%)

Page 19: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Questions

• What is the relative performance of the four algorithms?

• Which of the GIA components matters the most?

• How does the system behave in the face of transient nodes?

Page 20: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

System Performance

0.00001

0.001

0.1

10

1000

0.01 0.1 1Replication Rate (percentage)

Collapse Point (qps/node)

GIA: N=10,000

SUPER: N=10,000

RWRT: N=10,000

FLOOD: N=10,000

GIA outperforms SUPER, RWRT & FLOOD by many GIA outperforms SUPER, RWRT & FLOOD by many orders of magnitude in terms of aggregate query loadorders of magnitude in terms of aggregate query load

% % %

population of the object

Page 21: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Factor Analysis

Algorithm Collapse point

RWRT 0.0005

RWRT+OHR 0.005RWRT+BIAS 0.0015

RWRT+TADAPT 0.001RWRT+FLWCTL 0.0006

Algorithm Collapse point

GIA 7

GIA – OHR 0.004GIA – BIAS 6

GIA – TADAPT 0.2

GIA – FLWCTL 2

Topologyadaptation Flow control

No single component is useful by itself; the combinationof them all is what makes GIA scalable

Page 22: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Transient Behavior

0.001

0.01

0.1

1

10

100

1000

10 100 1000 10000Per-node max-lifetime (seconds)

Collapse point (qps/node)

replication rate = 1.0%

replication rate = 0.5%

replication rate = 0.1%

Static SUPER

Static RWRT (1% repl)

Even under heavy churn, GIA outperforms the otheralgorithms by many orders of magnitude

Page 23: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Deployment

• Prototype client implementation using C++

• Deployed on PlanetLab:– 100 machines spread across 4 continents

• Measured the progress of topology adaptation…

Page 24: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Progress of Topology Adaptation

0

10

20

30

40

50

0 20 40 60 80 100Time (seconds)

Number of neighbors

C=1xC=10xC=100xC=1000x

Nodes quickly discover each other and soon reach their target “satisfaction level”

Page 25: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Outline

• Existing approaches

• GIA: Scalable Gnutella

• Results: Simulations & Experiments

• Conclusion

Page 26: GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from

Summary

• GIA: scalable Gnutella– 3–5 orders of magnitude improvement in

system capacity

• Unstructured approach is good enough!– DHTs may be overkill– Incremental changes to deployed systems

• Status: Prototype implementation deployed on PlanetLab