32
1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T) –Nick Lanham (UC Berkeley) –Scott Shenker (ICSI) Published in: – IEEE SIGCOMM 2003 Reviewed by: – Todd Sproull Discussion Leader: – Christoph Jechlitschek CS7701: Research Seminar on Networking http://arl.wustl.edu/~jst/cse/770/

1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

Embed Size (px)

Citation preview

Page 1: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

1 - CS7701 – Fall 2004

Review of: Making Gnutella-like P2P Systems

Scalable

• Paper by: – Yatin Chawathe (AT&T)–Sylvia Ratnasamy (Intel)–Lee Breslau (AT&T)–Nick Lanham (UC Berkeley)–Scott Shenker (ICSI)

• Published in:– IEEE SIGCOMM 2003

• Reviewed by:– Todd Sproull

•Discussion Leader:– Christoph Jechlitschek

CS7701: Research Seminar on Networkinghttp://arl.wustl.edu/~jst/cse/770/

Page 2: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

2 - CS7701 – Fall 2004

Outline

• Introduction• Problem Description• Gia Design• Simulation Results• Implementation • Conclusions

Page 3: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

3 - CS7701 – Fall 2004

Introduction

• Peer to Peer (P2P) Networks– “Systems serving other Systems”– Potential for millions of users– Gained consumer popularity through Napster

• Napster– Started in 1999 by Shawn Fanning– Enabled music fans to trade songs over a P2P network– Clients connected to centralized Napster Servers to locate music– 2001 Judge ruled Napster had to block all copyrighted material– 2002 Napster folded

• RIAA continued after Napster clones• Gnutella

– March 14, 2000 Nullsoft released first version of software• Created by Justin Frankel and Tom Pepper• Nullsoft pulled the software the next day

– Software was reverse engineered – Open Source clients became available

– Built around decentralized approach

Page 4: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

4 - CS7701 – Fall 2004

Gnutella

• Distributed search and download• Unstructured: ad-hoc topology

– Peers connect to random nodes• Random search

– Flood queries across network• Scaling problems

– As network grows, search overhead increases

P1

P2

P4

P3

who has“madonna”

P 4 has “

madonna-

american-life.mp3”P5

P6

P2 has “madonna-ray-of-light.mp3”

Page 5: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

5 - CS7701 – Fall 2004

Problem

• Gnutella has notoriously poor scaling – Flooding-based Solution – Just using Distributed Hash Tables does not

necessarily fix the problem• Challenge

– Improve scaling while maintain Gnutella’s simplicity

• Propose new mechanisms to fix scalability issues• Evaluate performance of these individual

components and the entire network

Page 6: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

6 - CS7701 – Fall 2004

What about DHTS?

• Distributed Hash Tables (DHTs)– Provides hash table abstraction over multiple compute nodes

• How it works– Each DHT can store data items– Data items indexed via lookup key– Overlay routing delivers requests for a given key to the responsible

node– O (log N) message hops in network of N nodes– DHT adjusts mapping of keys and neighbor tables when node set

changes

Page 7: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

7 - CS7701 – Fall 2004

Example

B’s Routing Table

Key Pointer

7 C

8 D

C

B

D

Key 6?

I have key 6

Key 6?

D’s Routing Table

Key Pointer

6 E

Nope!

Key 6?Key 6?

Key 6!

Key 6!

Key 6!

E

A

Page 8: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

8 - CS7701 – Fall 2004

DHT only P2P network?

• Problems– P2P clients are transient

• Clients joining and leaving at rates causing a fair amount of “churn” • Route failures require O (log n) repair operations

– Keyword searches are more prevalent, and more important than an exact-match queries

• “Madonna Ray of Light mp3” or “Madona Ray Light mp3” ..– Queries are for hay, not needles

• Most requests for popular content• 50% content requests for more than 100 replicas• 80% content requests for more than 80 replicas

Page 9: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

9 - CS7701 – Fall 2004

The Solution

• Design new Gnutella like P2P system “Gia”– Short for gianduia, generic form of hazelnut spread

Nutella• What’s so great about it?

– Dynamic Topology Adaptation• Accounts for heterogeneity among nodes

– Active Flow Control Scheme• Implements token based allocation for queries

– One-hop replication• Keep small nodes next to well connected “higher capacity”

nodes– Capacity refers to message processing capabilities of a node per

unit time

– Search Protocol based on Random Walks• No longer flooding the network with requests

Page 10: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

10 - CS7701 – Fall 2004

• Make high-capacity nodes easily reachable– Dynamic topology adaptation

• Make high-capacity nodes have more answers– One-hop replication

• Search efficiently– Biased random walks

• Prevent overloaded nodes– Active flow control

• Make high-capacity nodes easily reachable– Dynamic topology adaptation

• Make high-capacity nodes have more answers– One-hop replication

• Search efficiently– Biased random walks

• Prevent overloaded nodes– Active flow control

Example

Query

Page 11: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

11 - CS7701 – Fall 2004

Dynamic Topology Adaptation

• Core Component of Gia• Goals

– Ensure high capacity nodes are ones with high degree

– Keep low capacity nodes within short reach of high capacity nodes

• Accomplished through satisfaction level S– When S=0, node is dissatisfied– As node accumulates more neighbors,

satisfaction rises until it reaches a satisfaction level of 1

Page 12: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

12 - CS7701 – Fall 2004

Adding new neighbors• Adding neighbor Y to X

– Add neighbor new neighbor, if room exists

– If no room, check to see if an existing neighbor can be replaced

– Goal:• Find an existing neighbor

with capacity less then or equal to new neighbor, with the highest degree

• Do not drop an already poorly connected neighbor

• Assumptions:– Max Neighbors of X = 3– Capacity of all nodes the same

X

A

B

Y

C

Page 13: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

13 - CS7701 – Fall 2004

Token Based Flow Control

• Allows client to query the neighbor only if allowed from the neighbor– Client must have token from neighbor

• Tokens sent from a client to its neighbors periodically– Token allocation rate based on nodes ability to

process queries

Page 14: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

14 - CS7701 – Fall 2004

One Hop Replication

• Gia nodes maintain index of content of neighbors– Improves efficiency of search process– Allows for neighbors to respond to search

queries• Being “close” to content is useful

– Not necessary that you have the requested content, but instead a pointer to it

Page 15: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

15 - CS7701 – Fall 2004

Search Protocol

• Based on biased random walks– Gia node selects highest capacity neighbor that

it has tokens for and sends query– Queues message if no tokens available for any

neighbor• Uses two mechanisms for control

– TTL bounds duration of walks– Maintains MAX_RESPONSES parameter for

maximum number of answers it searches for

Page 16: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

16 - CS7701 – Fall 2004

Simulations• Four basic models

– FLOOD• Gnutella Model

– RWRT• Random Walks over Random Topologies• Proposed by Lv et al.

– SUPER• Classifies some nodes as “Super Nodes”, based on Capacity (> 1000x)

– GIA• Gia protocol suite

• Capacity– The number of messages (queries or add/drop requests) a node can process per

unit time– Derived from measured bandwidth distributions from Sariou et al.

• Fair amount of clients have dialup connections• Majority are using cable-modem or DSL • Few have “high-speed” connections

Page 17: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

17 - CS7701 – Fall 2004

Performance Metrics

• Collapse Point (CP)– Per node query rate at the point beyond which

the success rate drops below 90%. – Referred to as the knee

• Hop-count before collapse (CP-HP)– Average hop count prior to collapse

Page 18: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

18 - CS7701 – Fall 2004

Performance Comparison

0.00001

0.001

0.1

10

1000

0.01 0.1 1Replication Rate (percentage)

Co

lla

ps

e P

oin

t (q

ps

/no

de

)

GIA: N=10,000

SUPER: N=10,000

RW RT: N=10,000

FLOOD: N=10,000

Page 19: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

19 - CS7701 – Fall 2004

Factor Analysis

• Effects of individual components – Remove each

component from Gia one at a time

– Add each component to RWRT

– No single component contributes entirely to Gia’s success

Page 20: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

20 - CS7701 – Fall 2004

Multiple Searches

• CP changes with MAX_RESPONSES

• Replication Factor and MAX_RESPONSES

Page 21: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

21 - CS7701 – Fall 2004

Robustness

0.001

0.01

0.1

1

10

100

1000

10 100 1000 10000Per-node max-lifetime (seconds)

Co

llap

se p

oin

t (q

ps

/no

de)

replication rate = 1.0%

replication rate = 0.5%

replication rate = 0.1%

Static SUPER

Static RWRT (1% repl)

Page 22: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

22 - CS7701 – Fall 2004

Active Replication

• Allow higher capacity nodes to replicate files– On demand replication when high capacity

node receives query and download request

• Active replication can increase capacity of nodes serving files from a factor of 38 to 50

Page 23: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

23 - CS7701 – Fall 2004

Implementation

• Satisfaction Level– Aggressiveness of Adaptation – Exponential relationship between satisfaction

level S and adaptation interval I– Define:

• I = Adaptation interval• S = Satisfaction level• T = maximum interval between adaptation iterations• K = aggressiveness of adaptation interval

– Let I = T * K -(1-S)

Page 24: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

24 - CS7701 – Fall 2004

Satisfaction Level

• Calculating Satisfaction level– S = 0 initially and if # of

neighbors is less than predefined min

– Satisfaction Algorithm does the following

• Adds up normalized capacity of all neighbors

– High capacity neighbor with low degree is worth more than High capacity high degree

• Divide your capacity from total to find S

• Returns S=1 if S > 1 or # neighbors greater than predefined max

Page 25: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

25 - CS7701 – Fall 2004

Deployment

• Planet Lab– Wide Area service deployment testbed in North

America, Europe, Asia and the South Pacific– Deployed Gia on 83 clients– Measured time to reach “steady state”

Page 26: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

26 - CS7701 – Fall 2004

Related Work

• KaZaA– At time of SIGCOMM little had been published on

KaZaA– “Understanding KaZaA” Liang, et al. 2004

• CAP– Cluster based approach to handle scaling in Gnutella

• Based on a central clustering server• Clusters act as directory servers

• PierSearch – Published in SIGCOMM 2004– PIER + Gnutella

• PIER uses DHT for hard to find content and Gnutella for the more popular

• Gnuetella2– Aimed at fixing many of the problems with Gnutella– Not created by Gnutella founders, causing some

controversy in the community

Page 27: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

27 - CS7701 – Fall 2004

Conclusion

• Gia proves to be a scalable Gnutella– 3 to 5 orders of magnitude improvement

• Unstructed system works well for popular content– DHT not necessary in most cases

• Working implementation on Planet Lab

Page 28: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

28 - CS7701 – Fall 2004

Page 29: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

29 - CS7701 – Fall 2004

Page 30: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

30 - CS7701 – Fall 2004

Page 31: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

31 - CS7701 – Fall 2004

Page 32: 1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)

32 - CS7701 – Fall 2004