27
Deconstructing the KaZaA Network Matei Ripeanu joint work with Nathaniel Leibowitz and Adam Wierzbicki

Deconstructing the KaZaA Network

  • Upload
    juliet

  • View
    43

  • Download
    6

Embed Size (px)

DESCRIPTION

Deconstructing the KaZaA Network. Matei Ripeanu joint work with Nathaniel Leibowitz and Adam Wierzbicki. P2P Impact: Widespread adoption. KaZaA – 200 millions downloads (3.5M/week) one of the most popular applications ever! - PowerPoint PPT Presentation

Citation preview

Page 1: Deconstructing the KaZaA Network

Deconstructing the KaZaA Network

Matei Ripeanujoint work with Nathaniel Leibowitz and Adam Wierzbicki

Page 2: Deconstructing the KaZaA Network

P2P Impact: Widespread adoption

KaZaA – 200 millions downloads (3.5M/week) one of the most popular applications ever!

Number of users for file-sharing applications (www.slyck.com, March’03)

Surveys: 25-30% of all customers at large ISPs use P2P file-sharing systems

 

FastTrack 4,443,120

iMesh 1,385,199

eDonkey 623,097

Cvernet 528,750

DirectConect

136,552

Blubster 97,128

Gnutella 92,678

Page 3: Deconstructing the KaZaA Network

P2P Impact (2): Huge traffic

P2P generated traffic now dominates the Internet load Internet2 traffic statistics UChicago estimate (March ‘01): Gnutella

control traffic about 1% of all Internet traffic.

Cornell.edu (March ’02): 60% P2P

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Feb.'02 Aug.'02 Feb.'03

Other

Data transfers

Unidentified

File sharing

Page 4: Deconstructing the KaZaA Network

Recent studies

Three recent measurement studies on Kazaa traffic:

Are File Swapping Networks Cacheable? Characterizing P2P Traffic, N. Leibowitz, et all, (WCW7 Aug 2002)

Analyzing Peer-to-Peer Traffic Across Large Networks, S. Sen, J. Wang, (IMW, Nov. 2002)

An Analysis of Internet Content Delivery Systems, S. Saroiu, K. Gummadi, R. Dunn, S. Gribble, H. Levy (OSDI, Dec. 2002)

Page 5: Deconstructing the KaZaA Network

Data collection

Collect traces at border routers UWashington, Tier 1 ISP (AT&T?), large Israeli ISP

Identify (and log) Kazaa traffic based on:

port number (1214) content of HTTP request

Page 6: Deconstructing the KaZaA Network

Question 1:

What is the overall bandwidth impact?

Page 7: Deconstructing the KaZaA Network

0

100

200

300

400

500

600

700

800

12:0

0

0:00

12:0

0

0:00

12:0

0

0:00

12:0

0

0:00

12:0

0

0:00

12:0

0

0:00

12:0

0

0:00

12:0

0

0:00

12:0

0

0:00

Mb

ps

Wed

Th

u

Fri

Sat

Su

n

Mo

n

Tu

e

Wed Th

u

WWW WWW

P2P P2P

non-HTTP TCP

Akamai

non-HTTP TCP

Bandwidth repartition

UWashington measurements Web = 14% of TCP; P2P = 43% of TCP P2P now dominates Web in bandwidth consumed

UW data, June 2002, Source: Saroiu & all.

Page 8: Deconstructing the KaZaA Network

WWW Kazaa

inbound outbound inboundoutboun

d

1.51TB 3.02TB 1.78TB 13.6TBUW data, June 2002, Source: Saroiu & all.

Inbound vs. Outbound traffic

UWashington acts like a huge content server: outbound (served) traffic 7.6 times larger than inbound traffic

Residential ISP: the situation is reversed as inbound traffic is more than 5 times larger than outbound

Page 9: Deconstructing the KaZaA Network

Question 2:

How do the objects shared look like?

Page 10: Deconstructing the KaZaA Network

File size characteristics

0%

20%

40%

60%

80%

100%

1.E

+03

1.E

+04

1.E

+05

1.E

+06

1.E

+07

1.E

+08

1.E

+09

1.E

+10

File size (bytes)

% o

f fil

es

Possible file ranges: 10KB-100KB pics 1MB-5MB songs 10-200MB apps, video clips > 500MB movies

Page 11: Deconstructing the KaZaA Network

Question 3

What is the file popularity distribution?

Terminology: Download session: downloading one chunk

of the file in a single HTTP session Download cycle: a complete download of a

file

Page 12: Deconstructing the KaZaA Network

File popularity distribution

0

20

40

60

80

100

0 20 40 60 80 100Accumulated % of files (sorted by popularity)

% o

f do

wnl

oad

cycl

es .

0

20

40

60

80

0 2 4 6 8 10Accumulated % of files (sorted by popularity)

% o

f do

wnl

oad

cycl

es

.

10% most popular files generate 60% of the download cycles

1% (or about 3,000) most popular files generate 25% of the download cycles

Page 13: Deconstructing the KaZaA Network

Question 4:

How is consumed bandwidth use distributed among objects?

Page 14: Deconstructing the KaZaA Network

Traffic distribution - files

Compare to UWashington traces where 1% most popular objects responsible for ‘only’ 50% of bytes transferred

0

20

40

60

80

100

0 2 4 6 8 10accumulated % of files

% o

f tr

affi

c

0

20

40

60

80

100

0 0.2 0.4 0.6 0.8 1accumulated % of files

% o

f tra

ffic

1% most popular files generate 80% of the traffic

0.1% most popular files (about 300) generate 50% of the traffic

Page 15: Deconstructing the KaZaA Network

Costs …

Generated Traffic

Cost

Israeli ISP Kazaa Inbound 68GB $52

UWashington

Kazaa Outbound

360GB $277

Uwashington

Kazaa Inbound 25 GB $19

Uwashington

Web Inbound 36GB $27Assumptions: OC3 line at $40K/month 5 day logs extrapolated to one month

Cost to provide access to the most popular object for a month

Page 16: Deconstructing the KaZaA Network

Traffic distribution vs. file size

60 % of the bytes downloaded but only 5% of download cycles correspond to large (movie) files

0%

20%

40%

60%

80%

100%

1.E

+03

1.E

+08

2.E

+08

3.E

+08

4.E

+08

5.E

+08

6.E

+08

7.E

+08

8.E

+08

9.E

+08

1.E

+09

1.E

+09

1.E

+09

1.E

+09

File size (bytes)

% o

f ac

tivi

ty (

dow

nl./

traf

fic)

% of downloads% of traffic% of files

Page 17: Deconstructing the KaZaA Network

Question 6:

Content dynamics and caching performance

Page 18: Deconstructing the KaZaA Network

Content dynamics

0

1000

2000

3000

4000

5000

0 100 200 300Date of measurement (hours)

Num

ber

of u

niqu

e ne

w f

iles

.

0

10000

20000

30000

40000

1 3 5 7 9 11 13 15 17

Date of measurement (days)

Num

ber

of u

niqu

e fi

les

.

How many new files does the system sees?

per day per hour

Page 19: Deconstructing the KaZaA Network

Content dynamics (2)

How stable is the set of most popular files?

About 30% files remain popular over long period of time

0%

20%

40%

60%

80%

4 14 24 34Date of Measurement

% o

f R

ecur

rent

Pop

ular

File

s

4 Files

50 Files

400 Files

0%

20%

40%

60%

80%

4 14 24 34Date of Measurement

% o

f R

ecur

rent

Pop

ular

Fil

es 4 Files

50 Files

400 Files

Page 20: Deconstructing the KaZaA Network

Ideal caching performanceTheoretical cache byte hit ratio

for various traffic volumes

0

10

20

30

40

50

60

70

80

30

90

15

0

21

0

27

0

33

0

39

0

45

0

51

0

Disk size (GB)

By

te h

it r

ate

(%)

300 GB Traffic600 GB Traffic900 GB Traffic1200 GB Traffic1500 GB Traffic1800 GB Traffic2100 GB Traffic2400 GB Traffic

Page 21: Deconstructing the KaZaA Network

Achieved caching performance

Significant savings: File hit rates of 30-35% Byte hit rates 50-60% P2P traffic is more

cacheable than Web traffic

But, it takes long time to warm-up caches (weeks)

Page 22: Deconstructing the KaZaA Network

Question 7:

Virtual relationships between users

Outliers filtered out

Page 23: Deconstructing the KaZaA Network

0.1

1.0

10.0

1 10 100 1000 10000Clustering coefficient ratio (log scale)

Avg

. pat

h le

ngth

rat

io (

log

scal

e) .

Word co-occurrences

Film actors

LANL coauthors

Internet

Web

Food webPower grid

Small world data-sharing graph

Data-sharing graph: Nodes == Kazaa

Users Link two users

that have similar activities (download the same files)

Page 24: Deconstructing the KaZaA Network

Future questions What savings can be realized without in caching

data but only redirecting requests to local users? What can one say about the overall characteristics

of the network (number of users, number of files, distributions) knowing only data logged by one ISP.

Constraint: Law makers may cause P2P traffic to vanish However this will lead to a new research question: How will

the sudden disappearance of 60% of Internet traffic affect the Internet?

Page 25: Deconstructing the KaZaA Network

Your questions

Thank you

Page 26: Deconstructing the KaZaA Network
Page 27: Deconstructing the KaZaA Network

Goals

High-level questions: What is the impact of these new content delivery

systems on the Internet and on ISPs? What are the characteristics of the Kazaa traffic?