Upload
juliet
View
43
Download
6
Tags:
Embed Size (px)
DESCRIPTION
Deconstructing the KaZaA Network. Matei Ripeanu joint work with Nathaniel Leibowitz and Adam Wierzbicki. P2P Impact: Widespread adoption. KaZaA – 200 millions downloads (3.5M/week) one of the most popular applications ever! - PowerPoint PPT Presentation
Citation preview
Deconstructing the KaZaA Network
Matei Ripeanujoint work with Nathaniel Leibowitz and Adam Wierzbicki
P2P Impact: Widespread adoption
KaZaA – 200 millions downloads (3.5M/week) one of the most popular applications ever!
Number of users for file-sharing applications (www.slyck.com, March’03)
Surveys: 25-30% of all customers at large ISPs use P2P file-sharing systems
FastTrack 4,443,120
iMesh 1,385,199
eDonkey 623,097
Cvernet 528,750
DirectConect
136,552
Blubster 97,128
Gnutella 92,678
P2P Impact (2): Huge traffic
P2P generated traffic now dominates the Internet load Internet2 traffic statistics UChicago estimate (March ‘01): Gnutella
control traffic about 1% of all Internet traffic.
Cornell.edu (March ’02): 60% P2P
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Feb.'02 Aug.'02 Feb.'03
Other
Data transfers
Unidentified
File sharing
Recent studies
Three recent measurement studies on Kazaa traffic:
Are File Swapping Networks Cacheable? Characterizing P2P Traffic, N. Leibowitz, et all, (WCW7 Aug 2002)
Analyzing Peer-to-Peer Traffic Across Large Networks, S. Sen, J. Wang, (IMW, Nov. 2002)
An Analysis of Internet Content Delivery Systems, S. Saroiu, K. Gummadi, R. Dunn, S. Gribble, H. Levy (OSDI, Dec. 2002)
Data collection
Collect traces at border routers UWashington, Tier 1 ISP (AT&T?), large Israeli ISP
Identify (and log) Kazaa traffic based on:
port number (1214) content of HTTP request
Question 1:
What is the overall bandwidth impact?
0
100
200
300
400
500
600
700
800
12:0
0
0:00
12:0
0
0:00
12:0
0
0:00
12:0
0
0:00
12:0
0
0:00
12:0
0
0:00
12:0
0
0:00
12:0
0
0:00
12:0
0
0:00
Mb
ps
Wed
Th
u
Fri
Sat
Su
n
Mo
n
Tu
e
Wed Th
u
WWW WWW
P2P P2P
non-HTTP TCP
Akamai
non-HTTP TCP
Bandwidth repartition
UWashington measurements Web = 14% of TCP; P2P = 43% of TCP P2P now dominates Web in bandwidth consumed
UW data, June 2002, Source: Saroiu & all.
WWW Kazaa
inbound outbound inboundoutboun
d
1.51TB 3.02TB 1.78TB 13.6TBUW data, June 2002, Source: Saroiu & all.
Inbound vs. Outbound traffic
UWashington acts like a huge content server: outbound (served) traffic 7.6 times larger than inbound traffic
Residential ISP: the situation is reversed as inbound traffic is more than 5 times larger than outbound
Question 2:
How do the objects shared look like?
File size characteristics
0%
20%
40%
60%
80%
100%
1.E
+03
1.E
+04
1.E
+05
1.E
+06
1.E
+07
1.E
+08
1.E
+09
1.E
+10
File size (bytes)
% o
f fil
es
Possible file ranges: 10KB-100KB pics 1MB-5MB songs 10-200MB apps, video clips > 500MB movies
Question 3
What is the file popularity distribution?
Terminology: Download session: downloading one chunk
of the file in a single HTTP session Download cycle: a complete download of a
file
File popularity distribution
0
20
40
60
80
100
0 20 40 60 80 100Accumulated % of files (sorted by popularity)
% o
f do
wnl
oad
cycl
es .
0
20
40
60
80
0 2 4 6 8 10Accumulated % of files (sorted by popularity)
% o
f do
wnl
oad
cycl
es
.
10% most popular files generate 60% of the download cycles
1% (or about 3,000) most popular files generate 25% of the download cycles
Question 4:
How is consumed bandwidth use distributed among objects?
Traffic distribution - files
Compare to UWashington traces where 1% most popular objects responsible for ‘only’ 50% of bytes transferred
0
20
40
60
80
100
0 2 4 6 8 10accumulated % of files
% o
f tr
affi
c
0
20
40
60
80
100
0 0.2 0.4 0.6 0.8 1accumulated % of files
% o
f tra
ffic
1% most popular files generate 80% of the traffic
0.1% most popular files (about 300) generate 50% of the traffic
Costs …
Generated Traffic
Cost
Israeli ISP Kazaa Inbound 68GB $52
UWashington
Kazaa Outbound
360GB $277
Uwashington
Kazaa Inbound 25 GB $19
Uwashington
Web Inbound 36GB $27Assumptions: OC3 line at $40K/month 5 day logs extrapolated to one month
Cost to provide access to the most popular object for a month
Traffic distribution vs. file size
60 % of the bytes downloaded but only 5% of download cycles correspond to large (movie) files
0%
20%
40%
60%
80%
100%
1.E
+03
1.E
+08
2.E
+08
3.E
+08
4.E
+08
5.E
+08
6.E
+08
7.E
+08
8.E
+08
9.E
+08
1.E
+09
1.E
+09
1.E
+09
1.E
+09
File size (bytes)
% o
f ac
tivi
ty (
dow
nl./
traf
fic)
% of downloads% of traffic% of files
Question 6:
Content dynamics and caching performance
Content dynamics
0
1000
2000
3000
4000
5000
0 100 200 300Date of measurement (hours)
Num
ber
of u
niqu
e ne
w f
iles
.
0
10000
20000
30000
40000
1 3 5 7 9 11 13 15 17
Date of measurement (days)
Num
ber
of u
niqu
e fi
les
.
How many new files does the system sees?
per day per hour
Content dynamics (2)
How stable is the set of most popular files?
About 30% files remain popular over long period of time
0%
20%
40%
60%
80%
4 14 24 34Date of Measurement
% o
f R
ecur
rent
Pop
ular
File
s
4 Files
50 Files
400 Files
0%
20%
40%
60%
80%
4 14 24 34Date of Measurement
% o
f R
ecur
rent
Pop
ular
Fil
es 4 Files
50 Files
400 Files
Ideal caching performanceTheoretical cache byte hit ratio
for various traffic volumes
0
10
20
30
40
50
60
70
80
30
90
15
0
21
0
27
0
33
0
39
0
45
0
51
0
Disk size (GB)
By
te h
it r
ate
(%)
300 GB Traffic600 GB Traffic900 GB Traffic1200 GB Traffic1500 GB Traffic1800 GB Traffic2100 GB Traffic2400 GB Traffic
Achieved caching performance
Significant savings: File hit rates of 30-35% Byte hit rates 50-60% P2P traffic is more
cacheable than Web traffic
But, it takes long time to warm-up caches (weeks)
Question 7:
Virtual relationships between users
Outliers filtered out
0.1
1.0
10.0
1 10 100 1000 10000Clustering coefficient ratio (log scale)
Avg
. pat
h le
ngth
rat
io (
log
scal
e) .
Word co-occurrences
Film actors
LANL coauthors
Internet
Web
Food webPower grid
Small world data-sharing graph
Data-sharing graph: Nodes == Kazaa
Users Link two users
that have similar activities (download the same files)
Future questions What savings can be realized without in caching
data but only redirecting requests to local users? What can one say about the overall characteristics
of the network (number of users, number of files, distributions) knowing only data logged by one ISP.
Constraint: Law makers may cause P2P traffic to vanish However this will lead to a new research question: How will
the sudden disappearance of 60% of Internet traffic affect the Internet?
Your questions
Thank you
Goals
High-level questions: What is the impact of these new content delivery
systems on the Internet and on ISPs? What are the characteristics of the Kazaa traffic?