25
1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組組 組組組 d96725004 組組組 d95725005 2009 年 6 年 15 年 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 12, NO. 2, APRIL 2004

1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

Embed Size (px)

Citation preview

Page 1: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

1

Analyzing Peer-To-Peer TrafficAcross Large Networks

Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE

組員:李英宗 d96725004 林慶和 d95725005

2009 年 6月 15日

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 12, NO. 2, APRIL 2004

Page 2: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

2ACN 2009

Authors

Subhabrata Sen received the B.Eng. Degree in computer science from Jadavpur University, India, in 1992, and the M.S. and Ph.D. degrees in computer science from the University of Massachusetts,A mherst, in 1997 and 2001, respectively.

Jia Wang received the B.S. degree in computer science from the State University of New York, Binghamton, in 1996, and the M.S. and Ph.D. degrees in computer science from Cornell University, Ithaca, NY, in 1999 and 2001, respectively.

They’re currently two members of the Internet and Networking Systems Research Center at AT&T Labs–Research in Florham Park, NJ. Their research interests include network measurement, routing and topology analysis, traffic flow measurement, overlay networks and applications, network security and anomaly detection, Web performance, content distribution networks, and other Internet-related research work. Dr. Sen and Dr.Wang are the members of the Association for Computing Machinery (ACM).

Page 3: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

3ACN 2009

IntroductionMotivation & Goals

The use of P2P applications is for distributed file sharing Large and growing traffic volume impact on the underlying network to characterize P2P behavior with a view to understanding how these

systems impact the network and to gain insights into developing P2P systems with superior

performance.

Previous research almost exclusively on P2P signaling traffic setting up P2P crawlers on the Internet, using “active probing” approach

Early version Based on data from the edge networks provide a view of local P2P usage

This work provides a complementary “backbone view” from a large tier-1 ISP gathering data at multiple border routers across the ISP.

Page 4: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

4ACN 2009

Outline

MethodologyCharacterization MetricsView and Analysis resultsP2P vs Web

Page 5: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

5ACN 2009

Methodology

Popular P2P Applications Three systems: Gnutella, FastTrack, DirectConnect All decentralized, self organizing Data and index information distributed over peers Transient peer membership

Measurement Approach Large-scale passive measurement Flow-level data gathered from routers across a large tier-1 ISP’s

backbone Analyze both signaling and data traffic Three levels of granularity: IP address, network prefix,

Autonomous system Collect data using Cisco’s NetFlow

Page 6: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

6ACN 2009

Methodology

Advantages Requires knowledge about P2P protocol: port# Non-intrusive measurement More easy than crawler More complete view of P2P traffic Allow localized analysis

Limitations Flow level data, No AP-level details May not capture the complete flow

Page 7: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

7ACN 2009

Characterization Metrics

Characterization Topology: hosts distributions, application-level

overlay Traffic distribution: downstream & upstream Dynamic behavior:how frequently hosts join an

leave the system, how long a host stay…

Page 8: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

8ACN 2009

Characterization Metrics

Metrics Host distribution Traffic Volume Host Connectivity Traffic pattern over time Connection duration and on-time

Data cleaning Invalid IP: 10.x.x.x/8 、 172.16.x.x/13 、 192.168.x.x/16 No matched prefix in routing tables Invalid AS#(>64512)、 Remove 4% of flow records

Page 9: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

9ACN 2009

Overview of P2P traffic

TABLE I Netflow DATA SET OF P2P TRAFFIC OVER TCP

Total around 800 million flow records

Page 10: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

10ACN 2009

Host distribution

Fig. 2. Host density: the distribution of the hosts participating in three P2P systems per day (y-axis is in logscale).

Page 11: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

11ACN 2009

Traffic volume distribution

Fig. 3. Cumulative distribution of traffic volume associated with IP addresses ranked in decreasing order of volume, for September 14, 2001 (x-axis is in logscale). Aggregate traffic observed for FastTrack on this day was 960 GB.

Significant skews in traffic volume across granularities Few entities source/receive most of the traffic

Page 12: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

12ACN 2009

Host connectivity

Fig. 5. Cumulative distribution of network connectivity at the IP and network prefix (PR) levels, for hosts participating in FastTrack on September 14, 2001.

Connectivity is very small for most hosts, very high for few hosts Distribution is less skewed at prefix and AS levels

Page 13: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

13ACN 2009

Time of day effect

Fig. 6. Distribution of number of IP addresses and traffic volume across hours in FastTrack on September 14, 2001 (GMT). (a) The traffic volume transferred in each bin. (b) The number of unique IP addresses, network prefixes, and ASes that are active in each bin.

Page 14: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

14ACN 2009

Host connection duration & on-time

Substantial transience: most hosts stay in the system for a short time Distribution less skewed at the prefix and AS levels

FastTrack (9/14/2001) thd=30min

Page 15: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

15ACN 2009

Mean bandwidth usage

Fig. 9. Cumulative distribution of the mean upstream and downstream bandwidth usage of hosts participating in FastTrack, and DirectConnect on September 14, 2001 (x-axis is in logscale). (a) FastTrack. (b) DirectConnect.

Upstream < Downstream: ADSL, Rate limiting

Page 16: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

16ACN 2009

Traffic Characterization

The P2P traffic does not fit well with power law distributions.

Relationships between measures Traffic volume #IPs On-times Mean bandwidth usage

Page 17: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

17ACN 2009

The power laws

Fig. 10. Rank-frequency plots of the P2P metrics for FastTrack on September 14, 2001: (a) overall host connectivity; (b) host connectivity for the top 10% IP addresses; (c) traffic volume of the top 10% IP addresses; (d) on-time of the top 10% IP addresses (both x-axis and y-axis are labeled in logscale).

Page 18: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

18ACN 2009

Relationships: Traffic volume vs on-time、 Connectivity 、 #BW

Volume heavy hitters are likely to have long on-times; Hosts with short on-times contribute small traffic volumes

A Host communicating with many others can transmit a small amount of traffic; a host communicating with few others can also source significant traffic.

Volume heavy hitters are likely to have large bandwidths; Hosts with small bandwidths contribute small traffic volumes

Page 19: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

19ACN 2009

Traffic volume vs on-time、 Connectivity 、 #BW

Fig. 11. FastTrack data set for September 14, 2001—top 1%. IP addresses ranked by volume of data sent out. Scatter plots (log-log scale): (a) upstream volume versus upstream on-time; (b) upstream volume versus number of unique upstream IP addresses that an IP address connects to; (c) upstream volume versus average upstream bandwidth of an IP address.

Page 20: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

20ACN 2009

Connectivity 、 on-time 、 #BW

Hosts with high connectivity have long on-times; Hosts with short on-times communicate with few other hosts.

Hosts with high upstram badwidths have low connectivity counts; Hosts send traffic to many others tend to span the bandwidths, but no one with the highest bandwidths

Hosts with low upstram badwidths have very long on-time (maybe download large file or SuperNode)

Page 21: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

21ACN 2009

Connectivity 、 on-time 、 #BW

Fig. 12. FastTrack data set for September 14, 2001—top 1% IP addresses ranked by volume of data sent out. Scatter plots (log-log scale): (a) number of unique upstream IP addresses that a host connects to versus total upstream on-time of the IP address; (b) number of unique upstream IP addresses versus average upstream bandwidth; (c) average upstream bandwidth versus total upstream on-time.

Page 22: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

22ACN 2009

P2P vs Web

97% of prefixes contributing P2P traffic also contribute Web traffic

Heavy hitter prefixes for P2P traffic tend to be heavy hitters for Web traffic

P2P traffic contributed by the top heavy hitter prefixes is more stable than either Web or total traffic

0.01%, 0.1%, 1%, 10% heavy hitters contribute 10%, 30%, 50%, 90% of the traffic volume

Page 23: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

23ACN 2009

P2P vs Web

Fig. 13. Cumulative distribution of the traffic volume changes for top heavy hitter prefixes. (a) Top 0.01%. prefixes. (b) Top 1% prefixes.

Page 24: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

24ACN 2009

Summary

The analysis covers both signaling & data traffic. complements previous work for Gnutella.

Significant increase in both traffic volume and number of Users.

The traffic volume generated by individual hosts is extremely variable less than 10% #IPs 99% of the traffic volume.

Traffic distributions are extremely skewed Both of traffic volume, connectivity, ontime and

average bandwidth usage. But do not strictly obey with power laws.

Page 25: 1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6

25ACN 2009

Summary

All three P2P systems exhibit a high level of system dynamics But only a small fraction of hosts are persistent

over long time periods.P2P is significant, but stable component of the

Internet traffic More stable than Web traffic or overall traffic Application-specific layer-3 traffic engineering is a

promising way to manage the P2P workload in an ISP’s network.