37
SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres * , Mohammad Hajjat * , Sanjay Rao * , Marco Mellia , Maurizio Munafo * Internet Systems Lab, Department of ECE, Purdue University, USA † Department of Electronics and Telecommunications, Politecnico di Torino, Italy

SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

Embed Size (px)

Citation preview

Page 1: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

SIGMETRICS'09 1

Inferring Undesirable Behavior from P2P Traffic Analysis

Ruben Torres*, Mohammad Hajjat*, Sanjay Rao*, Marco Mellia†, Maurizio Munafo†

* Internet Systems Lab, Department of ECE, Purdue University, USA† Department of Electronics and Telecommunications, Politecnico di Torino, Italy

Page 2: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

2SIGMETRICS'09

Rapid Evolution of P2P Networks

Peer-to-peer (P2P) systems are huge, complex and with millions of participants. Over 60% of network traffic is due to P2P systems.

Used for many different applications. File sharing – BitTorrent, eMule. VoIP – Skype. Video streaming – PPlive.

Matured to the point there are commercial offerings.

Page 3: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

3SIGMETRICS'09

Undesirable Behavior in P2P Networks

Most of the research is on P2P systems design and characterization.

Shift attention to the impact P2P systems may have on the Internet.

Our focus is on identifying undesirable behavior. Patterns not expected, not intended or unwanted by

developers, users or network operators. Potential for undesirable behavior due to:

Millions of users. Completely distributed. Software bugs. Malicious clients. Security vulnerabilities.

Page 4: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

4SIGMETRICS'09

Our Contributions

One of the first works to show that undesirable behavior exists, is prevalent and significant. Evidence of DDoS attacks exploiting P2P clients. Significant waste of ISP resources. Impact of application/user performance.

Expose problems in the context of a traffic trace of a large ISP. More than 5 million customers.

One of the first systematic approaches to uncover undesirable behavior in P2P systems.

Page 5: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

5SIGMETRICS'09

Talk Outline

Dataset Methodology Results

Page 6: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

6SIGMETRICS'09

Setup

Traces obtained from large European ISP.

ISP provides ADSL (20/1Mbps) or Fiber (10/10Mbps). Extensive usage of NATs in the ISP

Peering point (Most clients in the ISP have private IP addresses).Home NAT.

Full Cone NAT

public

privateHome NAT

ISP Internet

TCP

UDP

UDPprivate

Full Cone NAT

public

privateHome NAT

ISP Internet

TCP

UDP

UDPprivate

Page 7: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

7SIGMETRICS'09

Setup

Traces obtained from large European ISP.

ISP provides ADSL (20/1Mbps) or Fiber (10/10Mbps). Extensive usage of NATs in the ISP

Peering point (Most clients in the ISP have private IP addresses). Home NAT.

Packet traces collected from a PoP within the ISP network.

There are more than 2000 customers in the PoP.

Full Cone NAT

ISP Internet

PoP

Full Cone NAT

ISP Internet

PoP

Page 8: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

8SIGMETRICS'09

eMule Traffic is Predominant in the PoP

eMule is a popular P2P file sharing application.

Over an entire 3 month period:60-70% of inbound traffic to PoP is due to eMule.95% of outbound traffic is due to eMule.

eMule consists of two networks:Kad - decentralized DHT-based network.

UDP-based and mainly used for file search.ED2K - centralized tracker-based network.

TCP-based and used for both search and data exchange.

Page 9: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

9SIGMETRICS'09

Systems Analyzed

1. Generic eMule, which we refer to as Kad.

2. Version of eMule customized to ISP, which we refer to as KadU.

Modified version of Kad developed by users in the ISP. Avoid performance problems because of the NAT at the

edge of the network. Difference: KadU clients only contact other clients within

the ISP.

These two systems are analyzed separately because they have different characteristics.

e.g. Performance of KadU clients is much better.

Page 10: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

10SIGMETRICS'09

High-Level Statistics of Dataset Analyzed

25 hours dataset.

478 kadU clients inside the PoP contact 229,000 kadU clients inside ISP.

136 Kad clients inside the PoP contact more than 300,000 Kad clients in the Internet.

815,000 ED2K TCP connections.

More than 8 million Kad/KadU UDP flows.

Page 11: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

11SIGMETRICS'09

Traffic Classification and Samples Generation

Per host aggregation of flows

Samples

Packet trace

Per flow classification using Tstat

Tstat is a Passive sniffer with Deep Packet Inspection (DPI) capabilities

Aggregate over 5 minute period

Metrics

Page 12: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

12SIGMETRICS'09

Metrics More than 50 metrics obtained from flow records.

Consider both TCP and UDP flows.Consider if the flow initiator is inside or outside the PoP.

Examples:Flow: average flow duration.Data Transfer: bps sent, bps received.Destinations: number of distinct destination IP addresses.Failures: failure ratio [TCP only].

Choice of metric:Intuitively important.Used in the past in the context of P2P systems.Can capture specific behaviors of interest to us.

Page 13: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

13SIGMETRICS'09

Challenges

Very little knowledge of what kinds of undesirable behavior may exist.

It is hard to clearly distinguish between normal and unwanted behavior. P2P traffic patterns are very heterogeneous across users.

Techniques relying on detecting abrupt changes may not work since undesirable behavior can: Be exhibited by the majority of the samples. Last throughout the observation period.

e.g. due to implementation bug in the P2P system.

Page 14: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

14SIGMETRICS'09

Our Approach

We use clustering techniques and manual inspection to determine undesirable behavior.

Clustering: Tens of thousands of samples and more than 50 metrics. Clustering reduces the number of samples to study to a

granularity of clusters.

Domain knowledge and manual inspection: Select regions of interest. Interpret the results.

Page 15: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

15SIGMETRICS'09

Clustering - DBScan

DBScan is a density based clustering technique.Dense regions of points are considered a cluster.Low density regions are considered noise.

Parameter tuning and sensitivity discussed in the paper.

Cluster1

Cluster2

Cluster3

NoiseNum

ber

of S

ampl

es

Average Packet Size [bytes]

Page 16: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

16SIGMETRICS'09

Selecting Regions of Interest - Metrics with more than One Cluster

Metrics with more than one cluster and noise. A cluster and/or noise are selected as interesting.

Cluster1

Cluster2

Cluster3

NoiseNum

ber

of S

ampl

es

Average Packet Size [bytes]

clients only send control messages

Page 17: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

17SIGMETRICS'09

Selecting Regions of Interest - Metrics with One Cluster

Metrics with one cluster and noise. Noise is typically selected as interesting.

Num

ber

of S

ampl

es Cluster1: Normal clients

Bits per Second Sent

noisevery active clients

x105

Page 18: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

18SIGMETRICS'09

Correlating Interesting Samples

Once samples in interesting regions are identified, infer undesirable behavior.

Find the hosts that generate the interesting samples.

If a few hosts, anomalous behavior is a property of the hosts.

If many hosts, behavior is general to the application. Find correlation across metrics.

Rely on domain knowledge to identify this. Ongoing work exploring use of techniques like rule

association mining.

Page 19: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

19SIGMETRICS'09

Talk Outline

Dataset Methodology Results

Generic Observations Key Findings

Page 20: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

20SIGMETRICS'09

Preliminary Results

For Kad: Most metrics have one cluster and noise. 8 metrics have two clusters and noise. 2 metrics have three clusters and noise.

Similar results for KadU.

Sensitivity study. Night period and day period. One week trace. Obtained very similar results.

Page 21: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

21SIGMETRICS'09

Samples Distribution in the Interesting Region

Fra

ctio

n o

f Ho

sts

Gen

era

ting

Sa

mp

les

Fraction of Samples in the Interesting Region

A few hosts have abnormal behavior. Abnormality spread across many hosts (circled below).

Number of destination ports in range 0-1024 that receive a kad flow

Page 22: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

22SIGMETRICS'09

Talk Outline

Dataset Methodology Results

Generic Observations Key Findings

Page 23: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

23SIGMETRICS'09

DDoS Attacks Exploiting Kad

Considered UDP flows classified as Kad with destination port in range 0-1024.

> 50% of these flows are sent to port 53 (DNS). > 90% of these flows are unanswered.

Top most destinations were reported to be under attack.

Port 53

Fra

ctio

n of

Una

nsw

ered

UD

P F

low

s

Port Number

Port 4672:Default Kad port

Unanswered UDP flows are those in which the flow destination never replies.

Page 24: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

24SIGMETRICS'09

DDoS Attack Exploiting P2P Systems

Redirection Attacks. Malicious clients inject fake membership information about a

victim into the system. Innocent clients send normal protocol message to the victim.

There has been some awareness of the problem in the research community - Belovin [2001], Ross [2006]. They have shown theoretical feasibility of doing the attack.

But our work is one of the first to show that these attacks are prevalent in the wild.

Page 25: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

25SIGMETRICS'09

Unnecessary P2P Traffic in KadU and Kad

Cluster2: Most incoming UDP flows are unanswered

Cluster1

Noise

Fraction of Unanswered Flows from Total Incoming UDP Flows

Fraction of Samples in the Interesting RegionFra

ctio

n of

Hos

ts G

ener

atin

g S

ampl

es

Page 26: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

26SIGMETRICS'09

Unnecessary P2P Traffic in KadU and Kad

Large amount of wasted traffic:

28% of all UDP flows incoming to PoP are unanswered. 65% due to Kad and KadU.

30% of all TCP connections incoming to the PoP fail. 50% due to KadU.

Due to two reasons: Stale membership information. Nodes behind NAT.

Staleness can be extremely long lived (e.g. tens of hours).

Page 27: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

27SIGMETRICS'09

Malicious P2P Trackers in the ED2K Network Metric: average number of TCP connections per destination IP. 94% of interesting samples generated by two hosts. Many short lived connections to two trackers reported as malicious.

Never responded to requests and closed the connections. Likely deployed by copyright agencies (e.g. RIAA, IFPI).

Similar findings by Banerjee [2008] and Siganos [2009].

Noise: Clients contact same destination more than once in 5 minutes

Cluster1

Average Connections per Destination

Page 28: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

28SIGMETRICS'09

Generalizing to Other Systems

Findings in BitTorrent: Very significant amount unnecessary P2P traffic is present

as in KadU.

Findings in Direct Connect: Possible DDoS attack exploiting DC++. Many TCP

connections sent to port 80 of real web servers.

More findings in the paper.

Ongoing work studying traces from other networks.

Page 29: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

29SIGMETRICS'09

Summary

One of the first works to systematically study P2P traffic to identify undesirable behavior.

Shown various types of undesirable behavior of P2P systems in the wild: DDoS attack on external servers exploiting the system. Wasted resources. Affect the performance of the P2P system (e.g. malicious

trackers).

Shown the potential of a systematic approach to uncover this behavior.

Our initial analysis suggest that results hold over a range of other P2P systems.

Page 30: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

30SIGMETRICS'09

Questions?

Page 31: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

31SIGMETRICS'09

Backup Slides

Page 32: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

32SIGMETRICS'09

Encrypted Traffic in eMule

11.05.2008 12:29 eMule 0.49a released 11.05.2008 12:29 eMule 0.49a released

1.08.2008 20:25 eMule 0.49b released 1.08.2008 20:25 eMule 0.49b released Our trace collection

Page 33: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

33SIGMETRICS'09

Why DBScan?

Does not rely on the assumption of the shape of the cluster.

There is the concept of noise region You don’t need to know how many clusters you want

ahead of time. But, in principle, any technique can be used. Just need a coarse way to cluster samples

Page 34: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

34SIGMETRICS'09

DBScan - Parameter Sensitivity

We adjust the parameters to match our intuition of where the clusters should be if manually look at each metric. Try to keep noise region small but not too small (at most

6% of the samples in our study).

We have an automated way to get clusters. More details in the paper

Page 35: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

35SIGMETRICS'09

Clustering for single metric instead of multiple metrics

Clusters interpretation may be harder. Typical metric distribution is very skewed. Metrics distribution have different support.

Single clustering still helps. Automatic way to get thresholds for interesting region. First cut observations.

But this is a first step. Ongoing work on multi-metric analysis.

Page 36: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

36SIGMETRICS'09

Do you think you find all behavior or there is more?

We expect there is more (so there is more work to do). But we expect we have caught first order issues

This is the first attempt on this direction. We don’t have an exhaustive of undesirable behavior

There may be other behavior we could find when the application or network setup changes.

For example, buddy problem. More to the architecture of Kad.

Page 37: SIGMETRICS'09 1 Inferring Undesirable Behavior from P2P Traffic Analysis Ruben Torres *, Mohammad Hajjat *, Sanjay Rao *, Marco Mellia †, Maurizio Munafo

37SIGMETRICS'09

How can you automated these, generalized to different network? First step pointing to the importance of the problem

Now that is there, we could look at better ways to detect: Changes over time Changes across networks For a class of P2P systems, use same list of undesirable

behavior.