25
Chi-Yao Hong, UIUC Fang Yu, MSR Silicon Valley Yinglian Xie, MSR Silicon Valley Populated IP Addresses — Classification and Applications ACM CCS (October, 2012)

Populated IP Addresses — Classification and Applications

  • Upload
    quang

  • View
    57

  • Download
    0

Embed Size (px)

DESCRIPTION

Populated IP Addresses — Classification and Applications. Chi-Yao Hong, UIUC Fang Yu, MSR Silicon Valley Yinglian Xie , MSR Silicon Valley. ACM CCS ( October, 2012). Outline. Introduction System Design Implementation Evaluation Application. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Populated IP Addresses — Classification and Applications

Chi-Yao Hong, UIUC

Fang Yu, MSR Silicon Valley

Yinglian Xie, MSR Silicon Valley

Populated IP Addresses — Classification and Applications

ACM CCS (October, 2012)

Page 2: Populated IP Addresses — Classification and Applications

2A Seminar at Advanced Defense Lab

• Introduction• System Design• Implementation• Evaluation• Application

Outline

2012/9/25

Page 3: Populated IP Addresses — Classification and Applications

3A Seminar at Advanced Defense Lab

• While online services have become everyday essentials for billions of users, they are also heavily abused by attackers.• Web-based email

• Online service providers often rely on IP addresses to perform blacklisting and service throttling.• For IP addresses that are associated with a large

number of user requests, they must be treated differently.

2012/9/25

Introduction

Page 4: Populated IP Addresses — Classification and Applications

4A Seminar at Advanced Defense Lab

• We deffine IP addresses that are associated with a large number of user requests as Populated IP (PIP) addresses.• not equivalent to the traditional concept of

proxies, NATs, gateways, or other middleboxes

2012/9/25

Populated IP Addresses

Page 5: Populated IP Addresses — Classification and Applications

5A Seminar at Advanced Defense Lab

• In this paper, we introduce PIPMiner, a fully automated method to extract and classify PIPs.

2012/9/25

Goal

Page 6: Populated IP Addresses — Classification and Applications

6A Seminar at Advanced Defense Lab

• We take a data-driven approach using service logs that are readily available to all service providers.

• And we train a non-linear support vector machine (SVM) classifier that is highly tolerant of noise in input data.

2012/9/25

System Design

Page 7: Populated IP Addresses — Classification and Applications

7A Seminar at Advanced Defense Lab

• PIP Selection • Phase 1 : IP addresses with rL requests, rL =

1,000• Phase 2: IP address has been used by at least

uM accounts, together accounting for at least rM requests.• uM = 10, rM = 300

2012/9/25

System Flow

Page 8: Populated IP Addresses — Classification and Applications

8A Seminar at Advanced Defense Lab

• Population Features capture aggregated user characteristics.

• Time Series Features model the detailed request patterns.

• IP Block Level Features aggregate IP block level activities and help recognize proxy farms.

2012/9/25

Features

Page 9: Populated IP Addresses — Classification and Applications

9A Seminar at Advanced Defense Lab2012/9/25

Population Features

Page 10: Populated IP Addresses — Classification and Applications

10A Seminar at Advanced Defense Lab2012/9/25

Time Series Features

Page 11: Populated IP Addresses — Classification and Applications

11A Seminar at Advanced Defense Lab

• large proxy farms often redirect trac to dierent outgoing network interfaces for load balancing purposes.

• Determine neighboring IP addresses:• Neighboring IPs must be announced by the same

AS.• Neighboring IPs are continuous over the IP

address space, and each neighboring IP is itself a PIP.

2012/9/25

IP Block Level Features

Page 12: Populated IP Addresses — Classification and Applications

12A Seminar at Advanced Defense Lab2012/9/25

EX: Block Level Time Series

Page 13: Populated IP Addresses — Classification and Applications

13A Seminar at Advanced Defense Lab

• Non-linear SVM

2012/9/25

Training and Classification

Page 14: Populated IP Addresses — Classification and Applications

14A Seminar at Advanced Defense Lab2012/9/25

Kernel Function k(xi, x)

Page 15: Populated IP Addresses — Classification and Applications

15A Seminar at Advanced Defense Lab

• Data Parse and Feature Extraction (Stage 1)• We implement PIPMiner on top of DryadLINQ [link], a

distributed programming model for large-scale computing.

• Using a 240-machine cluster

• Training and Testing (Stage 2)• Quad Core CPU with 8GB RAM• LIBSVM [link] and LIBLINEAR [link] toolkits

2012/9/25

Implementation

Page 16: Populated IP Addresses — Classification and Applications

16A Seminar at Advanced Defense Lab

• We apply PIPMiner to a month-long Hotmail login log pertaining to August 2010 and identify 1.7 million PIP addresses. (200 MB )• 0.5% of the observed IP addresses• the source of more than 20.1% of the total

requests• Associated with 13.7% of the total accounts

in our dataset• At Stage 1, PIPMiner processes a 296 GB

dataset in only 1.5 hours.

2012/9/25

Evaluation

Page 17: Populated IP Addresses — Classification and Applications

17A Seminar at Advanced Defense Lab2012/9/25

PIP Score Distribution

Page 18: Populated IP Addresses — Classification and Applications

18A Seminar at Advanced Defense Lab2012/9/25

PIP Address Distribution

Dynamic IP

Dynamic IP

Page 19: Populated IP Addresses — Classification and Applications

19A Seminar at Advanced Defense Lab

• Among 1.7 million PIP addresses, 973K of them can be labeled based on the account reputation data.

2012/9/25

Accuracy Evaluation

Page 20: Populated IP Addresses — Classification and Applications

20A Seminar at Advanced Defense Lab2012/9/25

Accuracy of Individual Componets

Page 21: Populated IP Addresses — Classification and Applications

21A Seminar at Advanced Defense Lab2012/9/25

Accuracy against Data Length

Page 22: Populated IP Addresses — Classification and Applications

22A Seminar at Advanced Defense Lab

• Future Reputation• the reputation score of July 2011 (after 11

months)

2012/9/25

Validation of Unlabeled Cases

Page 23: Populated IP Addresses — Classification and Applications

23A Seminar at Advanced Defense Lab

• Windows Live ID Sign-up Abuse Problem• We focus on the sign-ups related to Hotmail and use

the Hotmail reputation trace in July, 2011 (after 11 months) to determine whether a particular sign-up account was malicious or not.

• We study the sign-up behavior on two types of the PIP addresses.

• The first is the 1.7 million derived PIPs. • The second is the set of IP addresses that have

more than 20 sign-ups from the Windows Live ID system, but they are not included in the 1.7 million PIPs.

2012/9/25

Application

Page 24: Populated IP Addresses — Classification and Applications

24A Seminar at Advanced Defense Lab

• Precision = 97%

2012/9/25

Using PIPs to Predict User Reputation

Page 25: Populated IP Addresses — Classification and Applications

25A Seminar at Advanced Defense Lab

Thank you for listening

2012/9/25

Q & A