CyberProbe: Towards Internet-Scale Active Detection of Malicious Servers Antonio Nappa ⇤ ‡, Zhaoyan Xu †, M. Zubair Rafique ⇤, Juan Caballero ⇤, Guofei

CyberProbe: Towards Internet-Scale Active Detection of Malicious Servers

Antonio Nappa ‡⇤ , Zhaoyan Xu†, M. Zubair Rafique⇤, Juan Caballero⇤, Guofei Gu† ⇤IMDEA Software Institute ‡Universidad Polite (cnica de Madrid {antonio.nappa, zubair.rafique, juan.caballero}@imdea.org

†SUCCESS Lab, Texas A&M University{z0x0427, guofei}@cse.tamu.edu

Presented by: Shasha Wen

2

Outline

Problem Current ways and limitations CyberProbe approach

Fingerprint generation Scanner

Evaluation Discussion and conclusion

3

Problem: Cybercrime

C&C server → control the malware

Exploit server → distribute the malware

Web server → monitor the operation

Redirector → leading fake clicks

…...

clickfraud

Identify servers

spam

theft

ransomware

4

Ways to detect the server Passive: monitoring

Monitor protected hosts

Run malware in contained environment

Active: Honey client farms

Visit URLs, crawling

Observe servers involved → Limit coverage → increase? Internet-scale?

Slow, detect asynchronously → server maybe dead

Focus on exploit servers

Achieving coverage is expensive

5

CyberProbe: approachSend probes to remote hosts and examines their responses, determining whether the remote hosts are malicious or not.

What probes to send

Adversarial fingerprint

How to send the probes

scanning

Adversarial fingerprint generator

Network trace

Benigh traffic

Fingerprints

ScanningPort

Target range

Malicious servers

6

Problem definition

Network fingerprinting

Fingerprint: the type, version, configuration of networking software

Identify software at different layers

A fingerprint → one malicious family

e.g. C&C software; exploit kit A family → multiple fingerprints

Problem definition

Host h; target hosts H; target family: x

Fingerprint: FGx = <P, fP>

P(h) :sequences of probes, RP : response

fP(R

P) : true if h ∈x

7

Fingerprint generation Overview

Framework, different from other fingerprint generation(FiG)

Minimize traffic

produce inconspicuous probes

Replay observed requests

Network signature → classification function fP

8

Fingerprint generation: RRP[1] extraction

Protocol feature

Protocol signature capture keywords in early part of a message

e.g. GET or POST in HTTP Unknown → transport protocol

Filter

Endpoint is one of top 100,000 Alexa domains → benign

RRPs with identical requests → avoid replaying the same request

[1] RRP: request response pairs

RRP

9

Fingerprint generation: Replay

Replay request to every malicious endpoint

Identify requests that lack replay protection

Requests replayed with a distinctive response

Use Virtual Private Network

Malware managers may notice

Replay in an incorrect order or invalid Independency

Requests that generate response without prior communication

10

Fingerprint generation: Replay

Filtering benign servers

RRPs with no response or return errors

Responses from a server to the replayed request and to the random request are similar

Replay the remaining RRPs twice more

Output

Replayed RRPs, excluding the original ones

Unique endpoints → seed servers

11

Fingerprint generation: Clustering

Cluster by request similarity

For HTTP

Requests have the same method, same path, similar parameters

For other protocols

Same transport protocol, size and content and sent to the same port

Probe construction function

One for each cluster

One of the probes in the cluster with value field replaced by TARGET and SET macros

12

Fingerprint generation: Signature generation

responses contain the token

total responses

responses contain the token

total responses

fp =

coverage =

Find the distinctive token Coverage > 0.4 fg < 10-9

in cluster

in benign traffic

13

Scanning overview

Target ranges

Internet-wide: full, unreserved, allocated, BGP

Localized-reduced: BGP route contain seed's IP address

Localized-extended: extract the route description

Scan

Scan in random order

Whitelisting: exclude certain ranges; 512MB bit array

Multiple scanners iterate over the targets

14

Scanning: Horizontal Scanner

scanner target

SYN

SYNACKMark the target alive RST

No retransmission

Sender

Raw sockets

Initialization: buffer filled IP, TCP header

Rating limiting: inter-probe sleeping time

Receiver

Catch SYNACK packets

Keep listening after the sender completes

Check the validity and log the target IP

15

Scanning: AppTCP & UDP Scanner

Probe construction function

First: initialization build a default probe

P: pass the target IP and get the TCP or UDP payload

AppTCP scanner

Input: the living list given by horizontal scanner

Maximum size for a response

UDP scanner

Raw socket

Snort

Store traces and analyze offline

16

Evaluation: fingerprint generation

23 fingerprints for 13 families

3 exploit server, 10 malware

One UDP, rest use HTTP

17

Evaluation: Horizontal scanning

Test scan infrastructure and provider locality

4.1% - 57.5%, most seeds locate on cloud hosting providers

Difference on live hosts ← BGP advertised routes

Reusing the results

67%

localized

internet-wide

18

Evaluation: HTTP scanning

14151

unique

66(34 new)

128(72 new)

19

Evaluation: UDP scanning Fingerprint: ZeroAccess botnet

getL command: request supernodes list

Scan further

7884 → 15,943 supernodes

6257(39%) found, 61% unreachable

19% supernodes alive one day after the Internet-wide scan

Speed of active probing makes IP variability a small issue

20

Evaluation: server operations

Bestav: winwebsec, uraysy,...

winwebsec → 2 fingerprints

Internet-wide scan reveal:

16 payment server;

11 C&C server;

In 4 providers.

Payment server

C&Cserver

Provider A 6 5

Provider B 9 4

Provider C 2

Provider D 1

Cybercriminals host multiple servers onthe same hosting provider

21

Discussion Ethical consideration

Their probes are not malicious

Unsolicited nature of the probes

Explaining page → 106K IP addresses

Completeness

Some families can not generate fingerprints

Scanning capacity

Complex protocol semantics

Replaying request may fail

...

22

Conclusion Novel active probing approach for detecting malicious servers

Fast, cheap, easy to deploy

Identify different server types

Implement CyberProbe

One adversarial fingerprint generation

Three scanners

Internet-wide scan and localized scan

Build fingerprints for 13 families

Find 151 malicious servers 7881 P2P bots through 24 scans

4 times better that existing technique

Reveal provider locality property

23

Q&A

Documents

CyberProbe: Towards Internet-Scale Active Detection of Malicious Servers Antonio Nappa ⇤ ‡, Zhaoyan Xu †, M. Zubair Rafique ⇤, Juan Caballero ⇤, Guofei