Network-Level Spam Detection

1

Network-Level Spam Detection

Nick FeamsterGeorgia Tech

2

Spam: More than Just a Nuisance

• 95% of all email traffic– Image and PDF Spam

(PDF spam ~12%)

• As of August 2007, one in every 87 emails constituted a phishing attack

• Targeted attacks on the rise– 20k-30k unique phishing attacks per month

Source: CNET (January 2008), APWG

3

Detection

• Detect unwanted traffic from reaching a user’s inbox by distinguishing spam from ham

• Question: What features best differentiate spam from legitimate mail?– Content-based filtering: What is in the mail?– IP address of sender: Who is the sender?– Behavioral features: How the mail is sent?

4

Content-Based Detection: Problems

• Low cost to evasion: Spammers can easily alter features of an email’s content can be easily adjusted and changed

• Customized emails are easy to generate: Content-based filters need fuzzy hashes over content, etc.

• High cost to filter maintainers: Filters must be continually updated as content-changing techniques become more sophisticated

5

Another Approach: IP Addresses

• Problem: IP addresses are ephemeral

• Every day, 10% of senders are from previously unseen IP addresses

• Possible causes– Dynamic addressing– New infections

6

• Filter email based on how it is sent, in addition to simply what is sent.

• Network-level properties are less malleable– Hosting or upstream ISP (AS number)– Membership in a botnet (spammer, hosting

infrastructure)– Network location of sender and receiver– Set of target recipients

Idea: Network-Based Detection

7

Behavioral Blacklisting

• Idea: Blacklist sending behavior (“Behavioral Blacklisting”)– Identify sending patterns commonly used by

spammers

• Intuition: Much more difficult for a spammer to change the technique by which mail is sent than it is to change the content

8

Improving Classification

• Lower overhead• Faster detection• Better robustness (i.e., to evasion, dynamism)

• Use additional features and combine for more robust classification– Temporal: interarrival times, diurnal patterns– Spatial: sending patterns of groups of senders

9

SNARE: Automated Sender Reputation

• Goal: Sender reputation from a single packet?(or at least as little information as possible)– Lower overhead– Faster classification– Less malleable

• Key challenge– What features satisfy these properties and can

distinguish spammers from legitimate senders

10

Sender-Receiver Geodesic Distance

90% of legitimate messages travel 2,200 miles or less

11

Density of Senders in IP Space

For spammers, k nearest senders are much closer in IP space

12

Other Network-Level Features

• Time-of-day at sender

• Upstream AS of sender

• Message size (and variance)

• Number of recipients (and variance)

13

Combining Features

• Put features into the RuleFit classifier• 10-fold cross validation on one day of query logs

from a large spam filtering appliance provider

• Using only network-level features• Completely automated

14

Cluster-Based Features

• Construct a behavioral fingerprint for each sender

• Cluster senders with similar fingerprints

• Filter new senders that map to existing clusters

15

domain1.com domain2.com domain3.com

spam spam spam

IP Address: 76.17.114.xxxKnown Spammer

DHCPReassignment

Behavioral fingerprint

domain1.com domain2.com domain3.com

spam spam spam

IP Address: 24.99.146.xxxUnknown sender

Cluster on sending behavior

Similar fingerprint!

Cluster on sending behavior

Infection

Identifying Invariants

16

Building the Classifier: Clustering

• Feature: Distribution of email sending volumes across recipient domains

• Clustering Approach– Build initial seed list of bad IP addresses– For each IP address, compute feature vector:

volume per domain per time interval– Collapse into a single IP x domain matrix:– Compute clusters

17

Clustering: Fingerprint

• For each cluster, compute fingerprint vector:

• New IPs will be compared to this “fingerprint”

IP x IP Matrix: Intensity indicates pairwise similarity

18

Evaluation

• Emulate the performance of a system that could observe sending patterns across many domains– Build clusters/train on given time interval

• Evaluate classification– Relative to labeled logs– Relative to IP addresses that were eventually listed

19

Early Detection Results

• Compare SpamTracker scores on “accepted” mail to the SpamHaus database– About 15% of accepted mail was later determined to

be spam– Can SpamTracker catch this?

• Of 620 emails that were accepted, but sent from IPs that were blacklisted within one month– 65 emails had a score larger than 5 (85th percentile)

20

Small Samples Work Well

Relatively small samples can achieve low false positive rates

21

Extensions to Phishing

• Goal: Detect phishing attacks based on behavioral properties of hosting site(vs. static properties of URL)

• Features– URL regular expressions– Registration time of domain– Uptime of hosting site– DNS TTL and redirections

• Next time: Discussion of phishing detection/integration

22

Integration with SMITE• Sensors

– Extract network features from traffic– IP addresses– Combine with auxiliary data (routing, time, etc.)

• Algorithms– Clustering algorithm to identify behavioral fingerprints– Learning algorithm to classify based on multiple features

• Correlation– Clusters formed by aggregating sending behavior observed

across multiple sensors– Various features also require input from data collected

across collections of IP addresses

23

Summary

• Spam increasing, spammers becoming agile– Content filters are falling behind– IP-Based blacklists are evadable

• Up to 30% of spam not listed in common blacklists at receipt. ~20% remains unlisted after a month

• Complementary approach: behavioral blacklisting based on network-level features– Blacklist based on how messages are sent– SNARE: Automated sender reputation

• ~90% accuracy of existing with lightweight features– Cluster-based features to improve accuracy/reduce

need for labelled data

24

26

Improvements

• Accuracy– Synthesizing multiple classifiers– Incorporating user feedback– Learning algorithms with bounded false positives

• Performance– Caching/Sharing– Streaming

• Security– Learning in adversarial environments

27

Sampling: Training Time

28

Dynamism: Accuracy over Time

Documents

Network-Level Spam Detection