Learning Rules for Anomaly Detection of Hostile Network Traffic

Learning Rules for Anomaly Learning Rules for Anomaly Detection of Hostile Detection of Hostile

Network TrafficNetwork Traffic

Matthew V. Mahoney and Philip K. ChanMatthew V. Mahoney and Philip K. Chan

Florida Institute of TechnologyFlorida Institute of Technology

ProblemProblem: How to detect novel : How to detect novel intrusions in network traffic given intrusions in network traffic given

only a model of normal trafficonly a model of normal traffic

Normal web server requestNormal web server request

GET /index.html HTTP/1.0GET /index.html HTTP/1.0 Code Red II wormCode Red II worm

GET /default.ida?NNNNNNNNN…GET /default.ida?NNNNNNNNN…

What has been doneWhat has been done

FirewallsFirewalls Can’t block attacks on open ports (web, mail, DNS)Can’t block attacks on open ports (web, mail, DNS)

Signature Detection (SNORT, BRO)Signature Detection (SNORT, BRO) Hand coded rules (search for “default.ida?NNN”)Hand coded rules (search for “default.ida?NNN”) Can’t detect new attacksCan’t detect new attacks

Anomaly Detection (eBayes, ADAM, SPADE)Anomaly Detection (eBayes, ADAM, SPADE) Learn rules from normal traffic for low-level protocols Learn rules from normal traffic for low-level protocols

(IP, TCP, ICMP)(IP, TCP, ICMP) But application protocols (HTTP, mail) are too hard to But application protocols (HTTP, mail) are too hard to

modelmodel

Learning Rules for Anomaly Learning Rules for Anomaly Detection (LERAD)Detection (LERAD)

Associative mining (APRIORI, etc.) learns rules Associative mining (APRIORI, etc.) learns rules with high support and confidence for with high support and confidence for oneone value value

LERAD learns rules with high support (n) and a LERAD learns rules with high support (n) and a small small setset of allowed values (r) of allowed values (r)

Any value seen at least once in training is Any value seen at least once in training is allowedallowed

If port = 80 and word1 = “GET” then word3 {“HTTP/1.0”, “HTTP/1.1”} (r = 2)

LERAD StepsLERAD Steps

1.1. Generate candidate rulesGenerate candidate rules

2.2. Remove redundant rulesRemove redundant rules

3.3. Remove poorly trained rulesRemove poorly trained rules

LERAD is fast because steps 1-2 can be LERAD is fast because steps 1-2 can be done on a small random sample (~100 done on a small random sample (~100 tuples)tuples)

Step 1. Generate Candidate RulesStep 1. Generate Candidate RulesSuggested by matching attribute valuesSuggested by matching attribute values

SampleSample PortPort Word1Word1 Word2Word2 Word3Word3

S1S1 8080 GETGET /index.html/index.html HTTP/1.0HTTP/1.0

S2S2 8080 GETGET /banner.gif/banner.gif HTTP/1.0HTTP/1.0

S3S3 2525 HELOHELO pascalpascal MAILMAIL

S1 and S2 suggest:S1 and S2 suggest: port = 80port = 80 if port = 80 then word1 = “GET”if port = 80 then word1 = “GET” if word3 = “HTTP/1.0” and word1 = “GET then port = 80if word3 = “HTTP/1.0” and word1 = “GET then port = 80

S2 and S3 suggest no rulesS2 and S3 suggest no rules

Step 2. Remove Redundant RulesStep 2. Remove Redundant RulesFavor rules with higher score = n/rFavor rules with higher score = n/r

SampleSample PortPort Word1Word1 Word2Word2 Word3Word3

S1S1 8080 GETGET /index.html/index.html HTTP/1.0HTTP/1.0

S2S2 8080 GETGET /banner.gif/banner.gif HTTP/1.0HTTP/1.0

S3S3 2525 HELOHELO pascalpascal MAILMAIL

Rule 1: Rule 1: if port = 80 then word1 = “if port = 80 then word1 = “GETGET” ” (n/r = (n/r = 2/1)2/1)

Rule 2: Rule 2: if word2 = “/index.html” then word1 = if word2 = “/index.html” then word1 = “GET” “GET” (n/r = 1/1)(n/r = 1/1)

Rule 2 has lower score and covers no new values, Rule 2 has lower score and covers no new values, so it is redundantso it is redundant

Step 3. Remove Poorly Trained RulesStep 3. Remove Poorly Trained RulesRules with violations in a validation set will probably Rules with violations in a validation set will probably

generate false alarmsgenerate false alarms

Train Validate Test

r (number of allowed values)

Fully trained rule (kept)

Incompletely trainedrule (removed)

Attribute SetsAttribute Sets

Inbound client Inbound client packets (PKT)packets (PKT) IP packet cut into 24 IP packet cut into 24

16-bit fields16-bit fields

Inbound client TCP Inbound client TCP streamsstreams Date, timeDate, time Source, destination IP Source, destination IP

addresses and portsaddresses and ports Length, durationLength, duration TCP flagsTCP flags First 8 application First 8 application

wordswords

Anomaly score = tn/r summed over violated rules, t = time since previous violation

Experimental EvaluationExperimental Evaluation 1999 DARPA/Lincoln Laboratory Intrusion 1999 DARPA/Lincoln Laboratory Intrusion

Detection Evaluation (IDEVAL)Detection Evaluation (IDEVAL) Train on week 3 (no attacks)Train on week 3 (no attacks) Test on inside sniffer weeks 4-5 (148 simulated Test on inside sniffer weeks 4-5 (148 simulated

probes, DOS, and R2L attacks)probes, DOS, and R2L attacks) Top participants in 1999 detected 40-55% of attacks at Top participants in 1999 detected 40-55% of attacks at

10 false alarms per day10 false alarms per day 2002 university departmental server traffic (UNIV)2002 university departmental server traffic (UNIV)

623 hours over 10 weeks623 hours over 10 weeks Train and test on adjacent weeks (some unlabeled Train and test on adjacent weeks (some unlabeled

attacks in training data)attacks in training data) 6 known real attacks (some multiple instances)6 known real attacks (some multiple instances)

Experimental ResultsExperimental ResultsPercent of attacks detected at 10 false alarms per dayPercent of attacks detected at 10 false alarms per day

0

10

20

30

40

50

60

70

IDEVAL PKT IDEVAL TCP UNIV PKT UNIV TCP

UNIV Detection/False Alarm UNIV Detection/False Alarm TradeoffTradeoff

Percent of attacks detected at 0 to 40 false alarms per dayPercent of attacks detected at 0 to 40 false alarms per day

0

20

40

60

80

100

0 10 20 30 40

False alarms per day per detector

Perc

ent o

f att

acks

de

tect

ed

Comb

TCP

PKT

Run Time PerformanceRun Time Performance(750 MHz PC – Windows Me)(750 MHz PC – Windows Me)

Preprocess 9 GB IDEVAL traffic = 7 min.Preprocess 9 GB IDEVAL traffic = 7 min. Train + test < 2 min. (all systems)Train + test < 2 min. (all systems)

Anomalies are due to bugs and Anomalies are due to bugs and idiosyncrasies in hostile codeidiosyncrasies in hostile code

No obvious way to distinguish from benign eventsNo obvious way to distinguish from benign events

UNIV attackUNIV attack How detectedHow detected

Inside port scanInside port scan HEAD / HTTP\1.0 (backslash)HEAD / HTTP\1.0 (backslash)

Code Red II wormCode Red II worm TCP segmentation after TCP segmentation after GETGET

Nimda wormNimda worm host: wwwhost: www

Scalper wormScalper worm host: unknownhost: unknown

Proxy scanProxy scan host: www.yahoo.comhost: www.yahoo.com

DNS version probeDNS version probe (not detected)(not detected)

ContributionsContributions

LERAD differs from association mining in LERAD differs from association mining in that the goal is to find rules for anomaly that the goal is to find rules for anomaly detection: a small detection: a small setset of allowed values of allowed values

LERAD is fast because rules are LERAD is fast because rules are generated from a small samplegenerated from a small sample

Testing is fast (50-75 rules)Testing is fast (50-75 rules) LERAD improves intrusion detectionLERAD improves intrusion detection

Models application protocolsModels application protocols Detects more attacksDetects more attacks

Thank youThank you

Documents

Learning Rules for Anomaly Detection of Hostile Network Traffic