30
Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute of Technology

Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Embed Size (px)

Citation preview

Page 1: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Learning Rules from System Call Arguments and Sequences for

Anomaly Detection

Gaurav Tandon and Philip ChanDepartment of Computer Sciences

Florida Institute of Technology

Page 2: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Overview

• Related work in system call sequence-based systems

• Problem Statement – Can system call arguments as attributes improve anomaly detection algorithms?

• Approach – LERAD ( a conditional rule learning algorithm)– Variants of attributes

• Experimental evaluation• Conclusions and future work

Page 3: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Related Work

• tide (time-delay embedding) Forrest et al, 1996• stide (sequence time-delay embedding) Hofmeyr

et al, 1999• t-stide (stide with frequency threshold)

Warrender et al, 1999• Variable length sequence-based techniques

(Wespi et al, 1999, 2000; Jiang et al, 2001)

False Alarms !!

Page 4: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Problem Statement

Current models – system call sequences

What else can we model?

System call arguments

open(“/etc/passwd”)

open(“/users/readme”)

Page 5: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Approach

• Models based upon system calls

• 3 sets of attributes- system call sequence- system call arguments- system call arguments + sequence

• Adopt a rule learning approach - Learning Rules for Anomaly Detection (LERAD)

Page 6: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Learning Rules for Anomaly Detection (LERAD) [Mahoney and Chan, 2003]

,....},{,..., 21 xxXbBaA

A, B, and X are attributes

a, b, x1, x2 are values to the corresponding attributes

nrbBaAxxXp /,...),|...},{Pr( 21

p - probability of observing a value not in the consequent

r - cardinality of the set {x1, x2, …} in the consequent

n - number of samples that satisfy the antecedent

rnpreAnomalySco //1

},{21231(), xyzabcArgArgcloseSC

Page 7: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Overview of LERAD

4 steps involved in rule generation:

1 From a small training sample, generate candidate rules and associate probabilities with them

2 Coverage test to minimize the rule set

3 Update rules beyond the small training sample

4 Validating rules on a separate validation set

Page 8: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Step 1a: Generate Candidate Rules

• Two samples are picked at random (say S1 and S2) • Matching attributes A, B and C are picked in random order (say B, C and A)• These attributes are used to form rules with 0, 1, 2 conditions in the antecedent

{2} B 3=C 1,=A :3 Rule

{2} B 3=C :2 Rule

{2} B * :1 Rule

Training Data

A B C D

Random Sample

S1 1 2 3 4

Random Sample

S2 1 2 3 5

Random Sample

S3 6 7 8 4

Training S4 1 0 9 5

Training S5 1 2 3 4

Validation S6 6 3 8 5

Page 9: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Step 1b: Generate Candidate RulesTraining

DataA B C D

Random Sample

S1 1 2 3 4

Random Sample

S2 1 2 3 5

Random Sample

S3 6 7 8 4

Training S4 1 0 9 5

Training S5 1 2 3 4

Validation S6 6 3 8 5

2/3][{2,7} B * :1 Rule

]2/1[{2} B 3=C 1,=A :3 Rule

]2/1[{2} B 3=C :2 Rule

p

p

p

• Adding values to the consequent based on a subset of the training set (say S1-S3)• Probability estimate p associated with every rule when it is violated ( instead of in each rule)• Rules are sorted in increasing order of the p

Page 10: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Step 2: Coverage TestTraining

DataA B C D

Random Sample

S1 1 2 (Rule 2) 3 4

Random Sample

S2 1 2 (Rule 2) 3 5

Random Sample

S3 6 7 (Rule 1) 8 4

Training S4 1 0 9 5

Training S5 1 2 3 4

Validation S6 6 3 8 5

• Obtain minimal set of rules

2/3][{2,7} B * :1 Rule

]2/1[{2} B 3=C 1,=A :3 Rule

]2/1[{2} B 3=C :2 Rule

p

p

p

Page 11: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Step 2: Coverage Test Training

DataA B C D

Random Sample

S1 1 2 (Rule 2) 3 4

Random Sample

S2 1 2 (Rule 2) 3 5

Random Sample

S3 6 7 (Rule 1) 8 4

Training S4 1 0 9 5

Training S5 1 2 3 4

Validation S6 6 3 8 5

• Obtain minimal set of rules

2/3][{2,7} B * :1 Rule

1/2][p{2} B 3=C 1,=A :3 Rule

]2/1[{2} B 3=C :2 Rule

p

p

Page 12: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Step 3: Updating rules beyond the training samples

Training Data

A B C D

Random Sample

S1 1 2 3 4

Random Sample

S2 1 2 3 5

Random Sample

S3 6 7 8 4

Training S4 1 0 9 5

Training S5 1 2 3 4

Validation S6 6 3 8 5

• Extend rules to the entire training (minus validation) set (samples S1-S5)

3/5][{2,7,0} B * :1 Rule

]3/1[{2} B 3=C :2 Rule

p

p

Page 13: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Step 4: Validating rules

Training Data

A B C D

Random Sample

S1 1 2 3 4

Random Sample

S2 1 2 3 5

Random Sample

S3 6 7 8 4

Training S4 1 0 9 5

Training S5 1 2 3 4

Validation S6 6 3 8 5

• Test the set of rules on the validation set (S6)• Remove rules that produce anomaly

3/5][{2,7,0} B * :1 Rule

]3/1[{2} B 3=C :2 Rule

p

p

Page 14: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Step 4: Validating rules

Training Data

A B C D

Random Sample

S1 1 2 3 4

Random Sample

S2 1 2 3 5

Random Sample

S3 6 7 8 4

Training S4 1 0 9 5

Training S5 1 2 3 4

Validation S6 6 3 8 5

• Test the set of rules on the validation set (S6)• Remove rules that produce anomaly

3/5][{2,7,0} B * :1 Rule

]3/1[{2} B 3=C :2 Rule

p

p

Page 15: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Learning Rules for Anomaly Detection (LERAD)

iii

iii

i rntptlyScoreTotalAnoma //

t - time interval since the last anomalous event

i - index of the rule violated

Non-stationary model

- only the last occurrence of an event is important

Page 16: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Variants of attributes

• 3 variants

(i) S-LERAD: system call sequence

(ii) A-LERAD: system call arguments

(iii) M-LERAD: system call arguments + sequence

Page 17: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

S-LERAD

• System call sequence-based LERAD• Samples comprising 6 contiguous system

call tokens input to LERAD

()}(),{6()2(),1 mmapcloseSCmunmapSCmmapSC

SC1 SC2 SC3 SC4 SC5 SC6

mmap() munmap() mmap() munmap() open() close()

munmap() mmap() munmap() open() close() open()

mmap() munmap() open() close() open() mmap()

Page 18: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

A-LERAD

• Samples containing system call along with arguments

• System call will always be a condition in the antecedent of the rule

}1240,2110,0102,1340{1() xxxArgmunmapSC

SC Arg1 Arg2 Arg3 Arg4 Arg5

Page 19: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

M-LERAD

• Combination of system call sequences and arguments

()}{313401(),1 munmapSCxArgcloseSC

Page 20: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

1999 DARPA IDS Evaluation [Lippmann et al, 2000]

• Week 3 – Training data (~ 2.1 million system calls)

• Weeks 4 and 5 – Test Data (over 7 million system calls)

• Total – 51 attacks on the Solaris host

Page 21: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Experimental Procedures• Preprocessing the data:BSM audit log Applications Processes

• Model per application• Merge all alarms

Pi

Application 1

Pj Pk

Application 2 Application N

Page 22: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Evaluation Criteria

• Attack detected if alarm generated within 60 seconds of occurrence of the attack

• Number of attacks detected @ 10 false alarms/day

• Time and storage requirements

Page 23: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Detections vs. false alarms

0

5

10

15

20

25

30

35

1 5 10 50 100

False Alarms per Day

Att

acks D

ete

cte

d

M-LERAD A-LERAD t-stide

stide S-LERAD tide

Page 24: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Percentage detections per attack type

0

10

20

30

40

50

60

70

80

90

100

Probes (5) DOS (19) R2L (12) U2R (9) Data (4) Data-U2R(2)

Attack Types (Number of Attacks)

Pe

rce

nta

ge

of

Att

ac

ks

De

tec

ted

tide stide t-stide S-LERAD A-LERAD M-LERAD

Page 25: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Comparison of CPU times

Application Training Time (seconds)

[on 1 week of data]

Testing Time (seconds)

[on 2 weeks of data]t-stide M-LERAD t-stide M-LERAD

ftpd 0.2 1.0 0.2 1.0

Telnetd 1.0 7.9 1.0 9.8

ufsdump 6.8 33.3 0.4 1.8

tcsh 6.3 32.8 5.9 37.6

login 2.4 16.7 2.4 19.9

sendmail 2.7 15.1 3.2 21.6

quota 0.2 3.5 0.2 3.8

sh 0.2 3.2 0.4 5.6

Page 26: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Storage Requirements

• More data extracted (system calls + arguments) – more space

• Only during training – can be done offline

• Small rule set vs. large database (stide, t-stide)

• e.g. for tcsh application:

1.5 KB file for the set of rules (M-LERAD)

5 KB for sequence database (stide)

Page 27: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Summary of contributions

• Introduced argument information to model systems• Enhanced LERAD to form rules with system calls as

pivotal attributes• LERAD with argument information detects more attacks

than existing system call sequence based algorithms (tide, stide, t-stide).

• Sequence + argument based system generally detected the most attacks with different false alarm rates

• Argument information alone can be used effectively to detect attacks at lower false alarm rates

• Less memory requirements during detection as compared to sequence based techniques

Page 28: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Future Work

• More $$$$$$$$$$

Page 29: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Future Work

• A richer representation

More attributes - time between subsequent system calls

• Anomaly score

t-stide vs. LERAD

Page 30: Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute

Thank You