Secure Configuration of Intrusion Detection Sensors for ... · Secure Configuration of Intrusion Detection Sensors for Dynamic Enterprise-Class Distributed Systems Gaspar Modelo-Howard

PhD Final Examination

Secure Configuration

of Intrusion Detection Sensors

for Dynamic Enterprise-Class

Distributed Systems

Gaspar Modelo-Howard Dependable Computing Systems Laboratory (DCSL)

Center for Education and Research in Information Assurance and Security (CERIAS)

Purdue University

December 11, 2012

Research Collaboration

• Comittee Members

– Prof. Saurabh Bagchi (advisor)

– Prof. Guy Lebanon

– Prof. Sonia Fahmy

– Prof. Vijay Pai

• Research collaborators

– Prof. Alan Qi

– Fahad Arshad

– Christopher Gutierrez

– Yu-Sung Wu

– Bingrui Foo

– NEEScomm IT Team

2

Our Security Scenario

• Multi-Stage Attack Example

S1 S2

S3 S4

3

Hacker


D1

D2

D3

D4

6

• Distributed IDS, based

on Bayesian Network


5

D1

• Dynamic Environment

D2

D3

D4

D5

D6

6

S1 S2

S3 S4

Limitation of our approach: static view of attack surface Question: how to generalize our model? Answer: generalize the signatures (vulnerabilities)

Hacker

Thesis Statement

• Develop novel techniques for configuring

and evaluating distributed intrusion detection systems

in dynamic computing environments

• Contributions

– Configuration of sensors based on monitored environment

– Re-configuration of sensors based on runtime observations

– System for automatic generation of generalized signatures

for individual attack steps

6

Presentation Outline

• Research Contributions

• pSigene: Automatic Generation and Update

of Generalized Intrusion Detection Signatures

– Motivation and Related Work

– Framework Design

– Evaluation

• Future Work

• Conclusions

7

Research Contributions

•Developed evaluation technique for configuration of IDS

•Greedy Algorithm for determining detectors settings

•RAID 2008

Determining Configuration of

Detection Sensors

• Implemented DP Approach to Evaluate Configuration

•Performed comparison between approximation algorithms

•CERIAS Technical Report

Approximation Algorithms for

Evaluation of Sensors

•Designed DIDS to dynamically reconfigure itself given dynamic environment and detection of multi-stage attacks

• SecureComm 2011

Configuration of Detection Sensors in a Dynamic Environment

•Designed pSigene system, for automatic generation of generalized intrusion detection signatures

• Submitted to IEEE S&P 2013

Creation of Generalized Intrusion Detection Signatures

7

Final Exam

Prelim Exam

pSigene: Motivation

• Misuse-based intrusion detection systems (IDS)

• New observations update of signatures

– Manual update

– Herculean task

– Relative static nature of signatures (missing 0-day attacks)

9

IDS

Signatures Set

union+select

union+select ALERT

pSigene: Motivation

• Example of existing signature in detection system

10

(?i:(?:\b(?:(?:s(?:ys\.(?:user_(?:(?:t(?:ab(?:_column|le)|rigger)|object|view)s|c(?:onstraints|atalog))|all_tables|tab)|elect\b.{0,40}\b(?:substring|users?|ascii))|m(?:sys(?:(?:queri|ac)e|relationship|column|object)s|ysql\.(db|user))|c(?:onstraint_type|harindex)|waitfor\b\W*?\bdelay|attnotnull)\b|(?:locate|instr)\W+\()|\@\@spid\b)|\b(?:(?:s(?:ys(?:(?:(?:process|tabl)e|filegroup|object)s|c(?:o(?:nstraint|lumn)s|at)|dba|ibm)|ubstr(?:ing)?)|user_(?:(?:(?:constrain|objec)t|tab(?:_column|le)|ind_column|user)s|password|group)|a(?:tt(?:rel|typ)id|ll_objects)|object_(?:(?:nam|typ)e|id)|pg_(?:attribute|class)|column_(?:name|id)|xtype\W+\bchar|mb_users|rownum)\b|t(?:able_name\b|extpos\W+\())) Reference: OWASP ModSecurity Core Rule Set, v.2.2.4

pSigene: Motivation


11


pSigene: Motivation


12


pSigene: Motivation


13

(?i:(?:(?:s(?:t(?:d(?:dev(_pop|_samp)?)?|r(?:_to_date|cmp))|u(?:b(?:str(?:ing(_index)?)?|(?:dat|tim)e)|m)|e(?:c(?:_to_time|ond)|ssion_user)|ys(?:tem_user|date)|ha(1|2)?|oundex|chema|ig?n|pace|qrt)|i(?:s(null|_(free_lock|ipv4_compat|ipv4_mapped|ipv4|ipv6|not_null|not|null|used_lock))?|n(?:et6?_(aton|ntoa)|s(?:ert|tr)|terval)?|f(null)?)|u(?:n(?:compress(?:ed_length)?|ix_timestamp|hex)|tc_(date|time|timestamp)|p(?:datexml|per)|uid(_short)?|case|ser)|l(?:o(?:ca(?:l(timestamp)?|te)|g(2|10)?|ad_file|wer)|ast(_day|_insert_id)?|e(?:(?:as|f)t|ngth)|case|trim|pad|n)|t(?:ime(stamp|stampadd|stampdiff|diff|_format|_to_sec)?|o_(base64|days|seconds|n?char)|r(?:uncate|im)|an)|m(?:a(?:ke(?:_set|date)|ster_pos_wait|x)|i(?:(?:crosecon)?d|n(?:ute)?)|o(?:nth(name)?|d)|d5)|r(?:e(?:p(?:lace|eat)|lease_lock|verse)|o(?:w_count|und)|a(?:dians|nd)|ight|trim|pad)|f(?:i(?:eld(_in_set)?|nd_in_set)|rom_(base64|days|unixtime)|o(?:und_rows|rmat)|loor)|a(?:es_(?:de|en)crypt|s(?:cii(str)?|in)|dd(?:dat|tim)e|(?:co|b)s|tan2?|vg)|p(?:o(?:sition|w(er)?)|eriod_(add|diff)|rocedure_analyse|assword|i)|b(?:i(?:t_(?:length|count|x?or|and)|n(_to_num)?)|enchmark)|e(?:x(?:p(?:ort_set)?|tract(value)?)|nc(?:rypt|ode)|lt)|v(?:a(?:r(?:_(?:sam|po)p|iance)|lues)|ersion)|g(?:r(?:oup_conca|eates)t|et_(format|lock))|o(?:(?:ld_passwo)?rd|ct(et_length)?)|we(?:ek(day|ofyear)?|ight_string)|n(?:o(?:t_in|w)|ame_const|ullif)|(rawton?)?hex(toraw)?|qu(?:arter|ote)|(pg_)?sleep|year(week)?|d?count|xmltype|hour)\W*\(|\b(?:(?:s(?:elect\b(?:.{1,100}?\b(?:(?:length|count|top)\b.{1,100}?\bfrom|from\b.{1,100}?\bwhere)|.*?\b(?:d(?:ump\b.*\bfrom|ata_type)|(?:to_(?:numbe|cha)|inst)r))|p_(?:sqlexec|sp_replwritetovarbin|sp_help|addextendedproc|is_srvrolemember|prepare|sp_password|execute(?:sql)?|makewebtask|oacreate)|ql_(?:longvarchar|variant))|xp_(?:reg(?:re(?:movemultistring|ad)|delete(?:value|key)|enum(?:value|key)s|addmultistring|write)|terminate|xp_servicecontrol|xp_ntsec_enumdomains|xp_terminate_process|e(?:xecresultset|numdsn)|availablemedia|loginconfig|cmdshell|filelist|dirtree|makecab|ntsec)|u(?:nion\b.{1,100}?\bselect|tl_(?:file|http))|d(?:b(?:a_users|ms_java)|elete\b\W*?\bfrom)|group\b.*\bby\b.{1,100}?\bhaving|open(?:rowset|owa_util|query)|load\b\W*?\bdata\b.*\binfile|(?:n?varcha|tbcreato)r|autonomous_transaction)\b|i(?:n(?:to\b\W*?\b(?:dump|out)file|sert\b\W*?\binto|ner\b\W*?\bjoin)\b|(?:f(?:\b\W*?\(\W*?\bbenchmark|null\b)|snull\b)\W*?\()|print\b\W*?\@\@|cast\b\W*?\()|c(?:(?:ur(?:rent_(?:time(?:stamp)?|date|user)|(?:dat|tim)e)|h(?:ar(?:(?:acter)?_length|set)?|r)|iel(?:ing)?|ast|r32)\W*\(|o(?:(?:n(?:v(?:ert(?:_tz)?)?|cat(?:_ws)?|nection_id)|(?:mpres)?s|ercibility|alesce|t)\W*\(|llation\W*\(a))|d(?:(?:a(?:t(?:e(?:(_(add|format|sub))?|diff)|abase)|y(name|ofmonth|ofweek|ofyear)?)|e(?:(?:s_(de|en)cryp|faul)t|grees|code)|ump)\W*\(|bms_pipe\.receive_message\b)|(?:;\W*?\b(?:shutdown|drop)|\@\@version)\b|'(?:s(?:qloledb|a)|msdasql|dbo)'))\b(?i:having)\b\s+(\d{1,10}|'[^=]{1,10}')\s*[=<>]|(?i:\bexecute(\s{1,5}[\w\.$]{1,5}\s{0,3})?\()|\bhaving\b ?(?:\d{1,10}|[\'\"][^=]{1,10}[\'\"]) ?[=<>]+|(?i:\bcreate\s+?table.{0,20}?\()|(?i:\blike\W*?char\W*?\()|(?i:(?:(select(.*)case|from(.*)limit|order\sby)))|exists\s(\sselect|select\Sif(null)?\s\(|select\Stop|select\Sconcat|system\s\(|\b(?i:having)\b\s+(\d{1,10})|'[^=]{1,10}')

Signature with regular expression of 2,917 characters Signature with regular expression of 2,917 characters

Related Work

• Automatic Signature Creation

– [Kreibi04], [Newsom05], [Yegnes06], [Li06]

– Work aimed at malware case (not our case)

• Signature Generalization

– [Yegnes06], [Robert06] , [Aickel08]

– Deterministic approach

• Protocol knowledge-based detection

– [Vigna09], [Robert10], [Chandr11]

– Different protocols, similar assumption

14

Contributions

• An automatic approach to generate and update

signatures for misuse-based NIDS

• A framework to generalize existing signatures

• Rigorously benchmarked our solution with a large set

of attack samples and compare our performance

to popular misuse-based NIDS-es

15







– Evaluation

• Future Work

• Conclusions

16

Framework Design

• pSigene: probabilistic Signature Generation

17

1. Crawls multiple public cybersecurity portals to collect attack samples

2. Extracts a rich set of features from the attack samples

3. Applies a clustering technique to the samples, giving the distinctive features for each cluster

4. A generalized signature is created for each cluster, using logistic regression modeling

Motivation for Focusing on SQLi

• To develop our system, we consider the prevalent

class of SQL injection (SQLi) attacks

– SQLi: insertion of a SQL query via input data to web app

– Very relevant attack vector popularity of 3-tier systems

• Sony

• Sega

• UK Royal Navy

• US CIA

• Oracle

• Adobe

18

Source: IBM X-Force 2011 Trend and Risk Report

Phase 1: Webcrawling for Attack Samples

• Proactively collect samples from multiple

cybersecurity web portals

• Portal: public repository of computer security tools,

exploits and security advisories

– Examples: Security Focus, Exploit Database, PacketStorm

Security, Open Source Vulnerability Database

– Availability of Search APIs (or use of Google’s)

– Open forums and mailing lists also available

19

Phase 1: CyberSecurity Portal Example

20

• www.securityfocus.com

Attack sample


21

• To collect the attack samples for our experiments,

we crawled different portals between April and June

2012

– Collected over 30,000 SQLi attack samples, using them

as our dataset to generate the generalized signatures

• Reasons for webcrawling

– When collecting samples for slow moving attacks

(such as SQLi)

– Greater diversity than typically handled by honeypots

– To speed up the data collection process (logistics)


• Sample collection should be as comprehensive as possible

– Manually inspected high/medium risk SQLi vulnerabilities

published in July 2012 by NVD

– In each case, found in our dataset working examples of attacks

22

VULNERABILITY CVE ID

Joomla 1.5.x RSGallery 2.3.20 component

CVE-2012-3554

Drupal 6.x-4.2 Addressbook module CVE-2012-2306

Moodle 2.0.x mod/feedback/complete.php 2.0.10

CVE-2012-3395

RTG 0.7.4 and RTG2 0.9.2 95/view/rtg.php

CVE-2012-3881

Phase 2: Feature Selection

• We characterize each sample using a set of features

• Three sources used to create set of features

23

FEATURE SOURCE

EXAMPLES DESCRIPTION

MySQL Reserved Words

create

insert

delete

Words are reserved in MySQL and require special treatment for use as identifiers or built-in functions

NIDS/WAF Signatures

in\s*?$+\s*?select

$?;

[^a-zA-Z&]+=

SQLi signatures for popular open-source detection systems are deconstructed into its components

SQLi Reference Documents

\’ ORDER BY [0-9]-- -

/\*/

\”

Common strings found in SQLi attacks, shared by subject matter experts


• The resulting feature set consisted of a series

of regular expressions representing

– Relevant characters

– SQL tokens

– Popular strings found in SQLi attacks

• All features in the set were of numeric type

– Each one measures the number of times a feature is found

in an attack sample

24

Attack Sample

?artist=0+div+1+union#foo*/*bar select # foo 1,2,current_user

Feature Value

+ 3

union.+select 1


• Resulting feature set used in the experiments had 159

numerical entries

– Started with 477 features

• Feature set also consider relative position of tokens

among them

– Example: =[0-9%]+

• Webcrawled dataset was organized into

a 30,000-by-159 matrix

– Sparse matrix: 85% of cells had zero values, 6% were ones

25

What is Biclustering?

26

sam

ple

s

features

Phase 3: Creating Clusters

for Similar Attack Samples • Usage of biclustering technique to analyze

the matrix, creating multiple biclusters

• Objective

– identify subsets of attack samples which share similar values for subset of features

• We create a signature for each bicluster

– Biclusters are non-overlapping and non-exclusive

• We performed a 2-way hierarchical agglomerative clustering algorithm, using

– Similarity metric: Euclidean pairwise distance

– Linkage Criteria: Unweighted Pair Group Method with Arithmetic Mean (UPGMA)

27


for Similar Atack Samples

28

• Heatmap representation of biclustering algorithm

on the matrix representing samples set


for Similar Attack Samples

• We tested different pair-wise distance metrics

and linkage criteria for the HAC algorithm

– Euclidean + UPGMA “clean” biclusters

• Validation of tree with the cophenetic correlation

coefficient

– Linear correlation coefficient between distances obtained

from tree and original distances used to construct it

• Ultimately the explorations of the design space

required visual inspection of multiple heatmaps

– Rather than use of multiple security experts with zen

master-like grasp of regular expressions

29

Phase 4: Creation of Generalized Signatures

• A generalized signature is created from each

bicluster

– A signature is a logistic regression (LR) model

of the corresponding bicluster

– A signature predicts whether an SQL query is an attack

similar to the samples in the bicluster

• Output values are interpreted as estimated probability

that a sample belongs to a class

30

jbsig

jb

jT Fj

T

j

eFgFh

1

1

sigmoid function

pSigene: Example of a Generalized Signature

• Bicluster b6 has a set S6 of 2,741 samples and a set

of features F6:

31

FEATURE NUMBER FEATURE (REGULAR EXPRESSION)

25 =

37 =[-0-9%]*

53 <=>|r?like|sounds+like|regex

36 ([^a-zA-Z&]+)?&|exists

28 [\?&][^\s\t\x00-\x37\|]+?

32 \)?;

pSigene: Example of a Generalized Signature

• Trained the logistic regression model with the set S6

(attack class) and one day of non-malicious traffic

(benign class)

32

5363762566 2614630262131026213107610543 ,,,

T f.f.f..Θ

326286366 70832401172702615840 ,,, f.f.f.

TESTING SAMPLE TYPE PROB.

?option=com\_simplefaq\&task=answer\&Itemid=9

999\&catid=9999\&aid=-

1+union+select+1,concat\_ws(0x3,username,password,ema

il),3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,

20+from+jos\_users

Attack 0.9926

/mod/resource/view.php?id=21154 Benign 0.0694

/blocks/mle/dwn/index.php?vendor=samsung\&device=X830 Benign 0.1928







– Evaluation

• Future Work

• Conclusions

33

Evaluation

• Evaluted pSigene and the signatures in 3 other IDSes by using SQLi attack samples and benign web traffic

• Experiment 1: Accuracy and Precision Comparison

– Determine True and False Positive Rates (TPR, FPR), using network traces collected from real systems

• Experiment 2: Incremental learning

– Incrementing number of attack samples given during learning step of signature creation phase

• Experiment 3: Performance Evaluation

– Determine performance impact of matching the signatures generated

• Used Bro NIDS to run experiments

34

Evaluation: SQLi Signature Sets

• Analyzed 4 different SQLi signature set, taken

from popular open-source security projects

35

Signature Set Version Number SQLi rules

SQLi Rules

Enabled

RegEx Usage

Signature Length

Avg Min Max

Bro 2.0 6 100% 100% 247.7 27 429

Snort Rules 2920 79 61% 82% 44.1 13 123

Emerging Threats

7098 4,160 0% 99% 14.0 9 75

ModSecurity 2.2.4 34 100% 100% 390.2 28 2917

Motivated us to use regular expressions to build features

Suggests ad-hoc approach to create signatures

Suggests overlap between signatures; open collaboration nature of project

Evaluation: SQLi Test Datasets

36

TRAINING TESTING

BENIGN

• Captured all HTTP traffic to several web servers (institutional, registration, payment, webmail) at university institution

• 2-days network trace • 1.32GB / 0.4 millions

HTTP GET requests

• 1-week network trace • 4.53GB / 1.4 millions HTTP

GET requests

MALICIOUS

• Webcrawled SQLi samples

• Generated from running standard SQLi scanning tool against vulnerable (Java-based) web app

• Biclusters from 30000 SQLi attacks

• 7258 HTTP GET requests

Experiment 1: Accuracy and Precision

Comparison

• pSigene signatures had higher detection rate than Snort and Bro, lower than ModSecurity

• pSigene had second lowest FPR, only behind Bro

• Important to notice absolute number of false positives

• ModSecurity has a robust signature set – Popular open-source WAF, developed and tested over many years

37

RULES TPR (%) FPR (%)

Bro 73.23 0.00

Snort – Emerging Threats 79.55 0.174 (2,463)

ModSecurity 96.07 0.052 (730)

pSigene (9) 86.53 0.037 (523)

pSigene (7) 82.72 0.016 (226)


of Individual Signatures

• Wide variability in the quality and coverage of the signatures

• Each signature can be tuned, using the threshold value

38



• Some signatures (examples: 1, 2, and 3) are quite insensitive to threshold settings

39



• Signatures 6 and 8 produce false positives faster than other signatures (share same set of features)

40

Experiment 1: Coverage of Individual Signatures

• Largest bicluster has 44% of samples, smallest bicluster has 5.5%

• 2-step filtering process for features – Features resulting from biclustering have equivalence classes

– Some features output by biclustering are too noisy for accurate classification

41

BICLUSTER NUMBER

NUMBER OF SAMPLES

NO. OF FEATURES (BICLUSTERING)

NO. OF FEATURES (SIGNATURE)

Bicluster 1 13272 90 33

Bicluster 2 5477 90 13







Bicluster 11 1671 15 14

Experiment 2: Incremental Learning

• Incremented the number of attack samples used to learn 𝚯 parameters

– Used set of 10 signatures

• TPR showed an increment of >2% in each round

– pSigene is getting similar attack samples in each round

• FPR also increased slightly in each round

– We added more malicious samples only

42

TEST DATASET USED FOR TRAINING

TPR(%) FPR(%)

0% 86.53 0.037

20% 89.13 0.039

40% 91.15 0.044

Experiment 3: Performance Evaluation

• Measured processing time per HTTP request for each signature – pSigene gives a slowdown of 11x (ModSec) and 19x (Bro)

– Large number of count_all function invocations large processing time

43

0

500

1000

1500

2000

BRO PSIGENE MODSEC

Ave

rage

Lat

en

cy P

er

Sign

atu

re (

Mic

rose

con

ds)

Performance of Signatures in BRO

Min

Avg

Max

Experiment 3: Performance Evaluation

• Signature matching in pSigene, not a bottleneck – Worst case processing time < 2ms

– Measurements made in resource-starved machine (700MHz CPU, 512MB RAM)

• Signature matching process can be parallelizable

44

0

500

1000

1500

2000

BRO PSIGENE MODSEC

Ave

rage

Lat

en

cy P

er

Sign

atu

re (

Mic

rose

con

ds)

Performance of Signatures in BRO

Min

Avg

Max

Future Work

• Further Evaluation of pSigene

– LibInjection Project

– Bro & Emerging Threats Development Teams

• Implementation of incremental update operation

– Incrementally update biclusters and logistic regression

models

• Performance Improvement

– Parallelization of traffic and multi-threading

• Extend pSigene to other attack types

– Candidates include Cross-Site Scripting (XSS) and Botnet

activity

45

Conclusions

• Presented pSigene, system for the automation

generation and update of intrusion signatures

– Tested architecture for the prevalent class of SQLi attacks

and found signatures to perform very well

• Framework to generalize existing signatures and

detection of new variations

– Features filtering process with biclustering + logistic

regression

• Rigorously benchmarked the system with a large set

of real attack samples

– Compare performance to popular misuse-based IDS

46

THANK YOU!

47

Publications

1. Modelo-Howard, G., Arshad, F., Gutierrez, C., Bagchi, S., Qi, Y.: Webcrawling to Generalize SQL Injection Signatures. Submitted to IEEE S&P 2013.

2. Arshad, F., Modelo-Howard, G., Bagchi, S.: To Cloud or Not to Cloud: A Study of Trade-offs between In-house and Outsourced Virtual Private Network. NPSec 2012.

3. Modelo-Howard, G., Sweval, J., Bagchi, S.: Secure Configuration of Intrusion Detection Sensors for Changing Enterprise Systems. SecureComm 2011.

4. Modelo-Howard, G., Bagchi, S., Lebanon, G.: Approximation Algorithms for Determining Placement of Intrusion Detectors in a Distributed System. CERIAS Technical Report 2011-01, Purdue University.

5. Modelo-Howard, G., Bagchi, S., Lebanon, G.: Determining Placement of Intrusion Detectors for a Distributed Application through Bayesian Network Modeling. RAID 2008.

6. Wu, Y., Modelo-Howard, G., Foo, B., Bagchi, S., Spafford, E.: The Search for Efficiency in Automated Intrusion Response for Distributed Applications. SRDS 2008.

7. Herbert, D., Modelo-Howard, G., Perez-Toro, C., Bagchi, S.: Fault Tolerant ARIMA-based Aggregation of Data in Sensor Networks (Fast Abstracts). DSN 2007.

8. Foo, B., Glause, M., Modelo-Howard, G., Wu, Y., Bagchi, S., Spafford, E.: Intrusion Response Systems: A Survey. Information Assurance: Dependability and Security in Networked Systems. Morgan Kaufmann, San Francisco (2007).

48

http://www.cerias.purdue.edu/assets/pdf/bibtex_archive/2011-01.pdf

http://www.springerlink.com/content/545rg5kt7224p1r1/

Lessons Learned

• Computer security rules!

• Ideas are (very) hard to come by

• Understand the problem and tools details matter

• Talk to industry

• Don’t overcommit Time management

• Research Philosophy

– Attack practical security problems with sound, principal

methods

49

ACK: R. Arpaci