A Data Mining Framework for Building Intrusion Detection Models

8/8/2019 A Data Mining Framework for Building Intrusion Detection Models

1/13

A Data Mining Framework for Building Intrusion Detection Models

Wenke LeeSalvatore J. Stolfo

Kui W. Mok Computer Science Department, Columbia University

500 West 120th Street, New York, NY 10027{wenke,sal,mok }@cs.columbia.edu

Abstract

There is often the need to update an installed Intrusion Detection System (IDS) due to new attack methods or up-graded computing environments. Since many current IDSsare constructed by manual encoding of expert knowledge,changes to IDSs are expensive and slow. In this paper, wedescribe a data mining framework for adaptively building Intrusion Detection (ID) models. The central idea is to uti-lize auditing programs to extract an extensive set of fea-tures that describe each network connection or host ses-

sion, and apply data mining programs to learn rules that accurately capture the behavior of intrusions and normalactivities. These rules can then be used for misuse detec-tion and anomaly detection. New detection models are in-corporated into an existing IDS through a meta-learning(or co-operative learning) process, which produces a metadetection model that combines evidence from multiple mod-els. We discuss the strengths of our data mining pro-grams, namely, classication, meta-learning, associationrules, and frequent episodes. We report our results of applying these programs to the extensively gathered network audit data for the 1998 DARPA Intrusion Detection Evalu-ation Program.

1 Introduction

As network-based computer systems play increasinglyvital roles in modern society, they have become the targetof intrusions by our enemies and criminals. In addition tointrusion prevention techniques, such as user authentication(e.g. using passwords or biometrics), avoiding program-ming errors, and information protection (e.g., encryption),intrusion detection is often used as another wall to protect

This research is supported in part by grants from DARPA (F30602-96-1-0311) and NSF (IRI-96-32225 and CDA-96-25374).

computer systems.Intrusion detection techniques can be categorized into

anomaly detection and misuse detection . Anomaly detec-tion systems, for example, IDES [14], ag observed ac-tivities that deviate signicantly from the established nor-mal usage proles as anomalies (i.e., possible intrusions).Misuse detection systems, for example, IDIOT [9] andSTAT [5], use patterns of well-known attacks or weak spotsof the system to match and identify known intrusion, pat-terns or signatures.

While accuracy is the essential requirement of an IDS, its

extensibility and adaptability are also critical in todays net-work computing environment. There are multiple penetra-tion points for intrusions to take place in a network system.For example, at the network level carefully crafted mali-cious IP packets can crash a victim host; at the host level,vulnerabilities in system software can be exploited to yieldan illegal root shell. Since activities at different penetrationpoints are normally recorded in different audit data sources,an IDS often needs to be extended to incorporate additionalmodules that specialize on certain components (e.g., hosts,subnets, etc.) of the network systems. The large trafc vol-ume in security related mailing lists and Web sites suggest

that new system security holes and intrusion methods arecontinuously being discovered. Therefore it is imperativethat IDSs be updated frequently and timely.

Currently building an effective IDS is an enormousknowledge engineering task. System builders rely on theirintuition and experienceto select the statistical measures foranomaly detection [13]. Experts rst analyze and categorizeattack scenarios and system vulnerabilities, and hand-codethe corresponding rules and patterns for misuse detection.Because of the manual and ad hoc nature of the develop-ment process, current IDSs have limited extensibility andadaptability. Many IDSs only handle one particular auditdata source, and their updates are expensive and slow.

Some of the recent research and commercial IDSs havestarted to provide built-in mechanisms for customization


2/13

and extension. For example, both Bro [18] and NFR [6]lter network trafc streams into a series of events, and ex-ecute scripts, e.g., Bro policy scripts and NFRs N-Codes,that contain site-specic event handlers, i.e., intrusion de-tection and handling rules. The system administration per-sonnel at each installation site must now assume the rolesof both security experts and IDS builders because they areresponsible for writing the correct event handling functions.Our rst-hand experience with both Bro and NFR show thatwhile these systems provide great exibility, writing thescripts involves a lot of effort, in addition to learning thescripting languages. For example, there is no means to de-bug the scripts. These systems also handle a xed set of network trafc event types. On a few occasions we were

forced to make changes to the source code of the originalIDS to handle new event types.

Our research aims to develop a more systematic and au-tomated approach in building IDSs. We are developing aset of tools that can be applied to a variety of audit datasources to generate intrusion detection models. We take adata-centric point of view and consider intrusion detectionas a data analysis process. Anomaly detection is about nd-ing the normal usage patterns from the audit data, whereasmisuse detection is about encoding and matching the in-trusion patterns using the audit data. The central theme of our approach is to apply data mining programs to the exten-sively gathered audit data to compute models that accuratelycapture the actual behavior (i.e., patterns) of intrusions andnormal activities. This automatic approach eliminates theneed to manually analyze and encode intrusion patterns, aswell as the guesswork in selecting statistical measures fornormal usage proles. More importantly, the same datamining tools can be applied to multiple streams of evidence,each from a detection module that specializes on a specictype(s) of intrusion or a specic component (e.g., a mission-critical host) of the network system, to learn the combineddetection model that considers all the available evidence.Thus our framework facilitates the construction of adaptive

IDSs.

The rest of the paper is organized as follows. Section 2outlines the main components of our framework. Section 3briey describes several data mining programs, and dis-cusses how they can be applied to discover frequent intru-sion and normal activity patterns, which are the basis forbuilding misuse detection models and user anomaly detec-tion models. Section 4 describes how to construct tempo-ral and statistical features using the frequent patterns minedfrom audit data. Section 5 reports the results of our ex-periments on building intrusion detection models using theaudit data from the DARPA evaluation program. Section 6discusses related research projects. Section 7 outlines ourfuture research plans.

2 A Systematic Framework

A basic premise for intrusion detection is that when au-dit mechanisms are enabled to record system events, dis-tinct evidence of legitimate activities and intrusions will bemanifested in the audit data. Because of the sheer volumeof audit data, both in the amount of audit records and inthe number of system features (elds of the audit records),efcient and intelligent data analysis tools are required todiscover the behavior of system activities.

Data mining generally refers to the process of extractingdescriptive models from large stores of data [4]. The re-cent rapid development in data mining has made available awide variety of algorithms, drawn from the elds of statis-tics, pattern recognition, machine learning, and databases.Several types of algorithms are particularly useful for min-ing audit data:

Classication: maps a data item into one of several pre-dened categories. These algorithms normally outputclassiers, for example, in the form of decision treesor rules. An ideal application in intrusion detectionwill be to gather sufcient normal and abnormalaudit data for a user or a program, then apply a clas-sication algorithm to learn a classier that can labelor predict new unseen audit data as belonging to the

normal class or the abnormal class;Link analysis: determines relations between elds in the

database records. Correlations of system features inaudit data, for example, the correlation between com-mand and argument in the shell command history dataof a user, can serve as the basis for constructing normalusage proles. A programmer, for example, may haveemacs highly associated with C les;

Sequence analysis: models sequential patterns. These al-gorithms can discover what time-based sequence of audit events are frequently occurring together. These

frequent event patterns provide guidelines for incorpo-rating temporal statistical measures into intrusion de-tection models. For example, patterns from audit datacontaining network-based denial-of-service (DOS) at-tacks suggest that several per-host and per-servicemeasures should be included.

We are developing a framework, rst proposed in [11],of applying data mining techniques to build intrusion de-tection models. This framework consists of programs forlearning classiers and meta-classication [2], associationrules [1] for link analysis, frequent episodes [16] for se-quence analysis, and a support environment that enablessystem builders to interactively and iteratively drive the pro-cess of constructing and evaluating detection models. The

2


3/13

end product are concise and intuitive rules that can detectintrusions, and can be easily inspected and edited by secu-rity experts when needed.

In our approach, the learned rules replace the manuallyencoded intrusion patterns and proles, and system featuresand measures are selected by considering the statistical pat-terns computed from the audit data. Meta-learning is usedto learn the correlation of intrusion evidence from multipledetection models, and produce a combined detection mod-els.

It is very important to point out that our framework doesnot eliminate the need to pre-process and analyze raw auditdata, e.g., tcpdump [7] and BSM [22] output. In fact, to buildintrusion detection models for network systems, our data

mining programs use pre-processed audit data where eachrecord corresponds to a high level event, e.g., a network connection or host session. Each record normally includesan extensive set of features that describe the characteristicsof the event, for example, the duration of a connection, thenumber of bytes transfered, etc.

While analyzing and summarizing raw audit data is anessential task for an IDS, we argue that generic utilitiesshould rst be developed by network and operating systemexperts, and made available to all IDSs as the lowest levelbuilding blocks. Bro and NFR can be regarded as examplesof such robust and free utilities, as they both perform IP

packet ltering and reassembling, and allow event handlersto output summarized connection records. Our framework assumes such building blocks are available when construct-ing IDSs.

3 Mining Audit Data

In this section we describe our data mining algorithms,and illustrate how to apply these algorithms to generate de-tection models from audit data. Here audit data refers topre-processed timestamped audit records, each with a num-ber of features (i.e., elds).

3.1 Classication

Intrusion detection can be thought of as a classicationproblem: we wish to classify each audit record into one of a discrete set of possible categories, normal or a particularkind of intrusion.

Given a set of records, where one of the features is theclass label (i.e., the concept), classication algorithms cancompute a model that uses the most discriminating featurevalues to describe each concept. For example, consider thetelnet connection records shown in Table 1. Here, hot is thecount of access to system directories, creation and executionof programs, etc, compromised is the count of le/path

not found errors, and Jump to instructions, etc. RIP-PER [3], a classication rule learning program, generatesrules for classifying the telnet connections are displayed inTable 2.

Here we see that RIPPER indeed selects the unique fea-ture values in identifying the intrusions. These rules can berst inspected and edited by security experts, and then beincorporated into misuse detection systems.

The accuracy of a classication model depends directlyon the set of features provided in the training data. For ex-ample, if the features hot , compromised and root shellwere removed from the records in Table 1, RIPPER wouldnot be able to produce accurate rules to identify bufferoverow connections. In [11] we showed that due to the

temporal nature of network events, especially certain in-trusions such as probing (e.g., port-scan, ping-sweep, etc.)and denial-of-service (e.g., ping-of-death, teardrop, etc.),adding per-host and per-service temporal statistics resultedin signicant improvement in the accuracy of the classica-tion models. Thus, selecting the right set of system featuresis a critical step when formulating the classication tasks.Our strategy is to rst mine the frequent sequential patternsfrom the network audit data, and then use these patterns asguidelines to select and construct temporal statistical fea-tures. Section 3.3 discusses this process in greater detail.

3.1.1 Meta-Classication

There are several important reasons for combining multi-ple detection models. First, in order to avoid becoming aperformance bottleneck and an easy target of subversion,an IDS should consist of multiple cooperative lightweightsubsystems that each monitor a separate part (e.g., accesspoint) of the entire network environment. For example, anIDS that inspects the full data contents of each IP packet andkeeps track of all opened connections may run out of mem-ory (i.e., buffers) during a DOS attack (e.g., SYN ood) andcease to function. On the other hand, a more lightweight

IDS that only inspects the header of each IP packet can de-tect only those intrusions that are targeted to the network protocols, and not those that try to exploit the hosts (e.g.,guess password, buffer overow, etc.). A solution is tohave one relatively lightweight system on the gateway thatchecks only the packet headers, and several host-based sys-tems that monitor the activities on the mission-critical hosts.A global detection system can then combine the evidencefrom these subsystems and take appropriate actions.

We believe that the best way to make intrusion detectionmodels adaptive is by combining existing models with newmodels that are trained on new intrusion data or new nor-mal data (e.g., audit data from a new network component),instead of restarting the entire model building process usingthe aggregate of the archived audit data and new data.

3


4/13

label service ag hot failed logins compromised root shell su duration ...normal telnet SF 0 0 0 0 0 10.2 ...normal telnet SF 0 0 0 3 1 2.1 ...guess telnet SF 0 6 0 0 0 26.2 ...

normal telnet SF 0 0 0 0 0 126.2 ...overow telnet SF 3 0 2 1 0 92.5 ...normal telnet SF 0 0 0 0 0 2.1 ...guess telnet SF 0 5 0 0 0 13.9 ...

overow telnet SF 3 0 2 1 0 92.5 ...normal telnet SF 0 0 0 0 0 1248 ...

... ... ... ... ... ... ... ... ... ...

Table 1. Telnet Records

RIPPER rule Meaningguess :- failed logins > = 5. If number of failed logins is greater than 5, then this telnet con-

nection is guess, a guessing password attack.overow :- hot = 3, compromised = 2,root shell = 1.

If the number of hot indicators is 3, the number of compromisedconditions is 2, and a root shell is obtained, then this telnet con-nection is a buffer overow attack.

. . . . . .normal :- true. If none of the above, then this connection is normal.

Table 2. Example RIPPER Rules from Telnet Records

We use Meta-learning [2] as a mechanism to induc-

tively learn the correlation of predictions by a number of base detection models (i.e., classiers). The resulting meta-classier thus combines the detection power of all the basedetection models. This approach has been extensively stud-ied [21] and empirically evaluated in a related domain of fraud detection and has been shown to be an effective andscalable approach.

3.2 Association Rules

There is empirical evidence that program executions anduser activities exhibit frequent correlations among systemfeatures. For example, certain privileged programs only ac-cess certain system les in specic directories [8], program-mers edit and compile C les frequently, etc. These consis-tent behavior patterns should be included in normal usageproles.

The goal of mining association rules is to derive multi-feature (attribute) correlations from a database table. Givena set of records, where each record is a set of items,support (X ) is dened as the percentage of records thatcontain item set X . An association rule is an expressionX Y, [c, s ] [1]. Here X and Y are item sets, andX Y = , s = support (X Y ) is the support of therule, and c = support (X

Y )support (X ) is the condence.

Table 3 shows the shell input commands during one tel-net session by a secretary. Here we keep only the le-

time hostname command arg1 arg2

am pascal mkdir dir1am pascal cd dir1am pascal vi texam pascal tex viam pascal mail freddam pascal subject progressam pascal vi texam pascal vi texam pascal mail williamf am pascal subject progress

... ... ... ...am pascal vi texam pascal latex texam pascal dvips dvi -o... ... ... ... ...am pascal logout

Table 3. Shell Command Records

name extensions, removethe (input) contents of mail bodiesand les, and use am to represent all the morning times-tamps. Not all associations are relevant in analyzing pro-gram or user behavior (for example, hostname = pascal arg 1 = tex ).

We utilized the schema level information about the au-dit records to direct the pattern mining process. Observethat certain features are essential in describing the data,

4


5/13

while others provide only auxiliary information. For ex-ample, in shell command data, since the combination of theexact time and command uniquely identies each record,command is an essential feature; likewise, in network con-nection data, the combination of the features timestamp,source and destination hosts, source port, and service (des-tination port) uniquely identies a connection record. Thus,they can be the essential features. We argue that the rel-evant association rules should describe patterns related tothe essential features.

We call these essential features(s) axis features whenthey are used as a form of item constraint. We restrict theassociation rules to only output rules that include axis at-tribute values. In practice, we need not designate all essen-

tial features as the axis features. For example, some net-work analysis tasks require statistics about various network services while others may require the patterns related to thehosts. We can use service as the axis feature to computethe association rules that describe the patterns related to theservices of the connections. In the case of shell commandrecords, we use command as the axis feature.

Table 4 shows some example association rules from theshell command data in Table 3. Each of these associationrules conveys information about the users behavior. Therules mined from each telnet/login session (of the sameuser) can be merged into an aggregate rule set to form theusers normal prole. Section 5.3 details our experimentsof using association rules for anomaly detection.

3.3 Frequent Episodes

There is often the need to study the frequent sequentialpatterns of network events in order to understand the natureof many attacks. We use frequent episodes to represent thesequential audit record patterns.

Given a set of timestamped event records, where eachrecord is a set of items, an interval [t 1 , t 2] is the sequenceof event records that starts from timestamp t 1 and ends att2 . The width of the interval is dened as t 2 t 1 . Let X bea set of items, an interval is a minimal occurrence of X if it contains X and none of its proper sub-intervals containsX . Dene support (X ) as the ratio between the number of minimum occurrences that contain X and the total numberof event records. A frequent episode rule is the expressionX, Y Z, [c,s,w ] [15]. Here X , Y and Z are item sets,and they together forman episode. s = support (X Y Z )is the support of the rule, and c = support (X Y Z )support (X Y ) is thecondence. The width of each of the occurrences must beless than w .

We introduced several extentions to the original frequentepisodes algorithm. Our extended algorithm computes fre-quent sequential patterns in two phases. First, it nds thefrequent associations using the axis features(s) as previ-

ously described. Then it generates the frequent serial pat-terns from these associations. Thus, our approach combinesthe associations among features and the sequential patternsamong the records into a single rule.

Another interesting schema-level fact about auditrecords is that some essential features can be the referencesof other features. These reference features normally carryinformation about some subject, and other features de-scribe the actions that refer to the same subject. Forexample, if we want to study the sequential patterns of con-nections to the same destination host, then dst host is thesubject and service is the action. In this case, we candesignate dst host as the reference feature. When formingan episode, our program tests the condition that, within the

episodes minimal occurrences, the event records coveredby its constituent item sets have the same reference featurevalue.

4 Feature Construction

We use the mined frequent episodes from network con-nection records as guidelines to construct temporal statisti-cal features for building classication models.

Raw tcpdump output are rst summarized into network connection records using pre-processing programs, whereeach record has a set of intrinsic features. For example,

the duration , service , src host and dst host (source anddestination hosts), src port (source port), src bytes anddst bytes (number of data bytes), a flag indicating normalor error status according to the protocols, etc., are intrinsicfeatures of a single connection. Table 5 shows examples of connection records. Note that these intrinsic features arefor general network analysis purposes, and not specic tointrusion detection.

We apply the frequent episodes program to both nor-mal connection data and intrusion data, and compare theresulting patterns to nd the intrusion only patterns. Thedetails of the pattern comparison algorithm is described in

[12]. Briey, since the number of patterns may be very largeand there are rarely exactly matched patterns from two datasets, this heuristic algorithm considers two episodes relatedto two different sets of axis features as more different, andoutputs (the user-specied) top percentage of the most in-trusion only patterns.

As an example, consider the syn ood attack recordsshown in Table 5. Here the attacker used many spoofedsource addresses to send a lot of S0 connections (i.e., onlythe rst SYN packet is sent) to a port (e.g., http ) of the vic-tim host in a very short time span (e.g., all in timestamp1.1). Table 6 shows the top intrusion only pattern, pro-duced using service as the axis feature and dst host as thereference feature.

Each of these intrusion patterns is used as a guideline

5


6/13

Association rule Meaningcommand = vi time = am ,hostname = pascal , arg 1 = tex ,[1.0, 0.28]

When using vi to edit a le, the user is always (i.e. 100% of thetime) editing a tex le, in the morning, and at host pascal ; and28% of the command data has this pattern.

command = subject time = am ,hostname = pascal , arg 1 = progress ,[1.0, 0.11]

The subject of the users email is always ((i.e. 100% of the time)about progress, in the morning, and at host pascal ; and 11%of the command data has this pattern.

Table 4. Example Association Rules from Shell Command Data

timestamp duration service src host dst host src bytes dst bytes ag ...1.1 0 http spoofed 1 victim 0 0 S0 ...1.1 0 http spoofed 2 victim 0 0 S0 ...1.1 0 http spoofed 3 victim 0 0 S0 ...

1.1 0 http spoofed 4 victim 0 0 S0 ...1.1 0 http spoofed 5 victim 0 0 S0 ...1.1 0 http spoofed 6 victim 0 0 S0 ...1.1 0 http spoofed 7 victim 0 0 S0 ...... ... ... ... ... ... ... ... ...

10.1 2 ftp A B 200 300 SF ...12.3 1 smtp B D 250 300 SF ...13.4 60 telnet A D 200 12100 SF ...13.7 1 smtp B C 200 300 SF ...15.2 1 http D A 200 0 REJ ...

... ... ... ... ... ... ... ... ...

Table 5. Network Connection Records

for adding additional features into the connection recordsto build better classication models. We use the follow-ing automatic procedure for parsing a frequent episode andconstructing features:

Assume F 0 (e.g., dst host ) is used as the referencefeature, and the width of the episode is w seconds.

Add the following features that examine only the con-nections in the past w seconds that share the samevalue in F 0 as the current feature:

A count of these connections;

Let F 1 be service , src dst or dst host otherthan F 0 . If the same F 1 value (e.g., http) isin all the item sets of the episode, add a percent-age of connections that share the same F 1 valueas the current connection; otherwise, add a per-centage of different values of F 1 .

Let V 2 be a value (e.g., S0) of a feature F 2 (e.g.,flag ) other than F 0 and F 1 . If V 2 is in all theitem sets of the episode, add a percentage of con-nections that have the same V 2 ; otherwise, if F 2is a numerical feature, add an average of the F 2values.

This syn ood pattern results in the following addi-tional features: a count of connections to the same dst hostin the past 2 seconds, and among these connections, a per-centage of those that have the same service , and percentageof those that have the S0 flag .

An open problem here is how to decide the right timewindow value w . We mine sequential patterns using differ-ent w values, for example, from 0.1 to 20 with an incrementof 1, and plot the number of patterns generated at each run.Our experience show that this plot tends to stabilize afterthe initial sharp jump. We call the smallest w in the stableregion w0 . In [11], we reported experiments of using dif-ferent w values to calculate temporal statistical features forclassication models. Our results showed the plot of accu-racy of the classier also stabilizes after w w0 and tend totaper off. Intuitively, a requirement for a good window sizeis that its set of sequential patterns is stable, that is, suf-cient patterns are captured and noise is small. We thereforeuse w0 for adding temporal statistical features.

In order to construct the appropriate features to detectan intrusion, we need to select the right axis and referencefeatures to generate the most distinguishing and useful in-trusion patterns. This could be a bottleneck in building ef-fective models since it requires empirical evaluation. Wehave implemented a feature construction system, described

6


7/13

Frequent episode Meaning(service = http , flag = S 0, dst host =victim ), (service = http , flag = S 0,dst host = victim ) (service =http , flag = S 0, dst host = victim )[0.93, 0.03, 2]

93% of the time, after two http connections with S 0 ag aremade to host victim , within 2 seconds from the rst of these two,the third similar connection is made, and this pattern occurs in3% of the data

Table 6. Example Intrusion Pattern

in [12], to support the iterative procedure of pattern min-ing and comparison, feature construction from patterns, andmodel building and evaluation. In each iteration, we choosea differentdata mining strategy regardingthe choices of axisand reference features. These choices are limited amongthe essential features (see Section 3.2), that is, service ,dst host , src dst , or src port . Since intrusions are gen-erally targeted to some victim host(s) in the network, thesystem starts with service and dst host .

5 Experiments

We participated in the DARPA Intrusion Detection Eval-uation Program, prepared and managed by MIT LincolnLabs. The objective of this study was to survey and eval-uate the state of the art in research in intrusion detection.

A standard set of extensively gathered audit data, which in-cludes a wide variety of intrusions simulated in a militarynetwork environment, is provided by DARPA. Each partici-pating site was required to build intrusion detection modelsor tweak their existing system parameters using the trainingdata, and send the results (i.e., detected intrusions) on thetest data back to DARPA for performance evaluation. Wereport our experience here.

5.1 The DARPA data

We were provided with about 4 gigabytes of compressedtcpdump data of 7 weeks of network trafc. This data canbe processed into about 5 million of connection recordsof about 100 bytes each. The data contains content (i.e.,the data portion) of every packet transmitted between hostsinside and outside a simulated military base. BSM auditdata from one UNIX Solaris host for some network sessionswere also provided.

Four main categories of attacks were simulated, they are:

DOS, denial-of-service, for example, ping-of-death,teardrop, smurf, syn ood, etc.,

R2L, unauthorized access from a remote machine, forexample, guessing password,

U2R, unauthorized access to local superuser privileges

by a local unprivileged user, for example, various of buffer overow attacks,

PROBING, surveillance and probing, for example,port-scan, ping-sweep, etc.

In addition, there were anomalous user behavior such asa manager becomes (i.e., behaves like) a system adminis-trator.

5.1.1 Data Pre-processing

We used Bro as the packet ltering and reassembling en-gine. We extended Bro to handle ICMP packets, and madechanges to its packet fragment inspection modules sinceit crashed when processing data that contains teardrop orping-of-death attacks.

We implemented a Bro connection nished event han-

dler to output a summarized record for each connection.Each connection record includes these intrinsic features:its time , duration , service , src host , dst h ost , src port ,wrong fragment (fragmentation error, e.g., fragment sizeis not multiple of 8 bytes, or the offsets are overlapped, etc.),flag (how the connection is established and terminated).

We used Bro event handlers to examine the telnet ses-sions, and extract the shell commands of the users. We fur-ther pre-processed the shell commands by replacing times-tamps with am, pm, and nt (for night), and eliminated theinput (i.e., contents) of edit and sendmail commands, andkept only the lename extensions. Table 3 shows exam-

ples of the processed command data. These shell commandrecords were used for user anomaly detection, to be dis-cussed in Section 5.3.

5.2 Misuse Detection

The training data from DARPA includes list les thatidentify the timestamp, source host and port, destinationhost and port, and the name of each attack. We used thisinformation to select intrusion data to perform pattern min-ing and feature construction, and to label each connectionrecord with normal or an attack type to create trainingdata for building classication models.

Since the amount of audit data is huge, for example,some days have several millions of connection records due

7


8/13

to the nasty DOS attacks, we did not aggregate all the con-nection records into a single training data set. Instead, weextracted all the connection records that fall within a sur-rounding time window of plus and minus 5 minutes of thewhole duration of each attack to create a data set for each at-tack type. We also randomly extracted sequences of normalconnections records to create the normal data set.

5.2.1 Manual and Automatic Feature Construction

Following the feature construction approach described inSection 4, for each attack type, e.g., syn ood, port-scan,etc., we performed pattern mining and comparison usingits intrusion data set and the normal data set. We con-

structed appropriate features according to the intrusion onlypatterns. Here we summarize the temporal and statisticalfeatures automatically constructed by our system:

The same host feature that examine only the connec-tions in the past 2 seconds that have the same destina-tion host as the current connection:

the count of such connections, the percentage of connections that have the same service as the cur-rent one, the percentage of different services, thepercentage of the S0 ag, and the percentage of the REJ (i.e., rejected connection) ag;

The same service features that examine only the con-nections in the past 2 seconds that have the same ser-vice as the current connection:

the count of such connections, the percentage of different destination hosts, the percentage of theS0 ag, and the percentage of the REJ ag.

We call these the (time-based) trafc features for con-nection records. There are several slow PROBING at-tacks that scan the hosts (or ports) using a much larger timeinterval than 2 seconds, for example, one in every minute.

As a result, these attacks did not produce intrusion only pat-terns with the time window of 2 seconds. We sorted theseconnection records by the destination hosts, and applied thesame pattern mining and feature construction process. Instead of using a time window of 2 seconds, we now used aconnection window of 100 connections, and constructeda mirror set of host-based trafc features as the (time-based) trafc features.

We discovered that unlike most of the DOS and PROB-ING attacks, the R2L and U2R attacks dont have any in-trusion only frequent patterns. This is because most of theDOS and PROBING attacks involve sending a lot of con-nections to some host(s) in a very short period of time, andtherefore can have frequent sequential patterns that are dif-ferent from the normal trafc. The R2L and U2R attacks

are embedded in the data portions of the packets and nor-mally involve only a single connection. Therefore, it is un-likely that they can have anyuniquefrequent trafcpatterns.In other words, our automatic feature construction processfailed to produce any features for these attacks.

After studying the outcome of this mining process, wefocussed our attention to the content of the connections. Inthe Bro event handlers, we added functions that inspect dataexchanges of interactive TCP connections (e.g., telnet, ftp,smtp, etc.). These functions assign values to a set of con-tent features to indicate whether the data contents suggestsuspicious behavior. These features are: number of failedlogins, successfully logged in or not, whether logged in asroot, whether a root shell is obtained, whether a su com-

mand is attempted and succeeded, number of access to ac-cess control les (e.g., /etc/passwd, .rhosts), number of compromised states on the destination host (e.g., le/pathnot found errors, and Jump to instructions, etc.), num-ber of hot indicators, (e.g., access to system directories, cre-ation, and execution of programs, etc.), and number of out-bound connections during a f tp session. Our approach hereis to include an extensive set of indicators, and then let clas-sication programs decide which minimal set of discrimi-nating features should be used to identify intrusions.

5.2.2 Detection Models

It is evident that different categories of intrusions requiredifferent sets of constructed features in order to be detected.We therefore built classication models using different fea-ture sets:

The trafc model: each connection record containsthe intrinsic and the trafc features. Table 7 showssome example labeled connection records. The resul-tant RIPPER classier detects the DOS and PROBINGattacks. Table 8 shows some example RIPPER rules.

The host-based trafc model: each connectionrecord contains the intrinsic and the host-based traf-c features. The resultant RIPPER classiers detectthe slow PROBING attacks.

The content model: each connection record containsthe intrinsic and the content features. Table 1shows some example labeled connection records. Theresultant RIPPER classier detects the R2L and U2Rattacks. Table 2 shows some example RIPPER rules.

These classication models each specialize to a certaintype of intrusion. We then constructed a meta-level clas-sier to combine these detection models. Each meta-leveltraining record consists of four features, the three predic-tions each from one of the base models, plus the true classlabel (i.e., normal and an attack type). RIPPER was then

8


9/13

label service ag host count srv count host REJ % host diff srv % duration ...normal ecr i SF 1 1 0 1 0 ...smurf ecr i SF 350 350 0 0 0 ...satan user-level REJ 231 1 85% 89% 0 ...

normal http SF 1 0 0 1 3 ...... ... ... ... ... ... ... ... ...

Table 7. Example Trafc Connection Records

RIPPER rule Meaningsmurf :- service=ecr i, host count 5,host srv count 5.

If the service is icmp echo request, and for the past 2 seconds,the number of connections that have the same destination host asthe current one is at least 5, and the number of connections thathave the same service as the current one is at least 5, then this is

a smurf attack (a DOS attack).satan :- host REJ % 83%, host diff srv % 87%.

If for the connections in the past 2 seconds that have same thedestination host as the current connection, the percentage of rejected connections are at least 83%, and the percentage of differ-ent services is at least 87%, then this is a satan attack (a PROB-ING attack).

Table 8. Example RIPPER Rules for DOS and PROBING attacks

applied to learn the rules that combine the evidence fromthe trafc, host-based trafc, and content classiersto make a (nal) prediction on a connection. The result-

ing meta-level rules basically use the predictions from thecontent model to detect R2L and U2R attacks, and thecombination of trafc and host-based trafc models todetect the DOS and (fast and slow) PROBING attacks. Thatis, the meta-classier predicts a connection as an attack of R2L or U2R whenever the content model does so; and anattack of DOS or PROBING whenever the trafc modeldoes so, or whenever the trafc model predicts normalbut the host-based model predicts a PROBING attack.

Model # of features # of rules # of featuresin records in rules

content 22 55 11trafc 20 26 4+ 9host trafc 14 8 1+ 5

Table 9. Model Complexities

Table 9 summarizes the complexity of the base modelsin terms of the number of features in a connection record,the number of RIPPER rules produced, and the number of distinct features actually used in the rules. The numbers inbold, for example, 9, indicate the number of automaticallyconstructed temporal and statistical features being used inthe RIPPER rules. We see that for both the trafc andhost-based trafc models, our feature construction pro-cess contribute the majority of the features. We should

point out that not all features in the connection records wereselected by RIPPER. This is because RIPPER, like mostclassication algorithms, has a built-in feature selection

process to select the most discriminating and generalizablefeatures according to their statistical signicance and per-formance (e.g., in a hold-out test dataset that simulates theunseen/future data). Because of the large amount of auditdata, a human expert is not able to manually gather and testvarious statistics, and thus tend to do a poor job in select-ing the features. As a result, hand crafted signature rulestend to be very specic to a small intrusion data set. Al-ternative classication algorithms that compute underlyingprobability distributions may indeed require all features beevaluated in their resultant models. A crucial issue here isthe tradeoff between model accuracy and model cost. TheRIPPER output indicates that some features are irrelevantand hence we need not compute these at run-time, thus re-ducing the cost of detection. This is the subject matter of our ongoing research.

5.2.3 Results

We report the performance of our detection models as eval-uated by MIT Lincoln Labs. We trained our intrusion detec-tion models, i.e., the base models and the meta-level clas-sier, using the 7 weeks of labeled data, and used them tomake predictions on the 2 weeks of unlabeled test data (i.e.,we were not told which connection is an attack). The testdata contains a total of 38 attack types, with 14 types in testdata only (i.e., our models were not trained with instances

9


10/13

0

10

20

30

40

50

60

70

0 0.05 0.1 0.15 0.2

D e

t e c

t i o n

R a

t e

False Alarm Rate

ColumbiaGroup1Group2Group3

(a) DOS

0

10

20

30

40

50

60

70

80

90

100

0 0.05 0.1 0.15 0.2

D e

t e c

t i o n

R a

t e

False Alarm Rate

ColumbiaGroup1Group2Group3

(b) PROBING

0

10

20

30

40

50

60

70

80

0 0.05 0.1 0.15 0.2

D e

t e c

t i o n

R a

t e

False Alarm Rate

Columbia U2RGroup3 U2RGroup3 R2LGroup1 R2L

Columbia R2L

(c) U2R and R2L

0

10

20

30

40

50

60

70

0 0.05 0.1 0.15 0.2

D e

t e c

t i o n

R a

t e

False Alarm Rate

ColumbiaGroup1Group3

(d) Overall

Figure 1. ROC Curves on Detection Rates and False Alarm Rates

of these attack types).

Figure 1 shows the ROC curves of the detection modelsby attack categories as well as on all intrusions. In each of

these ROC plots, the x-axis is the false alarm rate, calcu-lated as the percentage of normal connections classied asan intrusion; the y-axis is the detection rate, calculated asthe percentage of intrusions detected. A data point in theupper left corner corresponds to optimal performance, i.e.,high detection rate with low false alarm rate. We comparehere our models with other participants (denoted as Group1 through 3) in the DARPA evaluation program. (Thetested systems produced binary output, hence, the ROCsare not continuous. These plots are duplicated from thepresentation slides of a report given by Lincoln Labs in aDARPA PI meeting. The slides can be viewed on line viahttp://www.cs.columbia.edu/ sal/JAM/PROJECT/MIT/mit-index.html). These participating groups used knowledgeengineering approaches to build their intrusion detection

models. We can see from the gure that our detectionmodel has the best overall performance, and in all but oneattack category, our model is one of the best two. However,it is also clear that all models performed very poorly on

R2L attacks. For all intrusions, an overall detection rateof below 70% is hardly satisfactory in a mission criticalenvironment.

Category Old NewDOS 79.9 24.3PROBING 97.0 96.7U2R 75.0 81.8R2L 60.0 5.9Overall 80.2 37.7

Table 10. Comparing Detection Rates (in %)on Old and New Attacks

10


11/13

Although our models were intended for misuse detec-tion, we had hoped that the features we constructed wouldbe general enough so that the models can detect new vari-ations of the known intrusions. Table 10 compares the de-tection rates of old intrusions and new intrusions. Here newintrusions refer to those that did not have corresponding in-stances in the training data. We see that our models wereable to detect a large percentage of new PROBING and U2Rattacks, but not as effective for new DOS and R2L attacks.

5.2.4 Discussion

PROBING attacks have relatively limited variance becausethey all involve making connections to a large number of

hosts or ports in a given time frame. Likewise, the outcomeof all U2R attacks is that a root shell is obtained withoutlegitimate means, e.g., login as root, su to root, etc. Thus,for these two categories of attacks, given some representa-tive instances in the training data, our data mining systemwas able to construct features that capture their general be-havior patterns. As a result, our detection models can detecta high percentage of old and new PROBING and U2R at-tacks. On the other hand, DOS and R2L have a wide varietyof behavior because they exploit the weaknesses of a largenumber of different network or system services. The fea-tures constructed based on the available attack instances are

very specialized to the known attack types. Our detectionmodels therefore missed a large number of new DOS andR2L attacks.

The results here are not entirely surprising since ourmodels are misuse detection models. We need to useanomaly detection models on network trafc or system pro-grams to guard against the new and diversied attacks.Anomaly detection is much more challenging than misusedetection. For example, we need to rst decide whetherwe should build normal prole for each network service ora group of services, and for each host or a groups of hosts.The featureconstructionprocess will likely to be more com-plex since unlike a relatively small number of intrusiononly patterns, normal network trafc can have a large num-ber of variations. Network anomaly detection is an impor-tant problem and an active area of research.

5.3 User Anomaly Detection

Insiders misusing their privileges can be hard to detectsince they dont normally need to break-in, and IDSs andsecurity personnel tend to focus on guarding outside attacks.Insider problems are some of the most vexing problems forsecurity personnel. (Indeed, who checks the checkers, i.e.,the person to whom the IDS reports?)

It is often very difcult to classify a single event by a useras normal or abnormal because the unpredictable nature of

most people. A users actions during a login session needsto be studied as a whole to determine whether he or she isbehaving normally. Our initial exploratory approach is tomine the frequent patterns from user command data, andmerge or add the patterns into an aggregate set to form thenormal usage prole of a user. A new pattern can be mergedwith an old pattern if they have the same left-hand-sidesand right-hand-sides, their support values are within a 5%of each other, and their condence values are also within5% of each other.

To analyze a user login session, we mine the frequentpatterns from the sequence of commands during this ses-sion. This new pattern set is compared with the prole pat-tern set and a similarity score is assigned. Assume that the

new set has n patterns and among them, there are m pat-ternsthat have matches (i.e., rules that they can be mergedwith) in the prole pattern set, then the similarity score issimply mn . Obviously, a higher similarity score means ahigher likelihood that the users behavior agrees with his orher historical prole.

User Normal Activitiessysadm logs in as root, cats the pass-

word le, and runs commandssuch as top.

programmer1 writes public domain C code,use a vi editor, compiles the Ccode, reads and sends mail, andexecutes unix commands.

programmer2 a similar user prole, but worksin afternoons and evenings.

secretary edits latex les, runs latex, readsmail, and sends mail.

manager1 reads and sends mailmanager2 reads mail.

Table 11. User Descriptions

The DARPA data also includes user anomaly data to

evaluate anomaly detection systems. Table 11 describesthe consistent behavior of the 6 users for anomaly analy-sis. Note that since we were the only group that performedanomaly detection on the test data, Lincoln Labs did notevaluate our results. We report our experimentson the train-ing data here.

We apply our frequent episode algorithms to the com-mand data from each login session (of the same user), withcommand as the axis feature and w = 5 (i.e., we look forpatterns within the range of 5 consecutive commands), tomine the frequent sequential patterns on the associationsamong user commands, their arguments, time segments,and hosts. We treat the rst 4 weeks as a data gathering pe-riod, during which we simply merge the patterns into eachusers proles. Each user has 3 proles, one for the ac-

11


12/13


13/13

8 Acknowledgments

We wish to thank our colleagues at Columbia University,Chris Park, Wei Fan, and Andreas Prodromidis, for theirhelp and encouragement.

References

[1] R. Agrawal, T. Imielinski, and A. Swami. Mining associa-tion rules between sets of items in large databases. In Pro-ceedings of the ACM SIGMOD Conference on Management of Data , pages 207216, 1993.

[2] P. K. Chan and S. J. Stolfo. Toward parallel and distributedlearning by meta-learning. In AAAI Workshop in Knowledge Discovery in Databases , pages 227240, 1993.

[3] W. W. Cohen. Fast effective rule induction. In Machine Learning: the 12th International Conference , Lake Taho,CA, 1995. Morgan Kaufmann.

[4] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. The KDDprocess of extracting useful knowledge from volumes of data. Communications of the ACM , 39(11):2734, Novem-ber 1996.

[5] K. Ilgun, R. A. Kemmerer, and P. A. Porras. State tran-sition analysis: A rule-based intrusion detection approach. IEEE Transactions on Software Engineering , 21(3):181199, March 1995.

[6] N. F. R. Inc. Network ight recorder. http://www.nfr.com,1997.

[7] V. Jacobson, C. Leres, and S. McCanne. tcpdump. availablevia anonymous ftp to ftp.ee.lbl.gov, June 1989.

[8] C. Ko, G. Fink, and K. Levitt. Automated detection of vul-nerabilities in privileged programs by execution monitoring.In Proceedings of the 10th Annual Computer Security Appli-cations Conference , pages 134144, December 1994.

[9] S. Kumar and E. H. Spafford. A software architecture tosupport misuse intrusion detection. In Proceedings of the18th National Information Security Conference , pages 194204, 1995.

[10] T. Lane and C. E. Brodley. Sequence matching and learningin anomaly detection for computer security. In AAAI Work-

shop: AI Approaches to Fraud Detection and Risk Manage-ment , pages 4349. AAAI Press, July 1997.[11] W. Lee and S. J. Stolfo. Data mining approaches for intru-

sion detection. In Proceedings of the 7th USENIX SecuritySymposium , San Antonio, TX, January 1998.

[12] W. Lee, S. J. Stolfo, and K. W. Mok. Mining in a data-owenvironment: Experience in intrusion detection. submittedfor publication, March 1999.

[13] T. Lunt. Detecting intruders in computer systems. In Pro-ceedings of the 1993 Conference on Auditing and Computer Technology , 1993.

[14] T. Lunt, A. Tamaru, F. Gilham, R. Jagannathan, P. Neumann,H. Javitz, A. Valdes, and T. Garvey. A real-time intrusion de-tection expert system (IDES) - nal technical report. Techni-cal report, Computer Science Laboratory, SRI International,Menlo Park, California, February 1992.

[15] H. Mannila and H. Toivonen. Discovering generalizedepisodes using minimal occurrences. In Proceedings of the 2nd International Conference on Knowledge Discoveryin Databases and Data Mining , Portland, Oregon, August1996.

[16] H. Mannila, H. Toivonen, and A. I. Verkamo. Discov-ering frequent episodes in sequences. In Proceedings of the 1st International Conference on Knowledge Discoveryin Databases and Data Mining , Montreal, Canada, August1995.

[17] B. Mukherjee, L. T. Heberlein, and K. N. Levitt. Network intrusion detection. IEEE Network , May/June 1994.

[18] V. Paxon. Bro: A system for detecting network intrudersin real-time. In Proceedings of the 7th USENIX SecuritySymposium , San Antonio, TX, 1998.

[19] P. A. Porras and P. G. Neumann. Emerald: Event monitoringenabling responses to anomalous live disturbances. In Na-tional Information Systems Security Conference , BaltimoreMD, October 1997.

[20] S. Stainford-Chen. Common intrusion detection framework.http://seclab.cs.ucdavis.edu/cidf.

[21] S. J. Stolfo, A. L. Prodromidis, S. Tselepis, W. Lee, D. W.Fan, and P. K. Chan. JAM: Java agents for meta-learningover distributed databases. In Proceedings of the 3rd In-ternational Conference on Knowledge Discovery and Data Mining , pages 7481, Newport Beach, CA, August 1997.AAAI Press.

[22] SunSoft, Mountain View, CA. SunSHIELD Basic Security Module Guide .

13

Documents

A Data Mining Framework for Building Intrusion Detection Models