Statistical intrusion detection in modern networksconf.nsc.ru/files/conferences/opcs2017/415961/LectionPagano.pdf · Intrusion Detection Expert System (Denning, An intrusion Detection

Statistical intrusion detection in modernnetworks

Michele Pagano

Department of Information EngineeringUniversity of Pisa

XIII International Asian School-seminarProblems of complex systems’ optimization

September 18-22, 2017Akademgorodok

Outline

1 Theoretical background

2 How to evaluate an IDS

3 Anomaly Detection: Overview

4 Anomaly Detection: Examples

M. Pagano Statistical intrusion detection in modern networks 2 / 50

Theoretical background

Outline

1 Theoretical backgroundOverviewTaxonomy of the Intrusion Detection SystemsExamples of the Intrusion Detection Systems





Theoretical background Overview

Why an intrusion detection system?

Network security mainly means PREVENTIONPhysical protection for hardwarePasswords, access tokens, etc. for authenticationAccess control list for authorizationCryptography for secrecyBackups and redundancy for authenticity. . . and so on

BUT . . .. . . Absolute security cannot be guaranteed!



What is an Intrusion Detection System?

Prevention is suitable whenInternal users are trustedLimited interaction with other networks

Need for a system which acts when prevention fails

Intrusion Detection SystemAn intrusion detection system or IDS is a software/hardwaretool used to detect unauthorized access to a computer systemor a network



A bit of History

The history of IDSs can be split in three main blocks1 First Generation IDSs (end of the 1970s)

The concept of IDS first appears in the 1970s and early1980s (Anderson, Computer Security Monitoring andSurveillance, Tech Rep 1980)Focus on audit data of a single machinePost processing of data

2 Second Generation IDSs (1987)Intrusion Detection Expert System (Denning, An intrusionDetection Model, IEEE Trans. on Soft. Eng., 1987)Statistical analysis of data

3 Third Generation IDSs (to come)Focus on the networkReal-time detectionReal-time reactionIntrusion Prevention System



A taxonomy of the intruders

Intruders can be classified asMasquerader: an individual who is not authorized to usethe computer and who penetrates a system’s accesscontrol to exploit a legitimate user’s accountMisfeasor: a legitimate user who accesses data,programs, or resources for which such access is notauthorized, or who is authorized for such access, butmisuses his/her privilegesClandestine User: an individual who seizes supervisorycontrol of the system and uses the control to evadeauditing and access controls or to suppress audit collection

Anderson, Computer Security Monitoring and Surveillance, Tech Rep 1980



A taxonomy of the intrusions

Eavesdropping and Packet Sniffing: passive interception ofnetwork traffic

Snooping and Downloading

Tampering and Data Diddling: unauthorized changes to dataor records

Spoofing: impersonating other users

Jamming or Flooding: overwhelming a system’s resources

Injecting Malicious Code

Probing

Exploiting Design or Implementation Flaws: as bufferoverflow

Cracking Passwords and Keys

Denning, Cyberspace Attacks and Countermeasures, New York,1997


Theoretical background IDS Taxonomy

Host based vs. Network based

Host based IDSAimed at detecting attacks related to a specific hostArchitecture/Operating system dependentProcessing of high level information (e.g. system calls)Effective in detecting insider misuse

Network based IDSAimed at detecting attacks towards hosts connected to aLANArchitecture/Operating system independentProcessing data at lower level of granularity (packets)Effective in detecting attacks from the “outside”



Misuse based IDS vs. Anomaly based IDS

Misuse based (Signature based, Rule based) IDSIdentifies intrusion by looking for patterns of traffic or ofapplication data presumed to be maliciousPattern of misuses are stored in a databaseEffective in detecting only “known” attacksEncrypted traffic ???

Anomaly based IDSIdentifies intrusions by classifying activity as eitheranomalous or normalNeed a training phase to recognize normal activityAble to detect “new” attacksGenerates more false alarms than a misuse based IDS



Stateless IDS VS Stateful IDS

Stateless IDSTreats each event independently of othersSimple system designHigh processing speed

Stateful IDSMaintains information about past eventsThe effect of a certain event depends on its position in theevents streamMore complex system designMore effective in detecting distributed attacks



Centralized IDS VS Distributed IDS

Centralized IDSAll the operations are performed by the same machineMore simple to realizeOnly one point of failure

Distributed IDSComposed of several components

Sensors which generate security eventsConsole to monitor events and alerts and control the sensorsCentral Engine that records events and generate alarms

May need to deal with different data formatsNeed of a secure communication protocol (IPFIX)


Theoretical background IDS Examples

SNORT

SnortThe most famous IDSOpen source software toolNetwork basedSignature basedCentralized architecture

Spade, the anomaly detection plug-in for Snort... is notsupported any longer

The rules database, as well as the system code, is available fordownload at the web site http://www.snort.org



Attacks State of the Art

Howard Lipson, CMU Software Engineering Institute CERT R©

(Computer Emergency Response Team – by DARPA in 1988)



IDS State of the Art

Focus is on Network based IDSs (The only ones effectivein detecting Distributed Denial of Service - DDoS)State of the art IDSs are Misuse Based

Most attacks are realized by means of software toolsavailable on the InternetMost attacks are “well-known” attacks

BUT . . .. . . The most dangerous attacks are those written

ad hoc by the intruder!



The best choice?

Combined use of bothHIDS (for insider attacks) & NIDS (for outsider attacks)Misuse IDS (low False Alarm rate) & Anomaly IDS (for“new” attacks)Stateless IDS (fast data process) & Stateful IDS (for“complex” attacks)

Distributed IDSNot a single point of failureMore effective in monitoring large networks



The best choice?


Evaluation

Outline


2 How to evaluate an IDSReceiver Operating Characteristics (ROC) CurveEvaluation Datasets




Evaluation ROC

Basic definitions

A = alarm−A = not an alarmI = attack (intrusion)−I = not an attackFalse Positive (FP): the error of rejecting a null hypothesiswhen it is actually true. In our case it implies the creation ofan alarm in correspondence of normal activities

P(A| − I) = False positive (alarm) probability

False Negative (FN): the error of failing to reject a nullhypothesis when it is in fact not true. In our case itcorresponds to a missed detection

P(−A|I) = False negative probability


Evaluation ROC

ROC (Receiver Operating Characteristics) Curve

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm Rate

De

tectio

n R

ate

Non Stationary ECDF

Non Homogeneous MC

Homogeneous MC

P(A

|I)

P(A|−I)=P(A|I)

P(A|−I)


Evaluation ROC

Base Rate Fallacy

Let’s suppose:P(A|I) = 0.99P(−A| − I) = 0.99we have 2 attacks over 106 pkts (base rate = 1/500000)

Applying the Bayes theorem:

P(I|A) = P(I) · P(A|I)P(A|I) · P(I) + P(A| − I) · P(−I)

=

=1/500000 · 0.99

1/500000 · 0.99 + (1− 1/500000) · 0.01

Thus P(I|A) = 0.0002


Evaluation Evaluation Datasets

DARPA Evaluation Program

The 1998/1999 DARPA/MIT IDS evaluation program is themost comprehensive evaluation performed to dateIt provides a corpus of data for the development,improvement, and evaluation of IDSsDifferent kind of data are available:

Operating systems logsNetwork traffic

Collected by an “inside” snifferCollected by an “outside” sniffer

The data model the network traffic measured between aUS Air Force base and the Internet



The DARPA Network



The DARPA Dataset

5 weeks dataData from weeks 1 and 3 are attack free and can be usedto train the systemData from week 2 contains labeled attacks and can be usedto realize the signatures databaseData from weeks 4 and 5 contains several attacks and canbe used for the detection phase

An Attack Truth list is providedAttacks are categorized as

Denial of Service (DoS)User to Root (U2R)Remote to Local (R2L)DataProbe

177 instances of 59 different types of attacks



KDD99

Database of features extracted from the 1998 DARPADatasetUsed for The third International Knowledge Discovery andData Mining Tools CompetitionUseful for evaluating IDSsSome example features

@IP source and destFlagsNumber of bytes sent/received by a hostDuration of flowsProtocol. . . and so on

Can be downloaded athttp://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html



Other Datasets

The DARPA dataset has many drawbacks:simulated environmentnot up-to-date trafficthe methodology used for generating the traffic has beenshown to be inappropriate for simulating actual networks

Other Datasets:several publicly available traffic tracese.g. CAIDA, Abilene (Internet2), GEANT, . . .no ground truth is provided!



MAWI-Lab Traffic Traces

MAWI (Measurement and Analysis on the WIDE Internet)archive (sample-points B and F)Anomaly labelling is obtained combining the output of fouranomaly detectors

Hough transformGamma distribution modellingKullback-Leibler divergencePrincipal Component Analysis

MAWILab : Combining Diverse Anomaly Detectors forAutomated Anomaly Labeling and Performance Benchmarking

R. Fontugne, P. Borgnat, P. Abry, and K. Fukuda, ACMCoNEXT 2010



MAWI-Lab Traffic Traces

Traffic taxonomyanomalous: anomalous with high probabilitysuspicious: probably anomalous, but not clearly identifiedby the MAWI classification methodsnotice: non anomalous, but reported by at least oneanomaly detectorbenign: normal

Some information about the kind of anomalyattack: anomalies representing a well known attackspecial: anomalies involving well known portsunknown: unknown kinds of anomalies


Anomaly Detection: Overview

Outline



3 Anomaly Detection: OverviewStatistical modelsTraffic Descriptors



Anomaly Detection: Overview Statistical models

Profiles

An activity profile characterizes the behavior of a givensubject (or set of subjects) with respect to a given object,thereby serving as a signature or description of normalactivity for its respective subject and object.Observed behavior is characterized in terms of a statisticalmetric and modelA metric is a random variable x representing a quantitativemeasure accumulated over a periodObservations xi of x obtained from the audit records areused together with a statistical model to determine whethera new observation is abnormalThe statistical models make no assumptions about theunderlying distribution of x ; all knowledge about x isobtained from the observations xi


Anomaly Detection: Overview Statistical models

Metrics and Models

MetricsEvent counter

Interval timer

Resource measure

Statistical modelsOperational model : abnormality is decided by comparison of xn with a fixedthreshold

Mean and standard deviation model : abnormality is decided by checking if xnfalls inside the confidence interval

Multivariate model : based on the correlations between two or more metrics

Markov process model : based on the transition probabilities

Time series model : takes into account order and inter-arrival time of the

observations


Anomaly Detection: Overview Traffic Descriptors

Statistical Approach: Traffic Descriptors

To identify some traffic parameters, which can be used todescribe the network traffic and that vary significantly from thenormal behavior to the anomalous one

Some examplesPacket lengthInter-arrival timeFlow sizeNumber of packets per flow. . . and so on


Anomaly Detection: Overview Traffic Descriptors

Choice of the Traffic Descriptors

For each parameter we can considerMean ValueVariance and higher order momentsDistribution functionQuantiles. . . and so on

The number of potential traffic descriptors is huge (somepapers identify up to 200 descriptors)

GOALTo identify as few descriptors as possible to classify traffic with

an acceptable error rate


Anomaly Detection: Examples

Outline




4 Anomaly Detection: ExamplesUse of Markov ChainsPrincipal Component Analysis


Anomaly Detection: Examples Use of Markov Chains

State Transition Analysis

The approach was first proposed by Denning anddeveloped in the 1990s.Mainly used in two distinct environment

HIDS: to model the sequence of system commands usedby a userNIDS: to model the sequence of some specific fields of thepacket (e.g. the sequence of the flags values in a TCPconnection)

The most classical approach: Markov chains



Markov Chains and TCP

Idea: Model TCP connections by means of Markov chainsThe IP addresses and the TCP port numbers are used toidentify a connectionState space is defined by the possible values of the TCPflagsThe value of the flags is used to identify the chaintransitionsA value Sp is associated to each packet according to therule

Sp = syn + 2 · ack + 4 · psh + 8 · rst + 16 · urg + 32 · fin



Markov Chain and TCP - Training phase

Calculate the transitionprobabilities

aij = P[qt+1 = j |qt = i] =P[qt = i ,qt+1 = j]

P[qt = i]

Server side3-way handshakepsh flagclosing

SSH Markov Chain



Markov Chain and TCP - Training phase

Calculate the transitionprobabilities

aij = P[qt+1 = j |qt = i] =P[qt = i ,qt+1 = j]

P[qt = i]

Client side3-way handshakeack flagclosing

FTP Markov Chain



Markov Chain and TCP - Training phaseCalculate the transition probabilities

aij = P[qt+1 = j |qt = i] =

P[qt = i , qt+1 = j]P[qt = i]

SSH Markov Chain

3-WayHandshake

Syn FloodAttack



Markov Chain and TCP - Detection phase

Given the observation (SR+1,SR+2, · · · ,SR+T )

The system has to decide between two hypothesis

H0 : normal behaviourH1 : anomaly

A possible statistic is given by the logarithm of theLikelihood Function

LogLF (t) =T+R∑

t=R+1

Log(aSt St+1)

or by its temporal “derivative”

Dw (t) =∣∣∣∣LogLF (t)− 1

W

W∑i=1

LogLF (t − i)∣∣∣∣



Markov Chain and TCP - Detection phase



Non-Homogeneous Markov Chain

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm Rate

Det

ectio

n R

ate

Non Stationary ECDFNon Homogeneous MCHomogeneous MC



High Order Markov Chain



Homogeneous Markov chain — Number of packets

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Alarm Rate

Dete

ction R

ate

5 packets

10 packets

15 packets

20 packets


Anomaly Detection: Examples PCA

Principal Component Analysis

Principal Component Analysis (or discreteKarhunen–Loeve transform) is the most commonly usedtechniques to analyze high dimensional data structuresPCA is an orthogonal linear transformation that maps themeasured data onto a new set of axes

These axes are called Principal ComponentsEach principal component points in the direction ofmaximum variation (or energy) remaining in the data, giventhe energy already accounted for in the precedingcomponentsThe principal axes are ordered by the amount of energy inthe data they capture



Geometric illustration



Linear algebraic formulation

For a matrix X = {xi j}I,J (i.e., raws of X are points in RJ ), calculatingthe principal components is equivalent to solving the symmetriceigenvalue problem for the matrix X T X

Each principal component vi is the i-th eigenvector computed from thespectral decomposition of X T X :

X T X vi = λi vi i = 1, . . . J

where λi is the eigenvalue corresponding to vi

k -th principal component

v1 = argmax‖v‖=1

‖X v‖

vk = argmax‖v‖=1

∥∥∥∥∥(

X −k−1∑i=1

X vi vTi

)v

∥∥∥∥∥where || · || denotes the L2 norm



Subspace Method

Subspace MethodThis method is based on a separation of the high-dimensionalspace into disjoint subspaces corresponding to normal andanomalous behavior

This separation can be performed effectively by PCA,mapping the dataset onto the new axes

Normal subspace: SAnomalous Subspace: S

x ∈ RJ can be written as

x = x + x = P PT x +(

I − P PT)

x

where P ={

pi j}

J,R = (v1 v2 . . . vR)



Example of Scree Plot

5 10 15 20 25 300

5

10

15

20

25

30

35

40

PC

% c

aptu

red

varia

nce


Conclusions


Documents

Statistical intrusion detection in modern networksconf.nsc.ru/files/conferences/opcs2017/415961/LectionPagano.pdf · Intrusion Detection Expert System (Denning, An intrusion Detection