Upload
others
View
32
Download
0
Embed Size (px)
Citation preview
Statistical intrusion detection in modernnetworks
Michele Pagano
Department of Information EngineeringUniversity of Pisa
XIII International Asian School-seminarProblems of complex systems’ optimization
September 18-22, 2017Akademgorodok
Outline
1 Theoretical background
2 How to evaluate an IDS
3 Anomaly Detection: Overview
4 Anomaly Detection: Examples
M. Pagano Statistical intrusion detection in modern networks 2 / 50
Theoretical background
Outline
1 Theoretical backgroundOverviewTaxonomy of the Intrusion Detection SystemsExamples of the Intrusion Detection Systems
2 How to evaluate an IDS
3 Anomaly Detection: Overview
4 Anomaly Detection: Examples
M. Pagano Statistical intrusion detection in modern networks 3 / 50
Theoretical background Overview
Why an intrusion detection system?
Network security mainly means PREVENTIONPhysical protection for hardwarePasswords, access tokens, etc. for authenticationAccess control list for authorizationCryptography for secrecyBackups and redundancy for authenticity. . . and so on
BUT . . .. . . Absolute security cannot be guaranteed!
M. Pagano Statistical intrusion detection in modern networks 4 / 50
Theoretical background Overview
What is an Intrusion Detection System?
Prevention is suitable whenInternal users are trustedLimited interaction with other networks
Need for a system which acts when prevention fails
Intrusion Detection SystemAn intrusion detection system or IDS is a software/hardwaretool used to detect unauthorized access to a computer systemor a network
M. Pagano Statistical intrusion detection in modern networks 5 / 50
Theoretical background Overview
A bit of History
The history of IDSs can be split in three main blocks1 First Generation IDSs (end of the 1970s)
The concept of IDS first appears in the 1970s and early1980s (Anderson, Computer Security Monitoring andSurveillance, Tech Rep 1980)Focus on audit data of a single machinePost processing of data
2 Second Generation IDSs (1987)Intrusion Detection Expert System (Denning, An intrusionDetection Model, IEEE Trans. on Soft. Eng., 1987)Statistical analysis of data
3 Third Generation IDSs (to come)Focus on the networkReal-time detectionReal-time reactionIntrusion Prevention System
M. Pagano Statistical intrusion detection in modern networks 6 / 50
Theoretical background Overview
A taxonomy of the intruders
Intruders can be classified asMasquerader: an individual who is not authorized to usethe computer and who penetrates a system’s accesscontrol to exploit a legitimate user’s accountMisfeasor: a legitimate user who accesses data,programs, or resources for which such access is notauthorized, or who is authorized for such access, butmisuses his/her privilegesClandestine User: an individual who seizes supervisorycontrol of the system and uses the control to evadeauditing and access controls or to suppress audit collection
Anderson, Computer Security Monitoring and Surveillance, Tech Rep 1980
M. Pagano Statistical intrusion detection in modern networks 7 / 50
Theoretical background Overview
A taxonomy of the intrusions
Eavesdropping and Packet Sniffing: passive interception ofnetwork traffic
Snooping and Downloading
Tampering and Data Diddling: unauthorized changes to dataor records
Spoofing: impersonating other users
Jamming or Flooding: overwhelming a system’s resources
Injecting Malicious Code
Probing
Exploiting Design or Implementation Flaws: as bufferoverflow
Cracking Passwords and Keys
Denning, Cyberspace Attacks and Countermeasures, New York,1997
M. Pagano Statistical intrusion detection in modern networks 8 / 50
Theoretical background IDS Taxonomy
Host based vs. Network based
Host based IDSAimed at detecting attacks related to a specific hostArchitecture/Operating system dependentProcessing of high level information (e.g. system calls)Effective in detecting insider misuse
Network based IDSAimed at detecting attacks towards hosts connected to aLANArchitecture/Operating system independentProcessing data at lower level of granularity (packets)Effective in detecting attacks from the “outside”
M. Pagano Statistical intrusion detection in modern networks 9 / 50
Theoretical background IDS Taxonomy
Misuse based IDS vs. Anomaly based IDS
Misuse based (Signature based, Rule based) IDSIdentifies intrusion by looking for patterns of traffic or ofapplication data presumed to be maliciousPattern of misuses are stored in a databaseEffective in detecting only “known” attacksEncrypted traffic ???
Anomaly based IDSIdentifies intrusions by classifying activity as eitheranomalous or normalNeed a training phase to recognize normal activityAble to detect “new” attacksGenerates more false alarms than a misuse based IDS
M. Pagano Statistical intrusion detection in modern networks 10 / 50
Theoretical background IDS Taxonomy
Stateless IDS VS Stateful IDS
Stateless IDSTreats each event independently of othersSimple system designHigh processing speed
Stateful IDSMaintains information about past eventsThe effect of a certain event depends on its position in theevents streamMore complex system designMore effective in detecting distributed attacks
M. Pagano Statistical intrusion detection in modern networks 11 / 50
Theoretical background IDS Taxonomy
Centralized IDS VS Distributed IDS
Centralized IDSAll the operations are performed by the same machineMore simple to realizeOnly one point of failure
Distributed IDSComposed of several components
Sensors which generate security eventsConsole to monitor events and alerts and control the sensorsCentral Engine that records events and generate alarms
May need to deal with different data formatsNeed of a secure communication protocol (IPFIX)
M. Pagano Statistical intrusion detection in modern networks 12 / 50
Theoretical background IDS Examples
SNORT
SnortThe most famous IDSOpen source software toolNetwork basedSignature basedCentralized architecture
Spade, the anomaly detection plug-in for Snort... is notsupported any longer
The rules database, as well as the system code, is available fordownload at the web site http://www.snort.org
M. Pagano Statistical intrusion detection in modern networks 13 / 50
Theoretical background IDS Examples
Attacks State of the Art
Howard Lipson, CMU Software Engineering Institute CERT R©
(Computer Emergency Response Team – by DARPA in 1988)
M. Pagano Statistical intrusion detection in modern networks 14 / 50
Theoretical background IDS Examples
IDS State of the Art
Focus is on Network based IDSs (The only ones effectivein detecting Distributed Denial of Service - DDoS)State of the art IDSs are Misuse Based
Most attacks are realized by means of software toolsavailable on the InternetMost attacks are “well-known” attacks
BUT . . .. . . The most dangerous attacks are those written
ad hoc by the intruder!
M. Pagano Statistical intrusion detection in modern networks 15 / 50
Theoretical background IDS Examples
The best choice?
Combined use of bothHIDS (for insider attacks) & NIDS (for outsider attacks)Misuse IDS (low False Alarm rate) & Anomaly IDS (for“new” attacks)Stateless IDS (fast data process) & Stateful IDS (for“complex” attacks)
Distributed IDSNot a single point of failureMore effective in monitoring large networks
M. Pagano Statistical intrusion detection in modern networks 16 / 50
Theoretical background IDS Examples
The best choice?
M. Pagano Statistical intrusion detection in modern networks 17 / 50
Evaluation
Outline
1 Theoretical background
2 How to evaluate an IDSReceiver Operating Characteristics (ROC) CurveEvaluation Datasets
3 Anomaly Detection: Overview
4 Anomaly Detection: Examples
M. Pagano Statistical intrusion detection in modern networks 18 / 50
Evaluation ROC
Basic definitions
A = alarm−A = not an alarmI = attack (intrusion)−I = not an attackFalse Positive (FP): the error of rejecting a null hypothesiswhen it is actually true. In our case it implies the creation ofan alarm in correspondence of normal activities
P(A| − I) = False positive (alarm) probability
False Negative (FN): the error of failing to reject a nullhypothesis when it is in fact not true. In our case itcorresponds to a missed detection
P(−A|I) = False negative probability
M. Pagano Statistical intrusion detection in modern networks 19 / 50
Evaluation ROC
ROC (Receiver Operating Characteristics) Curve
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Alarm Rate
De
tectio
n R
ate
Non Stationary ECDF
Non Homogeneous MC
Homogeneous MC
P(A
|I)
P(A|−I)=P(A|I)
P(A|−I)
M. Pagano Statistical intrusion detection in modern networks 20 / 50
Evaluation ROC
Base Rate Fallacy
Let’s suppose:P(A|I) = 0.99P(−A| − I) = 0.99we have 2 attacks over 106 pkts (base rate = 1/500000)
Applying the Bayes theorem:
P(I|A) = P(I) · P(A|I)P(A|I) · P(I) + P(A| − I) · P(−I)
=
=1/500000 · 0.99
1/500000 · 0.99 + (1− 1/500000) · 0.01
Thus P(I|A) = 0.0002
M. Pagano Statistical intrusion detection in modern networks 21 / 50
Evaluation Evaluation Datasets
DARPA Evaluation Program
The 1998/1999 DARPA/MIT IDS evaluation program is themost comprehensive evaluation performed to dateIt provides a corpus of data for the development,improvement, and evaluation of IDSsDifferent kind of data are available:
Operating systems logsNetwork traffic
Collected by an “inside” snifferCollected by an “outside” sniffer
The data model the network traffic measured between aUS Air Force base and the Internet
M. Pagano Statistical intrusion detection in modern networks 22 / 50
Evaluation Evaluation Datasets
The DARPA Network
M. Pagano Statistical intrusion detection in modern networks 23 / 50
Evaluation Evaluation Datasets
The DARPA Dataset
5 weeks dataData from weeks 1 and 3 are attack free and can be usedto train the systemData from week 2 contains labeled attacks and can be usedto realize the signatures databaseData from weeks 4 and 5 contains several attacks and canbe used for the detection phase
An Attack Truth list is providedAttacks are categorized as
Denial of Service (DoS)User to Root (U2R)Remote to Local (R2L)DataProbe
177 instances of 59 different types of attacks
M. Pagano Statistical intrusion detection in modern networks 24 / 50
Evaluation Evaluation Datasets
KDD99
Database of features extracted from the 1998 DARPADatasetUsed for The third International Knowledge Discovery andData Mining Tools CompetitionUseful for evaluating IDSsSome example features
@IP source and destFlagsNumber of bytes sent/received by a hostDuration of flowsProtocol. . . and so on
Can be downloaded athttp://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
M. Pagano Statistical intrusion detection in modern networks 25 / 50
Evaluation Evaluation Datasets
Other Datasets
The DARPA dataset has many drawbacks:simulated environmentnot up-to-date trafficthe methodology used for generating the traffic has beenshown to be inappropriate for simulating actual networks
Other Datasets:several publicly available traffic tracese.g. CAIDA, Abilene (Internet2), GEANT, . . .no ground truth is provided!
M. Pagano Statistical intrusion detection in modern networks 26 / 50
Evaluation Evaluation Datasets
MAWI-Lab Traffic Traces
MAWI (Measurement and Analysis on the WIDE Internet)archive (sample-points B and F)Anomaly labelling is obtained combining the output of fouranomaly detectors
Hough transformGamma distribution modellingKullback-Leibler divergencePrincipal Component Analysis
MAWILab : Combining Diverse Anomaly Detectors forAutomated Anomaly Labeling and Performance Benchmarking
R. Fontugne, P. Borgnat, P. Abry, and K. Fukuda, ACMCoNEXT 2010
M. Pagano Statistical intrusion detection in modern networks 27 / 50
Evaluation Evaluation Datasets
MAWI-Lab Traffic Traces
Traffic taxonomyanomalous: anomalous with high probabilitysuspicious: probably anomalous, but not clearly identifiedby the MAWI classification methodsnotice: non anomalous, but reported by at least oneanomaly detectorbenign: normal
Some information about the kind of anomalyattack: anomalies representing a well known attackspecial: anomalies involving well known portsunknown: unknown kinds of anomalies
M. Pagano Statistical intrusion detection in modern networks 28 / 50
Anomaly Detection: Overview
Outline
1 Theoretical background
2 How to evaluate an IDS
3 Anomaly Detection: OverviewStatistical modelsTraffic Descriptors
4 Anomaly Detection: Examples
M. Pagano Statistical intrusion detection in modern networks 29 / 50
Anomaly Detection: Overview Statistical models
Profiles
An activity profile characterizes the behavior of a givensubject (or set of subjects) with respect to a given object,thereby serving as a signature or description of normalactivity for its respective subject and object.Observed behavior is characterized in terms of a statisticalmetric and modelA metric is a random variable x representing a quantitativemeasure accumulated over a periodObservations xi of x obtained from the audit records areused together with a statistical model to determine whethera new observation is abnormalThe statistical models make no assumptions about theunderlying distribution of x ; all knowledge about x isobtained from the observations xi
M. Pagano Statistical intrusion detection in modern networks 30 / 50
Anomaly Detection: Overview Statistical models
Metrics and Models
MetricsEvent counter
Interval timer
Resource measure
Statistical modelsOperational model : abnormality is decided by comparison of xn with a fixedthreshold
Mean and standard deviation model : abnormality is decided by checking if xnfalls inside the confidence interval
Multivariate model : based on the correlations between two or more metrics
Markov process model : based on the transition probabilities
Time series model : takes into account order and inter-arrival time of the
observations
M. Pagano Statistical intrusion detection in modern networks 31 / 50
Anomaly Detection: Overview Traffic Descriptors
Statistical Approach: Traffic Descriptors
To identify some traffic parameters, which can be used todescribe the network traffic and that vary significantly from thenormal behavior to the anomalous one
Some examplesPacket lengthInter-arrival timeFlow sizeNumber of packets per flow. . . and so on
M. Pagano Statistical intrusion detection in modern networks 32 / 50
Anomaly Detection: Overview Traffic Descriptors
Choice of the Traffic Descriptors
For each parameter we can considerMean ValueVariance and higher order momentsDistribution functionQuantiles. . . and so on
The number of potential traffic descriptors is huge (somepapers identify up to 200 descriptors)
GOALTo identify as few descriptors as possible to classify traffic with
an acceptable error rate
M. Pagano Statistical intrusion detection in modern networks 33 / 50
Anomaly Detection: Examples
Outline
1 Theoretical background
2 How to evaluate an IDS
3 Anomaly Detection: Overview
4 Anomaly Detection: ExamplesUse of Markov ChainsPrincipal Component Analysis
M. Pagano Statistical intrusion detection in modern networks 34 / 50
Anomaly Detection: Examples Use of Markov Chains
State Transition Analysis
The approach was first proposed by Denning anddeveloped in the 1990s.Mainly used in two distinct environment
HIDS: to model the sequence of system commands usedby a userNIDS: to model the sequence of some specific fields of thepacket (e.g. the sequence of the flags values in a TCPconnection)
The most classical approach: Markov chains
M. Pagano Statistical intrusion detection in modern networks 35 / 50
Anomaly Detection: Examples Use of Markov Chains
Markov Chains and TCP
Idea: Model TCP connections by means of Markov chainsThe IP addresses and the TCP port numbers are used toidentify a connectionState space is defined by the possible values of the TCPflagsThe value of the flags is used to identify the chaintransitionsA value Sp is associated to each packet according to therule
Sp = syn + 2 · ack + 4 · psh + 8 · rst + 16 · urg + 32 · fin
M. Pagano Statistical intrusion detection in modern networks 36 / 50
Anomaly Detection: Examples Use of Markov Chains
Markov Chain and TCP - Training phase
Calculate the transitionprobabilities
aij = P[qt+1 = j |qt = i] =P[qt = i ,qt+1 = j]
P[qt = i]
Server side3-way handshakepsh flagclosing
SSH Markov Chain
M. Pagano Statistical intrusion detection in modern networks 37 / 50
Anomaly Detection: Examples Use of Markov Chains
Markov Chain and TCP - Training phase
Calculate the transitionprobabilities
aij = P[qt+1 = j |qt = i] =P[qt = i ,qt+1 = j]
P[qt = i]
Client side3-way handshakeack flagclosing
FTP Markov Chain
M. Pagano Statistical intrusion detection in modern networks 38 / 50
Anomaly Detection: Examples Use of Markov Chains
Markov Chain and TCP - Training phaseCalculate the transition probabilities
aij = P[qt+1 = j |qt = i] =
P[qt = i , qt+1 = j]P[qt = i]
SSH Markov Chain
3-WayHandshake
Syn FloodAttack
M. Pagano Statistical intrusion detection in modern networks 39 / 50
Anomaly Detection: Examples Use of Markov Chains
Markov Chain and TCP - Detection phase
Given the observation (SR+1,SR+2, · · · ,SR+T )
The system has to decide between two hypothesis
H0 : normal behaviourH1 : anomaly
A possible statistic is given by the logarithm of theLikelihood Function
LogLF (t) =T+R∑
t=R+1
Log(aSt St+1)
or by its temporal “derivative”
Dw (t) =∣∣∣∣LogLF (t)− 1
W
W∑i=1
LogLF (t − i)∣∣∣∣
M. Pagano Statistical intrusion detection in modern networks 40 / 50
Anomaly Detection: Examples Use of Markov Chains
Markov Chain and TCP - Detection phase
M. Pagano Statistical intrusion detection in modern networks 41 / 50
Anomaly Detection: Examples Use of Markov Chains
Non-Homogeneous Markov Chain
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Alarm Rate
Det
ectio
n R
ate
Non Stationary ECDFNon Homogeneous MCHomogeneous MC
M. Pagano Statistical intrusion detection in modern networks 42 / 50
Anomaly Detection: Examples Use of Markov Chains
High Order Markov Chain
M. Pagano Statistical intrusion detection in modern networks 43 / 50
Anomaly Detection: Examples Use of Markov Chains
Homogeneous Markov chain — Number of packets
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Alarm Rate
Dete
ction R
ate
5 packets
10 packets
15 packets
20 packets
M. Pagano Statistical intrusion detection in modern networks 44 / 50
Anomaly Detection: Examples PCA
Principal Component Analysis
Principal Component Analysis (or discreteKarhunen–Loeve transform) is the most commonly usedtechniques to analyze high dimensional data structuresPCA is an orthogonal linear transformation that maps themeasured data onto a new set of axes
These axes are called Principal ComponentsEach principal component points in the direction ofmaximum variation (or energy) remaining in the data, giventhe energy already accounted for in the precedingcomponentsThe principal axes are ordered by the amount of energy inthe data they capture
M. Pagano Statistical intrusion detection in modern networks 45 / 50
Anomaly Detection: Examples PCA
Geometric illustration
M. Pagano Statistical intrusion detection in modern networks 46 / 50
Anomaly Detection: Examples PCA
Linear algebraic formulation
For a matrix X = {xi j}I,J (i.e., raws of X are points in RJ ), calculatingthe principal components is equivalent to solving the symmetriceigenvalue problem for the matrix X T X
Each principal component vi is the i-th eigenvector computed from thespectral decomposition of X T X :
X T X vi = λi vi i = 1, . . . J
where λi is the eigenvalue corresponding to vi
k -th principal component
v1 = argmax‖v‖=1
‖X v‖
vk = argmax‖v‖=1
∥∥∥∥∥(
X −k−1∑i=1
X vi vTi
)v
∥∥∥∥∥where || · || denotes the L2 norm
M. Pagano Statistical intrusion detection in modern networks 47 / 50
Anomaly Detection: Examples PCA
Subspace Method
Subspace MethodThis method is based on a separation of the high-dimensionalspace into disjoint subspaces corresponding to normal andanomalous behavior
This separation can be performed effectively by PCA,mapping the dataset onto the new axes
Normal subspace: SAnomalous Subspace: S
x ∈ RJ can be written as
x = x + x = P PT x +(
I − P PT)
x
where P ={
pi j}
J,R = (v1 v2 . . . vR)
M. Pagano Statistical intrusion detection in modern networks 48 / 50
Anomaly Detection: Examples PCA
Example of Scree Plot
5 10 15 20 25 300
5
10
15
20
25
30
35
40
PC
% c
aptu
red
varia
nce
M. Pagano Statistical intrusion detection in modern networks 49 / 50
Conclusions
M. Pagano Statistical intrusion detection in modern networks 50 / 50