Upload
reynold-wilkinson
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Anomaly detection in VoIP and Ethernet traffic under presence of
daily patterns
Piotr Żuraniewski (UvA/TNO/AGH)
Felipe Mata (UAM), Michel Mandjes (UvA), Marco Mellia (POLITO)
Changepoint detection
• Changepoint detection: finding that current statistical description of data sample is no longer valid
• Problem can be formulated in language of statistical hypothesis test
Benefits of changepoint detection
• Deviation from normal system state can be detected (anomaly detection)– attack on ICT infrastructure (excessive number of
TCP SYN packets)– failure (excessive/too low traffic volume)– Service Level Agreement not met (delay out of
acceptable range)
• Human experts empowered with additional tool
Benefits of statistics-based approach
• Manual and on-line analysis of large data volumes may be infeasible
• Visual inspection may be insufficient due to some hidden structures in data
• Objective and unbiased opinion of human not always available
• Possibility to control false alarm ratio/detection ratio
Problems
• Changepoint detection procedures often assume independent observations
• Real life: dependency is present– stochastic one (mind ‘fractal’ models)– deterministic (e.g., diurnal trends)
• High dependency may ruin changepoint detection test
Possible solution
• Estimate and remove trend from traffic– for VoIP traffic: try to exploit possible local
Poissonian behavior– exploit periodicity
• Only than apply changepoint detection procedure(s) to residuals– residuals should be (approx.) standard normal– anomaly: change from N(0,1) to N(m,s)
7
Traffic, trend, residuals (no nights)
Contribution
• We have developed changepoint detection test able to detect simultaneous change in mean and variance for Gaussian input
• We have numerically assessed sensitivity to deviation from independence assumption– our simple trend removal method may still
leave some dependency in residuals
Synthetic Gaussian trace• Window of 50 observation presented to detector,
sequential manner, delta – relative position of changepoint• True change from N(0,1) to N(3.07,1.082) from window
152 on (Erlang: it would give 0.1% blocking prob.)• 500 experiments, good performance
0 100 200 300 4000
0.2
0.4
0.6
0.8
1
window number
true deltaQ 2.5% of detected deltasQ 25% of detected deltasQ 50% of detected deltasQ 75% of detected deltasQ 97.5% of detected deltas
0 100 200 300 4000
0.2
0.4
0.6
0.8
1
window number
dete
ctio
n ra
tio
Dependent input
• What if input to detection procedure is correlated?
• Verification with genarated AR(1) traces
• Recall: {Xi} is AR(1) process if it follows
noise white- mean; - ;1 iiii XX
,1,0, kk k
• AR(1) autocorrelation (linear dependency measure) function is:
Correlated input – results
phi
mean false alarm ratio
detection ratio for window no. 152
false alarm ratio (regen.)
0 5.7% 76.6% 5.7%
0.2 10.1% 77.9% 5.3%
0.4 17.7% 80.8% 10.4%
0.6 27.2% 85.9% 17.9%
0.8 36.8% 90.3% 24.0%0 100 200 300 400
0
0.2
0.4
0.6
0.8
1
window number
dete
ctio
n ra
tio
• Correlation results in performance degradation• Due to dependency, false alarm ratio (FA) ratio in window k
influences FA prob. in window k+1• To assess this effect, FA is calculated for fully regenerated sample
Real data example
0 100 200 300 4000
500
data, pattern, detected anomalies (week 5)
time
calls
0 100 200 300 4000
0.5
1
callspattern
Ethernet traffic
• Poissonian assumption may be problematic
• Mean and variance to be estimated
• Less regularity
• Periodic moving average and simple moving average?
Ethernet traffic (NREN)
• Some traces show some regular patterns
1.279 1.28 1.281 1.282 1.283 1.284 1.285
x 109
0
1
2
3
4x 10
7
time (UNIX stamp)
Bps
(10
min
. avg
)
Trends
5660 5680 5700 5720 5740
2
4
6
8
10x 10
6
time
Bps
(10
min
. avg
)
original traceestimated patternestimated periodic patternestimated MA pattern
Trends
6680 6700 6720 6740
2
4
6
8
10
12
14
16
x 106
time
Bps
(10
min
. avg
)
original trace
estimated pattern
estimated periodic pattern
estimated MA pattern
Residuals
0 2000 4000 6000 8000 10000-1
-0.5
0
0.5
1
1.5x 10
7
time
Bps
(10
min
. avg
)
residuals = trace - pattern
Busy hour
• The same model for day and night, working day and weekend may not be optimal in all cases
• Now we focus on busy hour (8-15), no weekends
1.279 1.28 1.281 1.282 1.283 1.284 1.285
x 109
0
1
2
3
4x 10
7
time
Bps
(10
min
. avg
)
original tracebusy hour 8-15, no weekends
Residuals
800 1000 1200 1400 1600 1800 2000
-4
-2
0
2
4
6
8
10
12
14x 10
6
time
Bps
(10
min
. av
g)
residuals
Residuals – 1st part
-4 -3 -2 -1 0 1 2 3 4-6
-4
-2
0
2
4
6
8
10
12
14x 10
6
Standard Normal Quantiles
Qua
ntile
s of
Inp
ut S
ampl
e
QQ Plot of Sample Data versus Standard Normal
-6 -4 -2 0 2 4 6 8 10 12 14
x 106
0
20
40
60
80
100
120
140
Residuals 2nd part
-4 -3 -2 -1 0 1 2 3 4-1
-0.5
0
0.5
1
1.5x 10
7
Standard Normal Quantiles
Qua
ntile
s of
Inp
ut S
ampl
e
QQ Plot of Sample Data versus Standard Normal
-6 -4 -2 0 2 4 6 8 10 12 14
x 106
0
5
10
15
20
25
30
35
40
45
Summary
• We have extended anomaly-detection method developed for stationary VoIP traffic
• Diurnal trends taken into consideration
• Statistical framework as a basis but…
• …practitioner’s perspective – simplifications – also considered
• Other type of traffic – more challenges