Network Tomography and Anomaly Detection
Mark Coates
Tarem Ahmed
Network map from www.opte.org
Brain mapping(opening it up candisturb the system)
Internet mapping(opening it up candisturb the system)
too complex to measure everywhere, all the time
traffic measurements expensive (hardware, bandwidth)
1969 19932005
Internet Boom
unknown object
statistical model
measurements
Maximumlikelihood estimate
maximizelikelihood
physics
data
prior knowledge MRF model
counting &projection
Poisson
Brain Tomography
Link-level Network Tomography
unknown object
measurements
Maximumlikelihood estimate
maximizelikelihood
physics
data
prior knowledgequeuing behavior
end-to-endmeasurements
topology / connectivitylink-level loss probability and delay distribution
Solely from edge-based traffic measurements, infer internal
Link-level Network Tomography
Challenges:
• 12 % never respond,15 % multiple interfaces - Barford et al (2000)
• detect level-2 topology “invisible” to IP layer (e.g., switches)
Application: Topology Discovery
Application: Overlay Voice-over-IP
Multiple paths to choose from select paths with minimal
delay or delay variance
Send a small number of critical packets (vocal transitions) along multiple paths
Use these packets to estimate the path delays (and the extent of path diversity)
Access Network
Autonomous System(s)
Service Gateway
Overlay Link
Network Monitoring
Challenges Restricted measurement High volumes and high rates of data (sampling of traffic on
Gb/s routers) High dimensional data (source/destination IP addresses,
port numbers)
Goals Supply networking protocols with relevant performance
information. Identify anomalous behaviour and operational transitions. Provide network administrators with appropriate notification
or visualization.
Outline
Inference about network performance based on passive measurements or active probing
Two components to the talk: Network tomography Network anomaly detection
Focus on online, sequential approaches Account for non-stationary behaviour Don’t repeat work that has already been done
A = routing matrix (graph)
= packet loss probabilities or queuing delays for each link
y = packet losses or delays measured at the edge
= randomness inherent in traffic measurements
),|(),( AyfAl Statistical likelihood function
Ay
Network Tomography: Likelihood Formulation
Ay
Solve the linear system
Interesting if A, , or have special structures
)|()( Ayfl Maximize the likelihood function
)|(),( AyfAl or:
Classical Problem
sender
receivers
Network Tomography: The Basic Idea
sender
receivers
Network Tomography: The Basic Idea
measurement packet pair
cross-traffic(2)packet (1)packet
(2)packet (1)packet
delay
packet(1) and packet(2) experience (nearly) identicallosses and/or delays on shared links
Packet-pair measurements
Cross-trafficCross-traffic
Modelling time-variations
Nonstationary cross-traffic induces time-variation
Directly model the dynamics (but not the traffic!)
Goal is to perform online tracking and prediction of network link characteristics
Introduce time-dependence in parameters
t t t ty A
Filtering exercise (track θt ):
1:( | )ˆ [ ]
t tt p y t E
(1) Describe dynamic behaviour of θt
(2) Form estimate: (MMSE)
Non-stationary behaviour
Limimi },{ ,,
Limimi },{ 1,1,
Particle Filtering
• Time-varying delay distribution of window size R at time m
• In each window, R probe measurements.
• Form estimates of average delay and jitter over short time intervals
)(, kT Rm
time
Delay units
Delay unit
Delay Distribution Tracking
• Queue/traffic model:
reflected random walk on [0,max_del]
),0(loglog 2,1, Nmjmj
mj ,
)exp()( ,,, mjmjmj kkp
Delay units
Probability
Dynamic Model
• Measurements:
Observe
)(~ ,, kpx mjmj
)(packet(1) m)(packet(2) m
)()2( my )()1( my
2,1),(,,
ixymjPathsmsmj
Observations
• Sequential Monte Carlo Approximation to posterior mean estimate:
)()()(
1,, ),,|()(ˆ i
mi
mi
mm
N
imjmj wykxpkp
Message-passing algorithm
• Estimate of time-varying delay distribution:
Particle weights
, , 1:1
1ˆ ˆ( ) ( | )m
m R j l ll m R
T k p x yR
Estimation of Delay Distributions
• Complexity: per measurement)( 2NLKO
Average Number of Unique Links
Max. delay units per link
Number of Particles
• Convergence analysis of [Crisan, Doucet 01; Le Gland,
Oudjane 02] applies.
• The approximation to the posterior mean estimate converges to the true estimate as N ∞
Analysis
time
Mean Delay
Delay Distributions
true
tracking
Simulation Results – ns2
Comments
Dynamic models allow us to account for non-stationarity but realistic models are hard to derive and incorporate
Particle filtering only appropriate when analytical techniques fail non-Gaussian or non-linear dynamics or observations Sequential structure allows on-line implementation Care must be taken to reduce computation at each step
Network Anomaly Detection
In tomography, a primary challenge is the restriction on available measurements.
Anomaly detection – a primary challenge is the abundance of measurements.
How can we process data at a sufficient rate?
How should we extract relevant information?
Netflow Data
Records of flows.
A flow is defined by: (source IP, dest. IP, source port #, dest. port #)
Packets are sampled at configurable rates.
Exported at 1-minute or 5-minute intervals.
Dataset – Abilene Network
Abilene Weathermap – Indiana University
Thanks to Rick Summerhill and Mark Fullmer at Abilene for providing access to the data.
Principal Component Analysis (PCA)
Goal: Identify a low-dimensional subspace that captures the key components of the feature set
Idea: If (most of) a measurement does not lie in this subspace, then it is anomalous
PCA conduct a linear transformation to choose a new coordinate
system Projection onto first principal component has greater
variance than any other projection (maximum energy). Subsequent principal components capture greatest
remaining energy
PCA (2)
Reduce dimensionality by eliminating principal components that do not contribute significantly to variance in the dataset (small singular value)
Not optimized for class separability (linear discriminant analysis)
Minimizes reconstruction error under L2 norm.
“Eigenflow” Analysis
Lakhina et al. (2004, 2004b).
PCA analysis of Origin-Destination (OD) Flows
Eigenflow: set of flows mapped onto a single principle component
Intrinsic Dimensionality: Empirical studies for Sprint and Abilene networks indicated that 5-10 principal components sufficed to capture most of the energy.
PCA-based Anomaly Detection
Perform PCA on block of OD flow measurements
Project each measurement onto primary principal components
Test whether the residual energy exceeds a threshold.
Squared prediction error (SPE - Q-statistic) used to test for unusual flow-types.
Prone to Type-I errors (false positives) when applied to transient operations.
In these cases, the assumption that the source data is normally distributed is violated.
Online Method
Don’t need to relearn from scratch when new data arrive
Computational cost per time step should be bounded by constant independent of time
Block-based PCA unattractive
Alternative method: Kernel Recursive Least Squares (KRLS)
KRLS
Represent function as:
Where {xi} are training points
Desire a sparse solution (storage and time savings + generalization ability)
Effective dimensionality of manifold spanned by training feature vectors may be much smaller than feature space dimension
Identify linearly independent feature vectors that approximately span this manifold.
t
iiikf
1
),()(ˆ xxx
KRLS
Sequentially sample a stream of input/output pairs
At time step t, assume we have collected a dictionary of samples:
where by construction are linearly independent feature vectors
Ryyy ii ,X,),...,(),,( 2211 xxx
1
11~
tm
jjt xD
1
1)~(
tm
jjx
KRLS
We encounter a new sample xt.
Test whether is approximately linearly dependent on feature vectors.
If not, add it to dictionary.
)( tx
2
1
1
)()~(mintm
jtjj
at a xx
Dictionary approximation Threshold
KRLS Properties
Provided input set X is compact, then number of dictionary elements is finite.
Approximate version of kernel PCA eigenvectors with eigenvalues significantly larger
than are projected almost entirely onto the dictionary set.
O(m2) memory and O(tm2) time
Compare exact kernel PCA – O(t2) memory and O(t2p) time.
Application in Networks
Data set is the Origin-Destination Flows (11x11 matrix = 121 dimensional vector per measurement interval).
Normalized, these comprise the features.
We use the total traffic per measurement interval as the associated value y
Total traffic
Measurement interval
0000 hrs on Aug 10, 2005 to 2359 hrs Aug 21, 2005 at Chicago router. Gives 3456, 5-minute intervals over the 12-day period.
No.
Pac
kets
Origin-Destination Flows
t =1300 t =3000
t =100t =1
Building the Dictionary
= 0.1
= 0.2
Measurement interval Measurement interval
δδ
δδ
# E
lem
ents
# E
lem
ents
Gaussian
Linear
Dictionary Components
Element 20 Element 22
Element 6Element 5
KRLS Anomaly Detection Algorithm
1. Based on xt , evaluate δt.
2. If δt < ν1, green-light traffic.
3. If δt < ν2, raise red alarm.
4. If ν1 < δt < ν2 raise orange alarm.
1. Test usefulness of xt. (Does φ(xt) provide good support for ensuing vectors).
2. If yes, add xt to the dictionary.
3. If no, raise red alarm.
5. Remove any obsolete dictionary elements
Evaluating Usefulness
Timestep
Ker
nel
val
ue
Normal
Obsolete
Anomalous
Anomaly Detection
KRLSPCA
OCNM
Euclideandistance
Magnitudeof Residual
KRLS
PCA
OCNM
Timestep
PCA versus KRLSAnomaly 1
Timestep
No
. IP
flo
ws
PCA Versus KRLS:Anomaly 2
Timestep
No
. IP
flo
ws
Mag
nit
ud
eo
f P
roje
ctio
n
Summary and Challenges
Network monitoring presents challenges on different fronts: Constraints on available measurements
(reconstruction based on partial views) High-rate, high-dimensional, distributed data
(Some of the many) open questions: Tomography: network models, spatial + temporal
correlations, optimal sampling, multiple source. Anomaly detection: thresholds, dictionary control,
feature space, dataset
Fig 3
False Alarm Rate (%)
Det
ecti
on
Rat
e (%
)
Objective: Estimate expectations 0: 0:( ) ( )t t th d with respect to a sequence of distributionsknown up to a normalizing constant, i.e.
Monte Carlo: Obtain N weighted samples
0t t
0: 0: 0:( ) ( )t t t t td d
( ) ( )0: 1, ,
,i it t i Nw
( ) ( )
1
0, 1N
i it t
i
w w
where such that
( ) ( )0: 0: 0:
1
Ni it t t t tN
i
w h h d
Particle filtering