34
Anomaly Detection Using Projective Markov Models in a Distributed Sensor Network Sean Meyn Department of Electrical and Computer Engineering and the Coordinated Science Laboratory University of Illinois Joint work with Amit Surana, Yiqing Lin, and Satish Narayanan, United Technologies Research Center Acknowledgements : Research supported by United Technologies Research Center and the National Science Foundation, CCF 07-29031

Anomaly Detection Using Projective Markov Models

Embed Size (px)

DESCRIPTION

Presented at the 2009 CDC, Shanghai Anomaly Detection Using Projective Markov Models in a Distributed Sensor Network Sean Meyn, Amit Surana, Yiqing Lin, and Satish Narayanan https://netfiles.uiuc.edu/meyn/www/spm_files/Mismatch/Mismatch.html

Citation preview

Page 1: Anomaly Detection Using Projective Markov Models

Anomaly Detection Using Projective Markov Models in a Distributed Sensor Network

Sean MeynDepartment of Electrical and Computer Engineeringand the Coordinated Science Laboratory University of Illinois

Joint work with Amit Surana, Yiqing Lin, and Satish Narayanan, United Technologies Research Center

Acknowledgements: Research supported by United Technologies Research Center and the National Science Foundation, CCF 07-29031

Page 2: Anomaly Detection Using Projective Markov Models

Outline

• Detection in a Sensor Network

• Multiple Models for Distributed Detection

• Application to a Building Security

• Detection in a Sensor Network

• Multiple Models for Distributed Detection

• Application to a Building Security

Page 3: Anomaly Detection Using Projective Markov Models

IDetection in a Sensor Network

Page 4: Anomaly Detection Using Projective Markov Models

Problem Statement

Detect anomalous behavior based on

• A large number of heterogeneous sensors • A large heterogeneous region • Partial information regarding anomalous behavior • Complex behavior with or without anomaly

Interest at UTRC: building monitoring for security and energy efficiency

Detect anomalous behavior based on

• A large number of heterogeneous sensors • A large heterogeneous region • Partial information regarding anomalous behavior • Complex behavior with or without anomaly

Interest at UTRC: building monitoring for security and energy efficiency

Page 5: Anomaly Detection Using Projective Markov Models

Challenges and Resolution

1. Model for anomalous behavior 2. Complexity of normal behavior 3. Lack of coordinated action, or communication constraints 4. High variance associated with detection

Detect anomalous behavior based on

• A large number of heterogeneous sensors • A large heterogeneous region • Partial information regarding anomalous behavior • Complex behavior with or without anomaly

1. Model for anomalous behavior 2. Complexity of normal behavior 3. Lack of coordinated action, or communication constraints 4. High variance associated with detection

Detect anomalous behavior based on

• A large number of heterogeneous sensors • A large heterogeneous region • Partial information regarding anomalous behavior • Complex behavior with or without anomaly

Page 6: Anomaly Detection Using Projective Markov Models

Challenges and Resolution

1. Model for anomalous behavior 2. Complexity of normal behavior 3. Lack of coordinated action, or communication constraints 4. High variance associated with detection

Approach: Projective Markov models address 1 - 3. Paramaterized models address 4.

1. Model for anomalous behavior 2. Complexity of normal behavior 3. Lack of coordinated action, or communication constraints 4. High variance associated with detection

Approach: Projective Markov models address 1 - 3. Paramaterized models address 4.

Page 7: Anomaly Detection Using Projective Markov Models

IIMultiple Models for Distributed Detection

Qβ∗( )

Q ( )ηπ0

π0

π1π1

Page 8: Anomaly Detection Using Projective Markov Models

Binary Hypothesis Testing - Geometric View

For ease of explanation only: The classical i.i.d. setting

For starters: Binary hypothesis testing

For ease of explanation only: The classical i.i.d. setting

For starters: Binary hypothesis testing

Z is an i.i.d. sequence on a finite state space

marginal under model of normal behaviorπ0

marginal under model of anomalous behaviorπ1

Page 9: Anomaly Detection Using Projective Markov Models

Binary Hypothesis Testing - Geometric View

For ease of explanation only: The classical i.i.d. setting

For starters: Binary hypothesis testing

For ease of explanation only: The classical i.i.d. setting

For starters: Binary hypothesis testing

Optimal test: Under Neyman-Pearson or Bayesian criteria,Optimal test: Under Neyman-Pearson or Bayesian criteria,

Z is an i.i.d. sequence on a finite state space

marginal under model of normal behaviorπ0

marginal under model of anomalous behavior

Log-likelihood Ratio Test Log-likelihood Ratio

π1

L = log(dπ1/dπ0)φ(ZT1 ) = I

1

T

T

t=1

L(Z(t)) ≥ τ

Page 10: Anomaly Detection Using Projective Markov Models

Binary Hypothesis Testing - Geometric View

Geometry:Geometry:

Optimal test: Under Neyman-Pearson or Bayesian criteria,Optimal test: Under Neyman-Pearson or Bayesian criteria,

Log-likelihood Ratio Test Log-likelihood Ratio

L = log(dπ1/dπ0)φ(ZT1 ) = I

1

T

T

t=1

L(Z(t)) ≥ τ

Qβ∗( )

Q ( )η

π0

π0

π1π1

Separating hyperplane

Page 11: Anomaly Detection Using Projective Markov Models

Binary Hypothesis Testing - Geometric View

Geometry:Geometry:

Optimal test: Under Neyman-Pearson or Bayesian criteria,Optimal test: Under Neyman-Pearson or Bayesian criteria,

Log-likelihood Ratio Test Log-likelihood Ratio

L = log(dπ1/dπ0)φ(ZT1 ) = I

1

T

T

t=1

L(Z(t)) ≥ τ

Qβ∗( )

Q ( )η

π0

π0

π1π1

Separating hyperplane:

{µ :

∫L(z)µ(dz) = τ

}

Qη(π0) = µ : D(µ‖π0) < η{ }

Page 12: Anomaly Detection Using Projective Markov Models

Binary Hypothesis Testing - Geometric View

Geometry:Geometry:

LLR test: Declare Anomaly if empirical distributions lie outside of lower half spaceLLR test: Declare Anomaly if empirical distributions lie outside of lower half space

Qβ∗( )

Q ( )η

π0

π0

π1π1

{µ :

∫L(z)µ(dz) = τ

}

ΓT (z) =:1

T

T∑

t=1

I Z(t) = z , z ∈ Z{ }

Page 13: Anomaly Detection Using Projective Markov Models

Universal Detection

Anomalous behavior is not modeledAnomalous behavior is not modeled

Alarm is sounded if empirical distribution lies outside divergence nbd

Alarm is sounded if empirical distribution lies outside divergence nbd

π0

Q ( )η π0

{ }

{ }

φ(ZT1 ) = I ΓT Q∈� η(π0)

= I D(ΓT ‖π0) ≥ η

sean
Typewritten Text
[Hoeffding, 1965]
Page 14: Anomaly Detection Using Projective Markov Models

Universal Detection

Anomalous behavior is not modeledAnomalous behavior is not modeled

Good news: For large T, performance approaches optimality of LLR testGood news: For large T, performance approaches optimality of LLR test

Alarm is sounded if empirical distribution lies outside divergence nbd

Alarm is sounded if empirical distribution lies outside divergence nbd

π0Q ( )η π0

{ }

{ }

φ(ZT1 ) = I ΓT Q∈� η(π0)

= I D(ΓT ‖π0) ≥ η

Page 15: Anomaly Detection Using Projective Markov Models

Universal Detection

Anomalous behavior is not modeledAnomalous behavior is not modeled

Good news: For large T, performance approaches optimality of LLR testGood news: For large T, performance approaches optimality of LLR test

Bad news: For finite T, variance of statistic grows linearly with size of observation alphabet Z

Bad news: For finite T, variance of statistic grows linearly with size of observation alphabet Z

Alarm is sounded if empirical distribution lies outside divergence nbd

Alarm is sounded if empirical distribution lies outside divergence nbd

π0Q ( )η π0

{ }

{ }

φ(ZT1 ) = I ΓT Q∈� η(π0)

= I D(ΓT ‖π0) ≥ η

D(ΓT ‖π0)

sean
Typewritten Text
[Unnikrishnan, Huang, M., Surana, Veeravalli, 2009] [Clarke and Barron, 1990]
sean
Typewritten Text
Page 16: Anomaly Detection Using Projective Markov Models

Universal Detection - Multiple Models

Suppose we extract features of the observations,Suppose we extract features of the observations,

Selected based onSelected based on

Zi(t) = ϕi(Z(t)), 1 ≤ i ≤ n Zi(t) ∼ π0i

• Constraints, such as sensor locations• Prior knowledge regarding anomalous behavior• Variance reduction

• Constraints, such as sensor locations• Prior knowledge regarding anomalous behavior• Variance reduction

Page 17: Anomaly Detection Using Projective Markov Models

Universal Detection - Multiple Models

Suppose we extract features of the observations,Suppose we extract features of the observations,

Selected based onSelected based on

Zi(t) = ϕi(Z(t)), 1 ≤ i ≤ n Zi(t) ∼ π0i

• Constraints, such as sensor locations• Prior knowledge regarding anomalous behavior• Variance reduction

• Constraints, such as sensor locations• Prior knowledge regarding anomalous behavior• Variance reduction

Optimal combination of features for optimal detection:Optimal combination of features for optimal detection:

φ(ZT1 ) = I Γi(T ) Q∈� η(π0

i ) for some i{ }

= I{ΓT �∈

i

Qη(π0i )

}

Page 18: Anomaly Detection Using Projective Markov Models

Universal Detection - Multiple Models

Suppose we extract features of the observations,Suppose we extract features of the observations,

Selected based onSelected based on

Zi(t) = ϕi(Z(t)), 1 ≤ i ≤ n Zi(t) ∼ π0i

• Constraints, such as sensor locations• Prior knowledge regarding anomalous behavior• Variance reduction

• Constraints, such as sensor locations• Prior knowledge regarding anomalous behavior• Variance reduction

Optimal combination of features for optimal detection:Optimal combination of features for optimal detection:

Geometry:Geometry:

Qη(π01)

Qη(π02)

φ(ZT1 ) = I Γi(T ) Q∈� η(π0

i ) for some i{ }

= I{ΓT �∈

i

Qη(π0i )

}

Page 19: Anomaly Detection Using Projective Markov Models

Universal Detection - Multiple Models

Suppose we extract features of the observations,Suppose we extract features of the observations,

Selected based onSelected based on

Zi(t) = ϕi(Z(t)), 1 ≤ i ≤ n Zi(t) ∼ π0i

• Constraints, such as sensor locations• Prior knowledge regarding anomalous behavior• Variance reduction

• Constraints, such as sensor locations• Prior knowledge regarding anomalous behavior• Variance reduction

Optimal combination of features for optimal detection:Optimal combination of features for optimal detection:

Geometry:Geometry:

Qη(π01)

Qη(π02)

φ(ZT1 ) = I Γi(T ) Q∈� η(π0

i ) for some i{ }

= I{ΓT �∈

i

Qη(π0i )

}

Safe RegionSafe Region

Page 20: Anomaly Detection Using Projective Markov Models

Markov Models

KL Divergence is replaced by relative entropy rateKL Divergence is replaced by relative entropy rate

wherewhere andand are the distributions of are the distributions of

assumed Markovian, with transition matrices Q and Passumed Markovian, with transition matrices Q and P

J(Q‖P ) = limn→∞

1

nD(γ(n)‖π(n))

γ(n) π(n)

= D(γ(2)‖π(2)) − D(γ‖π)

(Z(1), . . . , Z(n))

Equal under Markov assumption

Page 21: Anomaly Detection Using Projective Markov Models

Markov Models

KL Divergence is replaced by relative entropy rateKL Divergence is replaced by relative entropy rate

Local models? Shannon-Mori-Zwanzig projection:Local models? Shannon-Mori-Zwanzig projection:

wherewhere andand are the distributions of are the distributions of

assumed Markovian, with transition matrices Q and Passumed Markovian, with transition matrices Q and P

J(Q‖P ) = limn→∞

1

nD(γ(n)‖π(n))

γ(n) π(n)

= D(γ(2)‖π(2)) − D(γ‖π)

(Z(1), . . . , Z(n))

P (x, y) :=π(2)(x, y)

π(x)

Equal under Markov assumption

Page 22: Anomaly Detection Using Projective Markov Models

Markov Models

Option A: Shannon-Mori-Zwanzig projection:Option A: Shannon-Mori-Zwanzig projection:

Option B: Parameterized models:Option B: Parameterized models:

θ chosen using ML estimationθ chosen using ML estimation

Local models? Two approaches:Local models? Two approaches:

P (x, y) :=π(2)(x, y)

π(x)

π(2)θ (x, y) = eθTψ(x,y), θ ∈ Rm

Page 23: Anomaly Detection Using Projective Markov Models

Markov Models

Option A: Shannon-Mori-Zwanzig projection:Option A: Shannon-Mori-Zwanzig projection:

Option B: Parameterized models:Option B: Parameterized models:

Advantage of option B: Variance grows with dimension m, not the cardinality of the observation spaceAdvantage of option B: Variance grows with dimension m, not the cardinality of the observation space

θ chosen using ML estimationθ chosen using ML estimation

Local models? Two approaches:Local models? Two approaches:

P (x, y) :=π(2)(x, y)

π(x)

π(2)θ (x, y) = eθTψ(x,y), θ ∈ Rm

Page 24: Anomaly Detection Using Projective Markov Models

IIIApplication to Building Security

Page 25: Anomaly Detection Using Projective Markov Models

Building Testbed at UTRC

Eleven Markov models for occupancy based on eleven zonesEleven Markov models for occupancy based on eleven zones

Option A: Empirical Markov model Option A: Empirical Markov model

Option B: Queueing modelOption B: Queueing model

Video cameraVideo camera

ZoneZone

[Smith and Towsley, 1981][Smith and Towsley, 1981]

1210

8

75

14

1

2 3 4 6

9

1113

16

1

2 34

5

6

78910

1115

Page 26: Anomaly Detection Using Projective Markov Models

Experiment Architecture12

10

8

75

14

1

2 3 4 6

9

1113

16

1

2 34

5

6

78910

1115

Scenarios:Scenarios:

Capture a range of unusual traffic patterns in a buildingCapture a range of unusual traffic patterns in a building

1 Convergence: Numerous occupants converge to a single zone

2 Divergence: Numerous occupants leave a single zone

3 Idleness: Numerous occupants converge to a single zone

4 Loitering: Numerous occupants converge to a single zone

5 High occupancy: Higher than normal occupancy in combined zones

1 Convergence: Numerous occupants converge to a single zone

2 Divergence: Numerous occupants leave a single zone

3 Idleness: Numerous occupants converge to a single zone

4 Loitering: Numerous occupants converge to a single zone

5 High occupancy: Higher than normal occupancy in combined zones

Page 27: Anomaly Detection Using Projective Markov Models

Typical ROC Curves12

10

8

75

14

1

2 3 4 6

9

1113

16

1

2 34

5

6

78910

1115

0.2 0.4 0.6 0.80.2 0.4 0.6 0.80

0.2

0.4

0.6

0.8

1Empirical model

Test statistic: Based on a moving window of length δ

Semi-empirical model

0

0.2

0.4

0.6

0.8

1

1010

Delay

52

7

R(t) =

J(Γ(2)δ0,t

Γ(2)δ0,t

‖π(2)) Empirical bivariate empirical distribution

1

δ0

0

t∑

k=t−δ0+1

log �t,δ0 �t,δ0Semi-empirical Likelihood ratio using ML estimate,( )

Page 28: Anomaly Detection Using Projective Markov Models

Typical ROC Curves12

10

8

75

14

1

2 3 4 6

9

1113

16

1

2 34

5

6

78910

1115

0.2 0.4 0.6 0.80.2 0.4 0.6 0.80

0.2

0.4

0.6

0.8

1Empirical model

Test statistic: Based on a moving window of length δ

Semi-empirical model

0

0.2

0.4

0.6

0.8

1

1010

Delay

52

7

R(t) =

J(Γ(2)δ0,t

Γ(2)δ0,t

‖π(2)) Empirical bivariate empirical distribution

1

δ0

0

t∑

k=t−δ0+1

log �t,δ0 �t,δ0Semi-empirical Likelihood ratio using ML estimate,( )

�t,δ0 :=Pt,δ0(Z(k), Z(k + 1))

P (Z(k), Z(k + 1))

Page 29: Anomaly Detection Using Projective Markov Models

Centralized Detection12

10

8

75

14

1

2 3 4 6

9

1113

16

1

2 34

5

6

78910

1115

Max

imum

of s

tatis

tics

Empirical Statistic

Delay is similar using either test

Semi-empirical statistic is far more disciminating

30 40 50 600

5

10

15

Semi-empirical Statistic

Anomalous episode:Convergence to zone 6

6

Page 30: Anomaly Detection Using Projective Markov Models

Decentralized Detection: Divergence12

10

8

75

14

1

2 3 4 6

9

1113

16

1

2 34

5

6

78910

1115

Empirical Statistic

Delay is similar using either test

Many false alarms from empirical statistic

Semi-empirical Statistic

Anomalous episode:Divergence from zone 5

88

99

5

10 20 30 40

05

1015

05

1015

05

1015

Z4Z5

Z6

Page 31: Anomaly Detection Using Projective Markov Models

Decentralized Detection: Occupancy12

10

8

75

14

1

2 3 4 6

9

1113

16

1

2 34

5

6

78910

1115

30 40 50 60

05

1015

05

1015

Empirical Statistic

Empirical statistic clairvoyant?

Missed detection using empirical statistic

Semi-empirical Statistic

Anomalous episode:10% higher occupancy in zones 5 and 6

99

5

6

Z5Z6

Page 32: Anomaly Detection Using Projective Markov Models

ConclusionsEmpirical Statistic

Semi-empirical Statistic

Contributions:

• Feasibility of an anomaly detection framework using projected Markov models.

• Advantages of semi-empirical Markov models

Page 33: Anomaly Detection Using Projective Markov Models

ConclusionsEmpirical Statistic

Semi-empirical Statistic

Current research:

• Feature selection for distributed detection

• Active learning - e.g., query for additional data

• Diagnosis

• Response

Contributions:

• Feasibility of an anomaly detection framework using projected Markov models.

• Advantages of semi-empirical Markov models

Page 34: Anomaly Detection Using Projective Markov Models

References

[1,3,4,5,6,8] Geometry

[2,3,4,6,8] Universal detection

[4,7] Variance in detection

and parameter estimation

.razsisC.I]1[ I-divergence geometry of probability distributionsand minimization problems. Ann. Probab., 3:146–158, 1975.

[2] O. Zeitouni and M. Gutman. On universal hypotheses testingvia large deviations. IEEE Trans. Inform. Theory, 37(2):285–290, 1991.

[3] C. Pandit and S. P. Meyn. Worst-case large-deviations withapplication to queueing and information theory. Stoch. Proc.Applns., 116(5):724–756, May 2006.

[4] J. Unnikrishnan, D. Huang, S. Meyn, A. Surana, and V. Veer-avalli. Universal and composite hypothesis testing via mis-matched divergence. CoRR and submitted for publication,IEEE Trans. IT., abs/0909.2234, 2009.

[5] S. Borade and L. Zheng. I-projection and the geometryof error exponents. In Proceedings of the Forty-FourthAnnual Allerton Conference on Communication, Control, andComputing, Sept 27-29, 2006, UIUC, Illinois, USA, 2006.

[6] E. Abbe, M. Medard, S. Meyn, and L. Zheng. Finding thebest mismatched detector for channel coding and hypothesistesting. Information Theory and Applications Workshop, 2007,pages 284–288, 29 2007-Feb. 2 2007.

[7] B. S. Clarke and A. R. Barron. Information-theoretic asymp-totics of Bayes methods. IEEE Trans. Inform. Theory,36(3):453–471, 1990.

[8] D. Huang, J. Unnikrishnan, S. Meyn, V. Veeravalli, andA. Surana. Statistical SVMs for robust detection, supervisedlearning, and universal classification. In Proceedings of theInformation Theory Workshop on Networking and InformationTheory, Volos, Greece., 2009.