17
Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab School of Computer Science Carnegie Mellon University This work was partly funded by NSF grant IIS-0325581 and CDC award R01-PH000028 presented by Robin Sabhnani from the Auton Lab

Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Embed Size (px)

Citation preview

Page 1: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Learning Stable Multivariate Baseline Models for Outbreak

Detection Sajid M. Siddiqi, Byron Boots,

Geoffrey J. Gordon, Artur W. DubrawskiThe Auton Lab

School of Computer Science Carnegie Mellon University

This work was partly funded by NSF grant IIS-0325581 and CDC award R01-PH000028

presented by Robin Sabhnani from the Auton Lab

Page 2: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Motivation

• Lots of health-related data available

• Much of this data is temporal

• Many data sources are also multivariate

Page 3: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Motivation

• When detecting anomalies, the crucial information could be hidden in the dynamics of the data as well as the interaction between different data streams

• Our goal: Learn good models for simulating baseline data for use in training algorithms as well as detecting anomalies

• Linear Dynamical Systems are a good choice

Page 4: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Outline

• Linear Dynamical Systems

• Learning Stable Models

• Experimental Setup

• Results

• Conclusion

Page 5: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Linear Dynamical Systems (LDS)

X1

Y1

X2

Y2

. . . Xt

Yt

Xt+1

Yt+1

hidden variables (low-dimensional)

observed data (high-dimensional)

Page 6: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Linear Dynamical Systems (LDS)

X1

Y1

X2

Y2

. . . Xt

Yt

Xt+1

Yt+1

hidden variables (low-dimensional)

observed data (high-dimensional)

Page 7: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

hidden variables (low-dimensional)

observed data (high-dimensional)

Linear Dynamical Systems (LDS)

X1

Y1

X2

Y2

. . . Xt

Yt

Xt+1

Yt+1

• Dynamics matrix A models temporal evolution

Page 8: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

hidden variables (low-dimensional)

observed data (high-dimensional)

Linear Dynamical Systems (LDS)

X1

Y1

X2

Y2

. . . Xt

Yt

Xt+1

Yt+1

• Dynamics matrix A models temporal evolution

• Multivariate Gaussian noise vt , wt models

interaction between streams

Page 9: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Linear Dynamical Systems

• The Good:– Linear Dynamical Systems (aka State-Space

models, aka Kalman Filters) are a generalization of ARMA models and can represent a wide range of time series

– LDS parameters can be learned from data

• The Bad:– LDSs learned from data are often unstable– Simulation from an unstable LDS degenerates

Page 10: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Stability

• Stability of an LDS depends on its dynamics matrix A

• Let {1,…,n} be the eigenvalues of A in decreasing order of magnitude

• A is stable if |1 <= 1• Constraining |1during learning is hard

• We devise an iterative optimization method* that beats previous approaches in efficiency and accuracy

* “A Constraint Generation Approach to Learning Stable Linear Dynamical Systems”, S. Siddiqi,

B. Boots, G. Gordon, NIPS 2007

Page 11: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Stability

• Learning stable LDS models allows us to:– Compress large temporal multivariate datasets– Generate realistic data sequences– Predict the future given some data

• Deviations from predicted data indicate anomalies

Page 12: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Experimental Setup• Data:

– OTC drug sales data for 22 categories in 29 Pittsburgh zip codes over 60 days

• track all zipcodes for cough/cold category (multi-zipcode data)• track all drug-categories for city of pittsburgh (multi-drug data)

• Experiments: – Learn a LDS model using first 15 days, and

• Simulate a sequence (qualitative task)• Reconstruct state sequence (quantitative task)• Predict future occurrences (quantitative task)

• Algorithms:– Constraint Generation (our method), – LB-1* (state of the art stability algorithm), – Least Squares (naïve, no stability guarantees)

* “Subspace Identification with guaranteed stability using subspace identification”, S. Lacy and D.

Bernstein, ACC 2002

Page 13: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Data Simulations

• Instability causes Least-Squares simulations to diverge

• Constraint Generation yields most realistic simulations that are also stable

Page 14: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

State Reconstruction• Obtained by computing the residual

t ||Axt – xt+1||2 , where {xt} are the estimated states

• Least squares has the best score by definition, since it is learned by regression on xtxt+1, but at the cost of instability

squared error Multi-drug data Multi-zip data

Constraint Generation

57,338 26,171

LB-1 60,669 26,431

Least Squares 56,203 16,918

• Stable methods trade off reconstruction error vs. stability

• Constraint Generation learns the most accurate models that are also stable

Page 15: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Prediction (preliminary results)

• Average prediction error obtained by tracking (filtering) up to time t, then simulating upto time t’ and calculating the sum of squared error, and averaging this over all t and t’ > t

avg sqd err (std dev) Multi-drug data Multi-zip data

Constraint Generation

59,845 (310) 45,465 (317)

LB-1 53,494 (364) 44,677 (266)

Least Squares 79,649 (648) n/a

• Stable methods yield superior results to least squares

Page 16: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Conclusion• Linear Dynamical Systems effective at modeling

multivariate time series data

• Stability crucial for accurate performance

• Superior performance of stable methods in baseline generation and prediction on OTC data

• Constraint Generation learns a more accurate model with more realistic simulations, most efficiently*. Further work needed on prediction accuracy metric.

* “A Constraint Generation Approach to Learning Stable Linear Dynamical Systems”, S. Siddiqi, B. Boots, G. Gordon, NIPS 2007

Page 17: Learning Stable Multivariate Baseline Models for Outbreak Detection Sajid M. Siddiqi, Byron Boots, Geoffrey J. Gordon, Artur W. Dubrawski The Auton Lab

Thank You! Questions?

further questions to [email protected]