Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
High Dimensionsal Time Series
High-dimensional Time Series Models
George Michailidis
University of Florida
Transdisciplinary Foundations of Data ScienceIMA, September 2016
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 1
/ 34
High Dimensionsal Time Series
Introduction
Learning Tasks with Temporally Dependent Data
Predictive inference, forecasting, segmentation,covariance estimation/graphicalmodeling
Regression models:yt = Xtβ + εt , where the p-dimensional predictors X and error term ε isgenerated by a stationary process
Autoregressive models: Xt = AXt−1 + Et , where the p-dimensional errorprocess Et is white noiseRelated control problem:Xt = AXt−1 + BUt + Et , together with a cost/performance function
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 2
/ 34
High Dimensionsal Time Series
Introduction
Learning Tasks with Temporally Dependent Data
Predictive inference, forecasting, segmentation,covariance estimation/graphicalmodeling (ctd)
Factor models: Xt = ΛFt + Et , where Xt is a p-dimensional process, Ft ak-dimensional latent/factor process and Et a noise processA popular model in the economics/finance literature is for the factors tobe changing dynamically over time;e.g. Ft = ΦFt−1 + Ut
Given a multivariate time series Xt and identify structural breaks;i.e. identify points in time that the structure of the model changese.g Xt = A1Xt−1I (t ≤ τ) + A2Xt−1I (t ≥ τ) + Et , for some τ ∈ [0,T ]There is an online version of the problem for streaming data
Estimate covariance matrix of temporally dependent data
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 3
/ 34
High Dimensionsal Time Series
Introduction
Application areas
Macroeconomics/Finance
Functional Genomics
Neuroscience
Control of large networks
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 4
/ 34
High Dimensionsal Time Series
Introduction
Application areas: Economics
testing relationship between money and income (Sims, 1972, 1980)
understanding stock price-volume relation (Hiemstra et al., 1994)
dynamic effect of government spending and taxes on output (Blanchardand Jones, 2002)
identify and measure the effects of monetary policy innovations onmacroeconomic variables (Bernanke et al., 2005)
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 5
/ 34
High Dimensionsal Time Series
Introduction
Forecasting models in Economics
-6
-4
-2
0
2
4
6
Feb
-60
Au
g-6
0
Feb
-61
Au
g-6
1
Feb
-62
Au
g-6
2
Feb
-63
Au
g-6
3
Feb
-64
Au
g-6
4
Feb
-65
Au
g-6
5
Feb
-66
Au
g-6
6
Feb
-67
Au
g-6
7
Feb
-68
Au
g-6
8
Feb
-69
Au
g-6
9
Feb
-70
Au
g-7
0
Feb
-71
Au
g-7
1
Feb
-72
Au
g-7
2
Feb
-73
Au
g-7
3
Feb
-74
Au
g-7
4
Employment
Federal Funds Rate
Consumer Price Index
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 6
/ 34
High Dimensionsal Time Series
Introduction
Application areas: Functional Genomics
Identify regulatory mechanisms from time course data (panel data structure)
HeLa gene expression regulatory network [From: Fujita et al., 2007]
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 7
/ 34
High Dimensionsal Time Series
Introduction
Application Areas: Neuroscience
Identify brain connectivity regions
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 8
/ 34
High Dimensionsal Time Series
Introduction
Need for high-dimensional models
Economics: forecasting with many predictors (De Mol et al., 2008) orunderstand strcutural relationships (Christiano et al., 1999)
Finance: build large scale systemic risk models
Functional Genomics: reconstruct gene regulatory networks based onlimited experimental data
Neuroscience: build detailed connectivity maps on temporal dataexhibiting multiple structural changes
Network control: for large sparse systems (Liu, Slotine, Barabasi, 2011)
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 9
/ 34
High Dimensionsal Time Series
Introduction
Key issues:
Nature of the data measurements (numerical, count, binary) (see Raskuttiet al., 2016, for models for count data)
Capture the correct dynamics (see Chen and Shojaie, 2016 for models forself-exciting processes)
How does the temporal dependence impact estimation and predictionaccuracy?
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 10
/ 34
High Dimensionsal Time Series
Introduction
Illustration of estimation accuracy
REGULARIZED ESTIMATION IN TIME SERIES 7
Fig. 1. Estimation error of lasso in stochastic regression. Top panel: Example 1, VAR(1)process of predictors with cross-sectional dependence. Bottom panel: Example 2, VAR(2)process of predictors with no cross-sectional dependence.
affect the convergence rates of lasso estimates in a more intricate manner,not completely captured by ρ(A). Further, several authors [Loh and Wain-wright (2012), Negahban and Wainwright (2011), Han and Liu (2013)] con-ducted nonasymptotic analysis of high-dimensional VAR(1) models, assum-ing ‖A‖< 1. In Appendix E (supplementary material [Basu and Michailidis(2015)]) (see Figure 1 and Lemma E.2), we show that this assumption is re-strictive and is violated by many stable VAR(1) models. More importantly,such an assumption does not generalize beyond VAR(1).
Example 1. We generate data from the stochastic regression model(1.1) with p = 200 predictors and i.i.d. errors εt. The process of predic-tors comes from a Gaussian VAR(1) model Xt = AXt−1 + ξt, where A isan upper triangular matrix with α = 0.2 on the diagonal and γ on thetwo upper off-diagonal bands. We generate processes with different levelsof cross-correlation among the predictors by changing γ and plot the aver-age estimation error of lasso (over multiple iterates) against different samplesizes n in Figure 1.
The spectral radius is common (α= 0.2) across all models. Consistentlywith the classical low-dimensional asymptotics, the lasso errors for differ-ent processes seem to converge as n goes to infinity. However, for smallto moderate n, as is common in high-dimensional regimes, lasso errors are
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 11
/ 34
High Dimensionsal Time Series
Modeling Framework
Vector Autoregression
Canonical model for understanding lead-lag cross-dependencies
Successful for forecasting purposes and for intervention analysis (impulseresponse)
Exhibits a number of technical challenges in high-dimensions
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 12
/ 34
High Dimensionsal Time Series
Modeling Framework
The VAR Model
p-dimensional, discrete time, stationary process X t = X t1 , . . . ,X
tp
X t = A1Xt−1 + . . .+ AdX
t−d + εt , εti.i.d∼ N(0,Σε) (1)
A1, . . . ,Ad : p × p transition matrices (solid, directed edges)
Σ−1ε : contemporaneous dependence (dotted, undirected edges)
stability: Eigenvalues of A(z) := Ip −∑d
t=1 Atzt outside z ∈ C, |z | ≤ 1
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 13
/ 34
High Dimensionsal Time Series
Modeling Framework
Detour: VARs and Granger Causality
Concept introduced by Granger (1969)
A time series X is said to Granger-cause Y if it can be shown, usuallythrough a series of F-tests on lagged values of X (and with lagged valuesof Y also known), that those X values provide statistically significantinformation about future values of Y .
In the context of a high-dimensional VAR model we have thatXT−t
j is Granger-causal for XTi if At
i,j 6= 0.
Granger-causality does not imply true causality; it is built on correlations
Also, related to estimating a Directed Acyclic Graph (DAG) with(d + 1)× p variables, with a known ordering of the variables
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 14
/ 34
High Dimensionsal Time Series
Modeling Framework
Estimating high-dimensional VARs through regression
data: X 0,X 1, . . . ,XT - one replicate, observed at T + 1 time points
construct autoregression
(XT )′
(XT−1)′
...(X d )′
︸ ︷︷ ︸
Y
=
(XT−1)′ (XT−2)′ · · · (XT−d)′
(XT−2)′ (XT−3)′ · · · (XT−1−d)′
.... . .
......
(X d−1)′ (X d−2)′ · · · (X 0)′
︸ ︷︷ ︸
X
A′1...A′d
︸ ︷︷ ︸
B∗
+
(εT )′
(εT−1)′
...(εd)′
︸ ︷︷ ︸
E
vec(Y) = vec(X B∗) + vec(E)
= (I ⊗X ) vec(B∗) + vec(E)
Y︸︷︷︸Np×1
= Z︸︷︷︸Np×q
β∗︸︷︷︸q×1
+ vec(E)︸ ︷︷ ︸Np×1
vec(E) ∼ N (0,Σε ⊗ I )
N = (T − d + 1), q = dp2
Key Assumption : At are sparse,∑d
t=1 ‖At‖0 ≤ k
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 15
/ 34
High Dimensionsal Time Series
Modeling Framework
Estimation Methods
`1-penalized least squares (`1-LS)
argminβ∈Rq
1
N‖Y − Zβ‖2 + λN ‖β‖1
`1-penalized log-likelihood (`1-LL) (Davis et al., 2012)
argminβ∈Rq
1
N(Y − Zβ)′
(Σ−1ε ⊗ I
)(Y − Zβ) + λN ‖β‖1
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 16
/ 34
High Dimensionsal Time Series
Modeling Framework
`1-LL Algorithm
Objective function jointly non-convex, but convex w.r.t. B’s and Σ−1ε
Algorithm converges to stationary point near truth with high probabilityunder high-dimensional scaling, provided it is initialized at a good point(details in Lin et al., 2016)
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 17
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Detour: Probabilistic Consistency of Lasso
For regression models, the quality of the estimates of the regression parametersdepends on relies crucially on two regularity conditions:
1. Restricted Eigenvalue (RE): The null space of the normalized designmatrix X/
√N avoids a cone set C(S , 3) := v ∈ Rp : ‖vSc ‖1 ≤ 3‖vS‖1
αRE := minv∈Rp ,‖v‖≤1,v∈C(S,3)
1
N‖Xv‖2 > 0
2. Deviation Condition: ‖X ′E/N‖max ≤ Q(X , σ)√
log p/N
Under the above conditions
Estimation error: ‖β − β∗‖2 ≤Q(X , σ)
αRE
√k log p
Nwith high probability
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 18
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Detour: Probabilistic Consistency of Lasso
For regression models, the quality of the estimates of the regression parametersdepends on relies crucially on two regularity conditions:
1. Restricted Eigenvalue (RE): The null space of the normalized designmatrix X/
√N avoids a cone set C(S , 3) := v ∈ Rp : ‖vSc ‖1 ≤ 3‖vS‖1
αRE := minv∈Rp ,‖v‖≤1,v∈C(S,3)
1
N‖Xv‖2 > 0
2. Deviation Condition: ‖X ′E/N‖max ≤ Q(X , σ)√
log p/N
Under the above conditions
Estimation error: ‖β − β∗‖2 ≤Q(X , σ)
αRE
√k log p
Nwith high probability
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 18
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Lasso Regression for Time Series Data
It is unknown if the above conditions are satisfied in high-dimensional timeseries data
Verifying RE type assumptions for a fixed design is NP-hard
For random design matrix X , existing results only provide guarantees whenthe samples are independent
Even for a stationary process, the data share complicated dependencepatterns -I Rows are dependentI Columns are dependentI error term E and design matrix X dependent
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 19
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Lasso Regression for Time Series Data
It is unknown if the above conditions are satisfied in high-dimensional timeseries data
Verifying RE type assumptions for a fixed design is NP-hard
For random design matrix X , existing results only provide guarantees whenthe samples are independent
Even for a stationary process, the data share complicated dependencepatterns -I Rows are dependentI Columns are dependentI error term E and design matrix X dependent
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 19
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Lasso Regression for Time Series Data
It is unknown if the above conditions are satisfied in high-dimensional timeseries data
Verifying RE type assumptions for a fixed design is NP-hard
For random design matrix X , existing results only provide guarantees whenthe samples are independent
Even for a stationary process, the data share complicated dependencepatterns -I Rows are dependentI Columns are dependentI error term E and design matrix X dependent
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 19
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Lasso Regression for Time Series Data
It is unknown if the above conditions are satisfied in high-dimensional timeseries data
Verifying RE type assumptions for a fixed design is NP-hard
For random design matrix X , existing results only provide guarantees whenthe samples are independent
Even for a stationary process, the data share complicated dependencepatterns -I Rows are dependentI Columns are dependentI error term E and design matrix X dependent
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 19
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Vector Autoregression
Random design matrix X , correlated with error matrix E(XT )′
(XT−1)′
...(X d )′
︸ ︷︷ ︸
Y
=
(XT−1)′ (XT−2)′ · · · (XT−d )′
(XT−2)′ (XT−3)′ · · · (XT−1−d )′
.... . .
......
(X d−1)′ (X d−2)′ · · · (X 0)′
︸ ︷︷ ︸
X
A′1...A′d
︸ ︷︷ ︸
B∗
+
(εT )′
(εT−1)′
...(εd )′
︸ ︷︷ ︸
Evec(Y) = vec(X B∗) + vec(E)
= (I ⊗X ) vec(B∗) + vec(E)
Y︸︷︷︸Np×1
= Z︸︷︷︸Np×q
β∗︸︷︷︸q×1
+ vec(E)︸ ︷︷ ︸Np×1
vec(E) ∼ N (0,Σε ⊗ I )
N = (T − d + 1), q = dp2
Question: How often does RE hold? How small is αRE ? How does thecross-correlation affect convergence rates?
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 20
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Quantifying Dependence in high-dimensional VAR: Existing approaches
One can try to procced analogously to regression for iid datae.g. Negahban and Wainwright, 2011: for VAR(1) models, assume‖A1‖ < 1, where ‖A‖ :=
√Λmax(A′A)
For univariate autoregression X t = ρX t−1 + εt , reduces to |ρ| < 1 -equivalent to stability assumption
It turns out, this is a very restrictive assumption for most realistic VARmodels
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 21
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Quantifying Dependence via the Spectral Density
Spectral density function of a covariance stationary process X t,
fX (θ) =1
2π
∞∑l=−∞
ΓX (l)e−ilθ, θ ∈ [−π, π]
ΓX (l) = E[X t(X t+l )′
], autocovariance matrix of order l
If the VAR process is stable, it has a closed form (cf. equation (9.4.23), Priestley(1981))
fX (θ) =1
2π
(A(e−iθ)
)−1Σε(A∗(e−iθ)
)−1
The two sources of dependence factorize in the frequency domain
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 22
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Quantifying Dependence via the Spectral Density
Spectral density function of a covariance stationary process X t,
fX (θ) =1
2π
∞∑l=−∞
ΓX (l)e−ilθ, θ ∈ [−π, π]
ΓX (l) = E[X t(X t+l )′
], autocovariance matrix of order l
If the VAR process is stable, it has a closed form (cf. equation (9.4.23), Priestley(1981))
fX (θ) =1
2π
(A(e−iθ)
)−1Σε(A∗(e−iθ)
)−1
The two sources of dependence factorize in the frequency domain
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 22
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Quantifying Dependence via the Spectral Density
For univariate processes, “peak” of the spectral density measures stability of theprocess - (sharper peak = less stable)
Quantifying Stability by Spectral Density
For univariate processes, “peak" of the spectral density measures stability of theprocess - (sharper peak = less stable)
−10 −5 0 5 10
0.0
0.2
0.4
0.6
0.8
1.0
lag (h)
Au
toco
vari
an
ce
Γ(h
)
ρ=0.1
ρ=0.5
ρ=0.7
Autocovariance of AR(1)
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
θf(
θ)
ρ=0.1
ρ=0.5
ρ=0.7
Spectral Density of AR(1)
For multivariate processes, similar role is played by the maximum eigenvalue ofthe (matrix-valued) spectral density
Sumanta Basu (UM) Network Estimation in Time Series 52 / 56
For multivariate processes, a similar role is played by the maximum eigenvalue ofthe (matrix-valued) spectral density
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 23
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Quantifying Dependence via the Spectral Density
For a covariance stationary process X t with continuous spectral density fX (θ),the maximum eigenvalue of its spectral density captures its stability
M(fX ) = maxθ∈[−π,π]
Λmax (fX (θ))
The minimum eigenvalue of the spectral density captures dependence among itscomponents
m(fX ) = minθ∈[−π,π]
Λmin (fX (θ))
For stable VAR(1) processes, M(fX ) scales with (1− ρ(A1))−2, where ρ(A1) isthe spectral radius of A1
m(fX ) scales with the capacity (maximum incoming + outgoing effect at a node)of the underlying graph
We can similarly measure stability of subprocesses
M(fX , k) := maxJ⊂1,...,p,|J|=k
M(fX (J)
)M(fX , 1) ≤M(fX , 2) ≤ · · · ≤ M(fX , p) =M(fX )
Allows us to derive concentration inequalities with dependent random variables
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 24
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Quantifying Dependence via the Spectral Density
For a covariance stationary process X t with continuous spectral density fX (θ),the maximum eigenvalue of its spectral density captures its stability
M(fX ) = maxθ∈[−π,π]
Λmax (fX (θ))
The minimum eigenvalue of the spectral density captures dependence among itscomponents
m(fX ) = minθ∈[−π,π]
Λmin (fX (θ))
For stable VAR(1) processes, M(fX ) scales with (1− ρ(A1))−2, where ρ(A1) isthe spectral radius of A1
m(fX ) scales with the capacity (maximum incoming + outgoing effect at a node)of the underlying graph
We can similarly measure stability of subprocesses
M(fX , k) := maxJ⊂1,...,p,|J|=k
M(fX (J)
)M(fX , 1) ≤M(fX , 2) ≤ · · · ≤ M(fX , p) =M(fX )
Allows us to derive concentration inequalities with dependent random variables
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 24
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Quantifying Dependence via the Spectral Density
For a covariance stationary process X t with continuous spectral density fX (θ),the maximum eigenvalue of its spectral density captures its stability
M(fX ) = maxθ∈[−π,π]
Λmax (fX (θ))
The minimum eigenvalue of the spectral density captures dependence among itscomponents
m(fX ) = minθ∈[−π,π]
Λmin (fX (θ))
For stable VAR(1) processes, M(fX ) scales with (1− ρ(A1))−2, where ρ(A1) isthe spectral radius of A1
m(fX ) scales with the capacity (maximum incoming + outgoing effect at a node)of the underlying graph
We can similarly measure stability of subprocesses
M(fX , k) := maxJ⊂1,...,p,|J|=k
M(fX (J)
)M(fX , 1) ≤M(fX , 2) ≤ · · · ≤ M(fX , p) =M(fX )
Allows us to derive concentration inequalities with dependent random variables
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 24
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Consistency of sparse VAR estimates
It is established (see Basu and Michailidis, 2015)
d∑h=1
∥∥∥Ah − Ah
∥∥∥ ≤ φ(At ,Σε)(√
k (log dp2)/N)
Consistency in high-dimension: Even if d , p = O(N2), k log dp2/N → 0as long as k = o(N)
Error has two components:1. φ(At ,Σε) large ⇔ M(fX ) large, m(fX ) small
2.√
k log dp2/N: Estimation error for independent data
Estimation error same as i.i.d. data, modulo a price for temporaldependence
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 25
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Recap of the Main Theoretical Results
Assuming RE and deviation conditions we can establish the consistency ofsparse estimates of high-dimensional VAR models
For stable VAR models, the RE and deviation conditions hold with highprobability
The convergence rate has two components:(i) the component governed by the structural parameters of the problemsand is identical to the iid case and(ii) the component governed by the temporal depencence
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 26
/ 34
High Dimensionsal Time Series
Theoretical Considerations
Beyond VAR and Sparsity:High Dimensional Models for Time Series Data
The concentration bounds obtained can be used to prove estimation consistency forother regularized methods with high-dimensional, Gaussian time series
Regression with Lasso; non-convex penalty (SCAD, MCP)
Generalized linear models (regression with non-continuous outcome variables[Raskutti et al., 2016]
Sparse covariance estimation with time series data
Regression / VAR with Group Lasso [Basu et al., 2015]
Low rank and Low rank+ Sparse VAR [Basu, 2014]
Tensor Regression with dependent data [Raskutti and Yuan, 2015]
Time series with local dependence [Schweinberger et al., 2015]
VAR models with grouped structure on the transition matrices [Mattesson et al.,2015]
The results have a common theme
estimation errorfor dependent data
-Measure of narrowness
of spectrum× estimation error
for i .i .d . data
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 27
/ 34
High Dimensionsal Time Series
Segmentation problems
Models with Structural Breaks
Increasing interest in using time series models and/or graphical models asnetwork models derived from high-dimensional data
Numerous applications both for the offline and online versions
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 28
/ 34
High Dimensionsal Time Series
Segmentation problems
A Canonical Statistical Problem:Change Point Detection
Simplest Setup:
Random vector observed in a time interval 1, · · · ,T offline versionXt = (X1t ,X2t , · · · ,Xpt) ∼ N(0,Σ1), t ≤ τ ,Xt = (X1t ,X2t , · · · ,Xpt) ∼ N(0,Σ2), t > τ .
Objectives:1. Estimate the change point τ2. Estimate the Gaussian graphical models Ω1 ≡ Σ−1
1 ,Ω2 ≡ Σ−12
The iid assumption is simplifying, but can easily be mitigated throughneighborhood selection techniques leveraging lasso regressions with temporallydependent errors (Basu and Michailidis, 2015)
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 29
/ 34
High Dimensionsal Time Series
Segmentation problems
Some Background on Low Dimensional Change Point Problems
Assume a stump model:
yi = αI (i ≤ τ) + βI (i > τ) + εi , i = 1, · · · ,T ,
where εi ∼ N(0, σ2) and I (·) denotes the indicator function
Then, under a condition on the signal-to-noise ratio |α−β|σ≥ C > 0, one
can establish the following
1. α, β can be estimated at rate√T (the usual parametric rate)
2. |τ−τ |T
= O( 1T
)
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 30
/ 34
High Dimensionsal Time Series
Segmentation problems
Naive Algorithm
1. For each t = 1 + d , · · · ,T − d , for some d > 0,calculate the joint Gaussian likelihood assuming that τcandidate = t which isgiven by
L(Ω1; t = 1, · · · , τcandidate) + L(Ω2; t = τcandidate + 1, · · · ,T )
2. Setτ = argmaxτcandidateL(Ω1,Ω2, τcandidate)
Technical Challenges:Note that at any solution τ 6= τ , one term in the likelihood is misspecified
Hence, a much more careful handling of the technical issues is needed toestablish the results
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 31
/ 34
High Dimensionsal Time Series
Segmentation problems
Main Results
Under a Restricted Eigenvalue condition it can be established that (Roy,Atchade and Michailidis, 2016)
1.||Ωk − Ωk ||F = O(
√s log pT )
2.|τ − τ |
T= O(
log pT
T).
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 32
/ 34
High Dimensionsal Time Series
Segmentation problems
Extension to Multiple Change Points
In a recent paper, Leonardi and Buhlmann, arXiv, 2016 look at the sameproblem, but allow multiple change points
To identify the change points, they propose a dynamic programming algorithm,as well as a computationally faster binary search approximation
Further, they look at estimation consistency properties of τk , k = 1, · · · ,K andthe corresponding Ωk ’s in a slow regime where change points are sparse and farapart and in a fast regime, where change points grow as a function of T
The rates for the Ωk ’s are the usual ones, but even in the slow regime theobtained rate for τk is worse than the one previously obtained.
A related problem was studied in Kollar and Xing, Electronic J. of Statistics,2012 where each node can experience multiple-change points and Soh andChadrashekaran, arXiv 2014
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 33
/ 34
High Dimensionsal Time Series
Segmentation problems
Concluding Remarks
Temporal data are present in a diverse set of applied areas
Time series models pose a number of subtle technical challenges inhigh-dimensions
A number of open questions:1. Going beyond Gaussian data (heavy tailed distributions, mixed types of
data)2. Incorporation of prior information/Bayesian modeling3. Inference framework for assessing both parameter and model significance4. Better models for capturing intricate temporal dynamics5. Intervention/control problems
George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 34
/ 34