41
High Dimensionsal Time Series High-dimensional Time Series Models George Michailidis University of Florida Transdisciplinary Foundations of Data Science IMA, September 2016 George Michailidis High Dimensionsal Time Series Transdisciplinary Foundations of Data Science IMA, S / 34

High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

High-dimensional Time Series Models

George Michailidis

University of Florida

Transdisciplinary Foundations of Data ScienceIMA, September 2016

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 1

/ 34

Page 2: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Introduction

Learning Tasks with Temporally Dependent Data

Predictive inference, forecasting, segmentation,covariance estimation/graphicalmodeling

Regression models:yt = Xtβ + εt , where the p-dimensional predictors X and error term ε isgenerated by a stationary process

Autoregressive models: Xt = AXt−1 + Et , where the p-dimensional errorprocess Et is white noiseRelated control problem:Xt = AXt−1 + BUt + Et , together with a cost/performance function

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 2

/ 34

Page 3: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Introduction

Learning Tasks with Temporally Dependent Data

Predictive inference, forecasting, segmentation,covariance estimation/graphicalmodeling (ctd)

Factor models: Xt = ΛFt + Et , where Xt is a p-dimensional process, Ft ak-dimensional latent/factor process and Et a noise processA popular model in the economics/finance literature is for the factors tobe changing dynamically over time;e.g. Ft = ΦFt−1 + Ut

Given a multivariate time series Xt and identify structural breaks;i.e. identify points in time that the structure of the model changese.g Xt = A1Xt−1I (t ≤ τ) + A2Xt−1I (t ≥ τ) + Et , for some τ ∈ [0,T ]There is an online version of the problem for streaming data

Estimate covariance matrix of temporally dependent data

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 3

/ 34

Page 4: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Introduction

Application areas

Macroeconomics/Finance

Functional Genomics

Neuroscience

Control of large networks

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 4

/ 34

Page 5: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Introduction

Application areas: Economics

testing relationship between money and income (Sims, 1972, 1980)

understanding stock price-volume relation (Hiemstra et al., 1994)

dynamic effect of government spending and taxes on output (Blanchardand Jones, 2002)

identify and measure the effects of monetary policy innovations onmacroeconomic variables (Bernanke et al., 2005)

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 5

/ 34

Page 6: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Introduction

Forecasting models in Economics

-6

-4

-2

0

2

4

6

Feb

-60

Au

g-6

0

Feb

-61

Au

g-6

1

Feb

-62

Au

g-6

2

Feb

-63

Au

g-6

3

Feb

-64

Au

g-6

4

Feb

-65

Au

g-6

5

Feb

-66

Au

g-6

6

Feb

-67

Au

g-6

7

Feb

-68

Au

g-6

8

Feb

-69

Au

g-6

9

Feb

-70

Au

g-7

0

Feb

-71

Au

g-7

1

Feb

-72

Au

g-7

2

Feb

-73

Au

g-7

3

Feb

-74

Au

g-7

4

Employment

Federal Funds Rate

Consumer Price Index

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 6

/ 34

Page 7: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Introduction

Application areas: Functional Genomics

Identify regulatory mechanisms from time course data (panel data structure)

HeLa gene expression regulatory network [From: Fujita et al., 2007]

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 7

/ 34

Page 8: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Introduction

Application Areas: Neuroscience

Identify brain connectivity regions

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 8

/ 34

Page 9: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Introduction

Need for high-dimensional models

Economics: forecasting with many predictors (De Mol et al., 2008) orunderstand strcutural relationships (Christiano et al., 1999)

Finance: build large scale systemic risk models

Functional Genomics: reconstruct gene regulatory networks based onlimited experimental data

Neuroscience: build detailed connectivity maps on temporal dataexhibiting multiple structural changes

Network control: for large sparse systems (Liu, Slotine, Barabasi, 2011)

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 9

/ 34

Page 10: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Introduction

Key issues:

Nature of the data measurements (numerical, count, binary) (see Raskuttiet al., 2016, for models for count data)

Capture the correct dynamics (see Chen and Shojaie, 2016 for models forself-exciting processes)

How does the temporal dependence impact estimation and predictionaccuracy?

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 10

/ 34

Page 11: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Introduction

Illustration of estimation accuracy

REGULARIZED ESTIMATION IN TIME SERIES 7

Fig. 1. Estimation error of lasso in stochastic regression. Top panel: Example 1, VAR(1)process of predictors with cross-sectional dependence. Bottom panel: Example 2, VAR(2)process of predictors with no cross-sectional dependence.

affect the convergence rates of lasso estimates in a more intricate manner,not completely captured by ρ(A). Further, several authors [Loh and Wain-wright (2012), Negahban and Wainwright (2011), Han and Liu (2013)] con-ducted nonasymptotic analysis of high-dimensional VAR(1) models, assum-ing ‖A‖< 1. In Appendix E (supplementary material [Basu and Michailidis(2015)]) (see Figure 1 and Lemma E.2), we show that this assumption is re-strictive and is violated by many stable VAR(1) models. More importantly,such an assumption does not generalize beyond VAR(1).

Example 1. We generate data from the stochastic regression model(1.1) with p = 200 predictors and i.i.d. errors εt. The process of predic-tors comes from a Gaussian VAR(1) model Xt = AXt−1 + ξt, where A isan upper triangular matrix with α = 0.2 on the diagonal and γ on thetwo upper off-diagonal bands. We generate processes with different levelsof cross-correlation among the predictors by changing γ and plot the aver-age estimation error of lasso (over multiple iterates) against different samplesizes n in Figure 1.

The spectral radius is common (α= 0.2) across all models. Consistentlywith the classical low-dimensional asymptotics, the lasso errors for differ-ent processes seem to converge as n goes to infinity. However, for smallto moderate n, as is common in high-dimensional regimes, lasso errors are

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 11

/ 34

Page 12: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Modeling Framework

Vector Autoregression

Canonical model for understanding lead-lag cross-dependencies

Successful for forecasting purposes and for intervention analysis (impulseresponse)

Exhibits a number of technical challenges in high-dimensions

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 12

/ 34

Page 13: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Modeling Framework

The VAR Model

p-dimensional, discrete time, stationary process X t = X t1 , . . . ,X

tp

X t = A1Xt−1 + . . .+ AdX

t−d + εt , εti.i.d∼ N(0,Σε) (1)

A1, . . . ,Ad : p × p transition matrices (solid, directed edges)

Σ−1ε : contemporaneous dependence (dotted, undirected edges)

stability: Eigenvalues of A(z) := Ip −∑d

t=1 Atzt outside z ∈ C, |z | ≤ 1

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 13

/ 34

Page 14: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Modeling Framework

Detour: VARs and Granger Causality

Concept introduced by Granger (1969)

A time series X is said to Granger-cause Y if it can be shown, usuallythrough a series of F-tests on lagged values of X (and with lagged valuesof Y also known), that those X values provide statistically significantinformation about future values of Y .

In the context of a high-dimensional VAR model we have thatXT−t

j is Granger-causal for XTi if At

i,j 6= 0.

Granger-causality does not imply true causality; it is built on correlations

Also, related to estimating a Directed Acyclic Graph (DAG) with(d + 1)× p variables, with a known ordering of the variables

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 14

/ 34

Page 15: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Modeling Framework

Estimating high-dimensional VARs through regression

data: X 0,X 1, . . . ,XT - one replicate, observed at T + 1 time points

construct autoregression

(XT )′

(XT−1)′

...(X d )′

︸ ︷︷ ︸

Y

=

(XT−1)′ (XT−2)′ · · · (XT−d)′

(XT−2)′ (XT−3)′ · · · (XT−1−d)′

.... . .

......

(X d−1)′ (X d−2)′ · · · (X 0)′

︸ ︷︷ ︸

X

A′1...A′d

︸ ︷︷ ︸

B∗

+

(εT )′

(εT−1)′

...(εd)′

︸ ︷︷ ︸

E

vec(Y) = vec(X B∗) + vec(E)

= (I ⊗X ) vec(B∗) + vec(E)

Y︸︷︷︸Np×1

= Z︸︷︷︸Np×q

β∗︸︷︷︸q×1

+ vec(E)︸ ︷︷ ︸Np×1

vec(E) ∼ N (0,Σε ⊗ I )

N = (T − d + 1), q = dp2

Key Assumption : At are sparse,∑d

t=1 ‖At‖0 ≤ k

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 15

/ 34

Page 16: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Modeling Framework

Estimation Methods

`1-penalized least squares (`1-LS)

argminβ∈Rq

1

N‖Y − Zβ‖2 + λN ‖β‖1

`1-penalized log-likelihood (`1-LL) (Davis et al., 2012)

argminβ∈Rq

1

N(Y − Zβ)′

(Σ−1ε ⊗ I

)(Y − Zβ) + λN ‖β‖1

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 16

/ 34

Page 17: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Modeling Framework

`1-LL Algorithm

Objective function jointly non-convex, but convex w.r.t. B’s and Σ−1ε

Algorithm converges to stationary point near truth with high probabilityunder high-dimensional scaling, provided it is initialized at a good point(details in Lin et al., 2016)

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 17

/ 34

Page 18: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Detour: Probabilistic Consistency of Lasso

For regression models, the quality of the estimates of the regression parametersdepends on relies crucially on two regularity conditions:

1. Restricted Eigenvalue (RE): The null space of the normalized designmatrix X/

√N avoids a cone set C(S , 3) := v ∈ Rp : ‖vSc ‖1 ≤ 3‖vS‖1

αRE := minv∈Rp ,‖v‖≤1,v∈C(S,3)

1

N‖Xv‖2 > 0

2. Deviation Condition: ‖X ′E/N‖max ≤ Q(X , σ)√

log p/N

Under the above conditions

Estimation error: ‖β − β∗‖2 ≤Q(X , σ)

αRE

√k log p

Nwith high probability

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 18

/ 34

Page 19: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Detour: Probabilistic Consistency of Lasso

For regression models, the quality of the estimates of the regression parametersdepends on relies crucially on two regularity conditions:

1. Restricted Eigenvalue (RE): The null space of the normalized designmatrix X/

√N avoids a cone set C(S , 3) := v ∈ Rp : ‖vSc ‖1 ≤ 3‖vS‖1

αRE := minv∈Rp ,‖v‖≤1,v∈C(S,3)

1

N‖Xv‖2 > 0

2. Deviation Condition: ‖X ′E/N‖max ≤ Q(X , σ)√

log p/N

Under the above conditions

Estimation error: ‖β − β∗‖2 ≤Q(X , σ)

αRE

√k log p

Nwith high probability

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 18

/ 34

Page 20: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Lasso Regression for Time Series Data

It is unknown if the above conditions are satisfied in high-dimensional timeseries data

Verifying RE type assumptions for a fixed design is NP-hard

For random design matrix X , existing results only provide guarantees whenthe samples are independent

Even for a stationary process, the data share complicated dependencepatterns -I Rows are dependentI Columns are dependentI error term E and design matrix X dependent

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 19

/ 34

Page 21: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Lasso Regression for Time Series Data

It is unknown if the above conditions are satisfied in high-dimensional timeseries data

Verifying RE type assumptions for a fixed design is NP-hard

For random design matrix X , existing results only provide guarantees whenthe samples are independent

Even for a stationary process, the data share complicated dependencepatterns -I Rows are dependentI Columns are dependentI error term E and design matrix X dependent

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 19

/ 34

Page 22: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Lasso Regression for Time Series Data

It is unknown if the above conditions are satisfied in high-dimensional timeseries data

Verifying RE type assumptions for a fixed design is NP-hard

For random design matrix X , existing results only provide guarantees whenthe samples are independent

Even for a stationary process, the data share complicated dependencepatterns -I Rows are dependentI Columns are dependentI error term E and design matrix X dependent

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 19

/ 34

Page 23: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Lasso Regression for Time Series Data

It is unknown if the above conditions are satisfied in high-dimensional timeseries data

Verifying RE type assumptions for a fixed design is NP-hard

For random design matrix X , existing results only provide guarantees whenthe samples are independent

Even for a stationary process, the data share complicated dependencepatterns -I Rows are dependentI Columns are dependentI error term E and design matrix X dependent

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 19

/ 34

Page 24: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Vector Autoregression

Random design matrix X , correlated with error matrix E(XT )′

(XT−1)′

...(X d )′

︸ ︷︷ ︸

Y

=

(XT−1)′ (XT−2)′ · · · (XT−d )′

(XT−2)′ (XT−3)′ · · · (XT−1−d )′

.... . .

......

(X d−1)′ (X d−2)′ · · · (X 0)′

︸ ︷︷ ︸

X

A′1...A′d

︸ ︷︷ ︸

B∗

+

(εT )′

(εT−1)′

...(εd )′

︸ ︷︷ ︸

Evec(Y) = vec(X B∗) + vec(E)

= (I ⊗X ) vec(B∗) + vec(E)

Y︸︷︷︸Np×1

= Z︸︷︷︸Np×q

β∗︸︷︷︸q×1

+ vec(E)︸ ︷︷ ︸Np×1

vec(E) ∼ N (0,Σε ⊗ I )

N = (T − d + 1), q = dp2

Question: How often does RE hold? How small is αRE ? How does thecross-correlation affect convergence rates?

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 20

/ 34

Page 25: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Quantifying Dependence in high-dimensional VAR: Existing approaches

One can try to procced analogously to regression for iid datae.g. Negahban and Wainwright, 2011: for VAR(1) models, assume‖A1‖ < 1, where ‖A‖ :=

√Λmax(A′A)

For univariate autoregression X t = ρX t−1 + εt , reduces to |ρ| < 1 -equivalent to stability assumption

It turns out, this is a very restrictive assumption for most realistic VARmodels

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 21

/ 34

Page 26: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Quantifying Dependence via the Spectral Density

Spectral density function of a covariance stationary process X t,

fX (θ) =1

∞∑l=−∞

ΓX (l)e−ilθ, θ ∈ [−π, π]

ΓX (l) = E[X t(X t+l )′

], autocovariance matrix of order l

If the VAR process is stable, it has a closed form (cf. equation (9.4.23), Priestley(1981))

fX (θ) =1

(A(e−iθ)

)−1Σε(A∗(e−iθ)

)−1

The two sources of dependence factorize in the frequency domain

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 22

/ 34

Page 27: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Quantifying Dependence via the Spectral Density

Spectral density function of a covariance stationary process X t,

fX (θ) =1

∞∑l=−∞

ΓX (l)e−ilθ, θ ∈ [−π, π]

ΓX (l) = E[X t(X t+l )′

], autocovariance matrix of order l

If the VAR process is stable, it has a closed form (cf. equation (9.4.23), Priestley(1981))

fX (θ) =1

(A(e−iθ)

)−1Σε(A∗(e−iθ)

)−1

The two sources of dependence factorize in the frequency domain

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 22

/ 34

Page 28: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Quantifying Dependence via the Spectral Density

For univariate processes, “peak” of the spectral density measures stability of theprocess - (sharper peak = less stable)

Quantifying Stability by Spectral Density

For univariate processes, “peak" of the spectral density measures stability of theprocess - (sharper peak = less stable)

−10 −5 0 5 10

0.0

0.2

0.4

0.6

0.8

1.0

lag (h)

Au

toco

vari

an

ce

Γ(h

)

ρ=0.1

ρ=0.5

ρ=0.7

Autocovariance of AR(1)

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

θf(

θ)

ρ=0.1

ρ=0.5

ρ=0.7

Spectral Density of AR(1)

For multivariate processes, similar role is played by the maximum eigenvalue ofthe (matrix-valued) spectral density

Sumanta Basu (UM) Network Estimation in Time Series 52 / 56

For multivariate processes, a similar role is played by the maximum eigenvalue ofthe (matrix-valued) spectral density

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 23

/ 34

Page 29: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Quantifying Dependence via the Spectral Density

For a covariance stationary process X t with continuous spectral density fX (θ),the maximum eigenvalue of its spectral density captures its stability

M(fX ) = maxθ∈[−π,π]

Λmax (fX (θ))

The minimum eigenvalue of the spectral density captures dependence among itscomponents

m(fX ) = minθ∈[−π,π]

Λmin (fX (θ))

For stable VAR(1) processes, M(fX ) scales with (1− ρ(A1))−2, where ρ(A1) isthe spectral radius of A1

m(fX ) scales with the capacity (maximum incoming + outgoing effect at a node)of the underlying graph

We can similarly measure stability of subprocesses

M(fX , k) := maxJ⊂1,...,p,|J|=k

M(fX (J)

)M(fX , 1) ≤M(fX , 2) ≤ · · · ≤ M(fX , p) =M(fX )

Allows us to derive concentration inequalities with dependent random variables

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 24

/ 34

Page 30: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Quantifying Dependence via the Spectral Density

For a covariance stationary process X t with continuous spectral density fX (θ),the maximum eigenvalue of its spectral density captures its stability

M(fX ) = maxθ∈[−π,π]

Λmax (fX (θ))

The minimum eigenvalue of the spectral density captures dependence among itscomponents

m(fX ) = minθ∈[−π,π]

Λmin (fX (θ))

For stable VAR(1) processes, M(fX ) scales with (1− ρ(A1))−2, where ρ(A1) isthe spectral radius of A1

m(fX ) scales with the capacity (maximum incoming + outgoing effect at a node)of the underlying graph

We can similarly measure stability of subprocesses

M(fX , k) := maxJ⊂1,...,p,|J|=k

M(fX (J)

)M(fX , 1) ≤M(fX , 2) ≤ · · · ≤ M(fX , p) =M(fX )

Allows us to derive concentration inequalities with dependent random variables

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 24

/ 34

Page 31: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Quantifying Dependence via the Spectral Density

For a covariance stationary process X t with continuous spectral density fX (θ),the maximum eigenvalue of its spectral density captures its stability

M(fX ) = maxθ∈[−π,π]

Λmax (fX (θ))

The minimum eigenvalue of the spectral density captures dependence among itscomponents

m(fX ) = minθ∈[−π,π]

Λmin (fX (θ))

For stable VAR(1) processes, M(fX ) scales with (1− ρ(A1))−2, where ρ(A1) isthe spectral radius of A1

m(fX ) scales with the capacity (maximum incoming + outgoing effect at a node)of the underlying graph

We can similarly measure stability of subprocesses

M(fX , k) := maxJ⊂1,...,p,|J|=k

M(fX (J)

)M(fX , 1) ≤M(fX , 2) ≤ · · · ≤ M(fX , p) =M(fX )

Allows us to derive concentration inequalities with dependent random variables

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 24

/ 34

Page 32: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Consistency of sparse VAR estimates

It is established (see Basu and Michailidis, 2015)

d∑h=1

∥∥∥Ah − Ah

∥∥∥ ≤ φ(At ,Σε)(√

k (log dp2)/N)

Consistency in high-dimension: Even if d , p = O(N2), k log dp2/N → 0as long as k = o(N)

Error has two components:1. φ(At ,Σε) large ⇔ M(fX ) large, m(fX ) small

2.√

k log dp2/N: Estimation error for independent data

Estimation error same as i.i.d. data, modulo a price for temporaldependence

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 25

/ 34

Page 33: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Recap of the Main Theoretical Results

Assuming RE and deviation conditions we can establish the consistency ofsparse estimates of high-dimensional VAR models

For stable VAR models, the RE and deviation conditions hold with highprobability

The convergence rate has two components:(i) the component governed by the structural parameters of the problemsand is identical to the iid case and(ii) the component governed by the temporal depencence

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 26

/ 34

Page 34: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Theoretical Considerations

Beyond VAR and Sparsity:High Dimensional Models for Time Series Data

The concentration bounds obtained can be used to prove estimation consistency forother regularized methods with high-dimensional, Gaussian time series

Regression with Lasso; non-convex penalty (SCAD, MCP)

Generalized linear models (regression with non-continuous outcome variables[Raskutti et al., 2016]

Sparse covariance estimation with time series data

Regression / VAR with Group Lasso [Basu et al., 2015]

Low rank and Low rank+ Sparse VAR [Basu, 2014]

Tensor Regression with dependent data [Raskutti and Yuan, 2015]

Time series with local dependence [Schweinberger et al., 2015]

VAR models with grouped structure on the transition matrices [Mattesson et al.,2015]

The results have a common theme

estimation errorfor dependent data

-Measure of narrowness

of spectrum× estimation error

for i .i .d . data

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 27

/ 34

Page 35: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Segmentation problems

Models with Structural Breaks

Increasing interest in using time series models and/or graphical models asnetwork models derived from high-dimensional data

Numerous applications both for the offline and online versions

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 28

/ 34

Page 36: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Segmentation problems

A Canonical Statistical Problem:Change Point Detection

Simplest Setup:

Random vector observed in a time interval 1, · · · ,T offline versionXt = (X1t ,X2t , · · · ,Xpt) ∼ N(0,Σ1), t ≤ τ ,Xt = (X1t ,X2t , · · · ,Xpt) ∼ N(0,Σ2), t > τ .

Objectives:1. Estimate the change point τ2. Estimate the Gaussian graphical models Ω1 ≡ Σ−1

1 ,Ω2 ≡ Σ−12

The iid assumption is simplifying, but can easily be mitigated throughneighborhood selection techniques leveraging lasso regressions with temporallydependent errors (Basu and Michailidis, 2015)

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 29

/ 34

Page 37: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Segmentation problems

Some Background on Low Dimensional Change Point Problems

Assume a stump model:

yi = αI (i ≤ τ) + βI (i > τ) + εi , i = 1, · · · ,T ,

where εi ∼ N(0, σ2) and I (·) denotes the indicator function

Then, under a condition on the signal-to-noise ratio |α−β|σ≥ C > 0, one

can establish the following

1. α, β can be estimated at rate√T (the usual parametric rate)

2. |τ−τ |T

= O( 1T

)

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 30

/ 34

Page 38: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Segmentation problems

Naive Algorithm

1. For each t = 1 + d , · · · ,T − d , for some d > 0,calculate the joint Gaussian likelihood assuming that τcandidate = t which isgiven by

L(Ω1; t = 1, · · · , τcandidate) + L(Ω2; t = τcandidate + 1, · · · ,T )

2. Setτ = argmaxτcandidateL(Ω1,Ω2, τcandidate)

Technical Challenges:Note that at any solution τ 6= τ , one term in the likelihood is misspecified

Hence, a much more careful handling of the technical issues is needed toestablish the results

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 31

/ 34

Page 39: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Segmentation problems

Main Results

Under a Restricted Eigenvalue condition it can be established that (Roy,Atchade and Michailidis, 2016)

1.||Ωk − Ωk ||F = O(

√s log pT )

2.|τ − τ |

T= O(

log pT

T).

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 32

/ 34

Page 40: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Segmentation problems

Extension to Multiple Change Points

In a recent paper, Leonardi and Buhlmann, arXiv, 2016 look at the sameproblem, but allow multiple change points

To identify the change points, they propose a dynamic programming algorithm,as well as a computationally faster binary search approximation

Further, they look at estimation consistency properties of τk , k = 1, · · · ,K andthe corresponding Ωk ’s in a slow regime where change points are sparse and farapart and in a fast regime, where change points grow as a function of T

The rates for the Ωk ’s are the usual ones, but even in the slow regime theobtained rate for τk is worse than the one previously obtained.

A related problem was studied in Kollar and Xing, Electronic J. of Statistics,2012 where each node can experience multiple-change points and Soh andChadrashekaran, arXiv 2014

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 33

/ 34

Page 41: High-dimensional Time Series Models · High Dimensionsal Time Series Modeling Framework ‘ 1-LL Algorithm Objective functionjointly non-convex, but convex w.r.t. B’s and 1 Algorithm

High Dimensionsal Time Series

Segmentation problems

Concluding Remarks

Temporal data are present in a diverse set of applied areas

Time series models pose a number of subtle technical challenges inhigh-dimensions

A number of open questions:1. Going beyond Gaussian data (heavy tailed distributions, mixed types of

data)2. Incorporation of prior information/Bayesian modeling3. Inference framework for assessing both parameter and model significance4. Better models for capturing intricate temporal dynamics5. Intervention/control problems

George Michailidis High Dimensionsal Time SeriesTransdisciplinary Foundations of Data Science IMA, September 2016 34

/ 34