Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Time-Series Analysis on MultiperiodicConditional Correlation by Sparse

Covariance Selection

Michael Lie1

1Prof. Suzuki Taiji Lab.,Faculty of Science,

Department of Information Science,Tokyo Institute of Technology, Japan

February 12, 2015


Agenda

To propose of the new statistical model:Sparse Multiperiodic Covariance Selection (M-CovSel)To propose of optimization method through ADMM




Sparse Covariance SelectionY1, · · · ,Yn ∼

i.i.d.Np(µ,Σ).

argminX�0

− ln det X + trace(SX ) + λ‖X‖1

Original idea: Dempster (1972)Application to Sparse and High-dimensional Matrices:Meinshausen and Bühlmann (2006)

Problem Formulation: Banerjee, Ghaoui and d’Aspremont(2008)Solution through graphical lasso model: Friedman, Hastieand Tibshirani (2008)Solution by ADMM method: Boyd (2011)


Application

Application: Markowitz’s Portfolio Selection

Portfolio Selection (Markowitz, 1952)

minwσ2

p,w = w>Sw s.t. w>1 = 1 ∴ w =S−11

1>S−11.

Here, the inverse of empirical covariance S−1 is needed!

The existing Covariance Selection: fixed time⇒ Covariance Selection analysis over time series is needed!


Intuition

Intuition

Figure: Existing Model

By estimating X, we can construct the portfolio.


Intuition

Figure: Our Model

Sij :=1n

∑k ,l

(yk ,i − µi)(yl,j − µj)>,


Problem Formulation

Problem Formulation

Consider a stationary-time process such that the multiperiodicinverse covariance matrix X can be expressed as

X =

X11 X12 X13 · · · X1,TX>12 X22 X23 · · · X2,TX>13 X>23 X33 · · · X3,T

......

.... . .

...X>1,T X>2,T X>3,T · · · XT ,T

︸︷︷︸

Tp columns

Tprow

s

.

Assumption: X is stationary time-process, such thatXi,i+h = Xj,j+h for all i , j .


Problem Formulation

Sparse Multiperiodic Covariance Selection (M-CovSel):

argminX�0

f (X) := argminX�0

{− ln det X +

∑i,j

trace(

S>ij Xij

)+

λ1∑i,j

∥∥Xij∥∥

1 + λ2∑i,j

∑k>i,l>j

∥∥Xij − Xkl∥∥2

2

}subject to Xi,i+h = Xj,j+h, ∀i , j .

`1 : ‖w‖1 =∑

i

|wi | `2 : ‖w‖2F =

∑i

|wi |2


Problem Formulation

We separate our model into two parts:

f (X) ≡ g(X) + h(X)

g(X) = − ln det X +∑i,j

trace(

S>ij Xij

),

h(X) = λ1∑i,j

∥∥Xij∥∥

1 + λ2∑i,j

∑k>i,l>j

∥∥Xij − Xkl∥∥2

F .

g(X): twice differentiable and strictly convexh(X): convex but non-differentiable


Problem Formulation

Auxiliary Variables

X =

X11 X12 X13 · · · X1,TX>12 X22 X23 · · · X2,TX>13 X>23 X33 · · · X3,T

......

.... . .

...X>1,T X>2,T X>3,T · · · XT ,T

bvec−→ X′ =

X11...

X1,TX22

...X2,T

...XT ,T

︸︷︷︸

p

numX×

p


Problem Formulation

H: stationary time matrix


Problem Formulation

All D: time-difference matrix


Problem Formulation

Simplified D: time-difference matrix


Problem Formulation

minimize g(X) + h(Z)

subject to

X′ = ZDX′ = ZHX′ = 0

⇐⇒ X = Z

whereg(X) = − ln det X +

∑i,j

trace(

S>ij Xij

),

h(Z) = λ1∑i,j

‖Z1‖1 + λ2∑i,j

‖Z2‖2F ,

X =

X′

DX′

HX′

, Z =

Z1Z20

.


Alternating Direction Method of Multiplier (ADMM)

Solving Through ADMM

Algorithm 1 Overview of ADMM1: for k = 0, 1, · · · do2: X-update:3: Compute W(0) = (X(0))−1.4: for t = 1, 2, · · · do5: Compute the direction using steepest gradient descent d = −∇G(X).6: Use an Armijo’s rule based step-size selection to get α such that

X(t+1) = X(t) + αd (t) is positive definite and the objective value suffi-ciently decreases.

7: Update X.8: end for9: Z-update:

10: Update Z1 : Z(k+1)1 = Sλ1/ρ((X

′)(k+1) + Y(k)

ρ)

11: Update Z2:Z(k+1)

2 =ρD(X′)(k+1) + Y(k)

2λ2 + ρ

12: Y-update: Y(k+1) = Y(k) + ρ(

X(k+1) − Z(k+1))

13: end for



minimize g(X) + h(Z)

subject to

X = ZDX = ZHX = 0

⇐⇒ X = Z

Its augmented Lagrangian is

Lρ(X, Z,Y) = g(X) + h(Z) + (ρ/2)

∥∥∥∥X− Z +Yρ

∥∥∥∥2

F,

g(X) = − ln det X +∑i,j

trace(

S>ij Xij

),

h(Z) = λ1∑i,j

‖Z1‖1 + λ2∑i,j

‖Z2‖2F .



1 X-update:

X(k+1) := argminX

(− ln det X +

∑i,j

trace(

S>ij Xij

)+ρ

2

∥∥∥∥∥X− Z(k) +Y(k)

ρ

∥∥∥∥∥2

F

),

2 Z-update:

Z(k+1) := argminZ

(λ1 ‖Z1‖1 + λ2 ‖Z2‖2F

+ρ

2

∥∥∥∥∥X(k+1) − Z +Y(k)

ρ

∥∥∥∥∥2

F

),

3 Y-update:

Y(k+1) := Y(k) + ρ(

X(k+1) − Z(k+1)).



X Update

The solution of

X(k+1) := argminX

(− ln det X +

∑i,j

trace(

S>ij Xij

)+ρ

2

∥∥∥∥X− Z(k) +Y(k)

ρ

∥∥∥∥2

F

)is solved through steepest gradient descent and the algorithmis as given in Algorithm 1 of line 2-8.

Algorithm 2 X Update1: Compute W(0) = (X(0))−1.2: for t = 1, 2, · · · do3: Compute the direction using steepest gradient descent d = −∇G(X).4: Use an Armijo’s rule based step-size selection to get α such that

X(t+1) = X(t) + αd (t) is positive definite and the objective value suffi-ciently decreases.

5: Update X.6: end for



Z Update

Z Update

Zk+1 := argminZ

(λ1‖Z1‖1 + λ2‖Z2‖2F

+ (ρ/2)

∥∥∥∥∥X(k+1) − Z +Y(k)

ρ

∥∥∥∥∥2

F

).

The equation above can be separated as two equations asbelow:

Z(k+1)1 := argmin

Z1

(λ1‖Z1‖1 + (ρ/2)‖(X′)(k+1) − Z1 + Yk

1/ρ‖2F)

Z(k+1)2 := argmin

Z2

(λ2‖Z2‖2F + (ρ/2)‖D(X′)(k+1) − Z2 + Yk

2/ρ‖2F)



Solution of Z Update

Z(k+1)1 := argmin

Z1

(λ1‖Z1‖1 + (ρ/2)‖(X′)(k+1) − Z1 + Yk

1/ρ‖2F)

Z(k+1)2 := argmin

Z2

(λ2‖Z2‖2F + (ρ/2)‖D(X′)(k+1) − Z2 + Yk

2/ρ‖2F)

The solution of first solution is simply the soft-thresholdingfunction of

Z(k+1)1 = Sλ1/ρ

((X′)(k+1) +

Y(k)

ρ

)

and the solution of second solution is

Z(k+1)2 =

ρD(X′)(k+1) + Y(k)

2λ2 + ρ.


Numerical Results

Numerical Results

Execution environment:Intel Core i7-4770 CPU @ 3.40GHz (8 CPUs)8GB RAMR ver. 3.3.65126.0OS Windows 7 Professional 64 bit (6.1. build 7601)

Verifying:Convergence SpeedSparsity of the estimates

using random data sets and real data.


Numerical Results

All D

Simplified D


Numerical Results

Figure: Runtime of n = 10, λ1 = 0.01, λ2 = 0.01.


Numerical Results

Figure: (i) Objective Values, (ii) Primal Residuals, and (iii) DualResiduals of n = 10,T = 5, λ1 = 0.01, λ2 = 0.01.


Numerical Results

Figure: The sparsity pattern of estimates from the model ofn = 10,T = 5, λ1 = 0.01, λ2 = 0.01.


Numerical Results

Analysis on real dataStock data of 50 randomly selected companies from NASDAQPeriod: 4 January 2011 to 31 December 2014

Tick Name SectorPDCO Patterson Companies, Inc. Health CareOMER Omeros Corporation Health CareHEAR Turtle Beach Corporation Consumer DurablesQBAK Qualstar Corporation TechnologyUTHR United Therapeutics Corporation Health CarePLCE The Children&39;s Place Retail Stores, Inc. Consumer ServicesSUSQ Susquehanna Bancshares, Inc. FinanceIDCC InterDigital, Inc. MiscellaneousELON Echelon Corporation TechnologyBGCP BGC Partners, Inc. FinanceMRGE Merge Healthcare Incorporated. TechnologyTISA Top Image Systems, Ltd. TechnologyIPXL Impax Laboratories, Inc. Health CareROVI Rovi Corporation MiscellaneousIBCP Independent Bank Corporation FinanceBABY Natus Medical Incorporated Health CareHFFC HF Financial Corp. FinanceISLE Isle of Capri Casinos, Inc. Consumer ServicesITIC Investors Title Company FinanceSLGN Silgan Holdings Inc. Consumer DurablesZIOP ZIOPHARM Oncology Inc Health CareMXIM Maxim Integrated Products, Inc. TechnologyNEPT Neptune Technologies & Bioresources Inc Health CareUTMD Utah Medical Products, Inc. Health Care

.

.

.

.

.

.

.

.

.


Numerical Results

Figure: (i) Objective Values, (ii) Primal Residuals, and (iii) DualResiduals of T = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.


Numerical Results

Figure: The sparsity pattern of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.


Numerical Results

Figure: The covariance matrix plot of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.


Numerical Results

Figure: Negative covariance value of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.


Numerical Results

Figure: Negative covariance value of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data (zoom on T = 1).


Numerical Results

Figure: The weak positivity of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.


Numerical Results

Figure: The weak positivity of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data (zoom on T = 1).


Conclusion and Discussion

Conclusions:ADMM algorithm with steepest gradient descent for Xupdate minimized our objective function f (X).Computation time took a lot of time as T increases.

Discussions:Instead of steepest gradient descent, Newton direction. cf.QUIC.Use Block Coordinate Descent as in BIG & QUIC.Introduce the decay constant in D.


References I

[De72] Dempster, A. P. (1972). Covariance Selection. Biometrics 28 157-175.

[MB06] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs andvariable selection with the Lasso. Annals of Statistics 34 1436-1462.

[BG08] Banerjee, O., Ghaoui, E. L. and d’Aspremont, A. (2008). Model selectionthrough sparse maximum likelihood estimation for multivariate Gaussianor binary data. Journal of Machine Learning Research 9 485-516.

[Ti08] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inversecovariance estimation with the graphical Lasso. Biostatistics 9 432-441.

[Ma52] Markowitz, H. (1952). Portfolio Selection. The Journal of Finance 7 77-91.

[Ti96] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B 58 267-288.

[Bo11] Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011).Distributed optimization and statistical learning via the alternatingdirection method of multipliers. Foundations and Trends in MachineLearning 3 1-122.


References II

[Hs13] Hsieh, C. J., Sustik, M. A., Dhillon, I., Ravikumar, P. and Poldrack, R.(2013). BIG & QUIC: Sparse inverse covariance estimation for a millionvariables. In Advances in Neural Information Processing Systems3165-3173.

[Bv11] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-DimensionalData: Methods, Theory and Applications. Springer-Verlag, Berlin.

[WB12] Wahlberg, B., Boyd, S., Annergren, M. and Wang, Y. (2012). An ADMMalgorithm for a class of total variation regularized estimation problems.ArXiv:1203.1828.