Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Time-Series Analysis on MultiperiodicConditional Correlation by Sparse

Covariance Selection

Michael Lie1

1Prof. Suzuki Taiji Lab.,Faculty of Science,

Department of Information Science,Tokyo Institute of Technology, Japan

February 12, 2015

Agenda

To propose of the new statistical model:Sparse Multiperiodic Covariance Selection (M-CovSel)To propose of optimization method through ADMM

Covariance Selection

Sparse Covariance SelectionY1, · · · ,Yn ∼

i.i.d.Np(µ,Σ).

argminX�0

− ln det X + trace(SX ) + λ‖X‖1

Original idea: Dempster (1972)Application to Sparse and High-dimensional Matrices:Meinshausen and Bühlmann (2006)

Problem Formulation: Banerjee, Ghaoui and d’Aspremont(2008)Solution through graphical lasso model: Friedman, Hastieand Tibshirani (2008)Solution by ADMM method: Boyd (2011)

Application

Application: Markowitz’s Portfolio Selection

Portfolio Selection (Markowitz, 1952)

minwσ2

p,w = w>Sw s.t. w>1 = 1 ∴ w =S−11

1>S−11.

Here, the inverse of empirical covariance S−1 is needed!

The existing Covariance Selection: fixed time⇒ Covariance Selection analysis over time series is needed!

Intuition

Figure: Existing Model

By estimating X, we can construct the portfolio.

Intuition

Figure: Our Model

Sij :=1n

∑k ,l

(yk ,i − µi)(yl,j − µj)>,

Problem Formulation

Consider a stationary-time process such that the multiperiodicinverse covariance matrix X can be expressed as

X11 X12 X13 · · · X1,TX>12 X22 X23 · · · X2,TX>13 X>23 X33 · · · X3,T

......

.... . .

...X>1,T X>2,T X>3,T · · · XT ,T

︸︷︷︸

Tp columns

Assumption: X is stationary time-process, such thatXi,i+h = Xj,j+h for all i , j .

Problem Formulation

Sparse Multiperiodic Covariance Selection (M-CovSel):

argminX�0

f (X) := argminX�0

{− ln det X +

∑i,j

trace(

S>ij Xij

λ1∑i,j

∥∥Xij∥∥

1 + λ2∑i,j

∑k>i,l>j

∥∥Xij − Xkl∥∥2

}subject to Xi,i+h = Xj,j+h, ∀i , j .

`1 : ‖w‖1 =∑

|wi | `2 : ‖w‖2F =

|wi |2

Problem Formulation

We separate our model into two parts:

f (X) ≡ g(X) + h(X)

g(X) = − ln det X +∑i,j

trace(

S>ij Xij

h(X) = λ1∑i,j

∥∥Xij∥∥

1 + λ2∑i,j

∑k>i,l>j

∥∥Xij − Xkl∥∥2

g(X): twice differentiable and strictly convexh(X): convex but non-differentiable

Problem Formulation

Auxiliary Variables

X11 X12 X13 · · · X1,TX>12 X22 X23 · · · X2,TX>13 X>23 X33 · · · X3,T

......

.... . .

...X>1,T X>2,T X>3,T · · · XT ,T

bvec−→ X′ =

X11...

X1,TX22

...X2,T

...XT ,T

︸︷︷︸

numX×

Problem Formulation

H: stationary time matrix

Problem Formulation

All D: time-difference matrix

Problem Formulation

Simplified D: time-difference matrix

Problem Formulation

minimize g(X) + h(Z)

subject to

X′ = ZDX′ = ZHX′ = 0

⇐⇒ X = Z

whereg(X) = − ln det X +

∑i,j

trace(

S>ij Xij

h(Z) = λ1∑i,j

‖Z1‖1 + λ2∑i,j

‖Z2‖2F ,

Alternating Direction Method of Multiplier (ADMM)

Solving Through ADMM

Algorithm 1 Overview of ADMM1: for k = 0, 1, · · · do2: X-update:3: Compute W(0) = (X(0))−1.4: for t = 1, 2, · · · do5: Compute the direction using steepest gradient descent d = −∇G(X).6: Use an Armijo’s rule based step-size selection to get α such that

X(t+1) = X(t) + αd (t) is positive definite and the objective value suffi-ciently decreases.

7: Update X.8: end for9: Z-update:

10: Update Z1 : Z(k+1)1 = Sλ1/ρ((X

′)(k+1) + Y(k)

11: Update Z2:Z(k+1)

2 =ρD(X′)(k+1) + Y(k)

2λ2 + ρ

12: Y-update: Y(k+1) = Y(k) + ρ(

X(k+1) − Z(k+1))

13: end for

minimize g(X) + h(Z)

subject to

X = ZDX = ZHX = 0

⇐⇒ X = Z

Its augmented Lagrangian is

Lρ(X, Z,Y) = g(X) + h(Z) + (ρ/2)

∥∥∥∥X− Z +Yρ

∥∥∥∥2

g(X) = − ln det X +∑i,j

trace(

S>ij Xij

h(Z) = λ1∑i,j

‖Z1‖1 + λ2∑i,j

‖Z2‖2F .

1 X-update:

X(k+1) := argminX

(− ln det X +

∑i,j

trace(

S>ij Xij

∥∥∥∥∥X− Z(k) +Y(k)

∥∥∥∥∥2

2 Z-update:

Z(k+1) := argminZ

(λ1 ‖Z1‖1 + λ2 ‖Z2‖2F

∥∥∥∥∥X(k+1) − Z +Y(k)

∥∥∥∥∥2

3 Y-update:

Y(k+1) := Y(k) + ρ(

X(k+1) − Z(k+1)).

X Update

The solution of

X(k+1) := argminX

(− ln det X +

∑i,j

trace(

S>ij Xij

∥∥∥∥X− Z(k) +Y(k)

∥∥∥∥2

)is solved through steepest gradient descent and the algorithmis as given in Algorithm 1 of line 2-8.

Algorithm 2 X Update1: Compute W(0) = (X(0))−1.2: for t = 1, 2, · · · do3: Compute the direction using steepest gradient descent d = −∇G(X).4: Use an Armijo’s rule based step-size selection to get α such that

X(t+1) = X(t) + αd (t) is positive definite and the objective value suffi-ciently decreases.

5: Update X.6: end for

Z Update

Zk+1 := argminZ

(λ1‖Z1‖1 + λ2‖Z2‖2F

+ (ρ/2)

∥∥∥∥∥X(k+1) − Z +Y(k)

∥∥∥∥∥2

The equation above can be separated as two equations asbelow:

Z(k+1)1 := argmin

(λ1‖Z1‖1 + (ρ/2)‖(X′)(k+1) − Z1 + Yk

1/ρ‖2F)

Z(k+1)2 := argmin

(λ2‖Z2‖2F + (ρ/2)‖D(X′)(k+1) − Z2 + Yk

2/ρ‖2F)

Solution of Z Update

Z(k+1)1 := argmin

(λ1‖Z1‖1 + (ρ/2)‖(X′)(k+1) − Z1 + Yk

1/ρ‖2F)

Z(k+1)2 := argmin

(λ2‖Z2‖2F + (ρ/2)‖D(X′)(k+1) − Z2 + Yk

2/ρ‖2F)

The solution of first solution is simply the soft-thresholdingfunction of

Z(k+1)1 = Sλ1/ρ

((X′)(k+1) +

and the solution of second solution is

Z(k+1)2 =

ρD(X′)(k+1) + Y(k)

2λ2 + ρ.

Numerical Results

Execution environment:Intel Core i7-4770 CPU @ 3.40GHz (8 CPUs)8GB RAMR ver. 3.3.65126.0OS Windows 7 Professional 64 bit (6.1. build 7601)

Verifying:Convergence SpeedSparsity of the estimates

using random data sets and real data.

Numerical Results

Simplified D

Numerical Results

Figure: Runtime of n = 10, λ1 = 0.01, λ2 = 0.01.

Numerical Results

Figure: (i) Objective Values, (ii) Primal Residuals, and (iii) DualResiduals of n = 10,T = 5, λ1 = 0.01, λ2 = 0.01.

Numerical Results

Figure: The sparsity pattern of estimates from the model ofn = 10,T = 5, λ1 = 0.01, λ2 = 0.01.

Numerical Results

Analysis on real dataStock data of 50 randomly selected companies from NASDAQPeriod: 4 January 2011 to 31 December 2014

Tick Name SectorPDCO Patterson Companies, Inc. Health CareOMER Omeros Corporation Health CareHEAR Turtle Beach Corporation Consumer DurablesQBAK Qualstar Corporation TechnologyUTHR United Therapeutics Corporation Health CarePLCE The Children&39;s Place Retail Stores, Inc. Consumer ServicesSUSQ Susquehanna Bancshares, Inc. FinanceIDCC InterDigital, Inc. MiscellaneousELON Echelon Corporation TechnologyBGCP BGC Partners, Inc. FinanceMRGE Merge Healthcare Incorporated. TechnologyTISA Top Image Systems, Ltd. TechnologyIPXL Impax Laboratories, Inc. Health CareROVI Rovi Corporation MiscellaneousIBCP Independent Bank Corporation FinanceBABY Natus Medical Incorporated Health CareHFFC HF Financial Corp. FinanceISLE Isle of Capri Casinos, Inc. Consumer ServicesITIC Investors Title Company FinanceSLGN Silgan Holdings Inc. Consumer DurablesZIOP ZIOPHARM Oncology Inc Health CareMXIM Maxim Integrated Products, Inc. TechnologyNEPT Neptune Technologies & Bioresources Inc Health CareUTMD Utah Medical Products, Inc. Health Care

Numerical Results

Figure: (i) Objective Values, (ii) Primal Residuals, and (iii) DualResiduals of T = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.

Numerical Results

Figure: The sparsity pattern of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.

Numerical Results

Figure: The covariance matrix plot of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.

Numerical Results

Figure: Negative covariance value of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.

Numerical Results

Figure: Negative covariance value of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data (zoom on T = 1).

Numerical Results

Figure: The weak positivity of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.

Numerical Results

Figure: The weak positivity of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data (zoom on T = 1).

Conclusion and Discussion

Conclusions:ADMM algorithm with steepest gradient descent for Xupdate minimized our objective function f (X).Computation time took a lot of time as T increases.

Discussions:Instead of steepest gradient descent, Newton direction. cf.QUIC.Use Block Coordinate Descent as in BIG & QUIC.Introduce the decay constant in D.

References I

[De72] Dempster, A. P. (1972). Covariance Selection. Biometrics 28 157-175.

[MB06] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs andvariable selection with the Lasso. Annals of Statistics 34 1436-1462.

[BG08] Banerjee, O., Ghaoui, E. L. and d’Aspremont, A. (2008). Model selectionthrough sparse maximum likelihood estimation for multivariate Gaussianor binary data. Journal of Machine Learning Research 9 485-516.

[Ti08] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inversecovariance estimation with the graphical Lasso. Biostatistics 9 432-441.

[Ma52] Markowitz, H. (1952). Portfolio Selection. The Journal of Finance 7 77-91.

[Ti96] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B 58 267-288.

[Bo11] Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011).Distributed optimization and statistical learning via the alternatingdirection method of multipliers. Foundations and Trends in MachineLearning 3 1-122.

References II

[Hs13] Hsieh, C. J., Sustik, M. A., Dhillon, I., Ravikumar, P. and Poldrack, R.(2013). BIG & QUIC: Sparse inverse covariance estimation for a millionvariables. In Advances in Neural Information Processing Systems3165-3173.

[Bv11] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-DimensionalData: Methods, Theory and Applications. Springer-Verlag, Berlin.

[WB12] Wahlberg, B., Boyd, S., Annergren, M. and Wang, Y. (2012). An ADMMalgorithm for a class of total variation regularized estimation problems.ArXiv:1203.1828.

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Science

CZ Lacertae A Blazhko RR Lyrae star with multiperiodic modulation

Sparse Inverse Covariance Estimation for a Million Variablescjhsieh/covselect_icml14/slides/Cho.pdfSocial Network Analysis / Web data: Model co-authorship networks (Goldenberg & Moore,

Enhancement of the Purcell factor in multiperiodic ... - DTU

Robust Covariance Matrix Estimators for Sparse Data Using

The graphical lasso: New insights and alternativeshastie/Papers/glassoinsights.pdfKeywords and phrases: Graphical lasso, sparse inverse covariance selec-tion, precision matrix, convex

The Local Marchenko-Pastur Law for Sparse …adlam/thesis.pdfThe Local Marchenko-Pastur Law for Sparse Covariance Matrices ... We review the resolvent method for proving local laws

0.15in ECE 18-898G: Special Topics in Signal Processing: Sparsity…yuejiec/ece18898G_notes/ece... · 2018-04-02 · [1]”Sparse inverse covariance estimation with the graphical

Sparse Covariance Selection using Semideﬁnite Programmingaspremon/PDF/SIAM06am.pdfSparse Covariance Selection using Semideﬁnite Programming A. d’Aspremont ORFE, Princeton University

1 SPICE : a sparse covariance-based estimation method for ... · PDF file1 SPICE : a sparse covariance-based estimation method for array processing Petre Stoica, Fellow, IEEE, Prabhu

Sparse PCA via Covariance Thresholdingyash/PAPERS/sparse.pdfYash Deshpande and Andrea Montanari. Sparse pca via covariance thresholding. In Ad-vances in Neural Information Processing

Ensemble Estimation of Large Sparse Covariance …Ensemble Estimation of Large Sparse Covariance Matrix Based on Modi ed Cholesky Decomposition Xiaoning Kangyand Xinwei Dengz yInternational

Big & Quic: Sparse Inverse Covariance Estimation for a ...cjhsieh/bigquic_nips13.pdf · Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables Cho-Jui Hsieh The

Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

Sparse Covariance Estimation, Variable Selection and Testing for … · 2018. 9. 27. · Bickel and Levina (2008): when zt is an iid Gaussian process and Σz is sparse For su¢ ciently

Gaussian process covariance functionsmlg.eng.cam.ac.uk/teaching/4f13/1718/covariance functions.pdf · Matérn covariance functions Stationary covariance functions can be based on

Sparse Inverse Covariance Estimation - IBM … The optimization problem Newton LASSO Methods Orthant Wise Optimization Results Sparse Inverse Covariance Estimation Peder Olsen, Figen

High Dimensional Sparse Covariance Estimation: Accurate

Mining Brain Region Connectivity for Alzheimer's Disease Study via Sparse Inverse Covariance Estimation Jieping Ye Arizona State University

Sparse multivariate regression with covariance estimationusers.stat.umn.edu/~arothman/mrce.pdf · Sparse multivariate regression with covariance estimation Adam J. Rothman, Elizaveta

Operator norm consistent estimation of large dimensional ...Operator norm consistent estimation of large dimensional sparse covariance matrices ... We also propose a notion of sparsity