36
Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection Michael Lie 1 1 Prof. Suzuki Taiji Lab., Faculty of Science, Department of Information Science, Tokyo Institute of Technology, Japan February 12, 2015

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Embed Size (px)

Citation preview

Page 1: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Time-Series Analysis on MultiperiodicConditional Correlation by Sparse

Covariance Selection

Michael Lie1

1Prof. Suzuki Taiji Lab.,Faculty of Science,

Department of Information Science,Tokyo Institute of Technology, Japan

February 12, 2015

Page 2: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Agenda

To propose of the new statistical model:Sparse Multiperiodic Covariance Selection (M-CovSel)To propose of optimization method through ADMM

Page 3: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Covariance Selection

Covariance Selection

Sparse Covariance SelectionY1, · · · ,Yn ∼

i.i.d.Np(µ,Σ).

argminX�0

− ln det X + trace(SX ) + λ‖X‖1

Original idea: Dempster (1972)Application to Sparse and High-dimensional Matrices:Meinshausen and Bühlmann (2006)

Problem Formulation: Banerjee, Ghaoui and d’Aspremont(2008)Solution through graphical lasso model: Friedman, Hastieand Tibshirani (2008)Solution by ADMM method: Boyd (2011)

Page 4: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Application

Application: Markowitz’s Portfolio Selection

Portfolio Selection (Markowitz, 1952)

minwσ2

p,w = w>Sw s.t. w>1 = 1 ∴ w =S−11

1>S−11.

Here, the inverse of empirical covariance S−1 is needed!

The existing Covariance Selection: fixed time⇒ Covariance Selection analysis over time series is needed!

Page 5: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Intuition

Intuition

Figure: Existing Model

By estimating X, we can construct the portfolio.

Page 6: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Intuition

Figure: Our Model

Sij :=1n

∑k ,l

(yk ,i − µi)(yl,j − µj)>,

Page 7: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Problem Formulation

Problem Formulation

Consider a stationary-time process such that the multiperiodicinverse covariance matrix X can be expressed as

X =

X11 X12 X13 · · · X1,TX>12 X22 X23 · · · X2,TX>13 X>23 X33 · · · X3,T

......

.... . .

...X>1,T X>2,T X>3,T · · · XT ,T

︸ ︷︷ ︸

Tp columns

Tprow

s

.

Assumption: X is stationary time-process, such thatXi,i+h = Xj,j+h for all i , j .

Page 8: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Problem Formulation

Sparse Multiperiodic Covariance Selection (M-CovSel):

argminX�0

f (X) := argminX�0

{− ln det X +

∑i,j

trace(

S>ij Xij

)+

λ1∑i,j

∥∥Xij∥∥

1 + λ2∑i,j

∑k>i,l>j

∥∥Xij − Xkl∥∥2

2

}subject to Xi,i+h = Xj,j+h, ∀i , j .

`1 : ‖w‖1 =∑

i

|wi | `2 : ‖w‖2F =

∑i

|wi |2

Page 9: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Problem Formulation

We separate our model into two parts:

f (X) ≡ g(X) + h(X)

g(X) = − ln det X +∑i,j

trace(

S>ij Xij

),

h(X) = λ1∑i,j

∥∥Xij∥∥

1 + λ2∑i,j

∑k>i,l>j

∥∥Xij − Xkl∥∥2

F .

g(X): twice differentiable and strictly convexh(X): convex but non-differentiable

Page 10: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Problem Formulation

Auxiliary Variables

X =

X11 X12 X13 · · · X1,TX>12 X22 X23 · · · X2,TX>13 X>23 X33 · · · X3,T

......

.... . .

...X>1,T X>2,T X>3,T · · · XT ,T

bvec−→ X′ =

X11...

X1,TX22

...X2,T

...XT ,T

︸ ︷︷ ︸

p

numX×

p

Page 11: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Problem Formulation

H: stationary time matrix

Page 12: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Problem Formulation

All D: time-difference matrix

Page 13: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Problem Formulation

Simplified D: time-difference matrix

Page 14: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Problem Formulation

minimize g(X) + h(Z)

subject to

X′ = ZDX′ = ZHX′ = 0

⇐⇒ X = Z

whereg(X) = − ln det X +

∑i,j

trace(

S>ij Xij

),

h(Z) = λ1∑i,j

‖Z1‖1 + λ2∑i,j

‖Z2‖2F ,

X =

X′

DX′

HX′

, Z =

Z1Z20

.

Page 15: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Alternating Direction Method of Multiplier (ADMM)

Solving Through ADMM

Algorithm 1 Overview of ADMM1: for k = 0, 1, · · · do2: X-update:3: Compute W(0) = (X(0))−1.4: for t = 1, 2, · · · do5: Compute the direction using steepest gradient descent d = −∇G(X).6: Use an Armijo’s rule based step-size selection to get α such that

X(t+1) = X(t) + αd (t) is positive definite and the objective value suffi-ciently decreases.

7: Update X.8: end for9: Z-update:

10: Update Z1 : Z(k+1)1 = Sλ1/ρ((X

′)(k+1) + Y(k)

ρ)

11: Update Z2:Z(k+1)

2 =ρD(X′)(k+1) + Y(k)

2λ2 + ρ

12: Y-update: Y(k+1) = Y(k) + ρ(

X(k+1) − Z(k+1))

13: end for

Page 16: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Alternating Direction Method of Multiplier (ADMM)

minimize g(X) + h(Z)

subject to

X = ZDX = ZHX = 0

⇐⇒ X = Z

Its augmented Lagrangian is

Lρ(X, Z,Y) = g(X) + h(Z) + (ρ/2)

∥∥∥∥X− Z +Yρ

∥∥∥∥2

F,

g(X) = − ln det X +∑i,j

trace(

S>ij Xij

),

h(Z) = λ1∑i,j

‖Z1‖1 + λ2∑i,j

‖Z2‖2F .

Page 17: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Alternating Direction Method of Multiplier (ADMM)

1 X-update:

X(k+1) := argminX

(− ln det X +

∑i,j

trace(

S>ij Xij

)+ρ

2

∥∥∥∥∥X− Z(k) +Y(k)

ρ

∥∥∥∥∥2

F

),

2 Z-update:

Z(k+1) := argminZ

(λ1 ‖Z1‖1 + λ2 ‖Z2‖2F

2

∥∥∥∥∥X(k+1) − Z +Y(k)

ρ

∥∥∥∥∥2

F

),

3 Y-update:

Y(k+1) := Y(k) + ρ(

X(k+1) − Z(k+1)).

Page 18: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Alternating Direction Method of Multiplier (ADMM)

X Update

The solution of

X(k+1) := argminX

(− ln det X +

∑i,j

trace(

S>ij Xij

)+ρ

2

∥∥∥∥X− Z(k) +Y(k)

ρ

∥∥∥∥2

F

)is solved through steepest gradient descent and the algorithmis as given in Algorithm 1 of line 2-8.

Algorithm 2 X Update1: Compute W(0) = (X(0))−1.2: for t = 1, 2, · · · do3: Compute the direction using steepest gradient descent d = −∇G(X).4: Use an Armijo’s rule based step-size selection to get α such that

X(t+1) = X(t) + αd (t) is positive definite and the objective value suffi-ciently decreases.

5: Update X.6: end for

Page 19: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Alternating Direction Method of Multiplier (ADMM)

Z Update

Z Update

Zk+1 := argminZ

(λ1‖Z1‖1 + λ2‖Z2‖2F

+ (ρ/2)

∥∥∥∥∥X(k+1) − Z +Y(k)

ρ

∥∥∥∥∥2

F

).

The equation above can be separated as two equations asbelow:

Z(k+1)1 := argmin

Z1

(λ1‖Z1‖1 + (ρ/2)‖(X′)(k+1) − Z1 + Yk

1/ρ‖2F)

Z(k+1)2 := argmin

Z2

(λ2‖Z2‖2F + (ρ/2)‖D(X′)(k+1) − Z2 + Yk

2/ρ‖2F)

Page 20: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Alternating Direction Method of Multiplier (ADMM)

Solution of Z Update

Z(k+1)1 := argmin

Z1

(λ1‖Z1‖1 + (ρ/2)‖(X′)(k+1) − Z1 + Yk

1/ρ‖2F)

Z(k+1)2 := argmin

Z2

(λ2‖Z2‖2F + (ρ/2)‖D(X′)(k+1) − Z2 + Yk

2/ρ‖2F)

The solution of first solution is simply the soft-thresholdingfunction of

Z(k+1)1 = Sλ1/ρ

((X′)(k+1) +

Y(k)

ρ

)

and the solution of second solution is

Z(k+1)2 =

ρD(X′)(k+1) + Y(k)

2λ2 + ρ.

Page 21: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

Numerical Results

Execution environment:Intel Core i7-4770 CPU @ 3.40GHz (8 CPUs)8GB RAMR ver. 3.3.65126.0OS Windows 7 Professional 64 bit (6.1. build 7601)

Verifying:Convergence SpeedSparsity of the estimates

using random data sets and real data.

Page 22: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

All D

Simplified D

Page 23: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

Figure: Runtime of n = 10, λ1 = 0.01, λ2 = 0.01.

Page 24: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

Figure: (i) Objective Values, (ii) Primal Residuals, and (iii) DualResiduals of n = 10,T = 5, λ1 = 0.01, λ2 = 0.01.

Page 25: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

Figure: The sparsity pattern of estimates from the model ofn = 10,T = 5, λ1 = 0.01, λ2 = 0.01.

Page 26: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

Analysis on real dataStock data of 50 randomly selected companies from NASDAQPeriod: 4 January 2011 to 31 December 2014

Tick Name SectorPDCO Patterson Companies, Inc. Health CareOMER Omeros Corporation Health CareHEAR Turtle Beach Corporation Consumer DurablesQBAK Qualstar Corporation TechnologyUTHR United Therapeutics Corporation Health CarePLCE The Children&39;s Place Retail Stores, Inc. Consumer ServicesSUSQ Susquehanna Bancshares, Inc. FinanceIDCC InterDigital, Inc. MiscellaneousELON Echelon Corporation TechnologyBGCP BGC Partners, Inc. FinanceMRGE Merge Healthcare Incorporated. TechnologyTISA Top Image Systems, Ltd. TechnologyIPXL Impax Laboratories, Inc. Health CareROVI Rovi Corporation MiscellaneousIBCP Independent Bank Corporation FinanceBABY Natus Medical Incorporated Health CareHFFC HF Financial Corp. FinanceISLE Isle of Capri Casinos, Inc. Consumer ServicesITIC Investors Title Company FinanceSLGN Silgan Holdings Inc. Consumer DurablesZIOP ZIOPHARM Oncology Inc Health CareMXIM Maxim Integrated Products, Inc. TechnologyNEPT Neptune Technologies & Bioresources Inc Health CareUTMD Utah Medical Products, Inc. Health Care

.

.

.

.

.

.

.

.

.

Page 27: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

Figure: (i) Objective Values, (ii) Primal Residuals, and (iii) DualResiduals of T = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.

Page 28: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

Figure: The sparsity pattern of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.

Page 29: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

Figure: The covariance matrix plot of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.

Page 30: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

Figure: Negative covariance value of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.

Page 31: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

Figure: Negative covariance value of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data (zoom on T = 1).

Page 32: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

Figure: The weak positivity of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data.

Page 33: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Numerical Results

Figure: The weak positivity of estimates from the model ofT = 5, λ1 = 0.01, λ2 = 0.01 from real stock data (zoom on T = 1).

Page 34: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

Conclusion and Discussion

Conclusions:ADMM algorithm with steepest gradient descent for Xupdate minimized our objective function f (X).Computation time took a lot of time as T increases.

Discussions:Instead of steepest gradient descent, Newton direction. cf.QUIC.Use Block Coordinate Descent as in BIG & QUIC.Introduce the decay constant in D.

Page 35: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

References I

[De72] Dempster, A. P. (1972). Covariance Selection. Biometrics 28 157-175.

[MB06] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs andvariable selection with the Lasso. Annals of Statistics 34 1436-1462.

[BG08] Banerjee, O., Ghaoui, E. L. and d’Aspremont, A. (2008). Model selectionthrough sparse maximum likelihood estimation for multivariate Gaussianor binary data. Journal of Machine Learning Research 9 485-516.

[Ti08] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inversecovariance estimation with the graphical Lasso. Biostatistics 9 432-441.

[Ma52] Markowitz, H. (1952). Portfolio Selection. The Journal of Finance 7 77-91.

[Ti96] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B 58 267-288.

[Bo11] Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011).Distributed optimization and statistical learning via the alternatingdirection method of multipliers. Foundations and Trends in MachineLearning 3 1-122.

Page 36: Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covariance Selection

Introduction Problem Setup Optimization Method Numerical Results Conclusion and Discussion References

References II

[Hs13] Hsieh, C. J., Sustik, M. A., Dhillon, I., Ravikumar, P. and Poldrack, R.(2013). BIG & QUIC: Sparse inverse covariance estimation for a millionvariables. In Advances in Neural Information Processing Systems3165-3173.

[Bv11] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-DimensionalData: Methods, Theory and Applications. Springer-Verlag, Berlin.

[WB12] Wahlberg, B., Boyd, S., Annergren, M. and Wang, Y. (2012). An ADMMalgorithm for a class of total variation regularized estimation problems.ArXiv:1203.1828.