Sparse and robust normal and t-portfolios by penalized Lq ... · Sparse and robust normal and t-portfolios by penalized Lq-likelihood minimization Davide Ferrari1, Margherita Giuzio2,

Sparse and robust normal and t-portfolios bypenalized Lq-likelihood minimization

Davide Ferrari1, Margherita Giuzio2, Sandra Paterlini2

1University of Melbourne, AU2EBS Universitat fur Wirtschaft und Recht, GER

Universita degli Studi di Bergamo

Seminar 07.04.2014

Portfolio Selection

Task: Select the ”optimal” assets and their ”optimal” weights

• Ideal Characteristics

• Good Out-of-sample Performance• Stable weights• Robustness• Cheap to implement and maintain

• Main Issues• Normality assumption• Highly correlated returns• Sensitivity of Weights to Estimation Errors and Model Misspecification• High-dimensionality

Ferrari, Giuzio, Paterlini (AU & GER) Penalized Lq-likelihood minimization Seminar 07.04.2014 2 / 44

Main Issue

State-of-art Methods

• Robust and Shrinkage Estimators =⇒ Estimation ErrorsHuber (1981)Ledoit and Wolf (2004)

• Penalized Least Squares =⇒ Model Misspecifications and SparsityDeMiguel, Garlappi, Nogales, Uppal (2007)Fan, Zhang, Yu (2012)Fastrich, Paterlini, Winker (2012)

• Minimum Divergence Methods =⇒ Model MisspecificationsFerrari, Yang (2010)Ferrari, Paterlini (2010)Ferrari, La Vecchia (2012)


Our Proposal

Targets:• Robustness to both Estimation Errors and Model Misspecification• Sparsity: automatically select a small number of relevant active

positions

How can we achieve that?Generalized Description Length Approach

• based on q-Entropy• using priors as sparsifying operator


Maximum Likelihood

• If observations are i.i.d. from a probability distribution f (x ; θ0),maximizing

log L(θ0) = logn∏

i=1f (Xi | θ) =

n∑i=1

log f (Xi ; θ)

we obtain the maximum log-likelihood estimator of θ0.

n∑i=1

log f (Xi ; θ)→ E [log f (Xi ; θ)],

as n→∞• Properties

• Consistency, Efficiency• Asymptotic normality• Easy to implement

Shannon Entropy

• Information Theory (1940s): how to measure uncertainty with respectto a probability distribution f (x)

H(X ) = −E [log f (x)]

• − log f (x) is the information included in the observation x• H(X ) represents the average uncertainty removed after observing the

outcome of the variable X .

RelationshipGiven n i.i.d. observations, the Maximum Likelihood Estimator can beseen as a minimization of the Shannon entropy

maxn∑

i=1log f (Xi ; θ) = min−E [log f (Xi ; θ)]

q-Entropy

• Havrda and Charvat (1967) proposed a generalized measure ofentropy of order q, then used in physics, biology and finance

H(X ) =E [1− f (x)q−1]

1− q , q > 0

• Tsallis (1988) arrived to the following specification:

H(X ) = −E [Lqf (x)],

where

Lq =

{(u1−q − 1)/(1− q), q 6= 1,

log(u), q = 1,

Maximum Lq Likelihood

• Given n i.i.d. observations from from a probability distributionf (x ; θ0), maximizing

n∑i=1

Lq[f (Xi ; θ)]

we obtain the Maximum Lq-Likelihood Estimator of θ0.

• Properties• Consistency and Fisher-consistency (Ferrari, La Vecchia, 2009)• Asymptotic normality• Changing q we balance the trade-off between bias and variance,

robustness and efficiency• When q → 1, MLqE → MLE

Asset Returns

Generalized Description Length criterion (GDL)

Let:

• X = (X1, . . . ,Xp)T a p-dimensional random vector such thatE (X) = µ and Var(X) = Σ

• Y = βT X a portfolio where βT is the weights vector

• the portfolio mean, µ∗ is a fixed target


Generalized Description Length criterion (GDL)

AssumptionsDistribution of the Data: f (y ;µ, σ2) = f (xT

i β;µ, σ2)Prior Distribution of Portfolio Weights: π(βj ;λ) for the jth weight βj

β = argminβ

−n∑

i=1Lq

{f(

xTi β − µ∗

σ

)}−

p∑j=1

Lq {π(βj ;λ)}

, (1)

with q-entropy function

Lq =

{(u1−q − 1)/(1− q), q 6= 1,

log(u), q = 1

• Information from the data given the model• Information from the model itself


Penalized q-entropy minimizationComputing the first derivative

0 =n∑

i=1wi (xi ,β, σ)∇ log f (σ−1(xT

i β − µ∗)) +p∑

j=1vj(βj , λ)∇ log π(βj ;λ).

(2)

Define wi and vj as

• Weights on obs y wi (xi ,β, σ) = f (σ−1(xTi β − µ∗))1−q

• Weights on β vj(βj , λ) = π(βj ;λ)1−q

Role of q- Changing q, we modify the role of unusual observations xi and sparsifythe parameters βj- When q < 1 we gain robustness to models misspecification


GDL Penalty Function

−p∑

j=1Lq {π(βj ;λ)}


GDL vs Other Penalties

−2 −1 0 1 20

0.2

0.4

0.6

0.8

1GDL Penalty

Parameter β

−L q π

(β)

q = 0.1q = 0.3q = 0.5q = 0.7q = 0.9

−2 −1 0 1 20

0.2

0.4

0.6

0.8

1

1.2

1.4Lasso Penalty

Parameter β

−L q π

(β)

−2 −1 0 1 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8SCAD Penalty

Parameter β

Pen

alty

g(β

)

−2 −1 0 1 20

2

4

6

8

10Logarithm Penalty

Parameter β

−L q π

(β)


Model 1: Normal portfolios with Laplace prior

Model SpecificationY ∼ N1(µ, σ2)

π(βj ;λ) = λ exp {−λ|βj |}/2

• GDL N

β(s)

= argminβ

n∑

i=1w (s−1)

i12

(µ∗ − xT

i β

σ(s−1)

)2

+ λp∑

j=1v (s−1)

j |βj |

(3)

• Lasso

β(s)

= argminβ

n∑

i=1(µ∗ − xT

i β)2 + λp∑

j=1|βj |

(4)


Model 2: t-portfolios with Laplace prior

Model SpecificationY ∼ t-Student fν(y ;µ, σ2) = N(µ, σ2Z−1

i ), Zi ∼ Γ(ν/2, ν/2)

π(βj ;λ) = λ exp {−λ|βj |}/2

• GDL t

argminβ,σ

{−(ν + 1

2

) n∑i=1

w (s−1)i log

{1 +

(xiβT − µ∗)2

νσ2

}+ λ

p∑j=1

v (s−1)j |βj |

}(5)

where νq = qν + (q − 1).


Behaviour of wi – Observations Weights

w (s)i = f ((xi β

(s−1)− µ∗)/σ(s−1))1−q (6)

−10 −5 0 5 100

0.2

0.4

0.6

0.8

1

1.2

x 10−3

Xβ

w

−10 −5 0 5 100

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

−3

Xβw

q = 0.5 q = 0.9


Behaviour of vj – Parameters Weights

v (s)j = π(β

(s−1)j ;λ)1−q. (7)

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

β

v

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

β

vq = 0.5 q = 0.9

Minimizing the GDL criterion• Modify the role of extreme observations and parameters by varying q• Minimize the divergence between the hypothetical and the tranformed version of

the density and prior distributionsFerrari, Giuzio, Paterlini (AU & GER) Penalized Lq-likelihood minimization Seminar 07.04.2014 19 / 44

Selection of λ

Select λ (for a given q) by minimizing

• ”Standard” AIC and BIC• Robust AIC and BIC (Ronchetti, 1997 and Machado, 1993)

AICq = −2n∑

i=1Lq

{f(

xTi βq,λ − µ∗

σq,λ

)}+ 2k, (8)

BICq = −2n∑

i=1Lq

{f(

xTi βq,λ − µ∗

σq,λ

)}+ log(n)k, (9)

where k is the number of active portfolio positions


Geometric Interpretation of GDL

ExampleLet’s assume two asset returns have bivariate normal distribution(

X1X2

)∼ N2

((01

),

(2 1/

√2

1/√

2 1

)),

When comparing Sharpe Ratios, the second asset is preferable

• Compare the optimal portfolios obtained with GDL N and LASSO• Contaminated vs not contaminated data






Simulation Study

Empirical Setup• n = 200 observations, p = 50 assets, k = 10 optimal positions• Four settings: ρ = 0.2, 0.4, 0.6, 0.8, 50 simulations• q = 0.9, target return µ∗ = k = 10

Model 1: X ∼ Np(µ,Σ) Model 2: X ∼ tp(µ,Σ, ν)

µj =

{1, j ≤ k,0, j > k,

Σij =

{1, i = j ,ρ, i 6= j ,

ν = 6.


Simulation Study

Goals:• Portfolio with low risk and high return, compared to a target• Sparsity

In each simulation:

• Given a grid of λ, compute estimates of β with GDL N, GDL t,Lasso, Zhang, Scad, Log, Lq

• Choose the model with the lowest BIC• Compare β to the optimal vector of β∗


Simulation Study

• Assess the whole performance by the Monte Carlo mean squared error

MSE =1

50

50∑b=1

µT βb − µ?√β

Tb Σβb

2

, (10)

where:

- µ?/σ? → target Sharpe ratio with σ? =

√β

Tb Σβb

- µT βb/

√β

Tb Σβb → portfolio Sharpe ratio

• Verify the selection of the k active assets by the F-measure:

F-measure = 2 |supp(β∗)| ∩ |supp(β)||supp(β∗)|+ |supp(β)|

, (11)

where supp(β) = {j : |βj | ≥ τ}.


Simulation Results – Sparsity

0.2 0.4 0.6 0.8

5

10

15

20

25

30

ρ

Num

ber

Act

ive

Pos

ition

s

GDL NGDL tLassoZhangLogLq

0.2 0.4 0.6 0.8

5

10

15

20

25

30

ρN

umbe

r A

ctiv

e P

ositi

ons

Model 1 Model 2


Simulation Results – F-measure

0.2 0.4 0.6 0.8

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ρ

F−

mea

sure


0.2 0.4 0.6 0.8

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ρF

−m

easu

re

Model 1 Model 2


Simulation Results – MSE

0.2 0.4 0.6 0.80

0.5

1

1.5

2

2.5

ρ

Mea

n S

quar

ed E

rror


0.2 0.4 0.6 0.80

1

2

3

4

5

ρM

ean

Squ

ared

Err

or

Model 1 Model 2


Financial Data

Index Trackingy returns vector of a Financial IndexX returns matrix of the Index Componentsβ vector of asset weights to be estimated

Data n p r σ S KF&F 100 1401 100 13.97 18.24 -0.03 5.27S&P 200 1401 200 10.70 26.94 -0.23 10.57S&P 500 1401 500 9.80 28.33 -0.30 13.09

Period from 23.08.2002 to 27.03.2008

Goal: build sparse portfolios able to track as closely as possible the index


Financial Data

−0.05 0 0.050

50

100

150

200

250

300

350

Log−returns

Abs

. Fre

quen

cyF&F 100

F&F 100Normalt−Student

−0.05 0 0.050

50

100

150

200

250

300

350

Log−returns

Abs

. Fre

quen

cy

S&P 200

S&P 200Normalt−Student

−0.05 0 0.050

50

100

150

200

250

300

350

Log−returns

Abs

. Fre

quen

cy

S&P 500

S&P 500Normalt−Student


Performance Evaluation

• Rolling Window of 250 obs, stepsize = 1 obs• Compare GDL N and GDL t with q = 0.5, 0.9,

Lasso and Eq. Weighted (1/N) Portfolios

Performance measures• Risk-return performance:

Information Ratio IR = ERTEV

• Sparsity and stability:k, average turnover (TO)

• Tracking ability:OOS correlation and beta wrt Index



Index GDL N GDL N GDL t GDL t Lasso 1/Nq = 0.9 q = 0.5 q = 0.9 q = 0.5

F&F 100ER (%) 0.338 0.302 0.170 0.186 1.030 0.929

TEV (%) 0.624 0.659 0.492 0.527 2.117 3.854IR 0.542 0.458 0.346 0.352 0.486 0.241

S&P 200ER (%) 0.319 0.639 -2.421 -0.196 4.760 5.937

TEV (%) 4.500 4.961 4.897 4.614 7.267 2.518IR 0.071 0.129 -0.494 -0.042 0.655 2.357

S&P 500ER (%) 2.906 -1.989 1.192 2.287 2.986 5.113

TEV (%) 6.966 8.100 9.018 8.859 10.315 3.107IR 0.417 -0.245 0.132 0.258 0.289 1.646



Index GDL N GDL N GDL t GDL t Lasso 1/Nq = 0.9 q = 0.5 q = 0.9 q = 0.5

F&F 100k 37.749 38.244 32.241 32.121 65.939 98

TO 0.068 0.060 0.066 0.064 0.017 -S&P 200

k 36.950 43.627 28.431 28.011 66.532 200TO 0.399 0.346 0.520 0.481 0.037 -

S&P 500k 44.564 45.736 27.770 26.944 66.407 500

TO 0.605 0.322 0.811 0.801 0.053 -


Tracking ability - Normalized trends

0 200 400 600 800 1000 120080

100

120

140

160

180

200

Time Window

Out

−O

f−S

ampl

e R

etur

n

F&F 1001/NLassoGDL t q=0.5

0 200 400 600 800 1000 120080

100

120

140

160

180

200

Time Window

Out

−O

f−S

ampl

e R

etur

n

S&P 2001/NLassoGDL N q=0.5

0 200 400 600 800 1000 120080

100

120

140

160

180

200

Time Window

Out

−O

f−S

ampl

e R

etur

n

S&P 5001/NLassoGDL N q=0.9


Conclusion

Penalized q-entropy minimization helps to

• select a sparse portfolio• handle highly-correlated variables• handle outliers and control for estimation errors

On-going work...• test on other financial data• extend to mean-variance framework• develop a method to optimally choose parameters q and λ


Appendix


Other Penalties

• LASSO

λp∑

i=1ρ(βi ) = λ‖β‖1.

• SCAD

λp∑

i=1ρ(βi ) =

p∑i=1

{λ|βi |, |βi | ≤ λ,

−β2i +2aλ|βi |−λ2

2(a−1) , λ < |βi | ≤ aλ,(a+1)λ2

2 aλ < |βi |


Other Penalties

• Zhang

λp∑

i=1ρ(βi ) =

p∑i=1

{λ|βi |, |βi | ≤ η,λη, η < |βi |

• Lq

λp∑

i=1ρ(βi ) = λ‖β‖qq, 0 < q < 1.

• Log

λp∑

i=1ρ(βi ) = λ

p∑i=1

(log(|βi |+ Φ)− log(Φ)).


General Re-weighting Algorithm

Given q, λ and µ∗

0. At Step s = 0, compute initial parameter values β(s) and σ(s).

1. Set s = s + 1, and update the data weightsw (s)

i = f ((xi β(s−1) − µ∗)/σ(s−1))1−q, i = 1, ..., n, and the penalty

weights v (s)j = π(β

(s−1)j ;λ)1−q, j = 1, ..., p.

2. Find the parameter values β and σ by minimizing

n∑i=1

wi log f ((xTi β − µ∗)/σ) +

p∑j=1

vj log π(βj ;λ), (12)

3. Compute β(s) and σ(s) by solving

f ((xTi β − µ∗)/σ) ∝ f (xT

i β − µ∗)/, σ)q for β and σ.4. Repeat Steps 1 and 2 until a stopping criterion is satisfied.


Model 1: Normal portfolios with Laplace prior

The weights wi and vj are updated using estimates obtained in Step s − 1as follows:

w (s−1)i =

1√2πσ2(s−1)

exp

−(µ∗ − xT

i β(s−1)

)2

2σ2(s−1)

1−q

, (13)

v (s−1)j =

[λ

2 exp{−λ|β(s−1)

j |}]1−q

. (14)

The portfolio variance is also updated using estimates from Step s − 1 as

σ2(s) =

∑ni=1 w (s−1)

i

(µ∗ − xT

i β(s−1)

)2

q∑n

i=1 w (s−1)i

. (15)


Model 2: t-portfolios with Laplace priorGiven w (s−1)

i and v (s−1)i , set the initial mixture weights zi = 1/n.

Then, obtain the updated estimates β(s) and σ(s) by iterating the

following expectation-maximization steps

• M-Step:

β′ = argminβ

{ n∑i=1

w (s−1)i z (s−1)

i12

(xT

i β − µσ(s−1)

)2

+ λ

p∑j=1

v (s−1)j |βj |

}, (16)

σ′2

=

∑ni=1 w (s−1)

i z (s−1)i

(xT

i β − µ)2

∑ni=1 w (s−1)

i z (s−1)i

× ν

(ν + 1)q − 1 . (17)

• E-Step:

zi =(νq + 1)σ′

2

νqσ′2 + w (s−1)

i (xTi β′ − µ)2

, i = 1, . . . , n, (18)

where νq = (ν + 1)q − 1.Ferrari, Giuzio, Paterlini (AU & GER) Penalized Lq-likelihood minimization Seminar 07.04.2014 42 / 44

References I

• Huber, P. J., 1981. Robust Statistics. John Wiley & Sons.• Ledoit, O. and Wolf, M., 2004. Honey, I shrunk the sample covariance

matrix. Journal of Portfolio Management 30, Volume 4, 110-119• Fan, J., Zhang, J., Yu, K., 2012. Vast Portfolio Selection With

Gross-Exposure Constraints. Journal of the American StatisticalAssociation, 107 (498), 592-606

• DeMiguel, V., Garlappi, L., Nogales, J., Uppal, R., 2009. AGeneralized Approach to Portfolio Optimization: ImprovingPerformance By Constraining Portfolio Norms. Management Science,Volume 55 (5), 798-812

• Ferrari, D., Yang, Y., 2010. Maximum Lq-likelihood estimation.Annals of Statistics, 38 (2), 753-783.


References II

• Ferrari, D., La Vecchia, D., 2012. On robust estimation viapseudo-additive information. Biometrika, 99 (1), 238-244.

• Fastrich, B., Paterlini, S., Winker, P., 2012. Constructing OptimalSparse Portfolios Using Regularization Methods. Available at SSRN:http://ssrn.com/abstract=2169062 orhttp://dx.doi.org/10.2139/ssrn.2169062

• Ferrari, D., Paterlini, S., 2010. Efficient and robust estimation forfinancial returns: an approach based on q-entropy. Available atSSRN: http://ssrn.com/abstract=1906819 orhttp://dx.doi.org/10.2139/ssrn.1906819

• Ronchetti, E.,1997. Robustness aspects of model choice. StatisticaSinica, 7, 327-338

• Machado, J., 1993. Robust model selection and m-estimation.Econometric Theory, 9, 478-478


Documents

Sparse and robust normal and t-portfolios by penalized Lq ... · Sparse and robust normal and t-portfolios by penalized Lq-likelihood minimization Davide Ferrari1, Margherita Giuzio2,