Project on Forecasting Equity Premium and Optimal Portfolios

Matematiska InstitutionenDepartment of Mathematics

Master’s Thesis

Forecasting the Equity Premium andOptimal Portfolios

Johan Bjurgert and Marcus Edstrand

Reg Nr: LITH-MAT-EX--2008/04--SELinköping 2008

Matematiska institutionenLinköpings universitet

581 83 Linköping

Forecasting the Equity Premium and OptimalPortfolios

Department of Mathematics, Linköpings universitet


LITH-MAT-EX--2008/04--SE

Handledare: Dr Jörgen Blomvallmai, Linköpings universitet

Dr Wofgang Maderrisklab GmbH

Examinator: Dr Jörgen Blomvallmai, Linköpings universitet

Linköping, 15 April, 2008

Avdelning, InstitutionDivision, Department

Division of MathematicsDepartment of MathematicsLinköpings universitetSE-581 83 Linköping, Sweden

DatumDate

2008-04-15

SpråkLanguage

Svenska/Swedish Engelska/English

RapporttypReport category

Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11795

ISBN—

ISRNLITH-MAT-EX--2008/04--SE

Serietitel och serienummerTitle of series, numbering

ISSN—

TitelTitle

Forecasting the Equity Premium and Optimal Portfolios

FörfattareAuthor


SammanfattningAbstract

The expected equity premium is an important parameter in many financialmodels, especially within portfolio optimization. A good forecast of the futureequity premium is therefore of great interest. In this thesis we seek to forecastthe equity premium, use it in portfolio optimization and then give evidence onhow sensitive the results are to estimation errors and how the impact of these canbe minimized.

Linear prediction models are commonly used by practitioners to forecastthe expected equity premium, this with mixed results. To only choose the modelthat performs the best in-sample for forecasting, does not take model uncertaintyinto account. Our approach is to still use linear prediction models, but also takingmodel uncertainty into consideration by applying Bayesian model averaging.The predictions are used in the optimization of a portfolio with risky assets toinvestigate how sensitive portfolio optimization is to estimation errors in themean vector and covariance matrix. This is performed by using a Monte Carlobased heuristic called portfolio resampling.

The results show that the predictive ability of linear models is not sub-stantially improved by taking model uncertainty into consideration. This couldmean that the main problem with linear models is not model uncertainty, butrather too low predictive ability. However, we find that our approach gives betterforecasts than just using the historical average as an estimate. Furthermore,we find some predictive ability in the the GDP, the short term spread and thevolatility for the five years to come. Portfolio resampling proves to be usefulwhen the input parameters in a portfolio optimization problem is suffering fromvast uncertainty.

Keywords: equity premium, Bayesian model averaging, linear prediction,estimation errors, Markowitz optimization

NyckelordKeywords equity premium, Bayesian model averaging, linear prediction, estimation errors,

Markowitz optimization

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11795

AbstractThe expected equity premium is an important parameter in many financial mod-els, especially within portfolio optimization. A good forecast of the future equitypremium is therefore of great interest. In this thesis we seek to forecast the equitypremium, use it in portfolio optimization and then give evidence on how sensitivethe results are to estimation errors and how the impact of these can be minimized.

Linear prediction models are commonly used by practitioners to forecast the ex-pected equity premium, this with mixed results. To only choose the model thatperforms the best in-sample for forecasting, does not take model uncertainty intoaccount. Our approach is to still use linear prediction models, but also takingmodel uncertainty into consideration by applying Bayesian model averaging. Thepredictions are used in the optimization of a portfolio with risky assets to investi-gate how sensitive portfolio optimization is to estimation errors in the mean vectorand covariance matrix. This is performed by using a Monte Carlo based heuristiccalled portfolio resampling.

The results show that the predictive ability of linear models is not substantiallyimproved by taking model uncertainty into consideration. This could mean thatthe main problem with linear models is not model uncertainty, but rather too lowpredictive ability. However, we find that our approach gives better forecasts thanjust using the historical average as an estimate. Furthermore, we find some pre-dictive ability in the the GDP, the short term spread and the volatility for the fiveyears to come. Portfolio resampling proves to be useful when the input parametersin a portfolio optimization problem is suffering from vast uncertainty.

Keywords: equity premium, Bayesian model averaging, linear prediction,estimation errors, Markowitz optimization

v

Acknowledgments

First of all we would like to thank risklab GmbH for giving us the opportunityto write this thesis. It has been a truly rewarding experience. We are gratefulfor the many inspirational discussions with Wolfgang Mader, our supervisor atrisklab. He also has provided us with valuable comments and suggestions. Wethank our supervisor at LiTH, Jörgen Blomvall, for his continous support andfeedback. Finally we would like to acknowledge our opponent Tobias Törnfeldt,for his helpful comments.

Johan BjurgertMarcus Edstrand

Munich, April 2008

vii

Contents

1 Introduction 51.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

I Equity Premium Forecasting using Bayesian Statistics 7

2 The Equity Premium 92.1 What is the equity premium? . . . . . . . . . . . . . . . . . . . . . 92.2 Historical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Implied models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4 Conditional models . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Multi factor models . . . . . . . . . . . . . . . . . . . . . . . . . . 132.6 A short summary of the models . . . . . . . . . . . . . . . . . . . . 142.7 What is a good model? . . . . . . . . . . . . . . . . . . . . . . . . 152.8 Chosen model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Linear Regression Models 173.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 The classical regression assumptions . . . . . . . . . . . . . . . . . 213.3 Robustness of OLS estimates . . . . . . . . . . . . . . . . . . . . . 223.4 Testing the regression assumptions . . . . . . . . . . . . . . . . . . 23

4 Bayesian Statistics 254.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Sufficient statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3 Choice of prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4 Marginalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.5 Bayesian model averaging . . . . . . . . . . . . . . . . . . . . . . . 304.6 Using BMA on linear regression models . . . . . . . . . . . . . . . 32

ix

x Contents

5 The Data Set and Linear Prediction 375.1 Chosen series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2 The historical equity premium . . . . . . . . . . . . . . . . . . . . 375.3 Factors explaining the equity premium . . . . . . . . . . . . . . . . 395.4 Testing the assumptions of linear regression . . . . . . . . . . . . . 455.5 Forecasting by linear regression . . . . . . . . . . . . . . . . . . . . 51

6 Implementation 536.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2 Linear prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.3 Bayesian model averaging . . . . . . . . . . . . . . . . . . . . . . . 556.4 Backtesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7 Results 577.1 Univariate forecasting . . . . . . . . . . . . . . . . . . . . . . . . . 577.2 Multivariate forecasting . . . . . . . . . . . . . . . . . . . . . . . . 607.3 Results from the backtest . . . . . . . . . . . . . . . . . . . . . . . 62

8 Discussion of the Forecasting 65

II Using the Equity Premium in Asset Allocation 69

9 Portfolio Optimization 719.1 Solution of the Markowitz problem . . . . . . . . . . . . . . . . . . 719.2 Estimation error in Markowitz portfolios . . . . . . . . . . . . . . . 769.3 The method of portfolio resampling . . . . . . . . . . . . . . . . . 779.4 An example of portfolio resampling . . . . . . . . . . . . . . . . . . 789.5 Discussion of portfolio resampling . . . . . . . . . . . . . . . . . . . 79

10 Backtesting Portfolio Performance 8510.1 Backtesting setup and results . . . . . . . . . . . . . . . . . . . . . 85

11 Conclusions 89

Bibliography 91

A Mathematical Preliminaries 97A.1 Statistical definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 97A.2 Statistical distributions . . . . . . . . . . . . . . . . . . . . . . . . 98

B Code 100B.1 Univariate predictions . . . . . . . . . . . . . . . . . . . . . . . . . 100B.2 Multivariate predictions . . . . . . . . . . . . . . . . . . . . . . . . 101B.3 Merge time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103B.4 Load data into Matlab from Excel . . . . . . . . . . . . . . . . . . 103B.5 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104B.6 Removal of outliers and linear prediction . . . . . . . . . . . . . . . 104

Contents xi

B.7 setSubColumn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104B.8 Portfolio resampling . . . . . . . . . . . . . . . . . . . . . . . . . . 105B.9 Quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . . 106

List of Figures3.1 OLS by means of projection . . . . . . . . . . . . . . . . . . . . . . 183.2 The effect of outliers . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Example of a Q-Q plot . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.1 Bayesian revising of probabilities . . . . . . . . . . . . . . . . . . . 26

5.1 The historical equity premium over time . . . . . . . . . . . . . . . 385.2 Shapes of the yield curve . . . . . . . . . . . . . . . . . . . . . . . 435.3 QQ-Plot of the one step lagged residuals for factors 1-9 . . . . . . 475.4 QQ-Plot of the one step lagged residuals for factors 10-18 . . . . . 485.5 Lagged factors 1-9 versus returns on the equity premium . . . . . . 495.6 Lagged factors 10-18 versus returns on the equity premium . . . . 50

6.1 Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2 User interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7.1 The equity premium from the univariate forecasts . . . . . . . . . . 587.2 Likelihood function values for different g-values . . . . . . . . . . . 597.3 The equity premium from the multivariate forecasts . . . . . . . . 607.4 Backtest of univariate models . . . . . . . . . . . . . . . . . . . . . 627.5 Backtest of multivariate models . . . . . . . . . . . . . . . . . . . . 63

9.1 Comparison of efficient and resampled frontier . . . . . . . . . . . . 819.2 Resampled portfolio allocation when shorting allowed . . . . . . . 829.3 Resampled portfolio allocation when no shorting allowed . . . . . . 839.4 Comparison of estimation error in mean and covariance . . . . . . 84

10.1 Portfolio value over time using different strategies . . . . . . . . . . 86

2 Contents

List of Tables2.1 Advantages and disadvantages of discussed models . . . . . . . . . 14

3.1 Critical values for the Durbin-Watson test. . . . . . . . . . . . . . 23

5.1 The data set and sources . . . . . . . . . . . . . . . . . . . . . . . . 385.2 Basic statistics for the factors . . . . . . . . . . . . . . . . . . . . . 405.3 Outliers identified by the leverage measure . . . . . . . . . . . . . . 455.4 Jarque-Bera test of normality . . . . . . . . . . . . . . . . . . . . . 465.5 Durbin-Watson test of autocorrelation . . . . . . . . . . . . . . . . 465.6 Principle of lagging time series for forecasting . . . . . . . . . . . . 515.7 Lagged R2 for univariate regression . . . . . . . . . . . . . . . . . . 52

7.1 Forecasting statistics in percent . . . . . . . . . . . . . . . . . . . . 577.2 The univariate model with highest probability over time . . . . . . 587.3 Out of sample, R2

os,uni, and hit ratios, HRuni . . . . . . . . . . . . 597.4 Forecasting statistics in percent . . . . . . . . . . . . . . . . . . . . 607.5 The multivariate model with highest probability over time . . . . . 617.6 Forecasts for different g-values . . . . . . . . . . . . . . . . . . . . 617.7 Out of sample, R2

os,mv, and hit ratios, HRmv . . . . . . . . . . . . 61

9.1 Input parameters for portfolio resampling . . . . . . . . . . . . . . 78

10.1 Portfolio returns over time . . . . . . . . . . . . . . . . . . . . . . . 8610.2 Terminal portfolio value . . . . . . . . . . . . . . . . . . . . . . . . 87

Nomenclature

The most frequently used symbols and abbreviations are described here.

Symbolsµ Demanded portfolio returnβi,t Beta for asset i at time tβt True least squares parameter at time tµ Asset return vectorΩt Information set at time tΣ Estimated covariance matrixcov[X] Covariance of the random variable Xβt Least squares estimate at time tΣ Sampled covariance matrixut Least squares sample residual at time tλm,t Market m price of risk at time tC Covariance matrixIn The unity matrix of size n× nw Weights of assetstr[X] The trace of the matrix Xvar[X] Variance of the random variable XDi,t Dividend for asset i at time tE[X] Expected value of the random variable Xrf,t Riskfree rate at time t to t+ 1rm,t Return from asset m at time tut Population residual in the least square model at time t

AbbreviationsaHEP Average historical equity premiumBMA Bayesian model averagingDJIA Dow Jones industrial averageEEP Expected equity premiumGDP Gross domestic productHEP Historical equity premiumIEP Implied equity premiumOLS Ordinary least squaresREP Required equity premium

3

Chapter 1

Introduction

The expected equity risk premium is one of the single most important economicvariables. A meaningful estimate of the premium is critical to valuing companiesand stocks and for planning future investments. However, the only premium thatcan be observed is the historical premium.

Since the equity premium is shaped by overall market conditions, factors influ-encing market conditions can be used to explain the equity premium. Althoughpredictive power usually is low, the factors can also be used for forecasting. Manyof the investigations undertaken, typically set out to determine a best model, con-sisting of a set of economic predictors and then proceed as if the selected modelhad generated the equity premium. Such an approach ignores the uncertaintyin model selection leading to over confident inferences that are more risky thanone thinks that they are. In our thesis we will forecast the equity premium bycomputing a weighted average of a large number of linear prediction models usingBayesian model averaging (BMA) to allow for model uncertainty being taken intoaccount.

Having forecasted the equity premium - the key input for asset allocation op-timization models, we conclude by highlighting main pitfalls in the mean varianceoptimization framework and present portfolio resampling as a way to arrive atsuitable allocation decisions when the input parameters are very uncertain.

5

6 Introduction

1.1 ObjectivesThe objective of this thesis is to build a framework for forecasting the equitypremium and then implement it to produce a functional tool for practical use.Further, the impact of uncertain input parameters in mean-variance optimizationshall be investigated.

1.2 Problem definitionBy means of BMA and linear prediction, what is the expected equity premiumfor the years to come and how is it best used as an input in a mean varianceoptimization problem?

1.3 LimitationsThe practical part of this thesis is limited to the use of US time series only.However, the theoretical framework is valid for all economies.

1.4 ContributionsTo the best knowledge of the authors, this is the first attempt to forecast theequity premium using Bayesian model averaging with the priors specified later inthe thesis.

1.5 OutlineThe first part of the thesis is about forecasting the equity premium whereas thesecond part discusses the importance of parameter uncertainty in portfolio opti-mization.

In chapter 2 we present the concept of the equity premium, usual assumptionsthereof and associated models. Chapter 3 describes the fundamental ideas of lin-ear regression and its limitations. In chapter 4 we first present basic concepts ofBayesian statistics and then use them to combine the properties of linear predic-tion with Bayesian model averaging. Having defined the forecasting approach wein chapter 5 turn to the factors explaining the equity premium. Chapter 6 ad-dresses the implementation of the theory. Finally, chapter 7 presents our resultsand a discussion thereof is found in chapter 8. In chapter 9 we investigate the im-pact of estimation error on portfolio optimization. In chapter 10 we evaluate theperformance of a portfolio when using the forecasted equity premium and portfo-lio resampling. With chapter 11 we conclude our thesis and make propositions offuture investigations and work.

Part I

Equity Premium Forecastingusing Bayesian Statistics

7

Chapter 2

The Equity Premium

In this chapter we define the concept of the equity premium and present somemodels that have been used for estimating the premium. At the end of the chap-ter, a table summing up advantages and disadvantages of the different models isprovided. The chapter concludes with a motivation to why we have chosen to workwith multi factor models and a summary of criterions for a good model.

2.1 What is the equity premium?As defined by Fernandéz [32], the equity premium can be split up into four differentconcepts. These concepts hold for single stocks as well for stock indices. In ourthesis the emphasis is on stock indices.

• historical equity premium (HEP): historical return of the stock marketover riskfree asset

• expected equity premium (EEP): expected return of the stock marketover riskfree asset

• required equity premium (REP): incremental return of the market port-folio over the riskfree rate required by an investor in order to hold the marketportfolio, or the extra return that the overall stock market must provide overthe riskfree asset to compensate for the extra risk

• implied equity premium (IEP): the required equity premium that arisesfrom a pricing model and from assuming that the market price is correct.

The HEP is observable on the financial market and is equal for all investors.1 Itis calculated by

HEPt = rm,t − rf,t−1 = ( PtPt−1

− 1)− (rf,t−1) (2.1)

1This is true as long as they use the same instruments and the same time resolution.

9

10 The Equity Premium

where rm,t is the return on the stock market, rf,t−1 is the rate on a riskfree assetfrom t− 1 to t. Pt is the stock index level.

A widely used measure for rm,t is the return on a large stock index. For thesecond asset rf,t−1 in (2.1), the return on government securities is usually used.Some practitioners use the return on short-term treasury bills; some use the re-turns on long-term government bonds. Yields on bonds instead of returns havealso been used to some extent. Despite the indisputable importance of the equitypremium, a general consensus on exactly which assets should enter expression (2.1)does not exist. Questions like: “Which stock index should be used?” and “Whichriskfree instrument should be used and which maturity should it have?” remainunanswered.

The EEP is made up of the markets expectations of future returns over a risk-freeasset and is therefore not observable in the financial market. Its magnitude andthe most appropriate way to produce estimates thereof is an intensively debatedtopic among economists. The market expectations shaping the premium are basedon, at least, a non-negative premium and to some extent also average realizationsof the HEP. This would mean that there is a relation between the EEP and theHEP. Some authors (e.g. [9], [21], [37] and [42]), even argue that there is a strictequality between the both, whereas other claim that the EEP is smaller than theHEP (e.g. [45], [6] and [22]). Although investors have different opinions to whatis the correct level of the expected equity premium, many basic financial booksrecommend using 5-8%.2

The required equity premium (REP) is important in valuation since it is the keyto determining the company’s required return on equity.

If one believes that prices on the financial markets are correct, then the implied eq-uity premium, (IEP), would be an estimate of the expected equity premium (EEP).

We now turn to presenting models being used to produce estimates of the dif-ferent concepts.

2.2 Historical modelsThe probably most used method by practitioners is to use the historical realizedequity premium as a proxy for the expected equity premium [64]. They therebyimplicitly follow the relationship HEP = EEP .

Assuming that the historical equity premium is equal to the expected equity pre-mium can be formulated as

rm,t = Et−1[rm,t] + em,t (2.2)

2See for instance [8]

2.3 Implied models 11

where em,t is the error term, the unexpected return. The expectation is often com-puted as the arithmetic average of all available values for the HEP. In equation(2.2), it is assumed that the errors are independent and have a mean of zero. Themodel then implies that investors are rational and the random error term corre-sponds to their mistakes. It is also possible to model more advanced errors. Forexample, an autoregressive error term might be motivated since market returnssometimes exhibit positive autocorrelation. An AR(1) model then implies thatinvestors need one time step to learn about their mistakes. [64]

The model has the advantages of being intuitive and easy to use. The draw-backs on the other hand are not few. Except for usual problems with time series,such as used length, outliers etc, the model suffers from problems with longer pe-riods where the riskfree asset has a higher average return than the equity. Clearly,this is not plausible since an investor expects a positive return in order to invest.

2.3 Implied modelsImplied models for the equity premium make use of the assumption EEP = IEPand are used much in a similar way as investors use the Black and Scholes formulabackwards to solve for implied volatility. The advantage of implied models is thatthey provide time-varying estimates for the expected market returns since pricesand expectations change over time. The main drawback is that the validity isbounded by the validity of the model used. Lately, the inverse Black Littermanmodel has attracted interest, see for instance [67]. Another more widely usedmodel is the Gordon dividend growth model which is further discussed in [11].Under certain assumptions it can be written as

Pit = E[Di,t+1]E[ri,t+1]− E[gi,t+1] (2.3)

where E[Di,t+1] are the next years expected dividend, E[ri,t+1] the required rateof return and E[gi,t+1] is the company’s expected growth rate of dividends fromtoday until infinity.

Assuming that CAPM3 holds, the required rate of returns for stock i can bewritten as

E[ri,t] = rf,t + βi,tE[rm,t − rf,t] (2.4)

By combining the two equations, where dividends are approximated as E[Di,t+1] =[1 + E[gi,t+1]]Di,t, under assumption that E[rf,t+1] = rf,t+1 and by aggregating

3Capital asset pricing model, see [7]


over all assets, we can now solve for the expected market risk premium

E[rm,t+1] = (1 + E[gm,t+1])Dm,t

Pm,t+ E[gm,t+1]

= (1 + E[gm,t+1]) DivYieldm,t +E[gm,t+1] (2.5)

where E[rm,t+1] is the expected market risk premium, Dm,t is the sum of dividendsfrom all companies, E[gm,t+1] is the expected growth rate of the dividends fromtoday to infinity4, and DivYieldm,t is the current market price dividend yield. [64]

One critic against using the Gordon dividend growth model is that the resultdepend heavily on what number is used for the expected dividend growth rate andthereby the problem is shifted to forecasting the expected dividend growth rate.

2.4 Conditional modelsConditional models refers to models conditioning on the information investors useto estimate the risk premium and thereby allow for time-varying estimations. Onthe other hand, the information set Ωt used by investors is not observable on themarket and it is not clear how to specify a method that investors use to form theirexpectations from the data set.

As an example of such a model, the conditional version of the CAPM impliesthe following restriction for the excess returns

E[ri,t|Ωt−1] = βi,tE[rm,t|Ωt−1] (2.6)

where the market beta is

βi,t = cov [ri,t, rm,t|Ωt−1]var [rm,t|Ωt−1] (2.7)

and E[ri,t|Ωt−1] and E[rm,t|Ωt−1] are expected returns on asset i and the marketportfolio conditional on investors’ information set Ωt−1

5.

Observing that the ratio E[rm,t|Ωt−1]/ var[rm,t|Ωt−1] is the market price of riskλm,t, measuring the compensation an investor must receive for a unit increasein the market return variance [55], yields the following expression for the marketportfolio’s expected excess returns

E[rm,t|Ωt−1] = λm,t(Ωt−1) var [rm,t|Ωt−1 ]. (2.8)

By specifying a model for the conditional variance process, the equity premiumcan be estimated.

4E[Rm,t+1] > E[gm,t+1]5Both returns are in excess of the riskless rate of return rf,t−1 and all returns are measured

in one numeraire currency.

2.5 Multi factor models 13

2.5 Multi factor modelsMulti factor models make use of correlation between equity returns and returnsfrom other economic factors. By choosing a set of economic factors and by deter-mining the coefficients, the equity premium can be estimated as

rm,t = αt +∑j

βj,tXj,t + εt (2.9)

where the coefficients α and β usually are calculated using the least squares method(OLS), X contains the factors and ε is the error.

The most prominent candidates of economic factors used as explanatory variablesare the dividend to price ratio and the dividend yield (e.g. [60], [12], [28], [40] and[51]), the earnings to price ratio (e.g. [13], [14] and [48]), the book to market ratio(e.g. [46] and [58]), short term interest rates (e.g. [40] and [1]), yield spreads (e.g.[43], [15] and [29]), and more recently the consumption-wealth ratio (e.g. [50]).Other candidates are dividend payout ratios, corporate or net issuing ratios andbeta premia (e.g. [37]), the term spread and the default spread (e.g. [2], [15], [29]and [43]), the inflation rate (e.g. [30], [27] and [19]), value of high and low betastocks (e.g. [57]) and aggregate financing activity (e.g. [3]).

Goyal and Welch [37] showed that most of the mentioned predictors performedworse out-of-sample than just assuming that the equity premium had been con-stant. They also found that the predictors were not stable, that is their importancechanges over time. Campbell and Thompson [16] on the other hand found thatsome of the predictors, with significant forecasting power in-sample, generally havea better out-of-sample forecast power than a forecast based on the historical av-erage.


2.6 A short summary of the models

Model type Advantages DisadvantagesHistorical Intuitive and easy to use Might have problems with

longer periods of negativeequity premium

Doubtful whether past isan indicator for future

Implied Relatively simple to use The validity of the esti-mates is bounded to thevalidity of the used model

Provides time varying es-timates for the premium

Assumes market prices arecorrect

Conditional Provides time varying es-timates for the premium

The information used byinvestors are not visible onthe market

Models for determininghow investors form theirexpectations from the in-formation are not unam-biguous

Multi Factor High model transparencyand results are easy to in-terpret

It is doubtful whether pastis an indicator for future

Forecasts are only possiblefor a short time horizon,due to lagging

Table 2.1. Table highlighting advantages and disadvantages of the discussed models

2.7 What is a good model? 15

2.7 What is a good model?These are model criterions that the authors, inspired of Vaihekoski [64], considerimportant for a good estimate of the equity premium:

Economical reasoning criterions

• The premium estimate should be positive for most of the time

• Model inputs should be visible at the financial markets

• The estimated premium should be rather smooth over time because investorpreferences presumably do not change much over time

• The model should provide different premium estimates for different timehorizons, that is, taking investors “time structure” into account

Technical reasoning criterions

• The model should allow for time variation in the premium

• The model should make use of the latest time t observation

• The model should be provided with a precision of the estimated premium

• It should be possible to use different time resolutions in the data input

2.8 Chosen modelAll model categories previously stated are likely to be useful in estimating theequity premium. In our thesis we have chosen to work with multi factor modelsbecause they are intuitively more straight forward than both implied and condi-tional models; all model inputs are visible on the market and it is perfectly clearfrom the model how different factors add up to the equity premium. Furthermore,it is easy to add constraints to the model, which enables the use of economicreasoning as a complement to pure statistical analysis.

Chapter 3

Linear Regression Models

First we summarize the mechanics of linear regressions and present some formu-las that hold regardless of what statistical assumptions that are made. Then wediscuss different statistical assumptions about the properties of the model and ro-bustness of the estimates.

3.1 Basic definitionsSuppose that a scalar yt is related to a vector xt ∈ Rk×1 and a noise term utaccording to the regression model

yt = x>t β + ut. (3.1)

Definition 3.1 (Ordinary least squares OLS) Given an observed sample(y1, y2, . . . , yT ), the ordinary least squares estimate of β (denoted βt) is the valuethat minimizes the residual sum of squares: V (β) =

∑Tt=1 ε

2t (β) =

∑Tt=1(yt− yt)2

=∑Tt=1(yt − xtβ)2 (see [38])

Theorem 3.1 (Ordinary least squares estimate) The OLS estimate is givenby

β = [T∑t=1

(xtx>t )]−1[T∑t=1

(xtyt)] (3.2)

assuming that the matrix∑Tt=1(xtx>t ) ∈ Rk×k is nonsingular (see [38]).

Proof : The result is found by differentiation,dV (β)dβ = −2

∑Tt=1 xt(yt − xtβ) = 0,

and the minimizing argument is thus

17

18 Linear Regression Models

β = [∑Tt=1(xtx>t )]−1[

∑Tt=1(xtyt)].

Often, the regression model is written in matrix notation as

y = Xβ + u, (3.3)

where y ≡

y1y2...yn

X ≡

xT1xT2...xTn

u ≡

u1u2...un

.

A perhaps more intuitive way to arrive at equation (3.2) is to project y on thecolumn space of X.

Figure 3.1. OLS by means of projection

The vector of the OLS sample residuals, u can then be written as u = y −Xβ.Consequently the loss function V (β) for the least squares problem can be written

V (β) = minβ(u>u).

Since y, the projection of y on the column space of X, is orthogonal to u

u>y = y>u = 0. (3.4)

In the same way, the OLS sample residuals are orthogonal to the explanatoryvariables in X

u>X = 0. (3.5)

3.1 Basic definitions 19

Now, substituting y = Xβ into (3.4) yields

(Xβ)>(y−Xβ) = 0 ⇔β>(X>y−X>Xβ) = 0.

By choosing the nontrivial solution for beta, and by noticing that if X is of fullrank, then the matrix X>X also is of full rank and we can compute the leastsquares estimator by inverting X>X.

β = (X>X)−1X>y. (3.6)

The OLS sample residual u shall not be confused with the population residual u.The vector of OLS sample residuals can be written as

u = y−Xβ = y−X(X>X)−1X>y = [In −X(X>X)−1X>]y = MXy. (3.7)

The relationship between the two errors can now be found by substituting equation(3.3) into equation (3.7)

u = MX(Xβ + u) = MXu. (3.8)

The difference between the OLS estimate β and the true parameter β is found bysubstituting equation (3.3) into (3.6)

β = (X>X)−1X>[Xβ + u] = β + (X>X)−1X>u. (3.9)

Definition 3.2 (Coefficient of determination) The coefficient of determina-tion, R2, is defined as the fraction of variance that is explained by the model

R2 = var[y]var[y] .

If we let X include an intercept, then (3.5) also implies that the fitted residualshave a zero mean 1

n

∑ni=1 ui = 0. Now we can decompose the variance of y into

the variance of y and u

var[y] = var[y + u] = var[y] + var[u]− 2 cov[y, u].

Rewriting the covariance as

cov[y, u] = E[yu]− E[y]E[u]

and by using y ⊥ u and E[u] = 0 we can write R2 as

R2 = var[y]var[y] = 1− var[u]

var[y] .


Since OLS minimizes the sum of squared fitted errors, which is proportional tovar[y], it also maximizes R2.

By substituting the estimated variances, R2 can be written as

var[y]var[y] =

1n

∑ni=1 (yi − y)2

1n

∑ni=1 (yi − y)2

=∑ni=1 (yi)2 − ny2∑ni=1 (yi)2 − ny2

= (Xβ)>(Xβ)− ny2

y>y− ny2

= y>X(X>X)−1X>y− ny2

y>y− ny2

where the identity used is calculated asn∑i=1

(xi − x)2 =n∑i=1

[x2i −

2nxi

n∑i=1

xi + 1n2 (

n∑i=1

xi)2]

=n∑i=1

(x2i )−

2n

(n∑i=1

xi)2 + n

n2 (n∑i=1

xi)2

=n∑i=1

(x2i )−

1n

(n∑i=1

xi)2

=n∑i=1

(x2i )− nx2.

3.2 The classical regression assumptions 21

3.2 The classical regression assumptionsThe following assumptions1 are used for later calculations

1. xt is a vector of deterministic variables

2. ut is i.i.d. with mean 0 and variance σ2

(E[u] = 0 and E[uu>] = σ2In)

3. ut is Gaussian (0, σ2)

Substituting equation (3.3) into equation (3.6) and taking expectations using as-sumptions 1 and 2 establishes that β is unbiased,

β = (X>X)−1X>[Xβ + u] = β + (X>X)−1X>u (3.10)E[β] = β + (X>X)−1X>E[u] = β (3.11)

with covariance matrix given by

E[(β − β)(β − β)>] = E[(X>X)−1X>uu>X(X>X)−1] (3.12)= (X>X)−1X>E[uu>]X(X>X)−1

= σ2(X>X)−1X>X(X>X)−1

= σ2(X>X)−1.

When u is Gaussian, the above calculations imply that β is Gaussian. Hence, thepreceding results imply

β ∼ N(β, σ2(X>X)−1).

It can further be shown that under assumption 1,2 and 3, β is BLUE2, that is, nounbiased estimator of β is more efficient than the OLS estimator β.

1As treated in [38]2BLUE, best linear unbiased estimator see the Gauss-Markov theorem


3.3 Robustness of OLS estimatesThe most serious problem with OLS is non-robustness to outliers. One single badpoint will have a strong influence on the solution. To remedy this one can dis-card the worst fitting data-point and recompute the OLS fit. In figure 3.2, theblack line illustrates the result of discarding an outlier. Deleting of an extreme

Figure 3.2. The effect of outliers

point can be justified by arguing that there seldom are outliers which practicallymakes them unpredictable and therefore the deletion would make the predictivepower stronger. Sometimes extreme points correspond to extraordinary changesin economies and depending on context it might be more or less justified to discardthem.

Because the outliers do not get a higher residual they might be easy to over-look. A good measure for the influence of a data point is its leverage.

Definition 3.3 (Leverage) To compute leverage in ordinary least squares, thehat matrix H is given by H = X(X>X)−1X>, where X ∈ Rn×p and n ≥ p.

Since y = Xβ = X(X>X)−1X>y the leverage measures how an observation es-timates its own predicted value. The diagonals hii of H contains the leveragemeasures and are not influenced by y. A rule of thumb [39] for detecting out-liers is that hii > 2 (p+1)

n signals a high leverage point, where p is the number ofcolumns in the predictor matrix X aside from the intercept and n is the numberof observations. [39]

3.4 Testing the regression assumptions 23

3.4 Testing the regression assumptionsUnfortunately assumption 2 can easily be violated for time series data since manytime series exhibit autocorrelation, resulting in the OLS estimates being inefficient,that is, they have higher variability than they should.

Definition 3.4 (Autocorrelation function) The jth autocorrelation of a co-variance stationary process3, denoted ρj , is defined as its jth autocovariance di-vided by the variance

ρj ≡γjγ0,where γj = E(Yt − µ)(Yt−j − µ). (3.13)

Since ρj is a correlation, |ρj | ≤ 1 for all j. Note also that ρ0 equals unity for allcovariance stationary processes.

A natural estimate of the the sample autocorrelation ρj is provided by the corre-sponding sample moments

ρj ≡ γjγ0, where

γj = 1T

∑Tt=j+1(Yt − y)(Yt−j − y) j = 0, 1, 2 . . . , T − 1

y = 1T

∑Tt=1(Yt).

Definition 3.5 (Durbin-Watson test) The Durbin-Watson test statistics is usedto detect the presence of autocorrelation in the residuals from a regression analysisand is defined by

DW =∑Tt=2 (et − et−1)2∑T

t=1 e2t

(3.14)

where the et, t = 1, 2, . . . , T are the regression analysis residuals.

The null hypothesis of the statistic is that there is no autocorrelation, that isρ = 0 and the opposite hypothesis, that there is autocorrelation, ρ 6= 0. Durbinand Watson [23] derive lower and upper bounds for the critical values, see table3.1.

ρ = 0 → DW≈ 2 No Correlationρ = 1 → DW≈ 0 Positive Correlationρ = −1 → DW≈ 4 Negative Correlation

Table 3.1. Critical values for the Durbin-Watson test.

3For a definition of a covariance stationary process, see appendix A.1.


One way to check assumption 3 is to plot the underlying probability distributionof the sample against the theoretical distribution. Figure 3.3 is called a Q-Q plot.

Figure 3.3. Example of a Q-Q plot

For a more detailed analysis the Jarque-Bera test, a godness of fit measure fromdeparture of normality, based on skewness and kurtosis can be employed.

Definition 3.6 (Jarque-Bera test) The test statistic JB is defined as

JB = n

6

(S2 + (K − 3)2

4

)(3.15)

where n is the number of observations, S is the sample skewness and K is thesample kurtosis, defined as

S =1n

∑nk=1 (xk − x)3

( 1n

∑nk=1 (xk − x)2)3/2

K =1n

∑nk=1 (xk − x)4

( 1n

∑nk=1 (xk − x)2)2

where x is the sample mean.

Asymptotically JB ∼ χ2(2) which can be used to test the null hypothesis thatdata are from a normal distribution. The null hypothesis is a joint hypothesisof skewness being 0 and the excess kurtosis being 3 since samples from a nor-mal distribution have an expected skewness of 0 and an expected kurtosis of 3.The definition shows that any deviation from the expectations increases the JBstatistic.

Chapter 4

Bayesian Statistics

First, we introduce fundamental concepts of Bayesian statistics and then we pro-vide tools for calculating posterior densities which are crucial to our forecasting.

4.1 Basic definitionsDefinition 4.1 (Prior and posterior) If Mj , j ∈ J, are considered models,then for any data D,p(Mj), j ∈ J, are called the prior probabilities of the Mj , j ∈ Jp(Mj |D), j ∈ J, are called the posterior probabilities of the Mj , j ∈ Jwhere p denotes probability distribution functions (See [5]).

Definition 4.2 (The likelihood function) Let x = (x1, . . . , xn) be a randomsample from a distribution p(x; θ) depending on an unknown parameter θ in theparameter space A. The function lx(θ) =

∏ni=1 p(xi; θ) is called the likelihood

function.

The likelihood function is then the probability that the values x1, . . . , xn are inthe random sample. Mind that the probability density is written as p(x; θ). Thisis to emphasize that θ is the underlying parameter and will not be written outexplicitly in the sequel. Depending on context we will also refer to the likelihoodfunction as p(x|θ) instead of lx(θ).

Theorem 4.1 (Bayes’s theorem) Let p(y,θ), denote the joint probability den-sity function (pdf) for a random observation vector y and a parameter vector θ,also considered random. Then according to usual operations with pdf’s, we have

p(y,θ) = p(y|θ)p(θ)=p(θ|y)p(y)

and thus

p(θ|y) = p(θ)p(y|θ)p(y) = p(θ)p(y|θ)∫

A p(y|θ)p(θ)dθ (4.1)

25

26 Bayesian Statistics

with p(y) 6= 0. In the discrete case, the theorem is written as

p(θ|y) = p(θ)p(y|θ)p(y) = p(θ)p(y|θ)∑

i∈A p(y|θi)p(θi). (4.2)

The last expression can be written as follows

p(θ|y) ∝ p(θ)p(y|θ)posterior pdf ∝ pdf × likelihood function, (4.3)

here, p(y), the normalizing constant needed to obtain a proper distribution in θis discarded and ∝ denotes proportionality. The use of the symbol ∝ is explainedin the next section.

Figure 4.1 highlights the importance of Bayes‘s theorem and shows how the priorinformation enters the posterior pdf via the prior pdf, whereas all the sample in-formation enters the posterior pdf via the likelihood function.

Figure 4.1. Bayesian revising of probabilities

Note that an important difference between the Bayesian statistics and the classicalFisher statistics is that the parameter vector θ is considered to be a stochasticvariable rather than an unknown parameter.

4.2 Sufficient statisticsA sufficient statistics can be seen as a summary of the information in data, whereredundant and uninteresting information has been removed.

Definition 4.3 (Sufficient statistics) A statistic t(x) is sufficient for an under-lying parameter θ precisely if the conditional probability distribution of the datax, given the statistic t(x), is independent of the parameter θ, (see [17]).

Shortly the definition states that θ can not give any further information about xif t(x) is sufficient for θ, that is, p(x|t, θ) = p(x|t).

The Neyman’s factorization theorem provides a convinient characterization of asufficient statistics.

4.2 Sufficient statistics 27

Theorem 4.2 (Neyman’s factorization theorem) A statistic t is sufficientfor θ given y if and only if there are functions f and g such that

p(y|θ) = f(t, θ)g(y)

where t = t(y). (see [49])

Proof: For a proof see [49]

Here, t(y) is the sufficient statistics and the function f(t, θ) relates the sufficientstatistics to the parameter θ, while g(y) is a θ independent normalization factorof the pdf.

It turns out that many of the common statistical distributions have a similarform. This leads to the definition of the exponential family.

Definition 4.4 (The exponential family) A distribution is from the one-parameterexponential family if it can be put into the form

p(y|θ) = g(y)h(θ) exp[t(y)Ψ(θ)].

Equivalently, if the likelihood of n independent observations y = (y1, y2 . . . yn)from this distribution is of the form

ly(θ) ∝ h(θ)n exp[∑t(yi)Ψ(θ)],

then it follows immediately from definition 4.2 that∑t(yi) is sufficient for θ given

y.

Example 4.1: Sufficient statistics for a GaussianFor a sequence of independent Gaussian variables with unknown mean µ

yt = µ+ et ∼ N(µ, σ2), t = 1, 2, . . . , N

p(y|µ) =∏Nt=1

1√2πσ2 exp[− 1

2σ2 (yt − µ)2]

= exp[− 12σ2

∑µ2 + 2µ

∑yt]︸︷︷︸

=f(t,µ)

(2πσ2)−N/2 exp[− 12σ2

∑y2t ]︸︷︷︸

=g(y)

the sufficient statistics t(y) is given by t(y) =∑yt.


4.3 Choice of priorSuppose our model M of a set of data y is parameterized by θ. Our knowledgeabout θ before y is measured (given) is quantified by the prior pdf, p(θ). Aftermeasuring y the posterior pdf is available as p(θ|y) ∝ p(y|θ)p(θ). It is clear thatdifferent assumptions of p(θ) leads to different inferences p(θ|y).

A good rule of thumb for prior selection is that your prior should represent thebest knowledge available about the parameters before looking at data. For exam-ple, the number of scores in a football game can not be less than zero and is lessthan 1000, which justifies setting your prior equal to zero outside this interval.In the case that one does not have any information, a good idea might be to usean uninformative prior.

Definition 4.5 (Jeffreys prior) Jeffreys prior pJ(θ) is defined as proportionalto the determinant of the Fisher information matrix of p(y|θ)

pJ(θ) ∝ |J(θ|y)| 12 (4.4)

whereJ(θ|y)i,j = −Ey

[∂2ln p(y|θ)∂θi∂θj

]. (4.5)

The Fisher information is a way of measuring the amount of information thatan observable random variable y = (y1, . . . , yn) carries about a set of unknownparameters θ = (θ1, . . . , θn). The notation J(θ|y) is used to make clear that theparameter vector θ is associated with the random variable y and should not bethought of as conditioning. A perhaps more intuitive way1 to write (4.5) is

J(θ|y)i,j = covθ[ ∂∂θi

ln p(y|θ), ∂

∂θjln p(y|θ)] (4.6)

Mind that the Fisher information only is defined under certain regularity condi-tions, which is further discussed in [24]. One might wonder why Jefferys made hisprior proportional to the square root of the determinant of the fisher informationmatrix. There is a perfectly good reason for this, consider a transformation of theunknown parameters θ to ψ(θ) then if K is the matrix Kij = ∂θi/∂ψj

J(ψ|y) = KJ(θ|y)K>

and hence the determinant of the information satisfies

|J(ψ|y)| = |J(θ|y)||K|2.

Because |K| is the Jacobian, and thus, does not depend on y, it follows that

pJ(θ) ∝ |J(θ|y)| 12

provides a scale-invariant prior, which is a highly desirable property for a referenceprior. In Jefferys’ own words “any arbitrariness in the choice of parameters couldmake no difference to the results”.

1Remember that cov[x,y] = E[(x− µx)(y− µy)].

4.3 Choice of prior 29

Example 4.2

Consider a random sample y = (y1, . . . , yn) ∼ N(θ, φ), with mean θ known andvariance φ unknown. The Jeffreys prior pJ(φ) for φ is then computed as follows

L(φ|y) = ln (p(y|φ)) = ln (n∏i=1

1√2πφ

exp[− (xi − θ)2

2φ ])

= ln (( 1√2πφ

)n exp[− 12φ

n∑i=1

(xi − θ)2])

= − 12φ

n∑i=1

(xi − θ)2 − n

2 lnφ+ c

⇒ ∂2L

∂φ2 = − 3φ3

n∑i=1

(xi − θ)2 + n

φ2

⇒ −E[∂2L

∂φ2 ] = 3φ3E[

n∑i=1

(xi − θ)2]− n

φ2 =

= 3φ3 (nφ)− n

φ2 = 2nφ2

⇒ pJ(φ) ∝ |J(φ|y)| 12 ∝ 1φ

A natural question that arises is what choices of priors generate analytical expres-sions for the posterior distribution. This question leads to the notion of conjugatepriors.

Definition 4.6 (Conjugate prior) Let l be a likelihoodfunction ly(θ). A classΠ of prior distributions is said to form a conjugate family if the posterior density

p(θ|y) ∝ p(θ)ly(θ)

is in the class Π for all y whenever the prior density is in Π (see [49]).

There is a minor complication with the definition and a more rigorous definition ispresented in [5]. However, the definition states the key principle in a clear enoughmatter.


Example 4.3Let x = (x1, . . . , xn) have independent Poisson distributions with the same meanλ, then the likelihood function lx(λ) equals

lx(λ) =∏ni=1 (λ

xi

xie−λ) = λt e−nλ∏n

i=1xi∝ λte−nλ

where t =∑ni=1 xi and by theorem 4.2 is sufficient for λ given x.

If we let the prior of λ be in the family Π of constant multiples of chi-squaredrandom variables, p(λ) ∝ λv/2−1e−S0λ/2, then the posterior is also in Π.

p(λ|x) ∝ p(λ)lx(λ) = λt+v/2−1e−12 (S0+2n)λ

The distribution of p(λ) is explained in appendix A.2.

Conjugate priors are useful in computing posterior densities. Although there arenot that many priors that are conjugate, there might be a risk of overuse sincedata might be better described by another distribution that is not conjugate.

4.4 MarginalizationA useful property of conditional probabilities is the possibility to integrate outundesired variables. According to usual operations of pdf’s we have∫

p(a, b)db = p(a).

Analogously, for any likelihood function of two or more variables, marginal like-lihoods with respect to any subset of the variables can be defined. Given thelikelihood ly(θ,M) the marginal likelihood ly(M) for model M is

ly(M) = p(y|M) =∫p(y|θ,M)p(θ|M)dθ.

Unfortunately marginal likelihoods are often very difficult to calculate and numer-ical integration techniques might have to be employed.

4.5 Bayesian model averagingTo explain the powerful idea of Bayesian model averaging (BMA) we start by anexample

Example 4.4Suppose we are analyzing data and believe that it arises from a set of probabilitydistributions or models Miki=1. For example, the data might consist of a normallydistributed outcome y that we wish to predict future values of. We also have twoother outcomes, x1 and x2, that covariates with y. Using the two covariates as

4.5 Bayesian model averaging 31

predictors on y offers two models, M1 and M2 as explanation for what values y islikely to take on in the future. A novel approach to deciding what future value ofy should be used might be to simply average the two estimates. But, if one of themodels suffers from bad predictive ability, then the average of the two estimatesis not likely to be especially good. Bayesian model averaging solves this issue bynormalizing the estimates y1 and y2 by how likely the models arey = p(M1|Data)y1 + p(M2|Data)y2. Using theory from the previous chapters itis possible to compute the probability p(Mi|Data) for each model.

We now treat the averaging more mathematically.

Let ∆ be a quantity of interest, then its posterior distribution given data D is

p(∆|D) =K∑k=1

p(∆|Mk,D)p(Mk|D). (4.7)

This is a weighted average of the posterior probability where each model Mk isconsidered. The posterior probability for model Mk is

p(Mk|D) = p(D|Mk)p(Mk)∑Kl=1 p(D|Ml)p(Ml)

, (4.8)

where

p(D|Mk) =∫p(D|θk,Mk)p(θk|Mk)dθk (4.9)

is the marginalized likelihood of the modelMk with parameter vectors θk as definedin section 4.4. All probabilities are implicitly conditional onM, the set of modelsbeing considered. The posterior mean and variance of ∆ are given by

ξ = E[∆|D] =K∑k=1

∆kp(Mk|D) (4.10)

φ = var[∆|D] = E(y2|D)− E(y|D)2 (4.11)

=K∑k=1

(var[y|D,Mk] + ∆k)p(Mk|D)− E[y|D]2

where ∆k = E[∆|D,Mk], (see [41]).


4.6 Using BMA on linear regression modelsHere, the key issue is the uncertainty about the choice of regressors, that is themodel uncertainty. Each modelMj is of the previously discussed form y = Xjβj+u ∼ N(Xjβj , σ

2In), where the regressors Xj ∈ Rn×p ∀j, with the interceptincluded, correspond to the regressor set, j ∈ J , specified in chapter 5. Thequantity y is the given data and we are interested in the quantity ∆, the regressionline.

p(y|βj , σ2) = ly(βj , σ2) = ( 12πσ2 )n2 exp[− 1

2σ2 (y−Xjβj)>(y−Xjβj)]

By completing the square in the exponent, the sum of squares can be written as

(y−Xβ)>(y−Xβ) = (β − β)>X>X(β − β) + (y−Xβ)>(y−Xβ),

where β = (X>X)−1X>y is the OLS estimate. That the equality holds is provedby multiplying out the right handside and checking that it equals the left handside.

As pointed out in section 3.1, (y − Xβ) is the residual vector u and its sumof squares divided by the number of observations less the number of covariates isknown as the residual mean square denoted by s2.

s2 = u>u(n−p) = u>u

(v) ⇒ u>u = vs2

It is convenient to denote n−p as v, known as the degrees of freedom of the model.

Now we can write the likelihood as

ly(βj , σ2) ∝ (σ2)−pj2 exp[− 1

2σ2 (βj − βj)>(X>j Xj)(βj − βj)]× (σ2)−vj2 exp[−vjs

2j

2σ2 ].

The BMA analysis requires the specification of prior distribution for the parame-ters βj and σ2. For σ2 we choose an uninformative prior

p(σ2) ∝ 1/σ2, (4.12)

which is the Jeffreys prior as calculated in example 4.2. For βj the g-prior, asintroduced by Zellner [68], is applied

p(βj |σ2,Mj) ∼ fN (βj |0, σ2g(X>j Xj)−1), (4.13)

where ∼ fN (w|m,V ) denotes a normal density on w with mean m and covariancematrix V. The expression σ2(X>X)−1 is recognized as the covariance matrix ofthe OLS-estimate and the prior covariance matrix is then assumed to be propor-tional to the sample covariance with a factor g which is used as a design parameter.An increase of g makes the distribution more flat and therefore gives higher pos-terior weights to large absolute values of βj .

4.6 Using BMA on linear regression models 33

As shown by Fernandez, Ley and Steel [33] the following three theoretical valuesof g lead to consistency, in the sense of asymptotically selecting the correct model.

• g = 1/nThe prior information is roughly equal to the information available from onedata observation

• g = k/nHere, more information is assigned to the prior as the number of predictorsk grows

• g = k(1/k)/nNow, less information is assigned to the prior as the number of predictorsgrows

To arrive at a posterior probability of the models given data we also need to specifythe prior distribution for each modelMj overM the space of all K = 2p−1 models.

∀Mj ∈M =

p(Mj) = pj , j = 1, . . . ,Kpj > 0∑Kj=1 pj = 1

In our application, we chose pj = 1/K so that we have a uniform distributionover the model space since we at this point have no reason to favor a model toanother. Now, the priors chosen have the tractable property of an analytical ex-pression for ly(Mj) the marginal likelihood.

Theorem 4.3 (Derivation of the marginal likelihood) Using the above spec-ified priors, the marginalized likelihood function is given by

ly(Mj) =∫p(y|βj , σ2,Mj)p(σ2)p(βj |σ2,Mj)dβjdσ2 =

= Γ(n/2)πn/2(g + 1)p/2

(y>y− g

1 + gy>Xj(X>j Xj)−1X>j y)−n2 .

Proof :

ly(Mj ,βj , σ2) = p(y|βj , σ2,Mj)p(βj |σ2,Mj)p(σ2) =

= (2πσ2)−n/2 exp[− 12σ2 (vjs2

j + (βj − βj)>(X>j Xj)(βj − βj))]

×(2πσ2)−p/2|Z0|−1/2 exp[− 12σ2 (βj − βj)>Z0(βj − βj))]× 1/σ2

To integrate the expression we start by completing the square of the exponents. Here,we do not write out the index on the variables. Mind that Z0 is used instead of writingout the g-prior.


(β − β)>X>X(β − β) + (β − β)>Z0(β − β)= β>(X>X + Z0)β−β>(X>Xβ+ Z0β)− (β>X>X + β>Z0)β+ β>X>Xβ+ β>Z0β == β>(X>X + Z0)β − β>(X>X + Z0) (X>X + Z0)−1(X>Xβ + Z0β)︸︷︷︸

=B1

− (β>X>X + β>Z0)(X>X + Z0)−1︸︷︷︸=B>1

(X>X + Z0)β + β>X>Xβ + β>Z0β =

= β>(X>X + Z0)β − β>(X>X + Z0)B1 −B>1 (X>X + Z0)β + B>1 (X>X + Z0)B1−B>1 (X>X + Z0)B1 + β>X>Xβ + β>Z0β =

= (β −B1)>(X>X + Z0)(β −B1)−B>1 (X>X + Z0)B1 + β>X>Xβ + β>Z0β == (β−B1)>(X>X + Z0)(β−B1)− (β>X>X + β>Z0)(X>X + Z0)−1(X>Xβ+ Z0β)++β>X>Xβ + β>Z0β =

= (β −B1)>(X>X + Z0)(β −B1)− (β>X>X)(X>X + Z0)−1(X>Xβ)− (β>X>X)(X>X + Z0)−1Z0β − (β>Z0)(X>X + Z0)−1(X>Xβ)+−(β>Z0)(X>X + Z0)−1(Z0β) + (β>X>X)(X>X + Z0)−1(X>X + Z0)β+β>Z0(X>X + Z0)−1(X>X + Z0)β =

= (β −B1)>(X>X + Z0)(β −B1)− [(β>X>X)(X>X + Z0)−1(Z0β)+ (β>Z0)(X>X + Z0)−1(X>Xβ)− (β>X>X)(X>X + Z0)−1(Z0β)− (β>Z0)(X>X + Z0)−1(X>Xβ)] =

/X>X(X>X + Z0)−1Z0 = ((X>X)−1 + Z−10 )−1/

= (β−B1)>(X>X+Z0)(β−B1)− [β>((X>X)−1 +Z−10 )−1β+β>((X>X)−1 +Z−1

0 )−1β−β>((X>X)−1 + X−1

0 )−1β − β>((X>X)−1 + Z−10 )−1β] =

= (β −B1)>(X>X + Z0)(β −B1) + (β − β)>((X>X)−1 + Z−10 )−1(β − β).

Now we can write ly(Mj ,βj , σ2) as

1/σ2 × (2πσ2)−(n+p)/2 × exp[− 12σ2 S1]× exp[− 1

2σ2 (βj −B1)>(A1)(βj −B1)] where

S1 = vjs2j + (βj − βj)>((X>j Xj)−1 + Z−1

0 )−1(βj − βj)A1 = Z0 + X>j Xj

The second exponent is the kernel of a multivariate normal density2 and integrating withrespect to β yields

1/σ2 × (2πσ2)−n/2|Z0|1/2|A1|−1/2 × exp[− 12σ2 S1]

which in turn is the kernel of an inverted Wishart density3.

2For a definition, see Appendix A3For a definition, see Appendix A

4.6 Using BMA on linear regression models 35

We now integrate with respect to σ2 resulting in

lY (Mj) = (2π)−n/2|Z0|1/2|A1|−1/2|S1|−n/2c0(n′ = n+ 2, p′ = 1)× k

where k is a proportionality constant canceling in the posterior expression. To obtainthe marginal likelihood we substitute Z0 with the inverse of the g-prior 1

g(X>j Xj), where

σ2 is integrated out.

|S1|−n/2 = S−n/21 = (vjs2j + β>j ((1 + g)X>j Xj)−1βj)−n/2

= (vjs2j + β>j (1/(1 + g))(X>j Xj)−1βj)−n/2

= ((y−Xjβj)>(y−Xjβj) + β>j (1/(1 + g))(X>j Xj)−1βj)−n/2

= (y>y− g

1 + gy>Xj(X>j Xj)−1X>j y)−n/2

|Z0|1/2 = |1g

X>j Xj |1/2 = (1/g)p/2|X>j Xj |1/2

|A1|−1/2 = 1|A1|1/2

= 1(1 + (1/g))p/2|X>j Xj |1/2

c0(n′ = n+ 2, p′ = 1) = 2n/2Γ(n/2)

And finally we arrive at ly(Mj) = Γ(n/2)πn/2(g+1)p/2 (y>y− g

1+gy>Xj(X>j Xj)−1X>j y)−n/2.


Now, applying Bayes rule yields the posterior model probabilities

p(Mj |y) = p(y|Mj)pj∑nk=1 p(y|Mk)pk

Meanwhile, the mean and variance of the predicted values, ∆, is given by

ξ = E(∆|y) =K∑j=1

Xjβjp(Mj |y) (4.14)

φ = var[∆|y] =K∑j=1

[σ2uXj(X>j Xj)−1X>j

+ (Xjβj)2]p(Mj |y) − [Xjβjp(Mj |y)]2

(4.16)

where the expression var[∆|y,Mk] from equation (4.11) is calculated as

var[∆|y,Mk] = var[Xkβk] = E[Xk(β − β)(β − β)>X>k ] (4.17)= XkE[(β − β)(β − β)>]X>k= σ2

uXk(X>kXk)−1X>k .

The estimation error is calculated as

Sk =

√√√√ 1n

n∑i=1

φii. (4.18)

Finally the confidence interval for our BMA estimate for the equity premium iscalculated

I1−α(ξk) = ξk ± Φ−1(1− α

2 ) Sk√n, (4.19)

where Φ = p(X ≤ x) when X is N(0, 1). This interval results from the centrallimit theorem stating that for a set of n i.i.d. random variables with finite meanµ and variance σ2, the sample average approaches the normal distribution with amean µ and variance σ2

n as n increases. This holds irrespectively of the shape ofthe original distribution. It then follows, that for each time step the 218 estimatesof the equity premium has a sample mean and variance that is normal distributed.

Chapter 5

The Data Set and LinearPrediction

In this chapter we first describe the used data set and then explain and motivatethe predictors we have chosen to forecast the expected equity premium. We alsocheck that our statistical assumptions hold and explain how the predictions arecarried out.

5.1 Chosen seriesThe data set consists of information from three different sources: Bloomberg R©,FRED R©and ValueLine R©, see table 5.1. In total the set consists of 18 differenttime series, which can be divided into three different groups: data on a large stockindex, interest rates and macroeconomic factors. The data set has yearly datafrom 1959 to 2007 on each series. The time series from ValueLine ends in 2003and has been prolonged with data from Bloomberg while data from FRED coversthe whole time span.

5.2 The historical equity premiumThe historical yearly realized equity premium can be seen in figure 5.1, where thepremium is calculated as in expression (2.1) with Pt as the index level of DowJones Industrial Average (DJIA)1 and rf,t−1 being the US 1-year treasury billrate. It is this historical time series that will be used as dependent variable in theregression models.

1DJIA is is a price-weighted average of 30 significant stocks traded on the New York StockExchange and the Nasdaq. In contrast, most stock indices are market capitalization weighted,meaning that larger companies account for larger proportions of the index.

37

38 The Data Set and Linear Prediction

Time series Bloomberg Ticker FRED Id Value LineDow Jones Industrial Average (DJIA) INDU Index.Px Last - XDJIA Dividend Yield .Eqy Dvd Yld 12m - XDJIA Price-Earnings Ratio .Pe Ratio - XDJIA Book Value per share .Indx Weighted Book Val - XDJIA Price-Dividend Ratio .Eqy Dvd Yld 12m - XDJIA Earnings per share .Indx General Earn - XConsumer Price Index - CPIAUCNS -Effective Federal Funds Rate - FEDFUNDS -3-month Treasury Bill - TB3MS -1-Year Treasury Rate - GS1 -10-Year Treasury Rate - GS10 -Moody’s Aaa Corp Bond Yield - AAA -Moody’s Baa Corp Bond Yield - BAA -Producer Price Index - PPIFGS -Industrial Production Index - INDPRO -Personal Income - PI -Gross Domestic Product - GDPA -Consumer Sentiment - UMCSENT -

Table 5.1. The data set and sources

Figure 5.1. The historical equity premium over time

5.3 Factors explaining the equity premium 39

5.3 Factors explaining the equity premiumFrom the time series in table 5.1 we have constructed 18 predictors, which shouldaccount for changes in the stock index as well as changes in the general economy.

1. Dividend yield is the dividend yield on the Dow Jones Industrial AverageIndex (DJIA).

2. Price-earnings ratio is the price-earnings ratio on DJIA.

3. Book value per share is the book value per share on DJIA.

4. Price-dividend ratio is the price dividend ratio on DJIA.

5. Earnings per share is the earnings per share on DJIA.

6. Inflation is measured by the consumer price index for all urban consumersand all items.

7. Fed funds rate is the US effective federal funds rate.

8. Short term interest rate is the 3-month US treasury bill secondary marketrate.

9. Term spread short is the US 1-year treasury with constant maturity rateless the 3-month US treasury bill secondary market rate.

10. Term spread long is the US 10-year treasury with constant maturity rateless the US 1-year treasury with constant maturity rate.

11. Credit spread is Moody’s Baa corporate bond yield returns less the Aaacorporate bond yield.

12. Producer price is the US producer price index for finished goods.

13. Industrial production is the US industrial production index.

14. Personal income is the US personal income.

15. GDP is the gross US domestic product.

16. Consumer sentiment is the University of Michigan time series for con-sumer sentiment.

17. Volatility is the standard deviation of the returns on DJIA.

18. Earnings-book ratio is earnings per share divided by book value per sharefor DJIA.

For all 18 factors above we have used the fractional change defined as

ri,t = ItIt−1− 1 (5.1)

where ri,t is the return on factor i at time t and It is the factor level at time t.The basic statistics for the 18 factors is found in table 5.3.


Factors1 2 3 4 5 6 7 8 9

Mean 0.00 0.07 0.06 0.02 0.07 0.04 0.09 0.07 -0.02Std 0.14 0.37 0.15 0.13 0.23 0.03 0.40 0.40 0.11Median 0.00 -0.01 0.05 0.01 0.10 0.04 0.06 0.01 -0.02Min -0.30 -0.38 -0.20 -0.23 -0.61 0.01 -0.71 -0.68 -0.34Max 0.32 1.73 0.87 0.29 0.64 0.14 1.28 1.65 0.20

10 11 12 13 14 15 16 17 18Mean -0.04 0.00 0.04 0.03 0.07 0.07 0.00 0.04 1.50Std 0.27 0.04 0.04 0.05 0.03 0.03 0.14 0.01 11.84Median 0.01 -0.01 0.02 0.03 0.07 0.06 0.00 0.04 0.79Min -1.29 -0.10 -0.03 -0.09 0.01 0.00 -0.28 0.01 -52.24Max 0.53 0.15 0.16 0.11 0.13 0.13 0.42 0.08 48.60

Table 5.2. Basic statistics for the factors

Dividend yieldThe main reason for the supposed predictive power of the dividend yield is thepositive relation between expected high dividend yields and high returns. This isa result from using a discounted cash flow framework under the assumption thatthe expected stock return is equal to a constant. For instance Campbell [11] hasshown that the current stock price is equal to the expected present value of futuredividends out to the infinite future. Assuming that the current dividend yields willremain the same in the future, the positive relation follows. This relationship canalso be observed in the Gordon dividend growth model. In the absence of capitalgains, the dividend yield is also the return on the stock and measures how muchcash flow you are getting for each unit of cash invested.

Price-earnings ratioPrice-earnings ratio, price per share divided by earnings per share, measures howmuch an investor is willing to pay per unit of earnings. A high Price-earnings ratiothen suggests that investors think the firm has good growth opportunities or thatthe earnings are safe and therefore more valuable [7].

Book value per shareBook value per share, value of equity divided by the number of outstanding share,is the raw value of the stock and should be compared with the market value ofthe stock. These two figures are rarely the same. Most of the time a stock tradeto a multiple of the book value. A low book value per share in comparison withthe market value per share suggests that the stock is high valued or perhaps evenovervalued, the reciprocal also holds.

Price-dividend ratioThe price-dividend ratio, price per share divided by annual dividend per share, isthe reciprocal of the dividend yield. A low ratio might mean that investors requirea high rate of return or that they are not expecting dividend growths in the future


[7]. As a consequence, a low ratio could be a forecast of less profitable times. Alow ratio can also indicate either a fast growing company or a company with pooroutlooks. A high ratio could either point to a mature company with few growthopportunities or just a mature stable company with temporarily low market value.

Earnings per shareEarnings per share, profit divided by the number of outstanding share, is moreinteresting if you calculate and view the incremental change for a period of time.A steady rate of increasing earnings per share could suggest good performance anddecreasing earnings per share figures would suggest poor performance.

InflationInflation is defined as the increase in the price of some set of goods and servicesin a given economy over a period of time [10]. The inflation is usually measuredthrough a consumer price index, which measures nominal consumer prices for abasket of items bought by a typical consumer. The prices are weighted by thefraction the typical consumer spends on each item. [20]

Many different theories for the role and impact of inflation in an economy havebeen proposed, but they all have some basic implications in common. A highinflation make people more interested in investing their savings in assets that areinflation protected, e.g. real estate, instead of holding fixed income assets suchas bonds. By moving away from fixed income and investing in other assets thehopes are that the returns will exceed the inflation. As a result, high inflationleads to reduced purchasing power as individuals reduce money holdings. Highinflations are unpredictable and volatile. This creates uncertainty in the businesscommunity, reducing investment activity and thus economic growth. If a periodof high inflation rules, a prolonged period of high unemployment must be paid toreduce inflation to modest levels again. This is the main reason for fearing highinflation. [44]

A low inflation usually implies that the price levels are expected to increase overtime and therefore it is beneficial to spend and borrow in the short run. A lowinflation is the starting point for a higher rate of inflation.

Central banks try to contain the rate of inflation to a predetermined interval,usually 2-3 %, in order to maintain a stable price level and currency value. Themeans for doing so are given to the banks by changing the discount rate - increas-ing the rate usually dampens the inflation and the other way around.

Generally, no producer is keen on lowering their prices, just as no employee acceptsa decrease in their nominal salary. This leads to that a small level of inflation hasto be allowed in order for the pricing system to work efficient. Inflation levelsabove this threshold are considered negative, mainly due to the fact that inflationcreates further inflation expectations. [44]


Besides being linked to the general state of the economy, inflation also has greatimpact on interest rates. If the inflation rises, so will the nominal interest rateswhich in turn influence the business conditions. [44]

Federal funds rateThe federal funds rate is one of the most important money market instruments. Itis the rate that banks in the US charge each other for lending overnight. Federalfunds are tradable reserves that commercial banks are required to maintain withthe Fed. The Fed does not pay interest on these reserves so banks maintain theminimum reserve position possible and sell the excess to other banks short of cashto meet their reserve deposit needs. The federal funds rate therefore is roughlyanalogous to the London Interbank Offered Rate (LIBOR). [4]

A bank that wishes to finance a client venture but does not have the means todo so can lend capital from another bank to the federal funds rate. As a result,the federal funds rate set the threshold for how willing banks are to finance newventures. As the rate increases, banks become more reluctant to take out theseinter-bank loans. A low rate will on the other hand encourage banks to borrowmoney and hence increase the possibilities for businesses to finance new ventures.Therefore, this rate somewhat controls the US business climate.

Short term interest rateThe short term interest rate (3-month T-bills) is an important rate which manyuse as a proxy for the risk-free rate and hence enters many different valuationmodels used by practitioners. As a result, changes in the short term rate influ-ences the market prices. For instance, an increase in the short term rate makes thepresent value of cash flows to the firm take on a smaller value and a discount cashflow model for a firm’s stock would as a result imply a lower stock price. Anothersimple implication is that an increase also just make it more expensive for firmsto finance themselves in the short run. In general, an increase in the short termrate tend to slow economic growth and dampen inflation. The short term interestrate is also linked, in its movements, to the federal funds rate.

Term spreadA yield curve can take on many different shapes and there are several differenttheories trying to explain the shape. When talking about the shape of the yieldcurve one refers to the slope of the curve. Is it flat, upward sloping, downwardsloping or humped? Upward and downward sloping curves are also referred to asnormal and inverted yield curves. A yield curve constructed from prices in thebond market can be used to calculate different term spreads, differences in ratesfor two different maturities. For this reason the term spread is related to the slopeof the yield curve. Here we have defined the short term spread as the difference inrates between the maturities one year and three months and the long term spreadas the difference between ten years and one year maturities. Positive short andlong term spreads could imply an upward sloping yield curve, and the oppositecould imply a downward sloping curve. A positive short term spread and a nega-


tive long term spread could correspond to a humped yield curve.

Yield curves almost always slope upwards, figure 5.2 a. One reason for this isexpectation of future increases in inflation and therefore investors require a pre-mium for locking in their money at an interest rate that is not inflation protected.[44] As mentioned earlier, increase in inflation comes with economy growth whichmakes an upward sloping yield curve a sign of good times. The growth itselfcan also be partly explained by the lower short term rate which makes it cheaperfor companies to borrow for expanding. Furthermore, central banks are expectedto fend off the expected rise in inflation with higher rates, decreasing the priceof long-term bonds and thus increasing their yields. A downward sloping yieldcurve, figure 5.2 b occurs when the expectations is that future inflation will belower than current inflation and thus the expectation also is that the economy willslow down in the future [44]. A low long term bond yield is acceptable since theinflation is low. In fact, each of the six last recessions in the US has been precededby an inverted yield curve [25]. This shape could also be developed as the FederalReserve raises their nominal federal funds rate.

(a) Normal (b) Inverted (c) Flat (d) Humped

Figure 5.2. Shapes of the yield curve

A flat yield curve, figure 5.2 c, signals uncertainty in the economy and should notbe visible for any longer time periods. Investors should in theory not have anyincentive to hold long-dated bonds over shorter-dated bonds when there is no yieldpremium. Instead they would sell off long-dated bonds resulting in higher yields inthe long end and an upward sloping yield curve. A humped yield curve, figure 5.2d, arises when investors expect interest rates to rise over the next several periodsand then decline. It could also signal the beginning of a recession or just be theresult of a shortage in the supply of long or short-dated bonds. [18]

Credit spreadYields on corporate bonds are almost always higher than on treasuries with thesame maturity. This is mainly a result of the higher default risk in corporatebonds, even if other factors have been suggested as well. The corporate spread,also known as the credit spread, is usually the difference between the yields on aBaa rated corporate bond and a government bond, with the same time to maturity


of course. Research [47] has shown that only around 20-50 percent of the creditspread can be accounted for by the default risk only, when calculating the creditspread with government bonds as the reference instrument. If one instead usesAaa rated corporate bonds, you hopefully increase this number. Above all, themain reason for using credit spread as an explaining/forecasting variable at all isthat the credit spread seems to widen in recessions and to shrink in expansionsduring the business cycle [47]. It can also change as other bad news hit the market.Our corporate bond series have bonds with a maturity as close as possible to 30years, and are averages of daily data.

Producer priceThe producer price measures the average change over time in selling prices receivedby domestic producers of goods and services. It is measured from the perspectiveof the seller in contrast with the consumer price index that measure from pur-chaser’s perspective. These two may differ due to government subsidies, sales andexcise taxes and distribution costs.[63]

Industrial production and personal incomeIndustrial production measures the output from the US industrial sector whichis defined as being compromised to manufacturing, mining and electric and gasutilities [31]. Personal income measures the sum of wages and salaries in dollarsfor the US.

Gross domestic productThe gross domestic product (GDP) is considered as a good measure of the size ofan economy and how well it is performing. This statistics is defined as the marketvalue of all goods and services produced within a country in a given time periodand is computed every three months by the Bureau of Economic Analysis. Morespecifically, the GDP is the sum of spending divided into four broad categories:consumption, investment, government purchases and net exports. The change ofthe GDP describes how the economy varies so therefore it is an indicator of theconjuncture cycle. [53]

Consumer sentimentThe consumer sentiment index is based on household interviews and gives an in-dication of the future business climate, personal finance and spending in the USand therefore has implications on stocks, bonds and cash markets.[62]

VolatilityVolatility is the standard deviation of the change in value of a financial instrument.The volatility is here calculated on monthly observations for each year. The basicidea behind volatility as an explaining variable is that volatility is synonymouswith risk. High volatility should imply a higher demand for risk compensation, ahigher equity premium.

5.4 Testing the assumptions of linear regression 45

Earnings-book ratioThe earnings-book ratio relates the earnings per share to the book value per shareand measures a firm’s efficiency at generating profits. The ratio is also called ROE,return on equity. It is likely that a high ROE yields a high equity premium be-cause general business conditions have to be good in order to generate a good ROE.

5.4 Testing the assumptions of linear regressionAs discussed in chapter 3.3 the estimated coefficients in the OLS-solution are verysensitive to outliers. By applying the leverage measure from definition 3.3 theoutliers in table 5.3 have been found. Elements in y deviating more than threestandard deviations from the mean of y have been removed and replaced by lin-ear interpolated values. This have been repeated three times for each factor timeseries. In total, an average of one outlier per time series factor per time step hasbeen removed and interpolated.

Step Outlierstot1 192 183 184 145 16

Table 5.3. Outliers identified by the leverage measure for univariate predictions

The assumptions that must hold for a linear regression model were presented inchapter 3.2 and the means for testing these assumptions were given in chapter 3.4.After having removed outliers, it is motivated to check for violations against theclassical regression assumptions.

The QQ-plots for all factors are presented in figure 5.3 and 5.4. By visual in-spection of each subplot, it is seen that for some factors, the points on the plot fallclose to the diagonal line - the error distribution is likely to be gaussian. Otherfactors shows sign of kurtosis due to the S-shaped form. A Jarque-Bera test on thesignificance level 0.05 has been performed to rule out the uncertainties of depar-tures from the normal distribution. From the results in table 5.4 it is found thatwe can not reject the null hypothesis that the residuals are Gaussian at significancelevel 0.05. The critical value represents the upper limit for the null hypothesis tohold, the P-Value represents the probability of observing the same outcome giventhat the null hypothesis is true or put another way if the P-Value is above thesignificance level we cannot reject the null hypothesis.


Factor1 2 3 4 5 6 7 8 9

JB-Value 2.39 1.79 1.35 2.24 1.69 1.27 0.96 1.14 2.00Crit-Value 4.84 4.88 4.95 4.92 4.95 4.89 4.95 4.93 4.93P-Value 0.16 0.26 0.39 0.18 0.29 0.41 0.53 0.46 0.22H0 or H1 H0 H0 H0 H0 H0 H0 H0 H0 H0

10 11 12 13 14 15 16 17 18JB-Value 1.62 2.14 0.85 1.77 0.96 0.82 1.72 2.18 1.62Crit-Value 4.94 4.98 4.93 4.92 4.91 4.90 4.91 4.88 4.94P-Value 0.30 0.20 0.58 0.26 0.53 0.59 0.28 0.19 0.30H0 or H1 H0 H0 H0 H0 H0 H0 H0 H0 H0

Table 5.4. Jarque-Bera test of normality at α = 0.05 for univariate residuals for laggedfactors

To investigate the presence of autocorrelation in the residuals a Durbin-Watsontest is performed. If the Durbin-Watson test statistics is close to 2, it indicatesthat there is no autocorrelation in the residuals. As can be seen in table 5.5 alltest statistics group around 2 and it can be assumed that autocorrelation is notpresent. It can be concluded from these two tests and from checking that theerrors indeed have an average of zero, that the classical regression assumptions inchapter 3.2 are fulfilled for the univariate models. For the multivariate models, ithas not been verified that the assumptions hold, this is due to the large numberof models. Even if the assumptions are not fulfilled, OLS can still be used, but itis not guaranteed that it is the best linear unbiased estimate.

Factor1 2 3 4 5 6 7 8 9

DW-Value 1.83 2.10 2.02 1.88 2.10 2.19 2.09 2.09 2.16P-Value 0.46 0.85 0.97 0.58 0.83 0.67 0.89 0.89 0.64

10 11 12 13 14 15 16 17 18DW-Value 2.08 1.97 2.23 1.92 2.23 2.08 2.11 2.02 2.05P-Value 0.92 0.82 0.57 0.67 0.56 0.95 0.81 0.91 0.98

Table 5.5. Durbin-Watson test of autocorrelation for univariate residuals for laggedfactors


(a) Dividend yield (b) Price-earnings ratio (c) Book value per share

(d) Price-dividend ratio (e) Earnings per share (f) Inflation

(g) Fed funds rate (h) Short term interest rate (i) Term spread short

Figure 5.3. QQ-Plot of the one step lagged residuals for factors 1-9 versus standardnormal pdf


(a) Term spread long (b) Credit spread (c) Producer price

(d) Industrial production (e) Personal income (f) Gross domestic product

(g) Consumer sentiment (h) Volatility (i) Earnings-book ratio

Figure 5.4. QQ-Plot of the one step lagged residuals for factors 10-18 versus standardnormal pdf


(a) Dividend yield (b) Price-earnings ratio (c) Book value per share

(d) Price-dividend ratio (e) Earnings per share (f) Inflation

(g) Fed funds rate (h) Short term interest rate (i) Term spread short

Figure 5.5. One step lagged factors 1-9 versus returns on the equity premium, outliersmarked with a circle


(a) Term spread long (b) Credit spread (c) Producer price

(d) Industrial production (e) Personal income (f) Gross domestic product

(g) Consumer sentiment (h) Volatility (i) Earnings-book ratio

Figure 5.6. One step lagged factors 10-18 versus returns on the equity premium, outliersmarked with a circle

5.5 Forecasting by linear regression 51

5.5 Forecasting by linear regressionWhen forecasting time series data by using regression there are two different ap-proaches. The first possibility would be to estimate the regression equation usingall values of the dependent and the independent variables. When one wants totake a step ahead in time, forecasted values for the independent variables have tobe inserted into the regression equation. In order to do this one must clearly beable to forecast the independent variables, e.g. by assuming an underlying process,and one has mearly shifted the problem of forecasting the dependent variable toforecasting the independent variables.

The second possibility is to estimate the regression equation using lagged inde-pendent variables. If one wants to take one step ahead in time, then one would lagits independent variables one step. This is illustrated in table 5.6 where τ is thetime lag steps. By inserting the most recent, unused, observations of the indepen-dent variables in the regression equation you get a one step forecasted value forthe dependent variable. In fact, one could insert any of the unused observations ofthe independent variables since its already assumed that the regression equationholds over time. However, economically, it is common practise to use the mostrecent values since they probably contain more information about the future2. Itis the approach mentioned above that has been used in this thesis. Plots for theunivariate one step lagged regressions are found in figure 5.5 and figure 5.6.

Y Xi

yt ↔ xi,t−τyt−1 ↔ xi,t−τ−1yt−2 ↔ xi,t−τ−2...

...yt−N ↔ xi,t−τ−N

Table 5.6. Principle of lagging time series for forecasting

2This follows from the Efficient market hypothesis


When a time series is regressed on other time series that are lagged, informationis generally lost and resulting in smaller absolute values of R2, see table 5.7. Thisdoes not need to be the case, some times lagged predictors provide a better R2.This can be explained by, and observed in table 5.7, that it takes time for thesepredictors to have impact on the dependent variable. For instance a higher R2 in-sample would have been obtained for factor 15, GDP, if its time series would havebeen lagged one step. The realized change in GDP does a better job in forecastingthan in explaining that years equity premium.

Factor Time lag0 1 2 3 4 5

1 0.440 0.038 0.008 0.000 0.086 0.0002 0.075 0.000 0.009 0.000 0.033 0.0103 0.001 0.032 0.108 0.010 0.028 0.0104 0.416 0.024 0.014 0.000 0.075 0.0015 0.001 0.000 0.042 0.009 0.008 0.0086 0.180 0.013 0.006 0.016 0.027 0.0017 0.001 0.076 0.004 0.022 0.008 0.1198 0.000 0.045 0.004 0.010 0.004 0.0659 0.001 0.037 0.004 0.129 0.034 0.12810 0.003 0.008 0.011 0.008 0.000 0.02211 0.138 0.087 0.003 0.000 0.127 0.01412 0.180 0.020 0.012 0.006 0.032 0.01913 0.159 0.059 0.000 0.001 0.003 0.06014 0.030 0.096 0.052 0.035 0.058 0.04215 0.008 0.113 0.084 0.018 0.049 0.03016 0.305 0.000 0.010 0.001 0.030 0.00817 0.112 0.025 0.017 0.095 0.062 0.05918 0.000 0.005 0.117 0.003 0.002 0.002

Table 5.7. Lagged R2 for univariate regression with the equity premium as dependentvariable

Chapter 6

Implementation

In this chapter it is explained how the theory from the previous chapter is imple-mented and techniques and solutions are highlightened. All code is presented inthe appendix B.

6.1 OverviewThe theory covered in the previous chapters are implemented using Matlab. Tomake the program easy to use, a user interface in Excel is constructed. Figure 6.1describes the communication between Excel, VBA and Matlab.

Figure 6.1. Flowchart

53

54 Implementation

Figure 6.2. User interface

6.2 Linear predictionThe predictions are implemented using Matlabs backslash operator which solvesequation systems of the form y = βx. Depending on matrix conditions of y,x dif-ferent factorizations are made in the call y\x. If the dimensions are not matched,the call is executed by first performing a factorization and the least squares esti-mate of β is calculated. If the dimensions are matched, then β = y\x is computedby Gauss elimination. The backslash operator never computes explicit inverses.

The Jarque-Bera test, Durbin-Watson test and the QQ-plots are generated us-ing the following Matlab calls: jbtest,dwtest and qqplot.

In the multivariate prediction, permutations of the 18 factors are selected us-ing binary numbers from 1 to 218 where the ones symbolize factors included andthe zeros symbolize factors not included in the different models.

Surveys on the equity premium have shown that the large majority of profes-sionals believe that the the premium is confined to 2-13% [65]. Therefore, modelsyielding a negative value of the premium or a value exceeding the historical meanof the premium with 1.28σ, that corresponds to a 90% confidence interval, are notbeing used in the Bayesian model averaging and therefore do not influence thefinal premium estimate at all. Setting the upper bound to 1.28σ rules out premialarger than around 30%.

6.3 Bayesian model averaging 55

6.3 Bayesian model averagingThe Bayesian model averaging is straightforwardly implemented from the theo-retical expression for the likelihood given in section 4.6, where g is set to be thereciprocal of the number of samples. As can be seen in table 7.6, the three differentchoices of g lead almost to the same results. The difficulties with the implemen-tation lie within dealing with the large number of models, 218 ≈ 262000, in atime efficient manner. This problem has been solved by implementing a routinein C, called setSubColumn, that handles memory allocation more efficient whenworking with matrices close to the maximal allowed matrix size in Matlab. Thecode is supplied in the appendix B.

6.4 BacktestingSince the HEP sometimes is negative while we do not allow for negative valuesof the premium, traditional backtesting would not be a fair benchmark for theperformance of our prediction model. Instead we evaluate how good excess returnsare estimated by allowing for negative values. To further investigate the predictiveability of our forecasting, an R2-out-of-sample statistic is employed. The statisticis defined as

R2os = 1−

∑ni=1 (rt − rt)2∑ni=1 (rt − rt)2 , (6.1)

where rt is the fitted value from the predicitive regression estimated through t− 1and rt is the historical average return, also measured through t−1. If the statisticis positive, then the predicitive regression has lower average mean squared errorthan the historical average.1Therefore, the statistic can be used to determine if amodel has better predictive performance than applying the historical average.

A measure called hit ratio (HR) can be used as an indication of how good theforecast is at predicting the sign of the realized premium. It is simply the ratio ofhow many times the forecast has the right sign and the length of the investigatedtime period. For an investor this is of interest since the hit ratio can be used as abuy-sell signal on the underlying asset. In the case of the equity premium, this isa biased measure since the long-term average of the HEP is positive.

An interesting question is if the next years predicted value will be realized withinthe next coming business cycle, here approximated as five years and called forwardaverage. This value is calculated as a benchmark along with a five-year rolling av-erage, here called backward average. The results from the backtest is presented inthe results section.

1This statistics is further investigated by Campbell and Thomson [16]

Chapter 7

Results

In this chapter we present our forecasts of the equity premium along with theresults from the backtest.

7.1 Univariate forecastingIn figure 7.1 the historical equity premium is prolonged with the estimated equitypremia for five years ahead and plotted over time. The models used are univariateand hence each model consists of only one factor, being 18 models in total.

The figures for the forecasted premia is displayed in table 7.1. Models not belong-ing to the set specified in chapter 6 are not taken into consideration. In table 7.1the labels Prediction Range and Mean refer to the range of the predicted valuesand to the mean of these predicted values. Note that the Mean corresponds to theprior believes. ξk is the estimate of the premium using bayesian model averaging.The variance and a confidence interval for this estimate is also presented.

Timestep

Prediction Range Mean ξk Sk I0.90

Dec-08 0.00 - 16.0 3.69 4.20 15.27 0.58 - 7.83Dec-09 0.00 - 14.4 2.36 3.07 15.29 -0.60 - 6.74Dec-10 0.00 - 14.0 2.54 3.54 15.28 -0.17 - 7.24Dec-11 0.00 - 15.1 2.94 4.84 15.30 1.08 - 8.59Dec-12 0.00 - 8.9 3.36 4.05 15.34 0.25 - 7.85

Table 7.1. Forecasting statistics in percent

57

58 Results

Figure 7.1. The equity premium from the univariate forecasts

In table 7.2 the factors constituting the univariate model with highest probabilityover time is presented. The factors are further explained in chapter 5. Note thatthe prior assumption about model probabilities is 1/18 ≈ 5.5 percent for eachmodel.

Timestep

Factor Pr(Mi)

1 Gross Domestic Product 6.472 Gross Domestic Product 7.383 Terms Spread Short 8.194 Volatility 9.235 Terms Spread Short 6.96

Table 7.2. The univariate model with highest probability over time

Figure 7.2 shows how the likelihood function changes for different g-values foreach one step lagged predictor. Table 7.3 shows results from the backtest. TheR2os statistics shows that the univariate prediction model has better predictive

7.1 Univariate forecasting 59

Figure 7.2. Likelihood function values for different g-values

performance than applying the historical average for the period 1991 to 1999.The hit ratio statistics, HR, shows how often the univariate predictions have theright sign, that is, if the premium is positive or negative. Mind that we allow fornegative premium values when applying the HR statistics.

Pred. step 1 2 3 4 5R2os,uni 0.21 0.26 0.23 0.05 0.14

HRuni 0.6 0.2 0 0.6 0.2

Table 7.3. Out of sample, R2os,uni, and hit ratios, HRuni

60 Results

7.2 Multivariate forecastingThe corresponding results from multivariate predictions are presented below infigure 7.3. As in the univariate case, no negative values are allowed and the upperlimit from chapter 6 is used. In table 7.4 the labels Prediction Range and Meanrefer to the range of the predicted values and to the mean of these predicted values.Note that the Mean corresponds to the prior believes. ξk is the estimate of thepremium using Bayesian model averaging.

Figure 7.3. The equity premium from the multivariate forecasts

Timestep

Prediction Range Mean ξk Sk I0.90

Dec-08 0.00 - 21.4 3.18 7.72 16.6 3.79 - 11.7Dec-09 0.00 - 21.7 1.48 7.97 16.7 4.01 - 11.9Dec-10 0.00 - 21.4 5.07 10.4 16.6 6.45 - 14.3Dec-11 0.00 - 21.7 4.26 10.2 16.7 6.30 - 14.2Dec-12 0.00 - 16.0 0.58 3.74 17.7 -0.21 - 7.70

Table 7.4. Forecasting statistics in percent

7.2 Multivariate forecasting 61

Timestep

Factor Pr(Mi)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181 • • • • • 0.0012 • • • • • • • • 0.0023 • • • • • • • 0.00094 • • • • • • 0.0015 • • 0.003

Table 7.5. The multivariate model with highest probability over time

In table 7.5 the factors constituting the multivariate models with highest proba-bilities over time are presented. The factors are discussed in chapter 5. Note thatthe prior assumption about the model probabilities is 1/(218) ≈ 0.00038 percentfor each model.

Time horizon g = 1/n g = k1/(1+k)/n g = k/nDec-08 7.7236 7.7274 7.8047Dec-09 7.9769 7.9786 7.9509Dec-10 10.384 10.340 10.568Dec-11 10.248 10.251 10.344Dec-12 3.7434 3.7433 3.7688

Table 7.6. Forecasts for different g-values

Table 7.6 depicts how the predicted values are influenced by the three choices for g.In the univariate case, the three choices coincide. Table 7.7 shows results from thebacktest. The R2

os statistics shows that also the multivariate prediction model hasbetter predictive performance than applying the historical average for the period1991 to 1999. The hit ratio statistics, HR, shows how often the univariate pre-dictions have the right sign, that is, if the premium is positive or negative. Onceagain, we allow for negative premium values when applying the HR statistics.

Pred. step 1 2 3 4 5R2os,mv 0.23 -0.10 0.20 0.47 0.60

HRmv 0.6 0.4 0.6 0.8 0.6

Table 7.7. Out of sample, R2os,mv, and hit ratios, HRmv

62 Results

7.3 Results from the backtestIn figure 7.4 and 7.5 our forecasts are compared with a backward average, a forwardaverage and the HEP. An average of the forecasts are also compared to a forwardaverage. The backtest is explained in chapter 6.4 and further discussed in the nextchapter.

(a) Univariate backtest 1 year (b) Univariate backtest 2 year

(c) Univariate backtest 3 year (d) Univariate backtest 4 year

(e) Univariate backtest 5 year (f) 1991:1995 compared forward

Figure 7.4. Backtest of univariate models

7.3 Results from the backtest 63

(a) Multivariate backtest 1 year (b) Multivariate backtest 2 year

(c) Multivariate backtest 3 year (d) Multivariate backtest 4 year

(e) Multivariate backtest 5 year (f) 1991:1995 compared forward

Figure 7.5. Backtest of multivariate models

Chapter 8

Discussion of the Forecasting

In chapter 6.3 we specified the value of g to be used in this thesis as the reciprocalof the number of samples. For the sake of completeness, we have presented theoutcome of the two other values of g in table 7.6. Apparently, the chosen valueof g has most impact on the 1-year horizon forecast and a decreasing impact onthe other horizons. This can be explained by the rapid decreasing forecastingperformance of the covariance matrix for time lags above one which in turn canbe motivated by table 5.7 showing decreasing R2-values over time. In figure 7.2the principle appearance of the likelihood function for the factors and differentg-values can be seen. As explained earlier it is seen that increasing the value ofg gives models with good adaptation to data a higher likelihood, while setting gto zero yields the same likelihood for all models. For large g-values, only modelswith high degree of explanation will have impact in the BMA and you have greatconfidence in your data. On the other hand, a decrease of g allows for more un-certainty to be taken into account.

Turning to the model criterions formulated in chapter 2.7, it is found that most ofthe criteria are fulfilled. The equity premium over the five-year horizon is positive,due to our added constraints, however the confidence interval for the premiumincorporates the zero at some times.

The time variation criteria is not fulfilled in the sense that the regression linedoes not change considerably as new data points become available. The amountof used data is a tradeoff between stability and incorporating the latest trend. Theconflict lies in the confidence of predictors. To use many data samples improvepreciseness of the predictors but the greater the difference between the time to bepredicted and that of the oldest samples, the more doubtful are the implicationsof old samples.

The smoothness of the estimates over time is questionable, our five-years pre-diction in the univariate case are rather smooth whereas the multivariate forecastsexhibit greater fluctuations. Given the appearance of the realized equity premium

65

66 Discussion of the Forecasting

til December 2007, which is strongly volatile, and that a multivariate model canexplain more variance, it is reasonable that a multivariate model would generateresults more similar to the input data, just as can be observed in the multivariatecase, figure 7.3.

The time structure of the equity premium is not taken into consideration be-cause the one-year yield, serving as the riskfree asset, does not alone account forthe term structure.

Since all predictions suffer from an error it is important to be aware of the qual-ity of the predictions. Our precision estimate takes the misfit of the models intoaccount and therefore it says something about the uncertainty in our predictions.However, this precision does not say anything about the relevancy of using olddata to forecast future values.

From the R2-values in table 5.7 it can be seen that there are some predictiveability at hand, even though it is small. Another evidence of predictability is thedeviation of the prior probabilities to the posterior probabilities. If there were nopredictability at hand, why would then the prior probability be different from theposterior probability? The mean in table 7.1 and table 7.4 corresponds to usingthe prior believes that all models have the same probability, the BMA estimate isnever equal to the mean.

The univariate predictors with the highest probability in each time step, table7.2, also enters the models with highest probability in table 7.5, except for GDPwhich is not a member of the multivariate model for the first time step. Thiscan be summarized as the factors GDP, term spread short and volatility beingimportant in the forecast for the next five years.

Having seen evidence of predictive ability, the question is now to what extentit can be used to produce accurate forecasts.

Backtesting our approach is not trivial, mainly because we cannot access thehistorical expected premium. Nevertheless, backtesting has been performed bydoing a full five-year horizon forecast starting in each year between 1991 and 1995respectively and then comparing the point forecasts with the realized historicalequity premium for each year. Here, no restrictions are imposed on the forecasts,i.e. negative excess returns are allowed. The results are presented in figure 7.4 andfigure 7.5 where each plot corresponds to a time step (1, 2, 3, 4 or 5 years). Theseplots have also been complemented with the realized excess returns, as well as thefive-year backward and the five-year forward average. In figure 7.4 f and figure7.5 f , the arithmetic average of the full five-year horizon forecast is compared tothe five-year forward average.

The univariate backtest shows that the forecast intervals at most capture 2 outof 5 HEP:s, this at the one and two-year horizon. Otherwise, the forecasts tend

67

to be far too low in comparison with the HEP. The number of times the HEPintersects with the forecasted intervals at the most is 2 times, at the two-yearhorizon figure 7.4 b. In general, the univariate forecasts do not seem to be flexibleenough to fit the sometimes vast changes in the HEP and are far too low. Thebacktest has not provided us with any evidence of forecasting ability. However,when the forecast constraint is imposed, the predictive ability from 1991-1995 issuperior to using the historical average. This can be seen from the R2-statisticsin table 7.3. The four and five-year horizon forecasts, figure 7.4 d − e, captures2 out of 5 forward averages, whereas the one-year horizon captures 3 backwardaverages. In figure 7.4 f it can be seen that averaging the forecasts do not givea better estimate of the forward average. From table 7.3 it can be seen that thehit-ratios for the one and four-year horizon stand out with both scoring 60 %. Theresults from the univariate back-test have shown that the best forecasts were re-ceived for the one and four-year horizon, of which none has a good forecast quality.

The multivariate backtest shows little sign of forecasting ability for our model.The number of times the HEP intersects at most with the forecasted interval is 3out of 5 times. This happens at the three and four-year horizon, figure 7.5 c and d,these are also also the forecasts following the evolvement of the HEP most closely.The four-year forecast depicts the change of the HEP the best, being correct 3 outof 4 times, however never getting the actual figures correct. The two and four-yearforecast captures the forward average the best, 2 out of 5 forecasted intervals arerealized in average over the next 5 years. From figure 7.5 f , the only conclusionthat can be drawn is that averaging our forecast for each time step does not pro-vide a better estimate of the forward average. The R2-values in table 7.7 showsign of forecasting ability in comparison with the historical average at all timesteps except for the two-year horizon, with the four and five-year horizon forecastsstanding out. The most significant hit-ratio is 80%, at the four-year horizon. Inconclusion the back-testing in the multivariate case has shown that for the testperiod the best results in all terms have been received for the four and five-yearhorizon, in particular the four-year horizon.

Summing up the results from the univariate and multivariate back-test, it cannot be said that the quality of the multivariate forecasts outperforms the qualityof the univariate estimates when looking to the R2-values and hit-ratios. However,the multivariate forecasts as such depict the evolvement of the true excess returnsin a better way. Contrary to what one could believe, the one year horizon fore-casts do not generate better forecasts than the other horizons. In fact, the bestestimates are provided by the 4-year forecasts, both in the univariate and the mul-tivariate case. Still, we recommend using the one-year horizon forecasts because ithas the smallest time lag and therefore uses more recent data. Furthermore, theresult that the forecast power for multi factor models is better than for a forecastbased on the historical average is in line with Campbell and Thompson’s findings[16].

68 Discussion of the Forecasting

Part II

Using the Equity Premiumin Asset Allocation

69

Chapter 9

Portfolio Optimization

In modern portfolio theory it is assumed that expected returns and covariances areknown with certainty. Naturally, this is not the case in practise - the inputs haveto be estimated and with this follows estimation errors. Errors in the estimationshave great impact on the optimal allocation weights in a portfolio, therefore it isof great interest to have as accurate forecasts of the input parameters as possible,which has been dealt with in part I of this thesis. Even if you have good estimates ofthe input parameters, estimation errors will still be present, they are just smaller.In this chapter we discuss and present the impact of estimation errors in portfoliooptimization.

9.1 Solution of the Markowitz problemThe Markowitz problem is the foundation for single-period investment theory andrelates the trade-off between expected rate of return and variance of the rate ofreturn in a portfolio of risky assets. [52]

The model of Markowitz is assuming that investors are only concerned aboutthe mean, the variance and the correlation of the portfolio assets. A portfolio issaid to be “efficient” if there is no other portfolio with the same expected returnbut with a lower risk, or if there is no other portfolio with the same risk, but witha higher expected return. [54] An investor who seeks to minimize risk (standarddeviation) always chooses the portfolio with the smallest standard deviation for agiven mean, i.e. he is risk averse. An investor, who for a given standard deviationwants to maximize the expected return, is said to have the property nonsatiation.An investor being riskaverse and nonsatiated at the same time will always choose aportfolio on the efficient frontier, which is made up of the set of efficient portfolios.[52] The portfolio on the efficient frontier with the lowest standard deviation iscalled the minimum variance portfolio (MVP).

Given the number of assets n in the portfolio the other statistical properties ofthe Markowitz problem can be described by its average return µ ∈ Rn×1, the

71

72 Portfolio Optimization

covariance matrix C ∈ Rn×n and the asset weight w ∈ Rn×1. The mathematicalformulation of the Markowitz problem is now given as

minw

w>Cw

s.t. µ>w = µ

1>w = 1, (9.1)

where 1 is a column vector of ones. The first constraint says that the weightsand their corresponding returns have to equal the desired return level. The sec-ond constraint means that the weights have to add up to one. Note that in thisformulation, the signs of the weights are not restricted, short selling is allowed.

Following Zagst [66] the solution to problem (9.1) is given in theorem 9.1.

Theorem 9.1 (Solution of the Markowitz problem) IfC is positive definite,then according to theorem A.1, C is invertible and its inverse is also positive def-inite. Further, denote

• a = 1>C−1µ

• b = µ>C−1µ

• c = 1>C−11

• d = bc− a2.

The optimal solution of problem (9.1) is given as

w∗ = 1d

((cµ− a)C−1µ+ (b− aµ)C−11) (9.2)

withσ2(µ) = w>Cw∗ = cµ2 − 2aµ+ b

d. (9.3)

The minimum variance portfolio denoted with wMV P is given as

wMV P = 1cC−11 (9.4)

and is located at

(µMV P , σMV P ) = (ac,

√1c

). (9.5)

Finally, the minimum variance set is given as

µ = µMV P ±√d

c(σ2 − σ2

MV P ) (9.6)

where the positive case correspond to the efficient frontier, since it dominates thenegative case. σ2

MV P sets the lower bound for possible values on σ2.

9.1 Solution of the Markowitz problem 73

Proof :1 Since C−1 is positive definite it holds that

b = µ>C−1µ > 0 (9.7)

and also thatc = 1>C−11 > 0. (9.8)

With the scalar product2 〈1,µ〉 ≡ 1>C−1µ and the Cauchy-Schwarz inequality itfollows

〈1,µ〉2 = (1>C−1µ)2 = a2

≤ 〈1,1〉〈µ,µ〉 = (1>C−11)(µ>C−1µ) = bc

and for µ 6= k · 1, it follows that

d = bc− a2 > 0. (9.9)

Furthermore, the Lagrangian for problem (9.1) is given as

L(w,u) = 12w>Cw + u1(µ− µ>w) + u2(1− 1>w) (9.10)

where the objective function has been multiplied with the factor 12 for convenience

only. w∗ is optimal if there exists an u = (u1, u2)> ∈ R2 that satisfies the Kuhn-Tucker conditions

∂L

∂wi(w∗, u) =

n∑j=1

ci,jw∗j − u1ui − u2 = 0, ∀i (9.11)

∂L

∂u1(w∗, u) = µ− µ>w∗ = 0 (9.12)

∂L

∂u2(w∗, u) = 1− 1>w∗ = 0. (9.13)

(9.11) ⇔ Cw∗ = u1µ+ u21⇔ w∗ = u1C

−1µ+ u2C−11 (9.14)

(9.13)&(9.14)⇒ 1>w∗ = u11>C−1µ+ u21>C−11= au1 + cu2 = 1 (9.15)

(9.12)&(9.14)⇒ µ>w∗ = u1µ>C−1µ+ u2µ

>C−11= bu1 + au2 = µ (9.16)

(9.15)&(9.16)⇔(a cb a

)︸︷︷︸≡A

(u1u2

)︸︷︷︸≡u

=(

1µ.

)(9.17)

1Following [66]2see theorem A.1


Calculate the inverse of A as

A−1 = 1det(A)

(a −c−b a

)= 1

bc− a2

(a −c−b a

)= 1

d

(−a cb −a

), (9.18)

where d is greater than zero, see (9.9). Using (9.17) and (9.18) yields

u = A−1(

1µ

)= 1

d

(cµ− ab− aµ

)(9.19)

By inserting (9.19) into (9.14) equation (9.2),the optimal weights, are found:

w∗ = u1C−1µ+ u2C

−11

= 1d

((cµ− a)C−1µ+ (b− aµ)C−11). (9.20)

Equation (9.3) follows by

σ2(µ) = w>Cw∗

(9.11)︷︸︸︷= u1µ>w∗ + u21>w∗

(9.15)&(9.16)︷︸︸︷= u1µ+ u2 (9.21)(9.19)︷︸︸︷= 1

d((cµ− a)µ+ (b− aµ))

= cµ2 − 2aµ+ b

d(9.22)

which has its minimum for

∂σ2(µ)∂µ

= 1d

(2cµ− 2a) = 0

⇒ µMV P = a

c(9.23)

since the second partial derivative is positive

∂2σ2(µ)∂2µ

= 2cd

(9.8)&(9.9)︷︸︸︷> 0. (9.24)

9.1 Solution of the Markowitz problem 75

(9.23) and (9.3) results in

σMV P =√σ2(µMV P )

=√cµ2MV P − 2aµMV P + b

d

=√c(ac )2 − 2a(ac ) + b

d

=√

1c, (9.25)

where c is positive, see (9.8). Together with (9.23) this gives equation (9.5), thelocation of the minimum variance portfolio.

(µMV P , σMV P ) = (ac,

√1c

)

The weights of the minimum variance portfolio, equation (9.4) is found as follows

wMV P

(9.20)︷︸︸︷= 1d

((cµMV P − a)C−1µ+ (b− aµMV P )C−11)(9.23)︷︸︸︷= 1

d((c(a

c)− a)C−1µ+ (b− a(a

c))C−11)

(9.9)︷︸︸︷= 1cC−11. (9.26)

Finally, the efficient frontier in equation (9.6) is found by defining σ ≡ σ(µ)

σ2(9.22)︷︸︸︷= cµ2 − 2aµ+ b

d⇔

d

cσ2 = µ2 − 2a

cµ+ b

c

= (µ− a

c)2 − a2

c2+ b

c(9.9)&(9.23)︷︸︸︷= (µ− µMV P )2 + d

c

1c

(9.25)︷︸︸︷= (µ− µMV P )2 + d

cσ2MV P

⇔

(µ− µMV P )2 = d

c(σ2 − σ2

MV P )⇔

µ = µMV P ±√d

c(σ2 − σ2

MV P )


If shorting was not allowed, the constraint for positive portfolio weights wouldhave to be added to problem (9.1). The problem formulation would then be

minw

w>Cw

s.t. µ>w = µ

1>w = 1w ≥ 0. (9.27)

This optimization problem is quadratic just as problem (9.1), but in contrast it cannot be reduced to a set of linear equations due to the added inequality constraint.Instead, an iterative optimization method has to be used for finding the optimalweights. The problem is solved by making the call quadprog in Matlab. Thefunction solves quadratic optimization problems by using active set methods3.

9.2 Estimation error in Markowitz portfoliosThe estimated parameters, mean and covariance, used in Markowitz-based port-folio construction are often based on calculations on just one sample set from thereturn history. Input parameters derived from this sample set can only be ex-pected to equal the parameters of the true distribution if the sample is very largeand the distribution is stationary. If the distribution is non-stationary it couldbe advisable to instead use a smaller sample for estimating the parameters. Wenow can distinguish between two types of origination for the estimation error -stationary but too short data set or non-stationary data. [61] In this part of thethesis we will focus on estimation error originating from stationary but too shortdata sets.

Solving problem (9.27) for a given data set, where the means and covarianceshave been estimated on historical data, would generate portfolios that exhibitvery different allocation weights. Some assets tend to never enter the solution aswell. This is a natural result from solving the optimization problem - the assetswith very attractive features dominate the solution. It is also here the estimationerrors are likely to be large, which means that the impact of estimation errorson portfolio weights is maximized. [61] This is an undesired property of portfoliooptimization that has been known for a long time [56]. Since the input parametersare treated as if they were known with certainty, even very small changes in themwill trace out a new efficient frontier. The problem gets even worse as the numbersof assets increases because this increases the probability of outliers. [61]

3This is further explained in [35]

9.3 The method of portfolio resampling 77

9.3 The method of portfolio resamplingSection 9.2 presented the problems with estimation errors in portfolio optimiza-tion due to treating input parameters as certain. A Monte Carlo approach called“Portfolio Resampling” has been introduced by Michaud [56] to deal with this. Thebasic idea is to allow for uncertainty in the input parameters by sampling froma distribution with parameters specified by estimates on historical data. Fabozzi[26] has summarized the procedure and it is described below.

Algorithm 9.1 (Portfolio resampling)

1. Estimate the mean vector, µ, and covariance matrix, Σ, from historical data.

2. Draw T random samples from the multivariate distribution N(µ, Σ) toestimate µi and Σi.

3. Calculate an efficient frontier from the input parameters from step 2 over theinterval [σMV P,i, σMAX ] which is partitioned into M equally spaced points.Record the weights w1,i, . . . ,wM,i.

4. Repeat step 2 and 3 a total of I times.

5. Calculate the resampled portfolio weights as wM = 1I

∑Ii=1wM,i and evalu-

ate the resampled frontier with the mean vector and covariance matrix fromstep 1.

The number of draws T correspond to the uncertainty in the inputs you are us-ing. As the number of draws increases the dispersion decreases and the estimationerror, the difference between the original estimated input parameters and the sam-pled input parameters, will become smaller. [61] Typically, the value of T is setto the length of the historical data set [61] and the value of I is set between 100 to500 [26]. The number of portfolios M can be chosen freely according to how wellthe efficient frontier should be depicted.

The new resampled frontier will appear below the original one. This follows fromthe weights w1,i, . . . ,wM,i being optimal relative to µi and Σi but inefficient rela-tive to the original estimates µ and Σ. Therefore, the resampled portfolio weightsare also inefficient relative to µ and Σ. By the sampling and reestimation thatoccurs at each step in the portfolio resampling process, the effect of estimationerror is incorporated in the determination of the resampled portfolio weights. [26]


9.4 An example of portfolio resamplingA portfolio consisting of 8 different assets has been constructed. The assets are:a world commodity index; equity in the emerging markets, the US and Germany;bonds in the emerging markets, the US and Germany and finally a real estateindex. Their mean vector and covariance matrix has been estimated on data from2002-2006 and can be found in table 9.1.

TickerBloomberg

Asset Mean

CovarianceCmdty EQEM EQUS EQDE BDEM BDUS BDDE Estate

SPGCCITR Cmdty 0.57 0.21NDLEEGF EQEM 0.08 0.32 0.21INDU EQUS -0.05 0.17 0.18 0.05DAX EQDE -0.08 0.31 0.30 0.64 0.07JGENGLOG BDEM -0.01 0.01 0.00 -0.01 0.01 0.09JPMTUS BDUS 0.01 -0.03 -0.03 -0.08 0.01 0.03 0.06JPMTWG BDDE 0.01 -0.02 -0.02 -0.05 0.01 0.02 0.01 0.05G250PGLL Estate -0.05 0.12 0.10 0.13 0.01 -0.01 0.00 0.19 0.10

Table 9.1. Input parameters for portfolio resampling

With the input parameters from table 9.1 a portfolio resampling has been carriedout, with and without shorting allowed and always with both errors in the meanand covariances. In figure 9.1 the resampled efficient frontiers are depicted. Infigure 9.2 and 9.3 the portfolio allocations are found. Finally, the impact of errorsin the mean and in the covariances respectively are displayed in 9.4.

9.5 Discussion of portfolio resampling 79

9.5 Discussion of portfolio resampling

As discussed earlier the resampled frontier will plot below the efficient frontier,just as in figure 9.1 b. However, when shorting is allowed the resampled frontierwill coincide with the efficient frontier. Why is that? Estimation errors shouldresult in an increase in portfolio risk showing up as an increase in volatility foreach return level. Instead it can only be seen that the estimation errors result ina shortening of the frontier. The explanation given by Scherer [61] is that highlypositive returns will be offset by highly negative returns when drawing from theoriginal distribution. The quadratic programming optimizer will invest heavily inthe asset with highly positive returns and short the asset with highly negativereturns and this will be offset in average. When the long-only constraint is added,this will no longer be the case and the resampled frontier will plot below the effi-cient frontier, figure 9.1 b.

As a result of above, the resampled porfolio weights when shorting is allowedwill be pretty much the same as those in the efficient portfolios. Most of the assetsenter the solution in the same way, as depicted in figure 9.2 b. When shortingno longer is allowed, the resulting allocations are very concentrated to only someassets in the efficient portfolios and a small shift in desired return level can leadto rather different allocations, e.g. going from portfolio 6 to 7 in figure 9.3. Theresampled portfolios on the other hand exhibit a much more smooth transitionfrom different return levels and a greater diversification.

In the resampling, estimation errors have been assumed both in the means andcovariances. In figure 9.4 the effect of only estimation errors in the means or co-variances can be observed. It is found that estimation errors in the mean have amuch greater impact than estimation errors in covariances. A good forecast of themean will improve the resulting allocations a great deal.

The averaging in the portfolio resampling method makes the weights still sumto one, which is important. But averaging can sometimes prove to be misleading.For instance you will always face the probability that the allocation weights for agiven portfolio are heavily influenced of a few lucky draws making the asset lookmore attractive than what is justifiable. Averaging is indeed the main idea be-hind portfolio resampling, but it is not plausible that the final averaged portfolioweights are dependent on a few extreme outcomes. This is criticism discussed byScherer [61]. However, the most important criticism, also presented by Scherer[61], is that all resamplings are derived from the same mean vector and covariancematrix. Because the true distribution is unknown, all resampled portfolios sufferfrom the same deviation from the true parameters in pretty much the same way.Averaging will not help much in this case. Therefore it is fair to say that all port-folios inherit the same estimation error.

It is found by Michaud [56] that resampled portfolios beat Markowitz portfoliosout-of-sample. This follows from the fact that well diversified portfolios tend to


always beat Markowitz portfolios out-of-sample and can therefore not only be sub-scribed to the portfolio resampling method itself as being outstanding. Althoughthe resampling heuristic have some major drawbacks, it remains interesting sinceit is a first step of addressing estimation errors in portfolio optimization.


(a) shorting allowed

(b) no shorting allowed

Figure 9.1. Comparison of efficient and resampled frontier


(a) Resampled weights

(b) Mean-variance weights

Figure 9.2. Resampled portfolio allocation when shorting allowed


(a) Resampled weights

(b) Mean-variance weights

Figure 9.3. Resampled portfolio allocation when no shorting allowed


(a) Errors in mean

(b) Errors in covariance

Figure 9.4. Comparison of estimation error in mean and covariance

Chapter 10

Backtesting PortfolioPerformance

In the first part of this thesis we developed a method for forecasting the equitypremium that took model uncertainty into account. It was found that our forecastoutperformed the use of an historical average but was associated with estimationerrors. In the previous chapter we presented portfolio resampling as a method fordealing with these errors. In this chapter we will evaluate if portfolio resamplingcan be used to improve our forecasting results.

10.1 Backtesting setup and resultsWe benchmark the performance of a portfolio consisting of all the assets found intable 9.1, except for equity and bonds from emerging markets, using our forecastedequity premium and portfolio resampling. For the two assets in emerging marketswe had too short time series.

Starting in the end of 1998 and going to the end of 2007 we solve problem (9.27)and rebalance the portfolio at the end of each year. We do not allow for short-selling since it previously was found that portfolio resampling only has effect underthe long-only constraint. Transaction costs are not taken into account, since ourconcern is the relative performance of the methods. The returns vector, µ, is fore-casted using the arithmetic average of the returns up to time t for asset i exceptfor equity US where we make use of our one year multivariate forecasted equitypremium for time t. The parameter µ is set so that each portfolio has a volatility of√

0.02 ≈ 14% when rebalanced. The covariance matrix is always estimated on allreturns available up to time t. The resulting portfolio value over time is found infigure 10.1 and in table 10.1 the corresponding returns are found. In table 10.2 theexact portfolio values on the end date for ten resampling simulations are presented.

85

86 Backtesting Portfolio Performance

Figure 10.1. Portfolio value over time using different strategies

It is found that using our premium forecasts as input yields better performancethan just employing the historical average1. Our forecast consistently generatesthe highest portfolio value. As explained earlier, using accurate inputs in portfoliooptimization is very important.

Date EEP EEP&PR aHEP aHEP&PRDec-99 33.4 32.8 24.4 27.2Dec-00 -3.6 -2.6 -2.8 -1.2Dec-01 -17.1 -18.3 -17.1 -18.9Dec-02 -16.8 -16.8 -23.3 -19.8Dec-03 22.0 24.3 19.0 23.0Dec-04 3.4 7.6 7.0 9.8Dec-05 18.9 20.2 20.8 21.0Dec-06 6.9 5.9 6.7 6.3Dec-07 20.7 17.6 20.3 19.4

Table 10.1. Portfolio returns in percent over time. PR is the acronym for portfolioresampling.

1For the asset equity US, the historical arithmetic average is refered to as aHEP.

10.1 Backtesting setup and results 87

EEP EEP&PR aHEP aHEP&PR1.716 1.701 1.520 1.731

1.765 1.6711.750 1.7131.700 1.7171.785 1.7281.768 1.6721.750 1.7551.790 1.7301.767 1.6751.766 1.736

Average: 1.754 1.713

Table 10.2. Terminal portfolio value. PR is the acronym for portfolio resampling.

Portfolio resampling seems to improve performance if the input is very uncertain,such as the aHEP. Resampling increases the portfolio return on an average of al-most 20 percentage units for the aHEP, but only about 4 percentage units for theEEP. As seen in table 10.2, resampling generated a higher terminal value ten outof ten times for the aHEP, whilst for the EEP resampling sometimes generateda lower terminal portfolio value. This could point to that resampling indeed isuseful when the input parameters are uncertain, since the portfolio weights getsmoothened and more assets enter the solution and creates a more diversifiedportfolio. According to Michaud [56] well diversified portfolios, e.g. obtained byresampling, should outperform Markowitz portfolios out-of-sample, just as foundhere. The pure EEP and aHEP portfolios are both outperformed by their resam-pled counterparts. The rather small increase in portfolio return when resamplingusing the EEP as input compared to using the aHEP, points to the EEP containingsmaller estimation errors than the aHEP. This is also supported by the positiveR2os,mv found in section 7.2.

In this backtest we find evidence that our multivariate forecast performs betterthan the arithmetic average when used as input in a mean-variance asset allocationproblem. Portfolio resampling is also found to provide a good way of arriving atmeaningful asset allocations when the input parameters are very noisy.

Chapter 11

Conclusions

In this thesis we incorporate model uncertainty in the forecasting of the expectedequity premium by creating a large number of linear prediction models on whichwe apply Bayesian model averaging. We also investigate the general impact of in-put estimation errors in mean-variance optimization and evaluate the performanceof a Monte Carlo based heuristic called portfolio resampling.

It is found that the forecasting ability of multi factor models is not substantiallyimproved by our approach. Our interpretation thereof is that the largest problemwith multifactor models is not model uncertainty, but rather too low predictiveability.

Further, our investigation brings evidence that the GDP, the short term spreadand the volatility are useful in forecasting the expected equity premium for the fiveyears to come. Our investigations also show that multivariate models are to someextent better than univariate models, but it can not be said that any of them isaccurate in predicting the expected equity premium. Nevertheless, it is likely thatboth provide better forecasts than using the arithmetic average of the historicalequity premium.

We have also found that portfolio resampling provides a good way to arrive atmeaningful allocation decisions when the optimization inputs are very noisy.

Our proposal to further work is to investigate whether a Bayesian analysis, not in-volving linear regression, with carefully selected priors, calibrated to reflect mean-ingful economic information, provides better predictions for the expected equitypremium than the approach used in this thesis.

89

Bibliography

[1] Ang A. & Bekaert G., (2003), Stock return predictability: is it there?, Work-ing Paper, University of Columbia.

[2] Avramov D., (2002), Stock return predictability and model uncertainty, Jour-nal of Financial Economics, vol. 64, pp. 423-458.

[3] Baker M. &Wurgler J., (2000), The Equity Share in New Issues and AggregateStock Returns, Journal of Finance, American Finance Association, vol. 55(5),pp. 2219-2257.

[4] Benning J. F., (2007), Trading Strategies for Capital Markets, McGraw-Hill,New York.

[5] Bernardo J. M. & Smith A., (1994), Bayesian Theory, John Wiley & SonsLtd.

[6] Bostock P., (2004), The Equity Premium, Journal of Portfolio Managementvol. 30(2), pp. 104-111.

[7] Brealey R. A., Myers S. C. & Allen F., (2006),Corporate Finance, McGraw-Hill, New York.



[10] Burda M. & Wyplosz C., (1997), Macroeconomics: A European text, OxfordUniversity Press, New York.

[11] Campbell J. Y., Lo A. &MacKinlay A., (1997), The Econometrics of FinancialMarkets, Princeton University Press.

[12] Campbell J. Y. & Shiller R. J., (1988) The dividend-price ratio and expecta-tions of future dividends and discount factors, Review of Financial Studies,vol. 1, pp. 195-228.

[13] Campbell J. Y. & Shiller R. J., (1988) Stock prices, earnings, and expecteddividends, Journal of Finance, vol. 43, pp. 661-676.

91

92 Bibliography

[14] Campbell J. Y. & Shiller R. J., (1998) Valuation ratios and the long-run stockmarket outlook, Journal of Portfolio Management, vol. 24, pp. 11-26.

[15] Campbell, J. Y., (1987), Stock returns and the term structure, Journal ofFinancial Economics, vol. 18, pp. 373-399.

[16] Campbell J. & Thompson S., (2005), Predicting the Equity Premium Out ofSample: Can Anything Beat the Historical Average?, NBER Working Papers11468, National Bureau of Economic Research.

[17] Casella G. & Berger R. L., (2002), Statistical Inference, 2nd ed. DuxburyPress.

[18] Choudhry M., (2006), Bonds - A concise guide for investors, Palgrave Macmil-lan, New York.

[19] Cohen R.B., Polk C. & Vuolteenaho T., (2005), Inflation Illusion in the StockMarket: The Modigliani-Cohn Hypothesis, Quarterly Journal of Economics,vol. 120, pp. 639-668.

[20] Dalén J., (2001), The Swedish Consumer Price Index - A Handbook of Meth-ods, Statistiska Centralbyrån, SCB-Tryck, Örebro.

[21] Damodaran A., (2006), Damodaran on Valuation, John Wiley & Sons, NewYork.

[22] Dimson E., Marsh P. & Staunton M., (2006), The Worldwide Equity Pre-mium: A Smaller Puzzle, SSRN Working Paper No. 891620.

[23] Durbin J. & Watson G.S., (1950), Testing for Serial Correlation in LeastSquares Regression I, Biometrika vol. 37, pp. 409-428.

[24] Escobar L. A. & Meeker W. Q., (2000), The Asymptotic Equivalence of theFisher Information Matrices for Type I and Type II Censored Data fromLocation-Scale Families., Working Paper.

[25] Estrella A. & Trubin M. R., (2006), The Yield Curve as a Leading Indicator:Some Practical Issues, Current Issues in Economics and Finance - FederalReserve Bank of New York, vol. 12(5).

[26] Fabozzi F. J., Focardi S. M. & Kolm P. N., (2006), Financial Modeling of theEquity Market, John Wiley & Sons, New Jersey.

[27] Fama E.F., (1981), Stock returns, real activity, inflation and money, AmericanEconomic Review, pp. 545-565.

[28] Fama E. F. & French K. R., (1988), Dividend yields and expected stockreturns, Journal of Financial Economics, vol. 22, pp. 3-25.

[29] Fama E. F. & French K. R., (1989), Business conditions and expected returnson stocks and bonds, Journal of Financial Economics, vol. 25, pp. 23-49.

Bibliography 93

[30] Fama E.F. & Schwert G.W., (1977), Asset Returns and Inflation, Journal ofFinancial Economics, vol. 5(2), pp. 115-46.

[31] The Federal Reserve, Industrial production and capacity utilization, (2007),Retrieved February 12, 2008 fromhttp://www.federalreserve.gov/releases/g17/20071214/

[32] Fernández P., (2006), Equity Premium: Historical, Expected, Required andImplied, IESE Business School, Madrid.

[33] Fernandéz C., Ley E. & Steel M., (1998), Benchmark priors for BayesianModel Averaging, Working Paper.

[34] Franke J., Härdle W.K. & Hafner C.M., (2008), Statistics of Financial MarketsAn Introduction, Springer-Verlag, Berlin Heidelberg.

[35] Gill P. E. & Murray W., (1981), Practical Optimization, Academic Press,London.

[36] Golub G. & Van Loan C., (1996), Matrix Computations, The Johns HopkinsUniversity Press, Baltimore.

[37] Goyal A. & Welch I., (2006), A Comprehensive Look at the Empirical Per-formance of Equity Premium Prediction, Review of Financial Studies, forth-coming.

[38] Hamilton J. D., (1994), Time Series Analysis, Princeton University Press.

[39] Harrell F. E., (2001), Regression Modeling Strategies, Springer-Verlag, NewYork.

[40] Hodrick R. J., (1992), Dividend yields and expected stock returns: alternativeprocedures for inference and measurement, Review of Financial Studies, vol.5(3), pp. 257-286.

[41] Hoeting J. A., Madigan D. & Raftery A. E. & Volinsky C. T., (1999), BayesianModel Averaging: A Tutorial, Statistical Science 1999, vol. 14(4), pp. 382-417.

[42] Ibbotson Associates, (2006), Stocks, Bonds, Bills and Inflation, ValuationEdition, 2006 Yearbook.

[43] Keim D. B. & Stambaugh R. F., (1986), Predicting returns in the stock andbond markets, Journal of Financial Economics, vol. 17(2), pp. 357-390.

[44] Kennedy P. E., (2000), Macroeconomic Essentials - Understanding Economicsin the News, The MIT Press, Cambridge.

[45] Koller T. & Goedhart M. & Wessels D., (2005), Valuation: Measuring andManaging the Value of Companies, McKinsey & Company, Inc. Wiley.

[46] Kothari S. P. & Shanken J., (1997), Book-to-market, dividend yield, and ex-pected market returns: a time series analysis, Journal of Financial Economics,vol. 44, pp. 169-203.

94 Bibliography

[47] Krainer J., What Determines the Credit Spread?, (2004), FRBSF EconomicLetter, Nr 2004-36.

[48] Lamont O., (1998), Earnings and expected returns, Journal of Finance, vol.53, pp.1563-1587.

[49] Lee P. M., (2004), Bayesian Statistics an introduction, Oxford UniversityPress.

[50] Lettau M. & Ludvigson, (2001), Consumption, aggregate wealth and expectedstock returns, Journal of Finance, vol. 56(3), pp. 815-849.

[51] Lewellen J., (2004), Predicting returns with financial ratios, working paper.

[52] Luenberger D. G., (1998), Investment Science, Oxford University Press, NewYork.

[53] Mankiw G. N., (2002), Macroeconomics, Worth Publishers, New York.

[54] Mayer B., (2007), Credit as an Asset Class, Masters Thesis, TU Munich.

[55] Merton R. C., (1980), On Estimating the Expected Return on the Market:An Exploratory Investigation, Journal of Financial Economics, vol. 8, pp.323-361.

[56] Michaud R., (1998), Efficient Asset Management: A Practical Guide to StockPortfolio Optimization and Asset Allocation, Oxford University Press, NewYork.

[57] Polk C., Thompson S, & Vuolteenaho T, (2005), Cross-sectional forecasts ofthe equity premium, Journal of Financial Economics, vol. 81(1), pp. 101-141.

[58] Pontiff J. & Schall L. D., (1998), Book-to-market ratios as predictors of marketreturns, Journal of Financial Economics, vol. 49, pp. 141-160.

[59] Press J. S., (1972), Applied Multivariate Analysis, Holt, Rinehart & WinstonInc, University of Chicago.

[60] Rozeff M., (1984), Dividend yields are equity risk premiums, Journal of Port-folio Management, vol. 11, pp. 68-75.

[61] Scherer B., (2004), Portfolio Construction and Risk Budgeting, Risk Books,Incisive Financial Publishing Ltd.

[62] University of Michigan, Surveys of consumers, Retrieved February 9, 2008from http://www.sca.isr.umich.edu/

[63] U.S. Department of Labor, Glossary, Retrieved February 5, 2008 fromhttp://www.bls.gov/bls/glossary.htm#P

[64] Vaihekoski M., (2005), Estimating Equity Risk Premium: Case Finland,Lappeenranta University of Technology, Working paper.

Bibliography 95

[65] Welch, I., (2000),Views of Financial Economists on the Equity Premium andon Professional Controversies, Journal of Business, vol. 73(4), pp. 501-537

[66] Zagst R., (2004), Lecture Notes - Asset Pricing, TU Munich.

[67] Zagst, R. & Pöschik M., (2007), Inverse Portfolio Optimization under Con-straints, Working Paper.

[68] Zellner A., (1986), On assessing prior distributions and bayesian regressionanalysis with g-prior distributions, in Essays in Honor of Bruno de Finetti,eds P.K. Goel and A. Zellner, Amsterdam: North-Holland, pp. 233-243.

Appendix A

Mathematical Preliminaries

A.1 Statistical definitionsDefinition A.1 (Bias) Let θ be a sample estimate of a vector of parameters θ.For example, θ could be the sample mean x. The estimate is then said to beunbiased if E[θ] = θ, (see [38]).

Definition A.2 (Stochastic process) A stochastic process Xt, t ∈ Z, is a fam-ily of random variables, defined in a probability space (Ω,F , P ).

At a specific time point t, Xt is a random variable with a specific density function.Given a specific w ∈ Ω, X(ω) = Xt(ω, t ∈ Z) is a realization or a path of theprocess, (see [34]).

Definition A.3 (Autocovariance function) The autocovariance function of astochastic process Xt is defined as

γ(t, τ) = E[(Xt − µt)(Xt−τ − µt−τ )], ∀τ ∈ Z

The autocovariance function is symmetric, that is, γ(t−τ,−τ) = γ(t, τ). In generalγ(t, τ) is dependent on t as well as on τ . Below we define the important concept ofstationarity, which many times will simplify autocovariance functions, (see [34]).

Definition A.4 (Stationarity) A stochastic process Xt is covariance stationaryif

E[Xt] = µ and γ(t, τ) = γ(τ), ∀t.

A stochastic process Xt is strictly stationary if for any t1, . . . , tn and for all n, s ∈ Zit holds that the joint distribution

Ft1,...,tn(x1, . . . , xn) = Ft1+s,...,tn+s(x1, . . . , xn).

For covariance stationary processes, the term weakly stationary is often used, (see[34]).

97

98 Mathematical Preliminaries

Definition A.5 (Trace of a matrix) The trace of an matrix A ∈ Rn×n is de-fined as the sum of the elements along the diagonal

tr(A) = a11 + a22 + · · ·+ ann, (see[59]).

Definition A.6 (The gamma function) The gamma function can be definedas the definite integral

Γ(x) =∞∫

0

t(x−1)e−tdt

where x ∈ R and x > 0, (see [59]).

Definition A.7 (Positive definite matrix) A symmetric matrix A ∈ Rn×n iscalled positive definite if

x>Ax > 0, ∀x 6= 0 ∈ Rn, (see[34]).

Theorem A.1 (Properties of positive definite matrices) IfA is positive def-inite it defines an inner product on Rn as

〈x,y〉 = x>Ay.

In particular, the standard inner product for Rn is obtained when setting A = I.Furthermore, A has only positive eigenvalues λi and is invertible and its inverseis also positive definite.

Proof : (see [36], [59])

A.2 Statistical distributionsDefinition A.8 (The normal distribution) The variable Y has a Gaussian,or normal, distribution with mean µ and variance σ2 if

fY = 1√2πσ

exp[−(yt − µ)2

2σ2

].

Definition A.9 (The Chi-Squared distribution) The probability density forthe χ2-distribution with v degrees of freedom is given by

pv(x) = xv/2−1 exp[−x/2]Γ(v/2)2v/2

.

A.2 Statistical distributions 99

Definition A.10 (The multivariate normal distribution) Let x ∈ Rp×1 bea random vector with density function f(x). x is said to follow a multivariatenormal distribution with mean vector θ ∈ Rp×1 and covariance matrix Σ ∈ Rp×pif

f(x) = 1(2π)p/2|Σ|1/2 exp[− 1

2 (x− θ)>Σ−1(x− θ)].

If |Σ| = 0 the distribution of x is called degenerate and does not exist.

The inverted Wishart distribution is the multivariate generalization of the uni-variate inverted gamma distribution. It is the distribution of the inverse of arandom matrix following the Wishart distribution, and is the distribution whichis natural conjugate prior for the covariance matrix in a normal distribution.

Definition A.11 (The inverted Wishart distribution) Let U ∈ Rp×p be arandom matrix following the inverted Wishart distribution with positive definitematrix G and n degrees of freedom. Then for n > 2p, the density of U is given by

p(U) = c0|G|(n−p−1)/2

|U|(n/2) exp[− 12 tr[U−1G]]

and p(U) = 0 otherwise. The constant c0 is given by

c−10 = 2(n−p−1)p/2πp(p−1)/4∏p

j=1 Γ(n−p−j2 ).

Appendix B

Code

B.1 Univariate predictions%input[dates,values]=loadThesisData_LongDataSet(false); [dates, returns,differ] = calcFactors_LongDataSet(dates, values);eqp=returns(1:end,1); %this is the equity premiumreturns=returns(1:end,2:end);

muci=[]; predRng=[]; allEst=[]; prob_model=[]; outliersStep=[];

%prediction horizonhorizon=5;

for k=1:horizon

y_bma=[];x_bma=[];res=[];est=[];removedModels=[];usedModels=[];outliers=0;

for j=1:length(returns(1,:))[x, y, est_tmp, beta, resVec, outliersTmp]=predictClean(eqp(k+1:end)...

,returns(1:end-k,j),returns(end,j));

res = [res resVec];est = [est est_tmp];

y_bma=[y_bma y];x_bma=[x_bma x];

n=length(x(:,1));p=length(x(1,:));g=1/n;

if (est(j) > 0.0) && est(j)<mean(eqp(k+1:end))+1.28*rlstd(eqp(k+1:end))P=x*inv(x’*x)*x’;likelihood(j)=(gamma(n/2)/((2*pi)^(n/2))/((1+g)^(p/2)))...

*(y’*y-(g/(1+g))*y’*P*y)^(-n/2);usedModels = [usedModels j];

elselikelihood(j)=0;

100

B.2 Multivariate predictions 101

removedModels = [removedModels j];est(j)=0;usedModels = [usedModels j];

endoutliers = outliers + outliersTmp;

endoutliersStep=[outliersStep outliers];usedModelsBMA = usedModels*2-1;p_model=likelihood./sum(likelihood);weightedAvg =p_model*est’;prob_model=[prob_model p_model’];predRng = [predRng; 100*min(est) 100*max(est) 100*mean(est)];allEst = [allEst est’];VARyhat_data=zeros(length(res(:,1)),length(res(:,1)));

for i = 1:length(returns(1,:))VARyhat_data = VARyhat_data +(diag(res(:,i))*x_bma(:,i*2-1:i*2)...*inv(x_bma(:,i*2-1:i*2)’*x_bma(:,i*2-1:i*2))*x_bma(:,i*2-1:i*2)’...+y_bma(:,i)*y_bma(:,i)’)*prob_model(i)-(y_bma(:,i)*prob_model(i))...*(y_bma(:,i)*prob_model(i))’;

endSTD_step(k) = sqrt(sum(diag(VARyhat_data))/length(diag(VARyhat_data)));z=norminv([0.05 0.95],0,1);muci=[muci; weightedAvg+z(1)*STD_step(k)/sqrt(length(res(:,1)))...

weightedAvg weightedAvg+z(2)*STD_step(k)/sqrt(length(res(:,1)))];

end

B.2 Multivariate predictions[dates,values]=loadThesisData_LongDataSet(false);

%input[dates, returns, differ] = calcFactors_LongDataSet(dates, values);eqp=returns(:,1); regressor=returns(:,2:end);numFactor=length(regressor(1,:)); numOfModel=2^numFactor;

horizon=5; %prediction horizoncomb=combinations(numFactor);prob_model=zeros(numOfModel-1,horizon);likelihood=zeros(numOfModel-1,1); tmp=zeros(numOfModel-1,1);usedModels=zeros(1,horizon); predRng=zeros(3,horizon);y_bma=zeros(length(returns),horizon);res=zeros(length(eqp)-1,numOfModel-1); toto = ones(length(eqp),1);r=zeros(1,horizon); allMag=[]; muci=[]; VARyhat_data=[];

for k=1:horizonfor i=1:numOfModel-1

%pick a modelL=length(regressor(:,1));out=[comb(i,1)*ones(L,1) comb(i,2)*ones(L,1) comb(i,3)*ones(L,1)...

comb(i,4)*ones(L,1) comb(i,5)*ones(L,1) comb(i,6)*ones(L,1)...comb(i,7)*ones(L,1) comb(i,8)*ones(L,1) comb(i,9)*ones(L,1)...comb(i,10)*ones(L,1) comb(i,11)*ones(L,1) comb(i,12)*ones(L,1)...comb(i,13)*ones(L,1) comb(i,14)*ones(L,1) comb(i,15)*ones(L,1)...comb(i,16)*ones(L,1) comb(i,17)*ones(L,1) comb(i,18)*ones(L,1)];

output=out.*regressor;modRegr = output(:,not(all(output(:,1:size(output,2))== 0)));

%predictions[x, y, est_tmp, beta, resVec, outliersTmp]=predictClean(eqp(k+1:end)...

,modRegr(1:end-k,:),modRegr(end,:));

102 Code

if (est_tmp>0)&&(est_tmp<(mean(eqp(k+1:end))+1.28*sqrt(var(eqp(k+1:end)))))tmp(i)=est_tmp;%calculate likelihoodn=length(x(:,1));p=length(x(1,:));g=p^(1/(1+p))/n;P=x*inv(x’*x)*x’;likelihood(i)=(gamma(n/2)/((2*pi)^(n/2))/((1+g)^(p/2)))...

*(y’*y-(g/(1+g))*y’*P*y)^(-n/2);else

likelihood(i)=0;tmp(i)=0;r(k)=r(k)+1;

endsetsubColumn(k+1,size(res,1),i,resVec,res);

end

%bmap_model=likelihood./sum(likelihood);magnitude=p_model’*tmp;prob_model(:,k)=p_model;predRng(:,k)=[min(tmp); max(tmp); mean(tmp)];allMag=[allMag magnitude];y_bma(k+1:end,k)=y;

%Compute variance and confidence interval%Instead of storing all models, create them againVARyhat_data=zeros(length(y_bma(k+1:end,k)));for i=1:numOfModel-1

%pick a modelL=length(regressor(:,1));out=[comb(i,1)*ones(L,1) comb(i,2)*ones(L,1) comb(i,3)*ones(L,1)...

comb(i,4)*ones(L,1) comb(i,5)*ones(L,1) comb(i,6)*ones(L,1)...comb(i,7)*ones(L,1) comb(i,8)*ones(L,1) comb(i,9)*ones(L,1)...comb(i,10)*ones(L,1) comb(i,11)*ones(L,1) comb(i,12)*ones(L,1)...comb(i,13)*ones(L,1) comb(i,14)*ones(L,1) comb(i,15)*ones(L,1)...comb(i,16)*ones(L,1) comb(i,17)*ones(L,1) comb(i,18)*ones(L,1)];

output=out.*regressor;modRegr = output(:,not(all(output(:,1:size(output,2))== 0)));modRegr = output(:,not(all(output(:,1:size(output,2))== 0)));modRegr = [modRegr(1:end-k,:) ones(length(modRegr(1:end-k,:)),1)];%intercept added

VARyhat_data = VARyhat_data + (diag(res(k:end,i))*modRegr*inv(modRegr’...*modRegr)*modRegr’+y_bma(k+1:end,k)*y_bma(k+1:end,k)’)...*prob_model(i)-(y_bma(k+1:end,k)*prob_model(i))...*(y_bma(k+1:end,k)*prob_model(i))’;

end

STD_step(k) = sqrt(sum(diag(VARyhat_data))/(length(diag(VARyhat_data))));

z=norminv([0.05 0.95],0,1);muci=[muci; allMag(k)+z(1)*STD_step(k)/sqrt(length(res(:,1)))...

allMag(k) allMag(k)+z(2)*STD_step(k)/sqrt(length(res(:,1)))];

end

B.3 Merge time series 103

B.3 Merge time seriesDeveloped by Jörgen Blomvall, Linköping Institute of Technology

function [mergedDates, values] = mergeExcelData(sheetNames, data)mergedDates = datenum(’30-Dec-1899’) + data1(:,1);mergedDates(find(isnan(data1(:,2)))) = []; length(sheetNames); fori = 2:length(sheetNames)

nMerged = length(mergedDates);dates = datenum(’30-Dec-1899’) + datai(:,1);newDates = zeros(size(mergedDates));

for j = 1:nMergedwhile (dates(k) < mergedDates(j) && k < length(dates))

k = k+1;endif (dates(k) == mergedDates(j) && ~isnan(datai(k,2)))

n = n+1;newDates(n) = mergedDates(j);

endendmergedDates = newDates(1:n);

end

values = zeros(n, length(sheetNames));

for i = 1:length(sheetNames)dates = datenum(’30-Dec-1899’) + datai(:,1);k = 1;for j = 1:n

while (dates(k) < mergedDates(j) && k < length(dates))k = k+1;

endif (dates(k) == mergedDates(j))

values(j,i) = datai(k,2);else

error = 1end

endend

B.4 Load data into Matlab from ExcelDeveloped by Jörgen Blomvall, Linköping Institute of Technology

function [dates, values] = loadThesisData(interpolate)

%[status, sheetNames] = xlsfinfo(’test_merge.xls’); % Do not work for all% Matlab versions

sheetNames = ’DJtech’ ’WoMat’ ’ConsDisc’ ’EnergySec’ ’ConStap’’Health’...

’Util’ ’sp1500’ ’sp500’ ’spEarnYld’ ’spMktCap’ ’spPERat’ ’spDaiNetDiv’...’spIndxPxBook’ ’spIndxAdjPe’ ’spEqDvdYi12m’ ’spGenPERat’ ’spPrice’...’spMovAvg200’ ’spVol90d’ ’MoodCAA’ ’MoodBAA’ ’tresBill3m’ ’USgenTBill1M’...’GovtYield10Y’ ’CPI’ ’PCECYOY’;

for i = 1:length(sheetNames)datai = xlsread(’runEqPred.xls’, char(sheetNames(i)));

end

if interpolate[dates, values] = mergeInterpolExcelData(sheetNames, data);

else[dates, values] = mergeExcelData(sheetNames, data);

end

104 Code

B.5 Permutations

function out = combinations(k);

total_num = 2^k; indicator = zeros(total_num,k); for i = 1:k;temp_ones = ones( total_num/( 2^i),2^(i-1) );temp_zeros = zeros( total_num/(2^i),2^(i-1) );x_temp = [temp_ones; temp_zeros];indicator(:,i) = reshape(x_temp,total_num,1);

end;

out = indicator;

B.6 Removal of outliers and linear predictionfunction [x, y, est, beta ,resVec, outliers]=predictClean(y,x,lastVal)

%remove outliersxTmp=[]; outliers=0; for i=1:length(x(1,:)) xVec=x(:,i);for k=1:3 %nr of iterations for finding outliers

H_hat=xVec*inv(xVec’*xVec)*xVec’;Y=H_hat*y;index=find(abs(Y-mean(Y))>3*rlstd(Y));outliers=outliers+length(index);for j=1:length(index)

if index(j)~= length(y)xVec(index(j))= 0.5*xVec(index(j)+1)+0.5*xVec(index(j)-1);

elsexVec(index(j))=0.5*xVec(index(j)-1)+0.5*xVec(index(j));

endend

end xTmp = [xTmp xVec]; end x=xTmp;

%OLSx=[ones(length(x),1) x]; %adding interceptbeta=x\y; % OLSest=[1 lastVal]*beta; %predicted valueresVec=(y-x*beta).^2; %residual vector

B.7 setSubColumn#include "mex.h"

void mexFunction(int nlhs, mxArray *plhs[ ],int nrhs, const mxArray*prhs[ ]) int j; double *output;double *src; double *dest; double *iStart, *iEnd, *col;

iStart = mxGetPr(prhs[0]); iEnd = mxGetPr(prhs[1]); col =mxGetPr(prhs[2]);

src = mxGetPr(prhs[3]); dest = mxGetPr(prhs[4]);

//mexPrintf("%d\n", (int)col[0]*mxGetM(prhs[4])+(int)iStart[0]-1);

/* Populate the output */memcpy(&(dest[((int)col[0]-1)*mxGetM(prhs[4])+(int)iStart[0]-1]),...\\src, (int)(iEnd[0]-iStart[0]+1)*sizeof(double));

B.8 Portfolio resampling 105

B.8 Portfolio resampling

% Load Data & Set Parameters[dates,values]=loadThesisData_Resampling4(false);

volDesired = 0.02; nrAssets=6; T=17; I=200; nrPortfolios=30;errMean=true; errCov=true;

normPort=false; resampPort=true; stocksNr = [1 2 3 4 5 6];

EQP=[0.1417 0.1148 0.1062 0.4478 0.1024 0.1372 0.0979 0.06350.0897 0.1084]; HEP=[0.0616 0.0760 0.0708 0.0326 0.0231 0.02530.0398 0.0578 0.0674 0.0450];

for l = 1:10

%1. Estimate Historical Mean & Covif normPort

histMean=mean(returns(1:end-(10-l),stocksNr));histMean(2)=EQP(l);%histMean(2)=HEP(l);histCov=cov(returns(1:end-(10-l),stocksNr));

elseif resampPorthistMean=mean(returns(1:end-(10-l),stocksNr));histMean(2)=EQP(l);%histMean(2)=HEP(l);histCov=cov(returns(1:end-(10-l),stocksNr));

end

%2. Sample the Distributionif resampPort

wStarAll=zeros(nrAssets, nrPortfolios);for j=1:I

r = mvnrnd(histMean,histCov,T);sampMean = mean(r);sampCov = cov(r);

%3. Calculate efficient sampled Frontierif (errMean) && ~(errCov)

sampMean=sampMean;sampCov=histCov;

elseif errCov && ~(errMean)sampMean=histMean;sampCov=sampCov;

elseif errCov && errMeansampMean=sampMean;sampCov=sampCov;

elsesampMean=histMean;sampCov=histCov;

end

minMean = abs(min(sampMean));maxMean = max(sampMean);z=1;for k=[minMean:(maxMean-minMean)/(nrPortfolios-1):maxMean]

[wStar(:,z), tmp] = solveQuad(sampMean, sampCov, nrAssets, k);z=z+1;

end%4. Repeat step 2-3allReturn(:,j)=wStar’*histMean’;for q=1:nrPortfolios

allVol(q,j)=wStar(:,q)’*histCov*wStar(:,q);endwStarAll=wStarAll + wStar;

end

106 Code

%5. Calculate Average WeightswStarAll=wStarAll./I;returnResamp=wStarAll’*histMean’;for i=1:nrPortfolios

volResamp(i)=wStarAll(:,i)’*histCov*wStarAll(:,i);end

end%6. Original Frontier

minMean = abs(min(histMean));maxMean = max(histMean);z=1;for k=[minMean:(maxMean-minMean)/(nrPortfolios-1):maxMean]

[wStarHist(:,z), tmp] = solveQuad(histMean, histCov, nrAssets, k);z=z+1;

endreturnHist=wStarHist’*histMean’;for i=1:nrPortfolios

volHist(i)=wStarHist(:,i)’*histCov*wStarHist(:,i);end

prices((11-l),:)=values(end-(l-1),stocksNr);if resampPort

[mvp_val mvp_nr] = min(volHist);[tmpMin, portNr] = min(abs(volResamp(mvp_nr:end)-volDesired));weights(l,:)=wStarAll(:, portNr+mvp_nr-1)’;

else[mvp_val mvp_nr] = min(volHist);[tmpMin, portNr] = min(abs(volHist(mvp_nr:end)-volDesired));weights(l,:)=wStarHist(:, portNr+mvp_nr-1)’;

end

end[V, wealth]=buySell2(weights,prices)

B.9 Quadratic optimizationfunction [w, fval] = solveQuad(histMean, histCov, nrAssets, muBar)

clc;

H=histCov*2; f=zeros(nrAssets,1); A=[]; b=[]; Aeq=[histMean; ones(1,nrAssets)]; beq=[muBar; 1]; lb=zeros(nrAssets,1);ub=ones(nrAssets,1);

options = optimset(’LargeScale’,’off’);

[w, fval] = quadprog(H, f, A, b, Aeq, beq, lb, ub, [], options);

108 Code

CopyrightThe publishers will keep this document online on the Internet - or its possible re-placement - for a period of 25 years from the date of publication barring exceptionalcircumstances. The online availability of the document implies a permanent per-mission for anyone to read, to download, to print out single copies for your own useand to use it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other uses ofthe document are conditional on the consent of the copyright owner. The publisherhas taken technical and administrative measures to assure authenticity, securityand accessibility. According to intellectual property law the author has the right tobe mentioned when his/her work is accessed as described above and to be protectedagainst infringement. For additional information about the Linköping UniversityElectronic Press and its procedures for publication and for assurance of documentintegrity, please refer to its WWW home page: http://www.ep.liu.se/

UpphovsrättDetta dokument hålls tillgängligt på Internet - eller dess framtida ersättare -under 25 år från publiceringsdatum under förutsättning att inga extraordinäraomständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var ochen att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att an-vända det oförändrat för ickekommersiell forskning och för undervisning. Över-föring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd.All annan användning av dokumentet kräver upphovsmannens medgivande. Föratt garantera äktheten, säkerheten och tillgängligheten finns det lösningar avteknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt attbli nämnd som upphovsman i den omfattning som god sed kräver vid användningav dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändraseller presenteras i sådan form eller i sådant sammanhang som är kränkande förupphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterli-gare information om Linköping University Electronic Press se förlagets hemsidahttp://www.ep.liu.se/

c© May 12, 2008. Johan Bjurgert & Marcus Edstrand

Documents

Project on Forecasting Equity Premium and Optimal Portfolios