Chapter 4

1

PhD Program in Business Administration and Quantitative Methods

FINANCIAL ECONOMETRICS

2007-2008

ESTHER RUIZ

CHAPTER 4. STOCHASTIC VOLATILITY MODELS

4.1 Properties of ARSV(1) model

ARCH-type models assume that the volatility can be observed one-step-ahead.

However, a more realistic model for volatility can be based on modelling it having a

predictable component that depends on past information and an unexpected noise. In

this case, the volatility is a latent unobserved variable. One interpretation of the latent

volatility is that it represents the arrival of new information into the market; see, for

example, Clark (1973). In the simplest case, the log-volatility follows an AR(1) process.

Then, we have the ARSV(1) model given by *ttty =

ttt ++= )log()log( 2* 12* where t is a strict white noise with variance 1. The noise of the volatility equation, t , is assumed to be a Gaussian white noise with variance 2 independent of the noise of the level, t . The Gaussianity of t , which may seem rather ad hoc, means that the log-volatility process has a Normal distribution. However, there are several empirical

studies that support this assumption both for exchange rates and stock returns; see

Andersen, T.G., T. Bollerslev, F.X. Diebold and H. Ebens (2001) and Andersen, T.G.,

T. Bollerslev, F.X. Diebold and P. Labys (2001, 2003).

The parameter is related with the marginal variance of returns. A more convenient re-parameterization of the ARSV(1) is

ttty *= ttt += )log()log( 2 12

2

where 2/1

* 1exp

= is a scale parameter that removes the necessity of including a

constant term in the equation of the log-volatility.

The persistence is measured by the parameter . Finally, 2 measures the uncertainty of the volatility. If 02 = , then the process is homoscedastic. If we assume that the variance of )log( 2t is fixed regardless of the persistence parameter, , then the ARSV(1) can be re-parameterized once more as follows:

ttty *= ttt 2/122 12 )1()log()log( +=

Note that, as 1 , the process approaches homoscedasticity. Stochastic Volatility models have several attractives. First of all, they are closer than

GARCH models to the models often postulated in financial theories. Furthermore, their

properties are usually easy to derive as they follow directly from the properties of the

log-Normal distribution. However, the presence of two noises makes their estimation

hard.

The main statistical properties of SV models have been review by Ghysels, Harvey

and Renault (1996). In particular, the series ty is stationary if the log-volatility process

is stationary, i.e. if 1

3

14cexp

1)k(4cexp

)k(2h

2

c

h2h

2

c

=

, for k1,

where 2h and h(k) are the variance and the ACF of the underlying log-volatility, ht, and c is a constant defined as:

c = E(|t| 2c) / [E(|t| c)]2 = 2])21

2c([)

21()

21c( ++

where () is the gamma function. For the cases of main interest, c=1 and c=2, this constant takes the values 1=/2 and 2=3, respectively. If, for example, c=2, t is Gaussian and )log( 2t is an AR(1) process, then

( )( ) 1exp3 1exp)( 22

2 =

h

khk

This acf was derived by Taylor (1986) who showed that if 2 is small and/or is close to one, it can be approximated by

( )( ) khhk

1exp31exp

)( 22

2

However, this approximation is not always appropriate. The approximate

autocorrelations are always larger than the true ones and their rate of decay, , is smaller. Therefore, we may have a distorted picture of the underlying dynamics of

squared returns.

4

The parameter 2 governs the degree of kurtosis independently of the persistence of volatility measured by . Introducing the noise t makes the ARSV(1) model more flexible in the sense that it is able to generate higher kurtosis than the

GARCH(1,1) model without increasing 2(1) and without forcing the volatility to be close to the non-stationarity region.

5

The conditional distribution of yt is not Normal even if we assume that t is Gaussian. The volatility is unobservable. However, it is possible to estimate it by

running the Kalman filter. For this consider the following linear transformation of

returns:

log( 2ty ) = + ht + t

ttt hh += 1 where =log( 2* )+E[log( 2t )] and t=log( 2t )-E[log( 2t )] is a non-Gaussian, zero mean, white noise process with variance 2 and independent of ht. If, for example, t is

Gaussian, then 27.1))(log( 2 =tE and 2))(log(2

2 =tVar . Because, log( 2t ) is not truly Gaussian, the Kalman filter yields minimium mean square linear estimators (MMSLE)

of ht and future observations rather than minimum mean square estimators (MMSE).

The variance of log( 2ty ) is (0)= 22h + , its autocovariance function coincides with the autocovariance function of ht and its ACF is given by:

(k) =1

2h

2

h 1)k(

+ , for k1.

The ACF of log-squared returns is proportional to the ACF of ht, with the factor

of proportionality being smaller than one.

6

4.2 Comparison with GARCH(1,1) model

The relationship between kurtosis, persistence of shocks to volatility and first-

order autocorrelation of squares is different in GARCH and ARSV models. This

difference can explain why when both models are fitted to the same series:

i) The persistence estimated is usually larger in GARCH models than in ARSV models;

Taylor (1994), Shephard (1996), Kim, Shephard and Chib (1998), Hafner and Herwartz

(2000) and Anderson (2001).

ii) The Gaussianity assumption for the errors seem more adequate in ARSV models than

in GARCH models; Shephard (1996), Ghysels, Harvey and Renault (1996), Kim,

Shephard and Chib (1998) and Hafner and Herwartz (2000).

8

The relationship between kurtosis, persistence and )1(2 for an ARSV(1) model is given by

1

1)1(2

y

y

When the Normal-GARCH model is fitted to represent the evolution of

volatility, the kurtosis and first-order autocorrelation of squared returns implied by the

estimated parameters could be much larger than the corresponding population

coefficients of the simulated data. The persistence estimated by the GARCH model is

also usually larger than the persistence of the underlying true autocorrelations of

squares.

4.3 Estimation

Given that the conditional distribution of yt is not Normal, the ML estimator

cannot be obtained by traditional methods. This is the reason why there is a large list of

alternative methods to estimate SV models. Next, we describe some of the more

promising from an empirical point of view.

4.3.1 Method of Moments

These methods have the difficulty that their efficiency depends on the choice of

moments. Furthermore, a particular distribution needs to be assumed for t . Finally, its efficiency is reduced as the process approaches the non-stationarity as it is often the

case in the empirical application to financial returns.

Consider, for example, the estimator based on the sample variance. If, the log-

volatility is a random walk, the stationary form of log( 2ty ) is

)log()log( 22 ttty = . Therefore, if we denote by )log( 2* tt yy = , then, assuming that t is Normal, )2/(2 222* +=y . Consequently, 222 * = y . A method of moments (MM) estimator of 2 is then given by

222*

~ = ys . If, 02 > , then )~( 22 T has an asymptotic normal distribution with zero mean and variance [ ]4222 )(2 ++ .

9

Melino and Turnbull (1990) proposed to estimate the stationary SV models by

GMM using the following moments:

( )( ))(

)(

)(

)(

2222

33

44

22

htthtt

htthtt

tt

tt

tt

tt

yyEyy

yyEyy

yEy

yEyyEy

yEy

GMM is relatively inefficient due to the largely arbitrary choice of unconditional

moments that can be computed in closed form, while the likelihood-based procedures

achieve the Cramer-Rao efficiency bound. The Efficient Method of Moments (EMM)

seeks efficiency improvements, while maintaining the general flexibility of GMM, by

letting the data guide the choice of an auxiliary quasi-likelihood which serves to

generate an efficient set of moments; see Andersen, Chung and Sorensen (1999).

4.3.2 Quasi Maximum Likelihood

The QML estimator was independently proposed by Nelson (1988) and Harvey

et al. (1994) and is based on the Kalman filter. This is applied to log( 2ty ), where the

observations are standardized by the sample standard deviation to obtain one-step ahead

errors and their variances. These are then used to construct the Gaussian likelihood which

is numerically maximized. Ruiz (1994) shows that the QML estimator is consistent and

asymptotically Normal. However, the QML procedure is inefficient, as the method does

not rely on the exact likelihood of log( 2ty ).

The estimator of the constant of the model is uncorrelated with the estimators of

the other parameters of the model; see Ruiz (1994).

The standard theory for the estimation of unobserved component time series

models with non-normal errors applies to the estimates of and 2 . When ht follows a random walk, the Kalman filter approach is still valid if the

restriction 1= is imposed. The only difference is that the first observation is used to initialize the Kalman filter, whereas when 1

10

Finally, note that the model can be estimated by assuming that t has a particular distribution as, for example, Normal. In this case, the parameter 2 does not need to be estimated as it is determined by this distribution. However, Ruiz (1994)

shows that, even if the distribution of t is known, estimating 2 reduces the finite sample variances of the estimates.

Furthermore, estimating 2 , it is possible to test whether t is Normal by testing

2:

22

0 =H . One possible test statistic could be a quasi-Likelihood Ratio (LR) test.

Since the null hypothesis is on the boundary of the admissible parameter space, the

distribution of the LR test is given by 2120 2

121 + where 20 is a degenerate

distribution with all its mass at the origin. The size of the of the LR test can therefore be

set appropriately simply by using the 2, rather than the , significance point of a 21 distribution for a test of size . For example, for =5%, the corresponding critical value

for the Quasi-LR test statistic is 2.71.

4.3.3 Monte Carlo Likelihood

Sandman and Koopman (1998) have proposed the Monte Carlo Likelihood

(MCL) procedure that estimates the likelihood function of log( 2ty ) by a Gaussian part

constructed via the Kalman filter plus a correction for departures from the Gaussian

assumption as follows

+=

)|(

)|(log)|(log)|(log

21log**

GGTGT f

fEYLYL

where { })log(),...,log( 221* TT yyY = , )|( * TG YL is the Gaussian likelihood function, )|(2

1logf is the true 21log density, )|( Gf is the importance density corresponding

to the approximating Gaussian model and GE refers to the expectation with this density.

The MCL procedure also generates simultaneously estimates of the latent

volatilities.

11

4.3.3 MCMC

ML estimators of the parameters of SV models have experience a big progress,

thanks to the development of numerical methods based on importance sampling and

Monte Carlo Markov Chain (MCMC) procedures. In order to derive the likelihood, the

vector of the unobserved volatilities has to be integrated out of the joint probability

distribution. If we denote by { }TT yyY ,...,1= , { }T ,...,1= and { }2,, = , the likelihood is given by

= dfYfYL TT )|(),|()|(log The dimension of this integral is T and its evaluation requires numerical methods.

MCMC estimators of the parameters of SV models were proposed Jacquier et al. (1994).

The Bayesian approach for estimating the parameters, , is to augment this vector with the

latent log-volatilities. After M Monte Carlo replicates of the parameters have been

obtained, )(i , it is possible to obtain density estimates. On the other hand, a natural choice to obtain smooth estimates of the log-volatilities is the marginal posterior expectation

which can be estimated by the sample mean. Finally, it is also possible to obtain interval

predictions of future volatilities conditional on the information available at time T that take

into account the inherent model variability and the parameter uncertainty.

Kim et al. (1998) have also proposed a MCMC algorithm that samples the

unobserved volatilities simultaneously by means of an approximating offset mixture of

normal models, together with an importance reweightening procedure to correct the

linearization error. The KSC procedure provides efficient inferences, likelihood evaluation,

filtered volatility estimates, diagnostics for model failure and computation of statistics for

comparing non-nested volatility models.

Example: Standard & Poors500

GMM JPR QML MCL KSC

)0479.0(

9602.0 )0203.0(

9596.0 )0699.0(

9401.0 )0249.0(

9288.0 )0237.0(

9392.0

2 )0219.0( 0541.0 )0196.0( 0172.0 )0222.0( 0128.0 )0190.0( 0499.0 )0168.0( 0405.0

* )0157.0( 0005.1 )0673.0( 9673.0 )0371.0( 2051.1 )0542.0( 1248.1 )0619.0( 1260.1

12

4.4 Extensions

4.5.1 Leverage effect

Harvey and Shephard (1996) proposed introducing the leverage effect in the

ARSV model through correlation between the noises t and 1+t . Therefore, the volatility of the asymmetric ARSV(1), A-ARSV(1), model is given by

ttty *= ttt += )log()log( 2 12

with =+ ),( 1ttCorr . Harvey and Shephard (1996) show that, in this case, the kurtosis of ty is the same as in the symmetric case. The acf of squared observations,

derived by Taylor (1994), is given by

1)exp(

1)exp()1()( 2

222

2 +=

h

khk

It is interesting to observe that, as in the symmetric ARSV model, the rate of

decay is under for the smaller lags but instead of converging to , it converges to one in the presence of correlation between the level and volatility noises, t and t , respectively. However, notice that, in practice, the autocorrelations of large order are

indistinguishable from zero.

13

For a given value of the kurtosis, the autocorrelation of order one of squares is

larger, the larger is the correlation between the noises.

Therefore, if as expected in empirical applications, the magnitude of the

asymmetry parameter is rather small, the relationship between persistence, kurtosis and

autocorrelations of squares is similar to the one derived for the symmetric ARSV(1)

model.

Jacquier et al. (2004) have proposed an ARSV model with leverage effect where

t and t are correlated. However, Yu (2002) shows that this latter specification has problems and provides empirical evidence favouring the specification proposed by

Harvey and Shephard (1996).

With respect to estimation, see Yu (2004).

4.5.3 Long-memory

The long-memory property has been incorporated into SV models by Harvey (1998)

and Breidt, Crato and deLima (1998), who propose LMSV models where the log-

volatility follows an ARFIMA(p,d,q) process. In particular, when p=1 and q=0, the

model for the series of returns, yt, is given by:

yt = exp(ht/2)t (1.a) (1-L)(1-L)d ht = t (1.b)

where ht= )log( 2t .

Finally, note that model (1) is stationary if ||

14

where F(,;;) is the hypergeometric function. Note that when =0, F(1,1+d;1-d;0)=1, and (3) becomes the variance of an ARFIMA(0,d,0) process,

222h ])d1(/[)d21( = , as given by Harvey (1998). On the other hand, when d=0,

F(1,1;1;)=(1-)-1, and (3) becomes the variance of an AR(1) process, )1/( 222h = , as in Harvey, Ruiz and Shephard (1994).

As k, these autocorrelations behave like h(k)~Ak2d-1 where A is a factor of proportionality that depends on d and . Therefore, the dependence between observations a long time span apart decays at a very slow hyperbolic rate. Finally, it is

possible to show that when d=0, expression (4) becomes the ACF of an AR(1) process,

h(k)=k, and when =0, it becomes the ACF of the ARFIMA(0,d,0) process, h(k)= )dk)...(d2)(d1(

)1kd)...(1d(d++ .

Andersen and Bollerslev (1997) and Robinson (2001) show that the autocovariances

of squared and absolute returns decay at the same rate as the autocovariances of ht for

large lags. This argument is often used to justify the use of these transformations to

identify and model the long-memory of volatility. However, the rate of decay of the

autocorrelations of |yt| or 2ty and those of ht could be rather different for low lags. The

rate of decay of the autocorrelations of squares is clearly smaller than the rate of decay

of the ACF of the log-volatility. Therefore, the autocorrelations of squares decay

towards zero quicker than the autocorrelations of the log-volatility. The rates of decay

of the autocorrelations of both series are the same for large lags. The same behaviour

can be observed when comparing the rates of decay of the autocorrelations of absolute

returns and log-volatility autocorrelations although, in this case, they are closer than

when comparing squared returns and log-volatilities.

Another important difference between the ACF of |yt| or 2ty and the ACF of ht is the

magnitude of the autocorrelations themselves, that are clearly smaller for |yt| and 2ty

than for the log-volatility process. This fact shows up in Figure 1(b) that displays the

ACF of ht together with the ACF of |yt| and 2ty for the same model as before. In this

case, the ACF of squared and absolute returns is nearly five times smaller than the ACF

of the log-volatilities.

15

It is also remarkable that the behaviour of the ACF of short-memory and long-

memory SV models can be rather similar in some cases. Figure 2(a) plots the ACF of 2ty for three LMSV models with parameters {=0.98, d=0, 2 =0.027}, {=0.93, d=0.2, 2 =0.027} and {=0.88, d=0.3, 2 =0.026}, respectively. These models are selected so

that their coefficient of variation, defined as {Var( 2t )/[E( 2t )]2}, is approximately one and the first order autocorrelation of 2ty is 0.19 in all of them. Notice that, in practice,

the rates of decay of the short-memory ARSV model and the LMSV model with small d

could be difficult to distinguish in the first lags. Indeed, the main differences only arise

after the autocorrelation of approximately order 80. Furthermore, observe that the

autocorrelations up to order 20 of the two long memory models displayed in Figure 2(a)

are nearly indistinguishable. In these cases, the knowledge of the behaviour in the long-

run will be essential. The same conclusions would be drawn if the ACF of absolute

returns were used.

The dynamic dependence of the series of returns also appears in the logarithms

of squares by Therefore, both ACFs decay at the same hyperbolic rate but the ACF of

log( 2ty ) takes smaller values. The ACF of log( 2ty ) takes even smaller values than the

ACF of the other two transformations considered. There is a difficulty in distinguishing

among different LMSV models using only the information contained in the ACF. As we

16

will see in next section, this problem will be enlarged because of the negative bias of the

sample ACF of log( 2ty ) in LMSV models.

With respect to estimation, Harvey (1998) and Breidt et al. (1998) proposed a

QML estimator based on maximising the discrete Whittle approximation of the

likelihood function of log( 2ty ) in the frequency domain. The estimates are obtained by

minimizing

+=

Mj j

jyj f

If

TL

);(

)();(log

21)(

*

where [ ]{ },2/,...,2,1 TjM == ()f is the spectral density of 2log ty , Tj

j 2= are

Fourier frequencies, 2

)()( ** yy WI = and = =T

tty ityT

W1

2 )exp(log21)(* are the

periodogram and discrete Fourier transform of 2log ty at frequency . The finite sample properties of the QML estimator have been considered by Prez

and Ruiz (2001).

Arteche (2004) has proven the consistency and asymptotic normality of a related

estimator, the local Whittle estimator. He shows that the added noise has a distorting effect

on the estimates of the memory parameter of the signal. A suitable choice of the bandwidth

is important to lessen its impact.

Once the parameters of the model have been estimated, the underlying volatility at

time t may be estimated by an algorithm proposed by Harvey (1998).

Example: Daily returns of the IBEX35 observed from 7/1/1987 to 30/12/1998. We

remove any correlation in the data fitting a MA(1) and focus the analysis on the

residuals form this model.

17

Estimation results

Model ARSV(1) RWSV LMSV ARLMSV

0.9898 (0.0057)

--- --- 0.6632

d --- --- 0.7538 0.7035

2 0.0168 (0.0042)

0.0099

(0.0024)

0.0906 0.0155

2* 0.9297

(0.0374)

0.9484

(0.0436)

1.5112 1.5484

Sample moments of standardised observations using smoothed estimates of volatility.

Sample moments ARSV(1) RWSV LMSV ARLMSV Mean

Variance

Skewness

Kurtosis

0.0101

1.0000

-0.0284

3.5620

0.0074

1.0000

-0.0441

3.7070

0.0113

0.9999

-0.0183

3.3941

0.0122

0.9999

-0.0127

3.4380

Aut. of squares 1

2

3

4

5

10

50

100

0.0709**

0.0676**

0.0306

0.0342

0.0335

0.0206

0.0011

0.0300

0.0894**

0.0923**

0.0392*

0.0542**

0.0464*

0.0258

0.0011

0.0320

0.0419*

0.0378*

0.0217

0.0129

0.0290

0.0225

-0.0011

0.0256

0.0455*

0.0320

0.0173

0.0053

0.0231

0.0210

-0.0017

0.0249

18

Ljung-Box Test

Q2(10)

Q2(20)

Q2(100)

43.15**

101.70**

150.41**

80.06**

144.61**

198.87**

17.89

71.37*

117.67

15.50 68.43*

113.39

** Significant at 1%; * Significant at 5%

There are other alternative estimation methods proposed for LMSV models:

a) Methods based on state space models: Chan and Petris (2000)

b) Bayesian procedures: Hsu and Breidt (1997)

c) GMM: Wright (1999)

d) Semiparametric estimation: Deo and Hurvich (1998)

References

Andersen, T.G., H.-J- Chung and B.E. Sorensen (1999), Efficient method of moments estimation of a stochastic volatility model: A Monte Carlo study, Journal of Econometrics, 91, 61-87. Andersen, T.G., T. Bollerslev, F.X. Diebold and H. Ebens (2001), The distribution of realized stock return volatility, Journal of Financial Economics, 61, 43-76. Andersen, T.G., T. Bollerslev, F.X. Diebold and P. Labys (2001), The distribution of realized exchange rate volatility, Journal of the American Statistical Association, 96, 42-55. Andersen, T.G., T. Bollerslev, F.X. Diebold and P. Labys (2003), Modelling and forecasting volatility, Econometrica, 71, 579-625. Arteche, J. (2004), Gaussian semiparametric estimation in long memory in stochastic volatility and signal plus noise models, 119, 131-154. Broto, C. and E. Ruiz (2004), Estimation methods for Stochastic Volatility models: A survey, Journal of Economic Surveys, 18, 613-649. Carnero, M.A., D. Pea and E. Ruiz (2004), Persistence and kurtosis in GARCH and Stochastic Volatility Models, Journal of Financial Econometrics, 2, 319-342. Clark, P.K. (1973), A subordinated stochastic process model with fixed variance for speculative prices, Econometrica, 41, 135-156. Ghysels, E., A.C. Harvey and E. Renault (1996), Stochastic Volatility, in G.S. Maddala and C.R. Rao (eds.), Handbook of Statistics, 14, North-Holland, Amsterdam.

19

Harvey, A.C. (1998), Long memory in Stochastic volatility, in J. Knight and S. Satchell (eds.), Forecasting Volatility in Financial Markets, 307-320, Butterworth-Haineman, Oxford. Harvey, A.C., E. Ruiz and N. Shephard (1994), Multivariate Stochastic Variance Models, Review of Economic Studies, 61, 247-2.

Documents

Chapter 4