19
1 PhD Program in Business Administration and Quantitative Methods FINANCIAL ECONOMETRICS 2007-2008 ESTHER RUIZ CHAPTER 4. STOCHASTIC VOLATILITY MODELS 4.1 Properties of ARSV(1) model ARCH-type models assume that the volatility can be observed one-step-ahead. However, a more realistic model for volatility can be based on modelling it having a predictable component that depends on past information and an unexpected noise. In this case, the volatility is a latent unobserved variable. One interpretation of the latent volatility is that it represents the arrival of new information into the market; see, for example, Clark (1973). In the simplest case, the log-volatility follows an AR(1) process. Then, we have the ARSV(1) model given by * t t t y σ ε = t t t η σ φ μ σ + + = ) log( ) log( 2 * 1 2 * where t ε is a strict white noise with variance 1. The noise of the volatility equation, t η , is assumed to be a Gaussian white noise with variance 2 η σ independent of the noise of the level, t ε . The Gaussianity of t η , which may seem rather ad hoc, means that the log- volatility process has a Normal distribution. However, there are several empirical studies that support this assumption both for exchange rates and stock returns; see Andersen, T.G., T. Bollerslev, F.X. Diebold and H. Ebens (2001) and Andersen, T.G., T. Bollerslev, F.X. Diebold and P. Labys (2001, 2003). The parameter μ is related with the marginal variance of returns. A more convenient re-parameterization of the ARSV(1) is t t t y σ ε σ * = t t t η σ φ σ + = ) log( ) log( 2 1 2

Chapter 4

Embed Size (px)

DESCRIPTION

econometria

Citation preview

  • 1

    PhD Program in Business Administration and Quantitative Methods

    FINANCIAL ECONOMETRICS

    2007-2008

    ESTHER RUIZ

    CHAPTER 4. STOCHASTIC VOLATILITY MODELS

    4.1 Properties of ARSV(1) model

    ARCH-type models assume that the volatility can be observed one-step-ahead.

    However, a more realistic model for volatility can be based on modelling it having a

    predictable component that depends on past information and an unexpected noise. In

    this case, the volatility is a latent unobserved variable. One interpretation of the latent

    volatility is that it represents the arrival of new information into the market; see, for

    example, Clark (1973). In the simplest case, the log-volatility follows an AR(1) process.

    Then, we have the ARSV(1) model given by *ttty =

    ttt ++= )log()log( 2* 12* where t is a strict white noise with variance 1. The noise of the volatility equation, t , is assumed to be a Gaussian white noise with variance 2 independent of the noise of the level, t . The Gaussianity of t , which may seem rather ad hoc, means that the log-volatility process has a Normal distribution. However, there are several empirical

    studies that support this assumption both for exchange rates and stock returns; see

    Andersen, T.G., T. Bollerslev, F.X. Diebold and H. Ebens (2001) and Andersen, T.G.,

    T. Bollerslev, F.X. Diebold and P. Labys (2001, 2003).

    The parameter is related with the marginal variance of returns. A more convenient re-parameterization of the ARSV(1) is

    ttty *= ttt += )log()log( 2 12

  • 2

    where 2/1

    * 1exp

    = is a scale parameter that removes the necessity of including a

    constant term in the equation of the log-volatility.

    The persistence is measured by the parameter . Finally, 2 measures the uncertainty of the volatility. If 02 = , then the process is homoscedastic. If we assume that the variance of )log( 2t is fixed regardless of the persistence parameter, , then the ARSV(1) can be re-parameterized once more as follows:

    ttty *= ttt 2/122 12 )1()log()log( +=

    Note that, as 1 , the process approaches homoscedasticity. Stochastic Volatility models have several attractives. First of all, they are closer than

    GARCH models to the models often postulated in financial theories. Furthermore, their

    properties are usually easy to derive as they follow directly from the properties of the

    log-Normal distribution. However, the presence of two noises makes their estimation

    hard.

    The main statistical properties of SV models have been review by Ghysels, Harvey

    and Renault (1996). In particular, the series ty is stationary if the log-volatility process

    is stationary, i.e. if 1

  • 3

    14cexp

    1)k(4cexp

    )k(2h

    2

    c

    h2h

    2

    c

    =

    , for k1,

    where 2h and h(k) are the variance and the ACF of the underlying log-volatility, ht, and c is a constant defined as:

    c = E(|t| 2c) / [E(|t| c)]2 = 2])21

    2c([)

    21()

    21c( ++

    where () is the gamma function. For the cases of main interest, c=1 and c=2, this constant takes the values 1=/2 and 2=3, respectively. If, for example, c=2, t is Gaussian and )log( 2t is an AR(1) process, then

    ( )( ) 1exp3 1exp)( 22

    2 =

    h

    khk

    This acf was derived by Taylor (1986) who showed that if 2 is small and/or is close to one, it can be approximated by

    ( )( ) khhk

    1exp31exp

    )( 22

    2

    However, this approximation is not always appropriate. The approximate

    autocorrelations are always larger than the true ones and their rate of decay, , is smaller. Therefore, we may have a distorted picture of the underlying dynamics of

    squared returns.

  • 4

    The parameter 2 governs the degree of kurtosis independently of the persistence of volatility measured by . Introducing the noise t makes the ARSV(1) model more flexible in the sense that it is able to generate higher kurtosis than the

    GARCH(1,1) model without increasing 2(1) and without forcing the volatility to be close to the non-stationarity region.

  • 5

    The conditional distribution of yt is not Normal even if we assume that t is Gaussian. The volatility is unobservable. However, it is possible to estimate it by

    running the Kalman filter. For this consider the following linear transformation of

    returns:

    log( 2ty ) = + ht + t

    ttt hh += 1 where =log( 2* )+E[log( 2t )] and t=log( 2t )-E[log( 2t )] is a non-Gaussian, zero mean, white noise process with variance 2 and independent of ht. If, for example, t is

    Gaussian, then 27.1))(log( 2 =tE and 2))(log(2

    2 =tVar . Because, log( 2t ) is not truly Gaussian, the Kalman filter yields minimium mean square linear estimators (MMSLE)

    of ht and future observations rather than minimum mean square estimators (MMSE).

    The variance of log( 2ty ) is (0)= 22h + , its autocovariance function coincides with the autocovariance function of ht and its ACF is given by:

    (k) =1

    2h

    2

    h 1)k(

    + , for k1.

    The ACF of log-squared returns is proportional to the ACF of ht, with the factor

    of proportionality being smaller than one.

  • 6

    4.2 Comparison with GARCH(1,1) model

    The relationship between kurtosis, persistence of shocks to volatility and first-

    order autocorrelation of squares is different in GARCH and ARSV models. This

    difference can explain why when both models are fitted to the same series:

    i) The persistence estimated is usually larger in GARCH models than in ARSV models;

    Taylor (1994), Shephard (1996), Kim, Shephard and Chib (1998), Hafner and Herwartz

    (2000) and Anderson (2001).

    ii) The Gaussianity assumption for the errors seem more adequate in ARSV models than

    in GARCH models; Shephard (1996), Ghysels, Harvey and Renault (1996), Kim,

    Shephard and Chib (1998) and Hafner and Herwartz (2000).

  • 7

  • 8

    The relationship between kurtosis, persistence and )1(2 for an ARSV(1) model is given by

    1

    1)1(2

    y

    y

    When the Normal-GARCH model is fitted to represent the evolution of

    volatility, the kurtosis and first-order autocorrelation of squared returns implied by the

    estimated parameters could be much larger than the corresponding population

    coefficients of the simulated data. The persistence estimated by the GARCH model is

    also usually larger than the persistence of the underlying true autocorrelations of

    squares.

    4.3 Estimation

    Given that the conditional distribution of yt is not Normal, the ML estimator

    cannot be obtained by traditional methods. This is the reason why there is a large list of

    alternative methods to estimate SV models. Next, we describe some of the more

    promising from an empirical point of view.

    4.3.1 Method of Moments

    These methods have the difficulty that their efficiency depends on the choice of

    moments. Furthermore, a particular distribution needs to be assumed for t . Finally, its efficiency is reduced as the process approaches the non-stationarity as it is often the

    case in the empirical application to financial returns.

    Consider, for example, the estimator based on the sample variance. If, the log-

    volatility is a random walk, the stationary form of log( 2ty ) is

    )log()log( 22 ttty = . Therefore, if we denote by )log( 2* tt yy = , then, assuming that t is Normal, )2/(2 222* +=y . Consequently, 222 * = y . A method of moments (MM) estimator of 2 is then given by

    222*

    ~ = ys . If, 02 > , then )~( 22 T has an asymptotic normal distribution with zero mean and variance [ ]4222 )(2 ++ .

  • 9

    Melino and Turnbull (1990) proposed to estimate the stationary SV models by

    GMM using the following moments:

    ( )( ))(

    )(

    )(

    )(

    2222

    33

    44

    22

    htthtt

    htthtt

    tt

    tt

    tt

    tt

    yyEyy

    yyEyy

    yEy

    yEyyEy

    yEy

    GMM is relatively inefficient due to the largely arbitrary choice of unconditional

    moments that can be computed in closed form, while the likelihood-based procedures

    achieve the Cramer-Rao efficiency bound. The Efficient Method of Moments (EMM)

    seeks efficiency improvements, while maintaining the general flexibility of GMM, by

    letting the data guide the choice of an auxiliary quasi-likelihood which serves to

    generate an efficient set of moments; see Andersen, Chung and Sorensen (1999).

    4.3.2 Quasi Maximum Likelihood

    The QML estimator was independently proposed by Nelson (1988) and Harvey

    et al. (1994) and is based on the Kalman filter. This is applied to log( 2ty ), where the

    observations are standardized by the sample standard deviation to obtain one-step ahead

    errors and their variances. These are then used to construct the Gaussian likelihood which

    is numerically maximized. Ruiz (1994) shows that the QML estimator is consistent and

    asymptotically Normal. However, the QML procedure is inefficient, as the method does

    not rely on the exact likelihood of log( 2ty ).

    The estimator of the constant of the model is uncorrelated with the estimators of

    the other parameters of the model; see Ruiz (1994).

    The standard theory for the estimation of unobserved component time series

    models with non-normal errors applies to the estimates of and 2 . When ht follows a random walk, the Kalman filter approach is still valid if the

    restriction 1= is imposed. The only difference is that the first observation is used to initialize the Kalman filter, whereas when 1

  • 10

    Finally, note that the model can be estimated by assuming that t has a particular distribution as, for example, Normal. In this case, the parameter 2 does not need to be estimated as it is determined by this distribution. However, Ruiz (1994)

    shows that, even if the distribution of t is known, estimating 2 reduces the finite sample variances of the estimates.

    Furthermore, estimating 2 , it is possible to test whether t is Normal by testing

    2:

    22

    0 =H . One possible test statistic could be a quasi-Likelihood Ratio (LR) test.

    Since the null hypothesis is on the boundary of the admissible parameter space, the

    distribution of the LR test is given by 2120 2

    121 + where 20 is a degenerate

    distribution with all its mass at the origin. The size of the of the LR test can therefore be

    set appropriately simply by using the 2, rather than the , significance point of a 21 distribution for a test of size . For example, for =5%, the corresponding critical value

    for the Quasi-LR test statistic is 2.71.

    4.3.3 Monte Carlo Likelihood

    Sandman and Koopman (1998) have proposed the Monte Carlo Likelihood

    (MCL) procedure that estimates the likelihood function of log( 2ty ) by a Gaussian part

    constructed via the Kalman filter plus a correction for departures from the Gaussian

    assumption as follows

    +=

    )|(

    )|(log)|(log)|(log

    21log**

    GGTGT f

    fEYLYL

    where { })log(),...,log( 221* TT yyY = , )|( * TG YL is the Gaussian likelihood function, )|(2

    1logf is the true 21log density, )|( Gf is the importance density corresponding

    to the approximating Gaussian model and GE refers to the expectation with this density.

    The MCL procedure also generates simultaneously estimates of the latent

    volatilities.

  • 11

    4.3.3 MCMC

    ML estimators of the parameters of SV models have experience a big progress,

    thanks to the development of numerical methods based on importance sampling and

    Monte Carlo Markov Chain (MCMC) procedures. In order to derive the likelihood, the

    vector of the unobserved volatilities has to be integrated out of the joint probability

    distribution. If we denote by { }TT yyY ,...,1= , { }T ,...,1= and { }2,, = , the likelihood is given by

    = dfYfYL TT )|(),|()|(log The dimension of this integral is T and its evaluation requires numerical methods.

    MCMC estimators of the parameters of SV models were proposed Jacquier et al. (1994).

    The Bayesian approach for estimating the parameters, , is to augment this vector with the

    latent log-volatilities. After M Monte Carlo replicates of the parameters have been

    obtained, )(i , it is possible to obtain density estimates. On the other hand, a natural choice to obtain smooth estimates of the log-volatilities is the marginal posterior expectation

    which can be estimated by the sample mean. Finally, it is also possible to obtain interval

    predictions of future volatilities conditional on the information available at time T that take

    into account the inherent model variability and the parameter uncertainty.

    Kim et al. (1998) have also proposed a MCMC algorithm that samples the

    unobserved volatilities simultaneously by means of an approximating offset mixture of

    normal models, together with an importance reweightening procedure to correct the

    linearization error. The KSC procedure provides efficient inferences, likelihood evaluation,

    filtered volatility estimates, diagnostics for model failure and computation of statistics for

    comparing non-nested volatility models.

    Example: Standard & Poors500

    GMM JPR QML MCL KSC

    )0479.0(

    9602.0 )0203.0(

    9596.0 )0699.0(

    9401.0 )0249.0(

    9288.0 )0237.0(

    9392.0

    2 )0219.0( 0541.0 )0196.0( 0172.0 )0222.0( 0128.0 )0190.0( 0499.0 )0168.0( 0405.0

    * )0157.0( 0005.1 )0673.0( 9673.0 )0371.0( 2051.1 )0542.0( 1248.1 )0619.0( 1260.1

  • 12

    4.4 Extensions

    4.5.1 Leverage effect

    Harvey and Shephard (1996) proposed introducing the leverage effect in the

    ARSV model through correlation between the noises t and 1+t . Therefore, the volatility of the asymmetric ARSV(1), A-ARSV(1), model is given by

    ttty *= ttt += )log()log( 2 12

    with =+ ),( 1ttCorr . Harvey and Shephard (1996) show that, in this case, the kurtosis of ty is the same as in the symmetric case. The acf of squared observations,

    derived by Taylor (1994), is given by

    1)exp(

    1)exp()1()( 2

    222

    2 +=

    h

    khk

    It is interesting to observe that, as in the symmetric ARSV model, the rate of

    decay is under for the smaller lags but instead of converging to , it converges to one in the presence of correlation between the level and volatility noises, t and t , respectively. However, notice that, in practice, the autocorrelations of large order are

    indistinguishable from zero.

  • 13

    For a given value of the kurtosis, the autocorrelation of order one of squares is

    larger, the larger is the correlation between the noises.

    Therefore, if as expected in empirical applications, the magnitude of the

    asymmetry parameter is rather small, the relationship between persistence, kurtosis and

    autocorrelations of squares is similar to the one derived for the symmetric ARSV(1)

    model.

    Jacquier et al. (2004) have proposed an ARSV model with leverage effect where

    t and t are correlated. However, Yu (2002) shows that this latter specification has problems and provides empirical evidence favouring the specification proposed by

    Harvey and Shephard (1996).

    With respect to estimation, see Yu (2004).

    4.5.3 Long-memory

    The long-memory property has been incorporated into SV models by Harvey (1998)

    and Breidt, Crato and deLima (1998), who propose LMSV models where the log-

    volatility follows an ARFIMA(p,d,q) process. In particular, when p=1 and q=0, the

    model for the series of returns, yt, is given by:

    yt = exp(ht/2)t (1.a) (1-L)(1-L)d ht = t (1.b)

    where ht= )log( 2t .

    Finally, note that model (1) is stationary if ||

  • 14

    where F(,;;) is the hypergeometric function. Note that when =0, F(1,1+d;1-d;0)=1, and (3) becomes the variance of an ARFIMA(0,d,0) process,

    222h ])d1(/[)d21( = , as given by Harvey (1998). On the other hand, when d=0,

    F(1,1;1;)=(1-)-1, and (3) becomes the variance of an AR(1) process, )1/( 222h = , as in Harvey, Ruiz and Shephard (1994).

    As k, these autocorrelations behave like h(k)~Ak2d-1 where A is a factor of proportionality that depends on d and . Therefore, the dependence between observations a long time span apart decays at a very slow hyperbolic rate. Finally, it is

    possible to show that when d=0, expression (4) becomes the ACF of an AR(1) process,

    h(k)=k, and when =0, it becomes the ACF of the ARFIMA(0,d,0) process, h(k)= )dk)...(d2)(d1(

    )1kd)...(1d(d++ .

    Andersen and Bollerslev (1997) and Robinson (2001) show that the autocovariances

    of squared and absolute returns decay at the same rate as the autocovariances of ht for

    large lags. This argument is often used to justify the use of these transformations to

    identify and model the long-memory of volatility. However, the rate of decay of the

    autocorrelations of |yt| or 2ty and those of ht could be rather different for low lags. The

    rate of decay of the autocorrelations of squares is clearly smaller than the rate of decay

    of the ACF of the log-volatility. Therefore, the autocorrelations of squares decay

    towards zero quicker than the autocorrelations of the log-volatility. The rates of decay

    of the autocorrelations of both series are the same for large lags. The same behaviour

    can be observed when comparing the rates of decay of the autocorrelations of absolute

    returns and log-volatility autocorrelations although, in this case, they are closer than

    when comparing squared returns and log-volatilities.

    Another important difference between the ACF of |yt| or 2ty and the ACF of ht is the

    magnitude of the autocorrelations themselves, that are clearly smaller for |yt| and 2ty

    than for the log-volatility process. This fact shows up in Figure 1(b) that displays the

    ACF of ht together with the ACF of |yt| and 2ty for the same model as before. In this

    case, the ACF of squared and absolute returns is nearly five times smaller than the ACF

    of the log-volatilities.

  • 15

    It is also remarkable that the behaviour of the ACF of short-memory and long-

    memory SV models can be rather similar in some cases. Figure 2(a) plots the ACF of 2ty for three LMSV models with parameters {=0.98, d=0, 2 =0.027}, {=0.93, d=0.2, 2 =0.027} and {=0.88, d=0.3, 2 =0.026}, respectively. These models are selected so

    that their coefficient of variation, defined as {Var( 2t )/[E( 2t )]2}, is approximately one and the first order autocorrelation of 2ty is 0.19 in all of them. Notice that, in practice,

    the rates of decay of the short-memory ARSV model and the LMSV model with small d

    could be difficult to distinguish in the first lags. Indeed, the main differences only arise

    after the autocorrelation of approximately order 80. Furthermore, observe that the

    autocorrelations up to order 20 of the two long memory models displayed in Figure 2(a)

    are nearly indistinguishable. In these cases, the knowledge of the behaviour in the long-

    run will be essential. The same conclusions would be drawn if the ACF of absolute

    returns were used.

    The dynamic dependence of the series of returns also appears in the logarithms

    of squares by Therefore, both ACFs decay at the same hyperbolic rate but the ACF of

    log( 2ty ) takes smaller values. The ACF of log( 2ty ) takes even smaller values than the

    ACF of the other two transformations considered. There is a difficulty in distinguishing

    among different LMSV models using only the information contained in the ACF. As we

  • 16

    will see in next section, this problem will be enlarged because of the negative bias of the

    sample ACF of log( 2ty ) in LMSV models.

    With respect to estimation, Harvey (1998) and Breidt et al. (1998) proposed a

    QML estimator based on maximising the discrete Whittle approximation of the

    likelihood function of log( 2ty ) in the frequency domain. The estimates are obtained by

    minimizing

    +=

    Mj j

    jyj f

    If

    TL

    );(

    )();(log

    21)(

    *

    where [ ]{ },2/,...,2,1 TjM == ()f is the spectral density of 2log ty , Tj

    j 2= are

    Fourier frequencies, 2

    )()( ** yy WI = and = =T

    tty ityT

    W1

    2 )exp(log21)(* are the

    periodogram and discrete Fourier transform of 2log ty at frequency . The finite sample properties of the QML estimator have been considered by Prez

    and Ruiz (2001).

    Arteche (2004) has proven the consistency and asymptotic normality of a related

    estimator, the local Whittle estimator. He shows that the added noise has a distorting effect

    on the estimates of the memory parameter of the signal. A suitable choice of the bandwidth

    is important to lessen its impact.

    Once the parameters of the model have been estimated, the underlying volatility at

    time t may be estimated by an algorithm proposed by Harvey (1998).

    Example: Daily returns of the IBEX35 observed from 7/1/1987 to 30/12/1998. We

    remove any correlation in the data fitting a MA(1) and focus the analysis on the

    residuals form this model.

  • 17

    Estimation results

    Model ARSV(1) RWSV LMSV ARLMSV

    0.9898 (0.0057)

    --- --- 0.6632

    d --- --- 0.7538 0.7035

    2 0.0168 (0.0042)

    0.0099

    (0.0024)

    0.0906 0.0155

    2* 0.9297

    (0.0374)

    0.9484

    (0.0436)

    1.5112 1.5484

    Sample moments of standardised observations using smoothed estimates of volatility.

    Sample moments ARSV(1) RWSV LMSV ARLMSV Mean

    Variance

    Skewness

    Kurtosis

    0.0101

    1.0000

    -0.0284

    3.5620

    0.0074

    1.0000

    -0.0441

    3.7070

    0.0113

    0.9999

    -0.0183

    3.3941

    0.0122

    0.9999

    -0.0127

    3.4380

    Aut. of squares 1

    2

    3

    4

    5

    10

    50

    100

    0.0709**

    0.0676**

    0.0306

    0.0342

    0.0335

    0.0206

    0.0011

    0.0300

    0.0894**

    0.0923**

    0.0392*

    0.0542**

    0.0464*

    0.0258

    0.0011

    0.0320

    0.0419*

    0.0378*

    0.0217

    0.0129

    0.0290

    0.0225

    -0.0011

    0.0256

    0.0455*

    0.0320

    0.0173

    0.0053

    0.0231

    0.0210

    -0.0017

    0.0249

  • 18

    Ljung-Box Test

    Q2(10)

    Q2(20)

    Q2(100)

    43.15**

    101.70**

    150.41**

    80.06**

    144.61**

    198.87**

    17.89

    71.37*

    117.67

    15.50 68.43*

    113.39

    ** Significant at 1%; * Significant at 5%

    There are other alternative estimation methods proposed for LMSV models:

    a) Methods based on state space models: Chan and Petris (2000)

    b) Bayesian procedures: Hsu and Breidt (1997)

    c) GMM: Wright (1999)

    d) Semiparametric estimation: Deo and Hurvich (1998)

    References

    Andersen, T.G., H.-J- Chung and B.E. Sorensen (1999), Efficient method of moments estimation of a stochastic volatility model: A Monte Carlo study, Journal of Econometrics, 91, 61-87. Andersen, T.G., T. Bollerslev, F.X. Diebold and H. Ebens (2001), The distribution of realized stock return volatility, Journal of Financial Economics, 61, 43-76. Andersen, T.G., T. Bollerslev, F.X. Diebold and P. Labys (2001), The distribution of realized exchange rate volatility, Journal of the American Statistical Association, 96, 42-55. Andersen, T.G., T. Bollerslev, F.X. Diebold and P. Labys (2003), Modelling and forecasting volatility, Econometrica, 71, 579-625. Arteche, J. (2004), Gaussian semiparametric estimation in long memory in stochastic volatility and signal plus noise models, 119, 131-154. Broto, C. and E. Ruiz (2004), Estimation methods for Stochastic Volatility models: A survey, Journal of Economic Surveys, 18, 613-649. Carnero, M.A., D. Pea and E. Ruiz (2004), Persistence and kurtosis in GARCH and Stochastic Volatility Models, Journal of Financial Econometrics, 2, 319-342. Clark, P.K. (1973), A subordinated stochastic process model with fixed variance for speculative prices, Econometrica, 41, 135-156. Ghysels, E., A.C. Harvey and E. Renault (1996), Stochastic Volatility, in G.S. Maddala and C.R. Rao (eds.), Handbook of Statistics, 14, North-Holland, Amsterdam.

  • 19

    Harvey, A.C. (1998), Long memory in Stochastic volatility, in J. Knight and S. Satchell (eds.), Forecasting Volatility in Financial Markets, 307-320, Butterworth-Haineman, Oxford. Harvey, A.C., E. Ruiz and N. Shephard (1994), Multivariate Stochastic Variance Models, Review of Economic Studies, 61, 247-2.