35
On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1 , Lorenzo Pozzi *2 , and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus University Rotterdam & Tinbergen Institute 3 National Bank of Belgium & Ghent University June 16, 2016 Abstract This paper investigates whether there is time variation in the excess sensitivity of aggregate consumption growth to anticipated aggregate disposable income growth using quarterly US data over the period 1953-2014. Our empirical framework contains the possibility of stickiness in aggregate consumption growth and takes into account measurement error and time aggregation. Our empirical specification is cast into a Bayesian state space model and estimated using Markov Chain Monte Carlo (MCMC) methods. We use a Bayesian model selection approach to deal with the non-regular test for the null hypothesis of no time variation in the excess sensitivity parameter. Anticipated disposable income growth is calculated by incorporating an instrumental variables estimation approach into our MCMC algorithm. Our results suggest that the excess sensitivity parameter in the US is stable at around 0.23 over the entire sample period. JEL Classification: E21, C11, C22, C26 Keywords: Excess sensitivity, time variation, consumption, income, MCMC, Bayesian model selection * Corresponding author at: Department of Economics, P.O. Box 1738, 3000 DR Rotterdam, the Netherlands, Tel:+31 (10) 4081256, Email: [email protected], Website: http://people.few.eur.nl/pozzi. 1

On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

On the stability of the excess sensitivity of aggregate

consumption growth in the US

Gerdie Everaert1, Lorenzo Pozzi∗2, and Ruben Schoonackers3

1Ghent University & SHERPPA2Erasmus University Rotterdam & Tinbergen Institute

3National Bank of Belgium & Ghent University

June 16, 2016

Abstract

This paper investigates whether there is time variation in the excess sensitivity of aggregate

consumption growth to anticipated aggregate disposable income growth using quarterly US data over

the period 1953-2014. Our empirical framework contains the possibility of stickiness in aggregate

consumption growth and takes into account measurement error and time aggregation. Our empirical

specification is cast into a Bayesian state space model and estimated using Markov Chain Monte Carlo

(MCMC) methods. We use a Bayesian model selection approach to deal with the non-regular test for

the null hypothesis of no time variation in the excess sensitivity parameter. Anticipated disposable

income growth is calculated by incorporating an instrumental variables estimation approach into our

MCMC algorithm. Our results suggest that the excess sensitivity parameter in the US is stable at

around 0.23 over the entire sample period.

JEL Classification: E21, C11, C22, C26

Keywords: Excess sensitivity, time variation, consumption, income, MCMC, Bayesian

model selection

∗Corresponding author at: Department of Economics, P.O. Box 1738, 3000 DR Rotterdam, the Netherlands, Tel:+31(10) 4081256, Email: [email protected], Website: http://people.few.eur.nl/pozzi.

1

Page 2: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

1 Introduction

Traditional permanent income and life cycle models of consumption predict that (log) real aggregate

private consumption follows a random walk (Hall, 1978). Empirical studies however have revealed that

aggregate consumption growth is excessively sensitive to anticipated disposable income growth (see e.g.,

Campbell and Mankiw, 1989, 1990, 1991). The most common interpretations given to this observation are

the occurrence of liquidity constraints (Flavin, 1985; Deaton, 1991; Ludvigson, 1999) and the prevalence

of precautionary and buffer stock savings motives (Carroll, 1992; Ludvigson and Michaelides, 2001) which

increase the weight given by consumers to current income in their consumption decisions.

A number of recent papers argue that the degree of excess sensitivity (ES) remains statistically signif-

icant but becomes of lower magnitude once other forms of aggregate consumption predictability are taken

into account (Basu and Kimball, 2002; Sommer, 2007; Kiley, 2010; Carroll et al., 2011). In particular,

Sommer (2007) and Carroll et al. (2011) show that the ES measured in quarterly US data is considerably

lower in a model that contains a ‘stickiness’ mechanism - due to, for instance, habit formation, rational

inattention or imperfect information - generating dependency of aggregate consumption growth to its

own past. Sommer (2007) further argues that it is necessary to account for measurement error and time

aggregation to obtain valid ES estimates.

In a different, somewhat older, branch of the literature the assumption that the ES parameter is

constant has been relaxed in favor of time-varying specifications (Campbell and Mankiw, 1991; McK-

iernan, 1996; Bacchetta and Gerlach, 1997; Girardin et al., 2000; Peersman and Pozzi, 2007). Some of

these studies have reported that ES has become less important during the last decades in the US (see

e.g., Bacchetta and Gerlach, 1997) and in other developed economies (see e.g., Girardin et al., 2000;

Blundell-Wignall et al., 1991). This is attributed to financial liberalization and the development of finan-

cial markets. These structural developments are thought to have improved the possibilities of consumers

to smooth consumption over time and across states of the world, i.e., by curbing the importance of credit

constraints and precautionary saving motives in consumer decisions these developments have, over time,

reduced the ES parameter. However, these studies have been conducted using restrictive empirical set-

tings, not taking into account other relevant empirical features of aggregate consumption growth data

like ‘stickiness’, measurement error and time aggregation and stochastic volatility.

The contribution of this paper is to measure potential time variation in ES using an elaborate empir-

ical framework that includes components to account for other potentially relevant features of aggregate

consumption growth: AR term (stickiness), MA terms (measurement error, time aggregation), stochastic

volatility (moderation in volatility) and a time-varying intercept (omitted variables). Regardless of the

interpretations given to the ES estimates and the still not fully resolved puzzle that they constitute, any

sound explanation brought forward must start from an adequate measurement of the puzzle. Moreover,

the accurate measurement of ES is also important insofar as these estimates are used as input in other

studies. Indeed, ES estimates are used with increasing frequency in the calibration of DSGE models that

focus on the implications of the presence of so-called ‘rule-of-thumb’ consumers for the impact of govern-

2

Page 3: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

ment expenditure shocks and productivity shocks on the macroeconomy (see Gali et al., 2007; Furlanetto

and Seneca, 2012, and references therein). While the interpretation of the ES of aggregate consumption

growth to aggregate income growth in terms of the presence of ‘rule-of-thumb consumers’ is debatable,

these papers suggest that the impact of aggregate income growth on aggregate consumption growth must

be taken into account explicitly to draw valid conclusions on the effects of shocks in DSGE models.

The paper further contributes to the literature by suggesting an appropriate methodological approach

that allows to test whether the time variation in the ES parameter is empirically relevant and that

adequately deals with all the complications that arise when estimating time-varying ES in our elaborate

empirical set-up. More specifically, our methodological approach addresses the following three issues.

First, a common feature of the ES literature is that model uncertainty is mostly ignored. Typically,

it is assumed from the outset which features of aggregate consumption growth are included in the model,

while if the ES parameter is allowed to vary over time it is assumed to evolve according to a random

walk process. Hence, time variation in ES is imposed without testing whether it is actually relevant.

In contrast, we explicitly address model uncertainty. Instead of merely including various components

and deciding whether they are constant or vary over time, we test which features of the model are

relevant and fall back to a more parsimonious specification when appropriate. This not only avoids

over-parameterization but will also provide us with information on which components are relevant and

whether the ES parameter varies over time or not. Model selection for time-varying parameter models is

a challenge, though, as it leads to a testing problem that is non-regular from a classical point of view. We

will therefore use the Bayesian stochastic model specification search outlined in Fruhwirth-Schnatter and

Wagner (2010). Their approach relies on splitting the time-varying parameter in a constant part and in

a time-varying part and introducing a stochastic binary model indicator which is one if the time-varying

part is to be included in the model and zero otherwise. In orde to select the most appropriate model,

we further add binary indicators on the other variables/components (lagged consumption growth, MA

errors,...). Using Markov Chain Monte Carlo (MCMC) methods, these stochastic binary indicators are

then sampled jointly with the unknown parameters. As we introduce in total 11 binary indicators in our

general model specification, 211 different models are considered in the selection procedure.

Second, as anticipated aggregate income growth is not observed, the ES parameter is estimated from

the relation between consumption growth and ex-post observed aggregate disposable income growth using

an instrumental variables (IV) method. Time-varying ES parameter models with endogenous regressors

have been estimated by, among others, Bacchetta and Gerlach (1997) and Peersman and Pozzi (2007)

using approximate methods. In a recent paper, Kim and Kim (2011) show that the control function

approach to IV estimation can be used to construct an exact state space representation which can then

be estimated by maximum likelihood (ML). Building on their paper, we incorporate a control function

type of approach in our MCMC algorithm to deal with endogeneity. The advantage of our Gibbs sampling

approach compared to ML estimation is that it is computationally easier to implement and, as such, does

not suffer from the numerical optimization problems inherent to ML estimation (see Kim and Kim, 2011).

Moreover, as our Bayesian IV approach relies on sampling the posterior distribution rather than on using

3

Page 4: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

asymptotic approximations, it allows for exact inference even when instruments are weak.

Third, time aggregation and measurement error in the log level of consumption imply moving average

(MA) error terms in the aggregate consumption growth equation. We follow Chib and Greenberg (1994)

who present exact methods to analyze Bayesian regression models with MA errors using MCMC sampling.

Our empirical framework is applied to US data over the period 1953 − 2014. The estimation results

suggest that the degree of ES of aggregate consumption growth to anticipated disposable income growth

for the US is stable and lies around 0.23 over the entire sample period (1954-2014). This estimated

magnitude of the ES parameter is in accordance with recent findings of Sommer (2007) and Carroll et al.

(2011) who consider an empirical framework with a constant ES parameter but with the possibility of

stickiness in aggregate consumption growth along with time aggregation and measurement error. The

lack of time variation in the ES parameter however stands in contrast to the findings reported by some

of the previous studies that investigate time-varying ES for the US (e.g., the study by Bacchetta and

Gerlach (1997) who argue that the degree of ES has dropped gradually over time). The time variation

reported in the literature has been obtained from the estimation of more restricted empirical models

however. These typically do not allow for stickiness in aggregate consumption growth, nor do they allow

for time aggregation and measurement error. Moreover, by modeling the ES parameter as a random

walk, time variation is imposed without testing whether it is actually relevant. Upon estimating more

restricted empirical models that are in line with the time-varying ES frameworks employed in previous

studies, we also find that the ES parameter has decreased over time. However, even in these restricted

settings, our Bayesian model specification search selects a model with a constant ES parameter. Our

results therefore imply that the finding of time variation in the ES parameter in the literature may be due

to specification errors and imposing time variation rather than being the outcome of genuine structural

economic developments such as financial liberalization. Our model selection approach further confirms

that there is notable stickiness in aggregate consumption growth and an MA(1) component in the error

term. The estimated coefficient on lagged consumption growth lies around 0.53, a result which is in

accordance with the findings reported recently by Sommer (2007), Kiley (2010) and Carroll et al. (2011).

The remainder of the paper is organized as follows. In Section 2 we present the benchmark theoretical

framework for aggregate consumption growth to which we add anticipated disposable income growth to

allow for time-varying ES. Section 3 outlines our empirical specification and estimation methodology.

Section 4 presents the estimation results for the US over the period 1953-2014. This consists of the

selection of a preferred model, a comparison with the literature and a number of robustness checks.

Section 5 concludes.

2 Theoretical framework

In this section we first present a benchmark model for aggregate consumption growth with AR(1) stick-

iness modeled through habit formation in consumer preferences and MA(3) errors occurring through

the presence of time aggregation and measurement error. This benchmark model is a generalization of

4

Page 5: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

the model presented by Sommer (2007) as it includes a time-varying intercept in aggregate consump-

tion growth that allows to capture and control for unspecified and/or hard-to-estimate components of

aggregate consumption growth. Following, among others, Bacchetta and Gerlach (1997) and Carroll

et al. (2011) we then add anticipated income growth to the consumption growth equation to allow for

time-varying ES of consumption to income. The section ends with a discussion.

2.1 A benchmark model

Suppose a representative permanent income consumer maximizes the stream of discounted utilities

maxEt

T∑j=0

ρjU(Ct+j ;Xt+j

), (1)

subject to a budget constraint, where Et denotes the consumer’s expectation conditional on period t

information, ρ is the discount factor, Ct is the level of period t ‘effective’ consumption and Xt is a vector

of variables that shift marginal utility at time t.1 ‘Effective’ consumption is assumed to be equal to

Ct = C∗t − γC∗t−1, (2)

where C∗t is the representative agent’s consumption level in period t such that utility depends on the

level of consumption C∗t relative to last period’s consumption level C∗t−1 with γ (where 0 ≤ γ ≤ 1 )

capturing the strength of habits. When γ = 0 habits are irrelevant and the consumer derives utility

only from the level of consumption. When γ = 1 habits are most important and the consumer derives

utility only from the change in consumption. When 0 < γ < 1 the consumer derives utility from both the

level of consumption and from the change in consumption. Hayashi (1985) and Dynan (2000) show that,

provided the real interest rate is constant and T is large, the first-order condition under time-nonseparable

preferences can be written as

Et−1

[Rρ

U ′(Ct;Xt)

U ′(Ct−1;Xt−1)

]= 1, (3)

where R is the real interest factor (which equals 1 plus the real interest rate) and U ′(Ct;Xt) = ∂U(Ct;Xt)

∂Ct.

Assuming that the utility function is of the CRRA type U(Ct;Xt) =C

1−ψt

1−ψ Xt with ψ > 0, so that

U ′(Ct;Xt) = C−ψt Xt and using this into equation (3) gives

Et−1

[Rρ

(Ct

Ct−1

)−ψXt

Xt−1

]= Et−1 [Zt] = 1, (4)

where Zt ≡ Rρ(

CtCt−1

)−ψXtXt−1

. Assuming that ∆ lnCt and ∆ lnXt are jointly conditionally normally

distributed implies that lnZt = ln(Rρ)− ψ∆ lnCt + ∆ lnXt is conditionally Gaussian as well. From the

lognormal property we can therefore write

Et−1 [Zt] = exp

[Et−1(lnZt) +

1

2Vt−1(lnZt)

]. (5)

1Examples are hours worked (see e.g., Kiley, 2010) and/or government consumption (see e.g., Evans and Karras, 1998).

5

Page 6: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

We then substitute equation (5) into equation (4) and take logs of the resulting equality to obtain

Et−1

(∆ lnCt

)=

1

2ψσ2

lnZ,t +1

ψln(Rρ) +

1

ψµ∆ lnX,t, (6)

where σ2lnZ,t ≡ Vt−1[lnZt] = Vt−1[∆ lnXt]− 2ψcovt−1[∆ lnXt,∆ lnCt] + ψ2Vt−1[∆ lnCt] and µ∆ lnX,t =

Et−1 (∆ lnXt). This can also be written as

∆ lnCt =1

2ψσ2

lnZ,t +1

ψln(Rρ) +

1

ψµ∆ lnX,t + εt, (7)

where εt =[∆ lnCt − Et−1

(∆ lnCt

)]. Note that the innovation εt implicitly reflects the revision in

permanent income of the optimizing permanent income consumer (Campbell and Mankiw, 1990).

After collecting the first three terms of equation (7) into a time-varying variable β0t and using the

approximation ∆ lnCt = ∆ ln(C∗t − γC∗t−1) ≈ ∆ lnC∗t − γ∆ lnC∗t−1 as suggested by Muellbauer (1988)

and Dynan (2000), we obtain

∆ lnC∗t = β0t + γ∆ lnC∗t−1 + εt, (8)

where β0t = 12ψσ

2lnZ,t + 1

ψ ln(Rρ) + 1ψµ∆ lnX,t. As such, β0t is a catch-all term that allows to capture and

control for unspecified (i.e., the conditional mean and variance of ∆ lnXt) and/or hard-to-estimate (i.e.,

the conditional variance Vt−1[∆ lnCt] in σ2lnZ,t) components of aggregate consumption growth.2,3

2.2 Time aggregation and measurement error

Assuming that consumption decisions are made more frequently than the intervals at which consumption

is measured causes time aggregation. Sommer (2007) shows that in combination with the presence of

habits this induces an MA(2) structure in the error term εt of true aggregate consumption growth ∆ lnC∗t ,

where ‘true’ refers to consumption in the absence of measurement error and other transitory components.

This implies that equation (8) should be written as

∆ lnC∗t = β0t + γ∆ lnC∗t−1 + θξ (L) ξt, (9)

with Et−1(ξt) = 0 and θξ (L) = 1 + θξ1L+ θξ2L2 an MA(2) lag polynomial with parameters being compli-

cated functions of γ.4

Sommer (2007) further notes that aggregate consumption data measured at the quarterly level are

often plagued by measurement error and other sources of transitory consumption fluctuations. He argues

that measurement error is best modeled as an MA(1) structure in the log-level of consumption. This

implies that measured aggregate consumption growth ∆ lnCt should be modeled as the sum of of true

2Note that in Sommer (2007) the intercept in consumption growth is assumed to be constant (β0t = β0 for all t) ashis model includes no preference shifters Xt while he implicitly assumes a constant conditional variance of consumptiongrowth.

3While the model is derived under a constant interest factor R, the presence of β0t in the model implicitly also allowsto control for a time-varying interest rate in the estimation.

4The proof in Sommer (2007) is based on a model with a constant mean in aggregate consumption growth whereas we havethe time-varying intercept β0t in the model. We can however rewrite equation (8) as (∆ lnC∗

t −µt) = γ(∆ lnC∗t−1−µt−1)+εt

with µt = β0t + γµt−1 so that his proof equally applies to our model.

6

Page 7: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

aggregate consumption growth ∆ lnC∗t given by equation (9) and an MA(2) error term

∆ lnCt = ∆ lnC∗t + θζ (L) ςt, (10)

where Et−1(ςt) = 0 and θζ (L) = 1 + θζ1L+ θζ2L2 an MA(2) lag polynomial. This specification allows for

‘general’ measurement error as it encompasses the simpler case where measurement error is assumed to

be a white noise error term in the log-level of consumption. In this case of ‘classical’ measurement error,

there is only an MA(1) error term in equation (10) with θζ1 = −1.

Combining equations (9) and (10) to obtain an expression containing only measured consumption

growth ∆ lnCt yields

∆ lnCt = β0t + γ∆ lnCt−1 + θζ (L) (ςt − γςt−1) + θξ (L) ξt,

= β0t + γ∆ lnCt−1 + θ (L) εt, (11)

with Et−1(εt) = 0 and θ (L) = 1 + θ1L+ θ2L2 + θ3L

3 an MA(3) lag polynomial.5

2.3 Time-varying excess sensitivity

Empirical studies have demonstrated that aggregate consumption growth is excessively sensitive to an-

ticipated disposable income growth (see e.g., Campbell and Mankiw, 1989, 1990, 1991). We follow the

standard approach to test for this type of ES by adding anticipated income growth to the consumption

growth model. In line with, among others, Bacchetta and Gerlach (1997) we allow the ES parameter to

vary over time. More specifically, we extend our benchmark in equation (11) to

∆ lnCt = β0t + β1tEt−1(∆ lnYt) + γ∆ lnCt−1 + θ (L) εt, (12)

where ∆ lnYt is aggregate (total) disposable income growth and β1t is the time-varying ES parameter.6

Note that as the innovation εt implicitly reflects the revision in permanent income of the optimizing

permanent income consumer of Sections 2.1 and 2.2, it can be correlated with the variable ∆ lnYt (also

see Campbell and Mankiw, 1990). We explicitly deal with this issue when estimating equation (12) as

discussed in Section 3 below.

2.4 Discussion

We use the model in equation (12) to test for time-varying ES of aggregate consumption growth with

respect to anticipated income growth against the benchmark model in equation (11). This deviates from

the past literature in a number of ways. First, ES is usually tested against a framework where aggregate

consumption growth is either white noise (i.e., the standard random walk model) or an MA(1) process

5In fact, θ (L) εt is the sum of the two independent MA processes θζ (L) (ςt − γςt−1) and θξ (L) ξt, the first being oforder 3 and the second of order 2. Following Hamilton (1994), we can write the sum of two independent MA processes asan MA process of which the order equals that of the highest order process in the sum.

6This approach differs somewhat from the method followed by Campbell and Mankiw (1990) and Kiley (2010) whotest for ES by writing down aggregate consumption growth as the sum or (weighted) average of consumption growth ofoptimizing permanent income consumers and consumption growth of current income (‘rule-of-thumb’) consumers.

7

Page 8: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

if time aggregation and classical measurement error are taken into account (see e.g., Bacchetta and

Gerlach, 1997). The results of Sommer (2007) and Carroll et al. (2011) show however that allowing

for stickiness (i.e., the dependence of aggregate consumption growth on its own lag) is important when

testing for the ES of consumption to income. In our model this is incorporated by introducing a habit

formation mechanism. Second, as noted by Sommer (2007), allowing for classical measurement error may

not be sufficient so that a more general framework with MA(3) errors is called for. Third, the time-

varying variable β0t controls for all potentially omitted variables that may affect aggregate consumption

growth (i.e., marginal utility shifters such as hours worked and government consumption, the conditional

variance of consumption growth which reflects a potential precautionary savings motive, the interest rate

that captures potential inter-temporal substitution effects).

It should be noted that this empirical model provides a convenient framework for the measurement

of ES but does not provide an explanation for the observed ES. It contains features that allow us to

capture all the characteristics of aggregate consumption growth data: AR term (stickiness), MA terms

(measurement error, time aggregation), a time-varying intercept (omitted variables) and the anticipated

income growth regressor (excess sensitivity). However, the model does not allow us to infer much about

the underlying consumer behavior that is responsible for the ES observed in aggregate data. The ES term

of the model is arbitrarily added to the benchmark model without an explanation given to it. In fact, the

same is true for stickiness in observed in aggregate consumption growth data. Habits are a convenient way

to allow for stickiness in aggregate consumption growth but we acknowledge that alternative mechanisms

could be responsible for this stickiness. Alternative mechanisms by which stickiness could be incorporated

into aggregate consumption growth are rational inattention (see e.g., Reis, 2006; Carroll et al., 2011) and

imperfect information (see e.g., Pischke, 1995). As a result, we do not claim that we can estimate

deep structural parameters from this model and hence we avoid giving a structural interpretation to our

estimates (e.g., in the remainder of the paper we label the coefficient on the AR(1) term in aggregate

consumption growth as the ‘stickiness’ parameter rather than the ‘habit’ parameter).

3 Empirical methodology

In this section we outline our empirical specification and econometric methodology to estimate the model

for aggregate consumption growth outlined in Section 2.

3.1 Empirical specification

We use equation (12) to test for time-varying ES of aggregate consumption growth with respect to income

growth against the benchmark model in equation (11). The empirical implementation of equation (12)

requires a number of further assumptions. These are outlined below.

8

Page 9: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

Time-varying parameters

The parameters β0t and β1t in equation (12) are allowed to change over time according to a random walk

process

βi,t+1 = βit + ηit, ηit ∼ N (0, σ2ηi), (13)

with i = 0, 1. This allows for a very flexible evolution of the parameters βit over time. A random walk

process is particularly convenient to capture smooth transition and structural change. It is in line with

the smooth transition specifications used in previous research on time-varying ES (e.g., the deterministic

trend specification by Campbell and Mankiw (1991) or the random walk specification by Bacchetta and

Gerlach (1997)). The reason for this is that time variation in ES is often (implicitly or explicitly) linked

to financial liberalization and in particular to consumer credit availability which is a variable many would

argue is - at least in the US - structurally trended and relatively smooth.7 As a robustness check, reported

in Section 4.6 below, we change the process for the ES parameter β1t into the sum of a random walk and

a stationary AR(1) process to also allow for higher frequency (e.g., cyclical) movements in ES.

Anticipated income growth and instrumental variables

Anticipated income growth Et−1 (∆ lnYt) is not observed, but can be estimated by assuming that observed

income growth ∆ lnYt is linearly related to a set of forecasting variables Zt known to the consumer at

time t− 1, so that

∆ lnYt = Ztδ + νt, (14)

where Zt is uncorrelated with εt in equation (12) and where νt is an error term that is unpredictable at

time t− 1, i.e., Et−1(νt) = 0. Taking expectations Et−1 of equation (14) gives

Et−1 (∆ lnYt) = Ztδ + Et−1νt = Ztδ, (15)

and substituting this in equation (12) yields

∆ lnCt = β0t + β1tZtδ + γ∆ lnCt−1 + θ (L) εt. (16)

Equation (16) is an instrumental variables (IV) type of regression model with instruments Zt where δ is

estimated using the first stage regression model (14).

Variance-covariance structure of the error terms

As noted in Section 2.3, shocks to aggregate income and consumption growth are correlated. This implies

that equations (14) and (16) are seemingly-unrelated regression equations with cross-equation parameter

7We refer to the evolution of three different credit availability indicators for the US as presented by Carroll et al. (2012)(their Figure 15), all of which are smooth trending variables.

9

Page 10: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

restrictions. Moreover, as there is a large literature documenting a sharp decline in the volatility of

macroeconomic variables since the mid 1980s (see e.g. Kim and Nelson, 1999; Stock and Watson, 2003,

for early contributions to this ‘Great Moderation’ literature), we allow the variances of the error terms

νt and εt to change over time. We therefore model the covariance matrix of the error terms νt and εt as

Ωt =

[σ2νt ρσνtσεt

ρσνtσεt σ2εt

], (17)

with σ2νt the variance of νt, σ

2εt the variance of εt and ρ the correlation between νt and εt. Using a

Cholesky factorization of Ωt we can write[νt

εt

]=

[σνt 0

ρσεt σεt√

1− ρ2

][µ1t

µ2t

], (18)

where µ1t and µ2t are i.i.d. error terms with unit variance. Replacing εt in equation (16) by

εt =ρσεtσνt

νt + σεt√

1− ρ2µ2t, (19)

yields

∆ lnCt = β0t + β1tZtδ + γ∆ lnCt−1 + ρν∗t + θ (L)µt, (20)

with ν∗t = σεtθ (L) νt /σνt and where µt = σεt√

1− ρ2µ2t is an error term that is not correlated with any

other error term in the model. Note that, as a result, we have σ2εt = σ2

µt

/(1− ρ2) .

Stochastic volatility

The time variation in the variances is modeled as stochastic volatility in the error terms of equations (14)

and (20), i.e., νt ∼ N(0, ehνt

)and µt ∼ N

(0, ehµt

), with

hj,t+1 = hjt + υjt, υjt ∼ N (0, σ2υj ), (21)

for j = ν, µ. The random walk assumption for the stochastic volatility component hjt is common in

macroeconometrics (see e.g. Primiceri, 2005) as it allows for highly persistent or permanent changes in

volatility, like the Great Moderation or the upsurge in volatility during the recent Great Recession.

3.2 Bayesian estimation and stochastic model specification search

Bayesian estimation

Equation (20) is a control function type of IV regression model similar to the one outlined by Kim

and Kim (2011) to deal with endogeneity in a time-varying parameter model8 further extended to allow

8An apparent difference is that our specification includes anticipated income growth, which is a predetermined regressorcalculated using instrumental variables, while the specification of Kim and Kim (2011) includes an endogenous regressor.Besides the error term from the first step auxiliary regression, the control function equation in Kim and Kim (2011) thereforeincludes the endogenous regressor instead of the term Ztδ that we include in our equation (20).

10

Page 11: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

for stochastic volatility. However, instead of using their two-step or joint maximum likelihood (ML)

procedure to estimate the non-linear model implied by equations (14) and (20), we use the Gibbs sampler

as outlined in Section 3.3 below. The first advantage over Kim and Kim (2011)’s ML approach is that our

sampling approach can much more easily take into account that Ztδ is a generated regressor in equation

(20). Second, the extension to stochastic volatility further complicates the estimation problem, making

Kim and Kim (2011)’s ML approach infeasible. Moreover, as our Bayesian approach relies on sampling

the posterior distribution rather than using asymptotic approximations, it allows for exact inference even

when the instruments Zt are weak.

Stochastic model specification search

The unobserved components (UC) model outlined above encompasses various potentially relevant features

of aggregate consumption growth. Next to allowing for stickiness, ES to income and MA errors, we

model the intercept, the ES parameter and the income and consumption growth innovation variances as

random walks to capture structural changes in the economy. However, instead of merely including these

components and deciding whether they are constant or vary over time, we test which features of the model

are relevant and fall back to a more parsimonious specification when appropriate. This not only avoids

over-parameterization but will also provide us with information on which of the model’s components are

relevant, with the key question being whether the ES parameter β1t varies vary over time or is constant.

Model selection for UC models is a challenge, though, as it leads to testing problems that are non-regular

from a classical point of view. Although β1t can be filtered using the Kalman filter and the variance

of the innovations σ2η1 can be estimated using ML, the question whether the time variation is relevant

implies testing σ2η1 = 0 against σ2

η1 > 0, which is a non-regular testing problem as the null hypothesis lies

on the boundary of the parameter space.

Our Bayesian approach is also convenient to deal with this non-regular testing problem. In a Bayesian

setting, each of the potential models is assigned a prior probability and the goal is to derive the posterior

probability for each model conditional on the data. The modern approach to Bayesian model selection

is to apply Markov Chain Monte Carlo (MCMC) methods by jointly sampling model indicators and

parameters. The stochastic variable selection procedure of George and McCulloch (1993), for instance, is

designed to identifying non-zero regression effects in models with observed variables. In a recent article,

Fruhwirth-Schnatter and Wagner (2010) extend this to model selection in UC models. Their approach

relies on a non-centered parameterization of the UC model in which (i) binary stochastic indicators for

each of the model components are sampled together with the parameters and (ii) the standard inverse

gamma (IG) prior for the variances of innovations to the time-varying components is replaced by a

Gaussian prior centered at zero for the standard deviations. As we introduce 11 binary indicators in

our general model specification, a total of 211 different models are considered in the model selection

procedure. The exact implementation applied to our model is outlined below.

11

Page 12: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

Non-centered parameterization

Fruhwirth-Schnatter and Wagner (2010) argue that a first piece of information on the hypothesis whether

the variance of innovations to a state variable is zero or not can be obtained by considering a non-centered

parameterization. This implies rearranging the data generating process for the time-varying parameters

βit in equation (13) to

βit = βi0 + σηiβ∗it, (22)

with β∗i,t+1 = β∗it + η∗it, β∗i0 = 0, η∗it ∼ N (0, 1), (23)

for i = 0, 1 and where βi0 is the initial value of βit if this coefficient varies over time (σηi > 0) while being

the constant value of βit if there is no time variation (σηi = 0). A crucial aspect of the non-centered

parameterization is that it is not identified as the signs of σηi and β∗it can be changed without changing

their product in equation (22). As a result of the non-identification, the likelihood function is symmetric

around 0 along the σηi dimension. When βit is time-varying (σ2ηi > 0), the likelihood function is bimodal

with modes −σηi and +σηi . For σ2ηi = 0, the likelihood function is unimodal around zero. As such,

allowing for non-identification of σηi provides useful information on whether σ2ηi > 0.

Likewise, the stochastic volatility terms in equation (21) can be reparameterized to

hjt = hj0 + συjh∗jt, (24)

with h∗j,t+1 = h∗jt + υ∗jt, h∗j0 = 0, υ∗jt ∼ N (0, 1), (25)

for j = ν, µ.

The parsimonious specification

The non-centered parameterization is very useful for model selection as the transformed components β∗it

and h∗jt - in contrast to βit and hjt - do not degenerate to static components if their innovation variances

are zero. For instance, if the variance σ2η1 = 0 then ση1 = 0 in equation (22) and the time-varying part

β∗1t of the ES parameter β1t drops from the model. In this case the constant ES parameter is represented

by β10. Hence, in the non-centered parameterization the question whether the ES parameter varies over

time or not can be expressed as a standard variable selection problem. To this aim, Fruhwirth-Schnatter

and Wagner (2010) introduce a binary indicator that can be either 0 or 1. If the binary indicator is zero,

β∗1t is excluded from the model and ση1 is set to zero. If the binary indicator is 1, β∗1t is included in the

model and ση1 is an unconstrained unknown parameter that is estimated from the data.

As our aim is to do selection on all features of the model, we introduce binary indicators on each of the

observed variables and unobserved components. This leads to the following parsimonious specification of

12

Page 13: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

the model in equation (20)

∆ lnCt = (ιβ00β00 + ιβ0tση0β∗0t) + (ιβ10β10 + ιβ1tση1β

∗1t)Ztδ + ιγγ∆ lnCt−1

+ ιρρν∗t +

q∑l=1

ιθlθlµt−l + µt, (26)

where ιk is a binary indicator that is either 0 or 1 for k = β00, β0t, β10, β1t, γ, ρ, θ1, . . . , θq. Similarly, the

parsimonious specification for the stochastic volatility components is given by

hjt = hj0 + ιhjσυjh∗jt, (27)

for j = ν, µ with ιhj a binary indicator that is either 0 or 1.

Gaussian priors centered at zero for all model parameters

Our Bayesian estimation approach requires choosing prior distributions for the model parameters. When

using the standard IG prior distribution for the variance parameters, the choice of the shape and scale

hyperparameters that define this distribution has a strong influence on the posterior distribution when

the true value of the variance is close to zero. More specifically, as the IG distribution does not have

probability mass at zero, using it as a prior distribution tends to push the posterior density away from

zero. This is of particular importance when estimating the variances σ2ηi of the innovations to the time-

varying parameters βit as for these components we want to decide whether they are relevant or not.

As σηi is a regression coefficient in equation (26), a further important advantage of the non-centered

parameterization is that it allows us to replace the standard IG prior on the variance parameter σ2ηi by a

Gaussian prior centered at zero on σηi . Centering the prior distribution at zero makes sense as, for both

σ2ηi = 0 and σ2

ηi > 0, σηi is symmetric around zero. Fruhwirth-Schnatter and Wagner (2010) show that

the posterior density of σηi is much less sensitive to the hyperparameters of the Gaussian distribution

and is not pushed away from zero when σ2ηi = 0.

As we also do model selection on the observed variables in the model, we do not want prior information

for the corresponding parameters to drive the variables that are selected. We therefore use a Gaussian

prior distribution centered at zero, i.e., N (0, V0), for all unknown coefficients and standard deviations.

The variance V0 is chosen such that the prior distribution has support over the range of relevant parameter

values. Hence, we set V0 = 0.52 for the coefficients (β00, β10, h10, h20, γ, ρ, θ, δ) and V0 = 0.22

for the standard deviations (ση0 , ση1 , συ1 , συ2). For the binary indicators ιk, we choose a Bernoulli

prior distribution where each indicator has a prior probability p0 of being included in the model, i.e.,

p(ιk = 1) = p0. In our baseline scenario we set p0 = 0.50 but we will also report results for p0 = 0.25 and

p0 = 0.75.

13

Page 14: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

3.3 Gibbs sampler

Taken together, equations (14) and (26) constitute the observation equations of the UC model, with

the unobserved states β∗it and hjt evolving according to the state equations (23), (25) and (27). In a

standard linear Gaussian SS model, the Kalman filter can be used to filter the unobserved states from

the data and to construct the likelihood function such that the unknown parameters can be estimated

using ML. However, the stochastic volatility components introduced in Section 3.1 and the stochastic

model specification search outlined in Section 3.2 imply a non-regular estimation problem for which the

standard approach via the Kalman filter and ML is not feasible. Instead, we use the Gibbs sampler which

is an MCMC method to simulate draws from the intractable joint and marginal posterior distributions

of the unknown parameters and the unobserved states using only tractable conditional distributions.

Intuitively, this amounts to reducing the complex non-linear model into a sequence of blocks for subsets

of parameters/states that are tractable conditional on the other blocks in the sequence.

For notational convenience, define the time-varying parameter vector β∗t = (β∗0t, β∗1t), the stochastic

volatilities h∗t =(h∗νt, h

∗µt

), the unknown parameter vectors φ1 = (hν0, hµ0, συν , συµ), φ2 = (β00, β10, γ,

ρ, ση0 , ση1), θ = (θ1, . . . , θq) and φ = (φ1, φ2, θ, δ) and the model indicators Mφ1= (ιhν , ιhµ), Mφ2

= (ιβ00,

ιβ10, ιγ , ιρ, ιβ0t

, ιβ1t), Mθ = (ιθ1 , . . ., ιθq ) and M = (Mφ1

, Mφ2, Mθ). Further let Dt= (∆ lnCt, ∆ lnYt,

Zt) be the data vector. Stacking observations over time, we denote D = DtTt=1 and similarly for β∗ and

h∗. The posterior density of interest is then given by f (φ, β∗, h∗,M |D). Building on Fruhwirth-Schnatter

and Wagner (2010) for the stochastic model specification part, on Chib and Greenberg (1994) for the

moving average (MA) part and on Kim et al. (1998) for the stochastic volatility part, our MCMC scheme

is as follows:

1. Sample the binary indicators M and the parameters φ

(a) Sample the first step parameters δ from f (δ|h∗, D) using the regression model in equation (14)

and calculate Ztδ and νt. Then ν∗t can be calculated using ν∗t = θ (L) νteh2t/2

(√1− ρ2eh1t

)−1

.

(b) Sample the binary indicators Mθ and the MA coefficients θ using equation (16) with the

centered components β0t and β1t calculated using equation (22):

i. Sample the binary indicators Mθ from f (Mθ|φ1, φ2, δ, β∗, h∗, D) marginalizing over the

parameters θ for which variable selection is carried out.

ii. Sample the unrestricted parameters in θ from f (θ|φ1, φ2, δ, β∗, h∗,Mθ, D) while setting

the restricted parameters θl (for which ιθl = 0) equal to 0.

(c) Sample the binary indicators Mφ2 and the second step parameters φ2 using the parsimonious

non-centered parameterization in equation (26):

i. Sample the binary indicators Mφ2 from f (Mφ2 |φ1, θ, δ, β∗, h∗, D) marginalizing over the

parameters φ2 for which variable selection is carried out.

ii. Sample the unrestricted parameters in φ2 from f (φ2|φ1, θ, β∗, h∗,Mφ2 , D) while setting

the restricted parameters (for which the corresponding binary indicator is 0) equal to 0.

14

Page 15: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

(d) Sample the binary indicators Mφ1and the stochastic volatility parameters φ1 using the parsi-

monious non-centered parameterization in equation (27):

i. Sample the binary indicators Mφ1from f (Mφ1

|φ2, θ, δ, β∗, h∗, D) marginalizing over the

parameters φ1 for which variable selection is carried out.

ii. Sample the unrestricted parameters in φ1 from f (φ1|φ2, θ, β∗, h∗,Mφ1 , D) while setting

the restricted parameters συj (for which ιhj = 0) equal to 0.

2. Sample the time-varying parameters β∗ and the stochastic volatilities h∗:

(a) Sample the unrestricted (i.e., for which ιβit = 1) time-varying parameters in β∗ from f(β∗| φ,

h∗, Mφ2 , D) using the non-centered parameterization in equation (26). The restricted time-

varying parameters (for which ιβit = 0) in β∗ are sampled directly from their prior distribution

using equation (23).

(b) Sample the unrestricted (i.e., for which ιhjt = 1) stochastic volatilities h∗ from f(h∗|φ, β∗,

Mφ1, D) using a logarithmic transformation of the squared error terms νt and µt in equations

(14) and (26) respectively. The restricted time-varying parameters (for which ιhj = 0) in h∗

are sampled directly from their prior distribution using equation (25).

3. Perform a random sign switch for σηi and β∗itTt=1 and for συj and h∗jtTt=1, e.g., ση1 and β∗1tTt=1

are left unchanged with probability 0.5 while with the same probability they are replaced by −ση1and −β∗1tTt=1.

Given an arbitrary set of starting values, sampling from these blocks is iterated J times and, after a

sufficiently large number of burn-in draws B, the sequence of draws (B + 1, ..., J) approximates a sample

from the virtual posterior distribution f (φ, β∗, h,M |D). Details on the exact implementation of each of

the blocks can be found in Appendix A. The results reported below are based on 15000 iterations, with

the first 5000 draws discarded as a burn-in sequence.

4 Results

4.1 Data

To estimate the empirical model, quarterly US data are available for all variables used over the period

1953Q1 − 2014Q4. The use of lagged instruments reduces the sample size with 2 observations so that

the effective sample period is 1953Q3 − 2014Q4. This implies an effective sample size equal to 246 ob-

servations.9 Where necessary, data are seasonally adjusted. For Ct, we use real per capita expenditures

on nondurables and services (excluding clothing and footwear). For Yt, real per capita personal dispos-

able income is used. Both variables are put in real terms using the deflator of nondurables and services

(excluding clothing and footwear) with base year 2009 = 100. With respect to the estimation of antici-

pated income growth, external instruments are also used. In particular, we include as instruments lagged

9Note that for some variables instruments are formed using further lags (i.e., a third and fourth lag). As data for thesevariables are typically available well before 1953Q1, these deeper lags do not reduce the effective sample size.

15

Page 16: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

changes in the short run interest rate for which we take the three-month nominal T-bill rate, lagged

changes in stock prices proxied using the S&P 500 index, lags of the inflation rate where the inflation

rate is calculated as the log change in the CPI index, lags of the level of consumer confidence for which

we take the index of consumer sentiment, and lags of the change in the unemployment rate.

Data for nominal expenditures on nondurables and services (excluding clothing and footwear), for

nominal personal disposable income and for the corresponding deflator are taken from the National

Product and Income Accounts (NIPA). Population data are taken from the OECD Quarterly National

Accounts. For the three-month T-bill rate, data are taken from the Board of Governors. Data for the

S&P 500 index comes from Sommer (2007) and is updated with data from Thomson Reuters Datastream.

Finally, for the CPI index and the unemployment rate, data are taken from the Bureau of Labor Statistics

while for the consumer sentiment index, the University of Michigan index is used.

4.2 Instrument sets

Since anticipated income growth Et−1(∆ lnYt) is not observed, a set of instrumental variables Zt is used

to estimate it. With expectations being formed at t− 1, any variable lagged one or more periods serves

as a valid instrument. Note that in the face of unmodeled MA(q) errors, variables lagged at least t−q−1

periods are in principle required to avoid that these instruments are correlated with the MA error term.

Our empirical approach, however, explicitly models and estimates the MA component in the error terms,

so that we do not face this issue. As Campbell and Mankiw (1990) and Kiley (2010), among others, show

that the choice of instruments can be critically important, the evaluation of the time variation in the ES

parameter is reported using three different instrument sets.

A first instrument set, Z1, is based upon Campbell and Mankiw (1990) and includes a constant,

lags 1-4 of disposable income growth and consumption growth, a lagged error correction term, i.e., log

consumption minus log disposable income (see also Campbell, 1987), lags 1-2 of changes in the short-term

interest rate and lags 1-2 of changes in stock prices. To construct the second instrument set, Z2, we add

the first and second lag of the inflation rate to the instruments contained in Z1 (see Fuhrer, 2000; Kiley,

2010). Following Sommer (2007), the third instrument set Z3 is constructed by adding lags 1-2 of the

consumer sentiment index and of the change in the unemployment rate.10 In column 3 of Table 1 we

report the average explanatory power of the different instrument sets in explaining ∆ lnYt. For all sets

we find an average adjusted R2 over all iterations of about 30%, which is very reasonable.

4.3 Stochastic model specification search

We estimate the full model and investigate which components of the model are relevant. The number of

binary indicators in the full model equals 11 if - following the framework presented in Section 2 - we set

the order of the MA process in the error term of the consumption equation equal to q = 3. We then have

the set of binary indicators (ιβ00, ιβ0t

, ιβ10, ιβ1t

, ιγ , ιρ, ιθ1 , ιθ2 , ιθ3 , ιhν , ιhµ).

10Our results are very similar when we use lags 1-4 for the external instruments (i.e., for the instruments not obtainedfrom disposable income and consumption) instead of lags 1-2.

16

Page 17: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

We first set all binary indicators equal to 1 to generate posterior distributions for all model parameters.

In Figure 1 we present the means of the posterior distributions of the time-varying parameters, i.e., the

intercept β0t and the ES parameter β1t. We also present the means of the stochastic volatilities, i.e.,

the time-varying variance of innovations to income, exp(hνt), and to consumption, exp(hµt). The 90%

highest posterior density (HPD) intervals are also presented in the figure. The results show some variation

in β0t and β1t, but especially for the ES parameter there is no clear (downward) trend. With respect

to volatility, there is a reduction in the variance of shocks to income and consumption growth around

the 1980s, while volatility in income growth goes up again at the end of the sample. Figure 1 further

plots the posterior distributions of the standard deviations σηi (for i = 0, 1) and συj (for j = ν, µ) of the

innovations to the time-varying components. When the posterior distribution of the standard deviation is

bimodal, with low or no probability mass at zero, this can be taken as a first indication of time variation

in the considered component. When looking at the results, we only observe clear bimodality in the

posterior distribution of συµ , i.e., for the stochastic volatility component in consumption growth. For the

time-varying intercept, the evidence is less clear. While the posterior distribution of ση0 appears to have

two modes, it also has a considerable probability mass at zero. For the innovations to the ES parameter

and to the stochastic volatility component in income growth, the posterior distributions of ση1 and συν

are clearly unimodal at zero. This suggests that these components are stable over time.

Figure 1: Posterior distributions of the time-varying parameters and the standard deviations of their innovationsin the unrestricted model (all binary indicators set to 1)

1960 1970 1980 1990 2000 2010

0.0

0.2

0.4

0.6

(a) TV intercept: β0t

1960 1970 1980 1990 2000 2010

−0.2

0.0

0.2

0.4

(b) TV ES: β1t

1960 1970 1980 1990 2000 20100.0

0.5

1.0

1.5

2.0

(c) SV in ∆ lnYt: exp(hνt)

1960 1970 1980 1990 2000 20100.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

(d) SV ∆ lnCt: exp(hµt)

−0.10 −0.05 0.00 0.05 0.10

(e) ση0

−0.10 −0.05 0.00 0.05 0.10

(f) ση1

−0.40 −0.20 0.00 0.20 0.40

(g) συν

−0.40 −0.20 0.00 0.20 0.40

(h) συµ

Notes: (i) The top row reports the mean of the posterior distribution of the time-varying components together with the 90

% highest posterior density (HPD) interval. The bottom row plots the posterior distributions of the standard deviations of

the innovations to the time-varying components. (ii) Figures are presented for the results using instrument set Z3 but are

similar when using instrument sets Z1 and Z2.

As a more formal model selection, we next sample the stochastic binary indicators together with the

other parameters in the model. Table 1 displays the individual posterior probabilities for the binary

indicators being one. These probabilities are calculated as the average selection frequencies over all

17

Page 18: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

iterations of the Gibbs sampler. We report results for the three instrument sets discussed in Section

4.2. In the baseline specification, we assign a prior probability p0 = 0.5 to each of the binary indicators

being one. As noted by Scott and Berger (2010), however, this prior choice does not provide multiplicity

control for the Bayesian variable selection. When the number of possible variables is very large and each

of the binary indicators has a prior probability of 0.5, the fraction of selected variables will very likely be

around 0.5. To check the robustness of our results to the prior p0, the middle and lower part of Table 1

reports posterior inclusion probabilities when p0 is set to 0.25 and 0.75.

The results show that the variables with constant parameters (intercept, anticipated income growth,

lagged consumption growth and the transformed error term from the first stage regression) have a very

high probability of being included. Hence, these are relevant components of the model. Apart from that,

the MA(1) term shows up with the highest posterior inclusion probability across the different settings,

with a probability around 67% in the benchmark case p0 = 0.50. The other MA components have lower

inclusion probabilities, only going up to 60% when setting p0 = 0.75. Interestingly, the time-varying

parameters β∗0t and β∗1t always have low posterior inclusion probabilities. In the benchmark case where

p0 = 0.5, they are below 20%, while for p0 = 0.75 they increase a bit but are still not above 30%. This

shows that time-varying ES is not a very relevant feature of the model. Finally, stochastic volatility only

seems to have some relevance for the consumption growth. In line with the bimodality in the posterior

distribution of συµ , h∗µt has a posterior inclusion probability going up to over 70% when setting p0 = 0.75.

The inclusion probability for stochastic volatility in income growth is never larger then 53%.

Table 1: Posterior inclusion probabilities for the binary indicators on the various model components over differentpriors and instrument sets

Instrument Posterior inclusion probabilities

Prior set CP TVP SV MA

Z R2adj β00 β10 γ ρ β∗0t β∗1t h∗νt h∗µt θ1 θ2 θ3

p0 = 0.25 Z1 0.29 0.94 0.68 0.96 1.00 0.13 0.12 0.07 0.07 0.44 0.33 0.33

Z2 0.31 0.95 0.81 0.99 1.00 0.09 0.11 0.07 0.08 0.47 0.27 0.27

Z3 0.32 0.95 0.87 0.98 1.00 0.09 0.08 0.05 0.10 0.46 0.26 0.26

p0 = 0.50 Z1 0.29 0.95 0.77 0.97 1.00 0.16 0.15 0.21 0.31 0.67 0.41 0.44

Z2 0.31 0.97 0.86 0.98 1.00 0.15 0.14 0.20 0.26 0.66 0.37 0.39

Z3 0.32 0.96 0.91 0.98 1.00 0.16 0.11 0.21 0.33 0.67 0.34 0.35

p0 = 0.75 Z1 0.29 0.97 0.84 0.98 1.00 0.29 0.24 0.52 0.71 0.83 0.53 0.59

Z2 0.31 0.96 0.90 0.98 1.00 0.30 0.23 0.53 0.70 0.85 0.51 0.55

Z3 0.32 0.96 0.93 0.99 1.00 0.27 0.20 0.47 0.74 0.87 0.49 0.52

Note: The instrument set Z1 includes lags 1-4 of disposable income growth and consumption growth, a lagged error correction

term and lags 1-2 of the change in stock prices and the change in the short term interest rate. Instrument set Z2 adds the first and

second lag of the inflation rate. Instrument set Z3 further includes lags 1-2 of the consumer sentiment index and of the change in

the unemployment rate.

Besides inclusion probabilities for the individual components, the model selection search also allows

to compute overall model probabilities. The introduction of 11 binary indicators leads to 211 possible

models. As 6 out of the 11 binary indicators have relatively low individual probabilities, most models

18

Page 19: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

have a probability of zero. More specifically, across the three considered values for the prior inclusion

probability p0, only 8 models were selected in at least 5% of the draws. The posterior probabilities for

these models are reported in Table 2, ordered according to the probabilities in the benchmark case where

p0 = 0.5. In fact, in this case only 3 models are selected in more than 5% of the Gibbs iterations. The

favored model includes a constant intercept term (β00), the constant ES parameter (β10), the constant

stickiness parameter (γ), correlation between the errors of the income and consumption equation (ρ), and

an MA(1) component in the error terms. The second best model adds stochastic volatility (h∗µt) to the

consumption equation, while the third best model adds an MA(3) term. In line with their low individual

inclusion probabilities, the favored models never include the time-varying parameters β∗0t and β∗1t. This

finding is robust over different values for p0. When setting p0 = 0.25, the same model is favored as for

p0 = 0.50, but the second best model now excludes the MA(1) component. When setting p0 = 0.75,

more components tend to be selected, with the best model now also including stochastic volatility in

consumption.

Table 2: Posterior model probabilities over different priors p0

Model Posterior probabilities

CP TVP SV MA

β00 β10 γ ρ β∗0t β∗1t h∗νt h∗µt θ1 θ2 θ3 p0 = 0.25 p0 = 0.50 p0 = 0.75

1 1 1 1 0 0 0 0 1 0 0 0.2042 0.1367 0.0188

1 1 1 1 0 0 0 1 1 0 0 0.0380 0.0769 0.0633

1 1 1 1 0 0 0 0 1 0 1 0.0498 0.0564 0.0190

1 1 1 1 0 0 0 0 0 0 0 0.1733 0.0372 0.0009

1 1 1 1 0 0 0 0 0 1 0 0.0833 0.0350 0.0026

1 1 1 1 0 0 0 1 1 0 1 0.0046 0.0284 0.0504

1 1 1 1 0 0 1 1 1 0 0 0.0012 0.0233 0.0558

1 1 1 1 0 0 0 0 0 0 1 0.0553 0.0174 0.0011

Notes: (i) Under the heading ‘Model’, a ‘1’ indicates that the component is included while a ‘0’indicates that it is excluded from the model. (ii) The presented results are obtained using instrumentset Z3 but are similar when using instrument sets Z1 and Z2.

4.4 Parsimonious model

Based on the model selection results reported above, the model that fits the data best is given by

∆ lnCt = β00 + β10Ztδ + γ∆ lnCt−1 + ρν∗t + θ(L)µt,

with θ(L) = 1 + θ1L an MA(1) lag polynomial. The variances of the error terms of the consumption

growth and income growth equations are exp(hν0) and exp(hµ0) respectively.

Descriptive statistics on the posterior distributions of the model parameters in this parsimonious

19

Page 20: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

model are given in the first column of Table 3. The results show that the time invariant intercept in

the equation for aggregate consumption growth lies between 0.07 and 0.22.11 For the ES parameter the

90% HPD interval ranges from 0.10 to 0.35 with a mean value of 0.23. The stickiness parameter ranges

between 0.35 and 0.68 with a mean of 0.53. Both these results are similar to findings reported by Carroll

et al. (2011) and Kiley (2010)12 and show that even though there is no time variation in the ES of private

consumption to disposable income, it remains an important factor contributing to the predictability of

aggregate consumption growth. With respect to the presence of autocorrelation of the MA form in the

error term, the posterior distribution points to a negative MA coefficient. The results also indicate that

it is important to control for correlation between shocks to income growth and shocks to consumption

growth as the 90% HPD interval of ρ ranges between 0.23 and 0.47.

Table 3: Posterior distributions of model parameters over alternative models

Parsimonious Restricted models Robustness checks

model RM1 RM2 RoC1 RoC2 RoC4(b)

Linear regression coefficientsβ00 (intercept) 0.1352 0.3614 0.3094 0.1207 0.0983 0.1428

[0.07;0.22] [0.28;0.44] [0.06;0.53] [0.05;0.20] [-0.01;0.23] [0.07;0.22]

β10 (ES) 0.2267 0.2858 0.2190 0.2161 0.3414 0.2066[0.10;0.35] [0.16;0.42] [0.09;0.35] [0.10;0.34] [0.09;0.58] [0.10;0.29]

γ (stickiness) 0.5280 - - 0.5556 0.4870 0.5292[0.35;0.68] [0.38;0.72] [0.30;0.65] [0.33;0.67]

ρ (endogeneity) 0.3470 0.3464 0.3375 0.3327 0.3524 0.3148[0.23;0.47] [0.23;0.48] [0.22;0.46] [0.22;0.46] [0.21;0.48] [0.19;0.44]

Variance error termexp(hµ0) 0.1355 0.1435 0.1264 0.1798 0.3486 0.1330

[0.11;0.17] [0.11;0.19] [0.10;0.17] [0.11;0.31] [0.27;0.45] [0.11;0.17]

Innovations to non-centered components|ση0 | (TV intercept β∗0t) - - 0.0535 - - -

[0.01;0.11]

|συµ | (SV h∗µt) - - - 0.0828 - -[0.02;0.18]

MA coefficientθ1 -0.2791 0.2134 0.1596 -0.2762 -0.3084 -0.2699

[-0.46;-0.08] [0.09;0.33] [0.02;0.28] [-0.46;-0.08] [-0.48;-0.11] [-0.44;-0.08]

Notes: (i) The parsimonious model refers to the model selected by the stochastic specification search (see model in the firstrow of Table 2). It restricts the intercept and the ES parameter to be stable over time and includes stickiness in aggregateconsumption growth and an MA(1) error term with stable volatility. RM1 refers to our Restricted Model 1, which removesstickiness in aggregate consumption growth from the parsimonious model. RM2 adds a time-varying intercept to RM1.RoC1 is the second best model according to the stochastic specification search in the benchmark case where p0 = 0.5 (seemodel in the second row of Table 2). It adds stochastic volatility in the consumption equation to the parsimonious model.RoC2 refers to a robustness check using the broader total personal consumer expenditures measure. RoC4(b) refers to amodel with time-varying coefficients δt in the first stage income growth regression (14). (ii) Reported are the posteriormean together with the 90% HPD interval (in square brackets). (iii) Results are presented using instrument set Z3 but aresimilar when using instrument sets Z1 and Z2.

11To interpret the magnitude of this estimate, note that data for aggregate consumption growth are expressed in percentageterms (as numbers between 0 and 100). The mean of aggregate consumption growth (on a quarterly basis) over the fullsample period equals 0.48%.

12More specifically, for the US Carroll et al. (2011) find an ES parameter of 0.27 and a sticky consumption growthcoefficient of 0.55 when using an instrumental variable approach. Kiley (2010) reports an ES parameter of 0.3 and acoefficient on lagged consumption growth of 0.65 when using their preferred instrument set (which includes lagged levels ofinflation).

20

Page 21: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

4.5 Comparison to the literature

In this section we use our methodology to test for time variation in the ES parameter in two more restric-

tive frameworks employed previously in the literature. First, McKiernan (1996) uses a framework without

stochastic volatility, without stickiness in aggregate consumption growth, with a constant intercept and

with an MA(1) error term in the consumption growth equation. This is our Restricted Model 1 (RM1).

Second, Bacchetta and Gerlach (1997) use the same framework but add a time-varying intercept in the

consumption growth equation. This is our Restricted Model 2 (RM2).

Figure 2: Time-varying ES parameter and corresponding posterior distribution of the standard error to itsinnovation in RM1 (binary indicators set to 1)

1960 1970 1980 1990 2000 2010−0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(a) Time-varying ES parameter β1t

−0.10 −0.05 0.00 0.05 0.10

(b) Posterior distribution ση1

Notes: (i) The presented results are for using instrument set Z3 but are similar when using

instrument sets Z1 and Z2. (ii) See notes to Table 3 for the specification of RM1.

Figure 3: Time-varying ES parameter and corresponding posterior distribution of the standard error to itsinnovation in RM2 (binary indicators set to 1)

1960 1970 1980 1990 2000 2010−0.2

0.0

0.2

0.4

0.6

(a) Time-varying ES parameter β1t

−0.10 −0.05 0.00 0.05 0.10

(b) Posterior distribution ση1

Notes: (i) The presented results are for using instrument set Z3 but are similar when using

instrument sets Z1 and Z2. (ii) See notes to Table 3 for the specification of RM2.

In panel (a) of Figures 2 and 3 we present the posterior mean of the time-varying ES parameter β1t

and its 90% HPD interval. In line with the results reported by Bacchetta and Gerlach (1997) and others,

we observe a decrease from the 1970s onward in the ES of aggregate consumption growth to anticipated

disposable income growth in both restricted models. However, even in these restricted settings, drawing

the conclusion that the ES parameter varies over time may be inappropriate. A drawback of previous

21

Page 22: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

research is that time variation is imposed without testing whether it is actually relevant. Our methodology

allows to explicitly test whether there is time variation in the ES parameter. As can be seen from panel

(b) in Figures 2 and 3, very little or no bimodality can be observed in the posterior distributions of the

innovations to the ES parameter, suggesting that time variation in the ES parameter is not very relevant.

This is confirmed by the posterior inclusion probabilities reported in Table 4. For both restricted models

RM1 and RM2, the inclusion probability of the time-varying component β∗1t of the ES parameter is very

low (at most 21%). Hence, our results suggest that the finding of time variation in the ES parameter

in the literature may be due to specification errors and imposing time variation rather than being the

outcome of genuine structural economic developments such as financial liberalization.

Table 4: Posterior inclusion probabilities: restricted models and robustness checks

Restricted models Robustness checks

RM1 RM2 RoC2 RoC3 RoC4(a) RoC4(b)

Variables with constant parameters

β00 (intercept) incl incl 0.5568 0.9856 0.9742 0.9784

β10 (ES) incl incl 0.9060 0.9160 0.9442 0.9652

γ (stickiness) excl excl 0.9716 0.9516 0.9928 0.9962

ρ (endogeneity) incl incl 0.9998 0.9996 0.9996 0.9996

Time-varying parameters

β∗0t (intercept) excl 0.8387 0.0660 0.2152 0.1410 0.1248

β∗1t (ES RW) 0.2092 0.1471 0.0966 0.0998 0.1194 0.1334

β∗2t (ES AR1) excl excl excl 0.2098 excl excl

δ∗t (first step) excl excl excl excl 0.3476 incl

Stochastic volatilities

h∗νt (income eq.) excl excl 0.1990 0.2154 0.2094 0.2244

h∗µt (consumption eq.) excl excl 0.4410 0.3630 0.2678 0.3086

MA components

θ1 incl incl 0.8206 0.6172 0.6936 0.6772

θ2 excl excl 0.4672 0.4010 0.3548 0.3700

θ3 excl excl 0.2864 0.4040 0.3582 0.3662

Notes: (i) The reported results are for instrument set Z3 and prior inclusion probability p0 = 0.50.

(ii) “incl” means the component is always included in the models, while “excl” means that the

component is excluded from the model. (iii) The posterior inclusion probability for δ∗t is the mean of

the inclusion probabilities over the different first step coefficients. (iii) See notes to Table 3 for the

specification of RM1, RM2 and RoC2. RoC3 refers to a model allowing the time-varying part of the

ES parameter to be the sum of a random walk and a stationary AR(1) component. RoC4(a) refers

to a model allowing (i.e. sampling the corresponding binary indicators) for time-varying coefficients

δt in the first stage income growth regression (14), while in RoC4(b) this time variation is always

included.

4.6 Robustness checks

In this section we investigate the robustness of our results to adding stochastic volatility to the consump-

tion equation, to the use of an alternative consumption measure, to the use of an alternative process for

the ES parameter and to the relaxation of the stability assumption imposed on the coefficients in the

first stage income growth regression. We discuss each in turn.

22

Page 23: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

Adding stochastic volatility h∗µt to the consumption equation (RoC1)

Although not being selected in the parsimonious model, the results reported in Section 4.3 show that

stochastic volatility may still be a relevant component in the consumption equation. First, there is

bimodality in the distribution of συµ , which signals that the variance of innovations to the stochastic

volatility component h∗µt may be non-zero. Second, in the benchmark case where p0 = 0.50, h∗µt is

included in the second model selected by the stochastic specification search. When setting p0 = 0.75, it

is even included in the best model. As a first robustness check we therefore add stochastic volatility to

the consumption equation. The descriptive statistics reported in Table 3 (column RoC1) show that the

posterior distributions of the model parameters are not sensitive to this extension.

Alternative consumption measure (RoC2)

Next, we check the robustness of our results with respect to the consumption measure used. Instead of

using nondurables and services consumption, we use the broader total personal consumer expenditures

measure for Ct (in real per capita terms) which includes nondurables, services and durables (data are

also from NIPA). Data for consumption and income are now also put in real terms using the deflator for

this broader consumption measure. This alternative measure for consumption is interesting because it

has been used in studies investigating ES in countries other than the US. For most countries outside the

US there is limited availability of more narrow consumption measures. The results reported in Table 4

(column RoC2) show that the posterior inclusion probability of the time-varying component of the ES

parameter β1t is less than 10%. This implies that, also when consumption Ct is measured using a broader

measure, the ES of aggregate consumption growth with respect to anticipated income growth is stable

over time. With respect to the estimated parameters in this setting, from Table 3 (column RoC2) we

observe that the magnitude of the ES parameter β10 is higher (i.e., about 0.34) compared to our preferred

parsimonious model, a result which is in line with higher ES estimates obtained in other studies that use

this broader consumption measure in international settings (see e.g., Everaert and Pozzi, 2014). The

higher variance of the error term in the consumption growth equation reflects the higher volatility of the

growth rate in total personal consumer expenditures observed also in raw data.

Alternative process for excess sensitivity β1t (RoC3)

We next investigate the robustness of our results with respect to the process that is assumed for the

ES parameter β1t. As explained in Section 3.1, our random walk assumption is in line with the smooth

transition specifications used in previous research where time-varying ES is (implicitly or explicitly)

linked to financial liberalization and consumer credit availability. A simple random walk process however

- while appropriate to capture structural change - may be less appropriate to capture more transitory

(e.g., cyclical) movements in the ES parameter. We therefore check the robustness of our results to the

specification of the ES parameter as the sum of a random walk and a stationary AR(1) process, with

the AR(1) component added to capture higher frequency movements in ES. To this end, we replace

23

Page 24: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

(ιβ10β10 + ιβ1t

ση1β∗1t) in the non-centered specification (26) by (ιβ10

β10 + ιβ1tση1β

∗1t + ιβ2t

ση2β∗2t) where

β∗2t follows a stationary AR(1) process

β∗2,t+1 = %β∗2t + η∗2t, β∗20 = 0, |%| < 1, η∗2t ∼ N (0, 1).

The results reported in Table 4 (column RoC3) show that using this alternative specification does not

lead to important differences in the posterior inclusion probabilities of the model components compared

to the inclusion probabilities reported in Table 1 (compare with the case for p0 = 0.5 and instrument

set Z3). The posterior inclusion probability of the random walk component β∗1t is again very low (10%).

Additionally, the posterior inclusion probability of the added AR(1) component β∗2t is only 21% so that

there is no strong evidence that supports the presence of transitory time variation in the ES parameter.

Time-varying parameters δt in the income growth ∆ lnYt equation (RoC4)

As a final robustness check, we test the stability of the coefficients δ of the first stage income growth

regression (14). We have so far assumed that the vector δ is constant but this may not be the case. To

the extent that time variation is present in the first stage regression, this instability - if unaccounted for

- could be transmitted to the estimates obtained when estimating the consumption growth equation. We

therefore replace equation (14) by

∆ lnYt = Ztδt + νt,

where δt is the now a time-varying vector of coefficients. It includes S coefficients with S being the

total number of instruments (including the intercept). For instrument set Z3 we have S = 20. The

non-centered parameterization of a coefficient δst (where s = 1, ..., S) is then given by

δst = δs0 + ιδsσωsδ∗st,

with δ∗s,t+1 = δ∗st + ω∗st, δ∗s0 = 0, ω∗st ∼ N (0, 1),

with binary indicators ιδs from which posterior inclusion probabilities for the time-varying components

δ∗st can be calculated. The results reported in column RoC4(a) of Table 4 show that the average (over the

variables in the instrument vector Z3) posterior inclusion probability equals 35%. Moreover, allowing for

time variation in δt does not really affect the inclusion probabilities for the other components. Despite

the overall stability of the first stage regression coefficients there is, however, some heterogeneity across

the individual inclusion probabilities. Out of the 20 coefficients in Z3, 17 are selected to vary over time

in less than 50% of the draws, for another two this is between 50% and 60% while for one variable (i.e.,

the one period lagged growth rate of income) this goes up to 93%. In column RoC4(b) of Table 4 we

therefore also report results imposing that all coefficients in δt vary over time, i.e., we set the binary

indicators ιδs equal to 1 for s = 1, ..., S. Again, the inclusion probabilities for the other components are

not affected. Also the posterior distributions of the other parameters are unaffected (see column RoC4(b)

24

Page 25: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

in Table 3). Highly similar results are obtained when modeling only those coefficients on Z3 as varying

over time that actually have a high probability to vary.

5 Conclusions

The recent literature measures the ES of aggregate consumption growth to anticipated aggregate dispos-

able income growth using an elaborate empirical framework that contains both the possibility of stickiness

in aggregate consumption growth and an adequate treatment of measurement error and time aggregation.

However, this framework has only served as a benchmark for measuring ES under the assumption that

the degree of ES is constant. This paper contributes to the literature by measuring time-varying ES in

this elaborate empirical framework using quarterly US data over the period 1953− 2014. We estimate a

Bayesian UC model using MCMC methods. We test whether the time variation is statistically relevant

using the Bayesian model selection approach recently suggested by Fruhwirth-Schnatter and Wagner

(2010). To control for endogeneity in our framework, we further incorporate a control function type

approach to instrumental variables estimation in our MCMC algorithm.

Our estimation results suggest that the degree of ES of aggregate consumption growth to anticipated

disposable income growth for the US is stable and lies around 0.23 over the entire sample period. This

magnitude is in accordance with recent findings of Sommer (2007) and Carroll et al. (2011) who consider

a similar empirical framework but with a constant ES parameter instead of a time-varying one. It is

considerably lower than the magnitude of around 0.5 reported in earlier studies (see e.g., Campbell and

Mankiw, 1989, 1990, 1991) in which ES estimates are obtained in more restrictive empirical frameworks.

The lack of time variation in the ES parameter stands in contrast to the findings reported in previous

studies that investigate time-varying ES for the US (e.g., the study by Bacchetta and Gerlach (1997)

who argue that the degree of ES has dropped gradually over time). The time variation reported in the

literature has however been obtained from the estimation of more restricted empirical models however

using methodologies that impose time variation rather than test for it.

The importance of accurate measurement of ES and hence of the finding of a stable and relatively

small ES parameter is twofold. First, since the ES puzzle is not yet fully resolved, it is important that the

evolution and magnitude of the puzzle is well established. Only then a valid (micro based) interpretation

can be given to it in subsequent work. Second, ES estimates are used in the calibration of DSGE models

in a number of recent papers that argue that the impact of current income on aggregate consumption

influences the effects of productivity and government expenditures shocks on the economy. It is therefore

a bit surprising that many of the papers in that literature still use the ES estimates reported in earlier

research while recent research such as Carroll et al. (2011) - and now also this paper - clearly demonstrate

that these estimates are too high.

25

Page 26: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

6 Acknowledgements

The authors would like to thank Tino Berger, Joshua Chan, Markus Eberhardt, Freddy Heylen and Glenn

Rayp for their constructive comments and suggestions. The paper has also benefited from discussions

with participants at the 23rd Symposium of the Society for Nonlinear Dynamics and Econometrics and

various seminars and workshops.

References

Bacchetta, P. and Gerlach, S. (1997). Consumption and credit constraints: international evidence. Journal

of Monetary Economics, 40:207–38.

Basu, S. and Kimball, M. (2002). Long-run labor supply and the elasticity of intertemporal substitution

for consumption. Mimeo, University of Michigan.

Blundell-Wignall, A., Browne, F., and Cavaglia, S. (1991). Financial liberalization and consumption

behaviour. OECD Working Papers, 81.

Campbell, J. (1987). Does saving anticipate declining labor income - An alternative test of the ppermanent

income hypothesis. Econometrica, 55(6):1249–1273.

Campbell, J. and Mankiw, N. (1989). Consumption, income, and interest rates: reinterpreting the time

series evidence. NBER Macroeconomics Annual, 4:185–216.

Campbell, J. and Mankiw, N. (1990). Permanent income, current income and consumption. Journal of

Business and Economic Statistics, 8:265–79.

Campbell, J. and Mankiw, N. (1991). The response of consumption to income: a cross-country investi-

gation. European Economic Review, 35:723–67.

Carroll, C. (1992). The buffer-stock theory of saving: some macroeconomic evidence. Brookings Papers

on Economic Activity, 2:61–156.

Carroll, C., Slacalek, J., and Sommer, M. (2011). International evidence on sticky consumption growth.

Review of Economics and Statistics, 93:1135–1145.

Carroll, C. D., Slacalek, J., and Sommer, M. (2012). Dissecting saving dynamics: measuring wealth,

precautionary, and credit effects. IMF Working Paper, WP/12/219.

Carter, C. and Kohn, R. (1994). On gibbs sampling for state space models. Biometrika, 81:541–53.

Chib, S. and Greenberg, E. (1994). Bayes inference in regression models with ARMA(p,q) errors. Journal

of Econometrics, 64:183–206.

De Jong, P. and Shephard, N. (1995). The simulation smoother fro time series models. Biometrika,

82:339–50.

26

Page 27: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

Deaton, A. (1991). Savings and liquidity constraints. Econometrica, 59:1,221–48.

Dynan, K. (2000). Habit formation in consumer preferences: evidence from panel data. American

Economic Review, 90(3):391–406.

Evans, P. and Karras, G. (1998). Liquidity constraints and the substitutability between private and

government consumption: the role of military and non-military spending. Economic Inquiry, 36:203–

14.

Everaert, G. and Pozzi, L. (2014). The predictability of aggregate consumption growth in OECD coun-

tries: a panel data analysis. Journal of Applied Econometrics, 29(3):431–453.

Flavin, M. (1985). Excess sensitivity of consumption to current income: liquidity constraints or myopia?

Canadian Journal of Economics, 38:117–36.

Fruhwirth-Schnatter, S. and Wagner, H. (2010). Stochastic model specification search for Gaussian and

partial non-Gaussian state space models. Journal of Econometrics, 154(1):85–100.

Fuhrer, J. (2000). Habit formation in consumption and its implications for monetary-policy models.

American Economic Review, 90(3):367–390.

Furlanetto, F. and Seneca, M. (2012). Rule-of-thumb consumers, productivity and hours. Scandinavian

Journal of Economics, 114(2):658–79.

Gali, J., Lopez-Salido, D., and Valles, J. (2007). Understanding the effects of government spending on

consumption. Journal of the European Economic Association, 5:227–270.

George, E. and McCulloch, R. (1993). Variable selection via gibbs sampling. Journal of the American

Statistical Association, 88:881–889.

Girardin, E., Sarno, L., and Taylor, M. (2000). Private consumption behaviour, liquidity constraints and

financial deregulation in france: a nonlinear analysis. Empirical Economics, 25:351–368.

Hall, R. (1978). Stochastic implications of the life cycle-permanent income hypothesis: theory and

evidence. Journal of Political Economy, 86:971–87.

Hamilton, J. (1994). Time series analysis. Princeton.

Hayashi, F. (1985). The permanent income hypothesis and consumption durability: analysis based on

Japanese panel data. Quarterly Journal of Economics, 100(4):1083–1113.

Kiley, M. (2010). Habit persistence, nonseparability between consumption and leisure, or rule-of-thumb

consumers: which accounts for the predictability of consumption growth? Review of Economics and

Statistics, 92:679–683.

27

Page 28: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

Kim, C.-J. and Nelson, C. R. (1999). Has the US Economy Become More Stable? A Bayesian Approach

Based on a Markov-Switching Model of the Business Cycle. Review of Economics and Statistics,

81(4):608–616.

Kim, S., Shephard, N., and Chib, S. (1998). Stochastic Volatility: Likelihood Inference and Comparison

with ARCH Models. Review of Economic Studies, 65(3):361–93.

Kim, Y. and Kim, C.-J. (2011). Dealing with endogeneity in a time-varying parameter model: joint

estimation and two-step estimation procedures. Econometrics Journal, 14:487–497.

Ludvigson, S. (1999). Consumption and credit: a model of time-varying liquidity constraints. Review of

Economics and Statistics, 81(3):434–47.

Ludvigson, S. and Michaelides, A. (2001). Does buffer-stock saving explain the smoothness and excess

sensitivity of consumption. American Economic Review, 91(3):631–47.

McKiernan, B. (1996). Consumption and the credit market. Economics Letters, 51:83–88.

Muellbauer, J. (1988). Habits, rationality and myopia in the life cycle consumption function. Annales

d’Economie et de Statistique, 9:47–70.

Peersman, G. and Pozzi, L. (2007). Business cycle fluctuations and excess sensitivity of private consump-

tion. Economica, 75:1–10.

Pischke, J. (1995). Individual income, incomplete information and aggregate consumption. Econometrica,

63:805–40.

Primiceri, G. (2005). Time varying structural vector autoregressions and monetary policy. Review of

Economic Studies, 72:821–852.

Reis, R. (2006). Inattentive consumers. Journal of Monetary Economics, 53:1761–1800.

Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-bayes multiplicity adjustment in the variable-

selection problem. The Annals of Statistics, 38(5):2587–2619.

Sommer, M. (2007). Habit formation and aggregate consumption dynamics. The B.E. Journal of Macroe-

conomics - Advances, 7:article 21.

Stock, J. H. and Watson, M. W. (2003). Has the Business Cycle Changed and Why? In NBER Macroe-

conomics Annual 2002, Volume 17, NBER Chapters, pages 159–230. National Bureau of Economic

Research, Inc.

Ullah, A., Vinod, H., and Singh, R. (1986). Estimation of linear models with moving average disturbances.

Journal of Quantitative Economics, 2:137–152.

28

Page 29: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

Appendix A Gibbs sampling algorithm

In this appendix we provide details on the implementation of the Gibbs sampling algorithm outlined in

Section 3.3 to jointly sample the binary indicators M , the parameters φ, the time-varying parameters β∗

and the stochastic volatilities h.

Block 1: Sampling the binary indicators M and the parameters φ

For notational convenience, let us define a general regression model

y = xmbm + θ (L) e, e ∼ N (0,Σ) , (A-1)

where y = (y1, . . . , yT )′

is a (T × 1) vector stacking observations on the dependent variable yt, x =

(x′1, . . . , x′T )′

is a (T × k) unrestricted predictor matrix with typical row xt, b is a (k × 1) unrestricted

parameter vector, θ (L) is a lag polynomial of order q and Σ is a diagonal matrix with elements σ2et on

the diagonal that may vary over time to allow for heteroscedasticity of a known form. The restricted

predictor matrix xm and restricted parameter vector bm exclude those elements in x and b for which the

corresponding binary indicator in the m is zero, with m being a subset of the model M .

The MA(q) errors in equation (A-1) imply a model that is non-linear in the parameters. As suggested

by Ullah et al. (1986) and Chib and Greenberg (1994), conditional on θ a linear model can be obtained

from a recursive transformation of the data. For t = 1, . . . T let

yt = yt −q∑i=1

θiyt−i, with yt = 0 for t ≤ 0, (A-2)

xt = xt −q∑i=1

θixt−i, with xt = 0 for t ≤ 0, (A-3)

and further for j = 1, . . . , q

ωjt = −q∑i=1

θiωj,t−i + θt+j−1, with ωjt = 0 for t ≤ 0, (A-4)

where θs = 0 for s > q. Equation (A-1) can then be transformed as

y = xmbm + ωλ+ e = wm Φm + e, (A-5)

where w = (x, ω) with ω = (ω′1, . . . , ω′T )′ and ωt = (ω1t, . . . , ωqt), Φm =

(bm′ , λ′

)′and λ = (e0, . . . , e−q+1)

initial conditions that can be estimated as unknown parameters.

Conditional on θ and σ2et, equation (A-5) is a standard linear regression with observed variables y

and wm and heteroscedastic normal errors with known covariance matrix Σ. A naive implementation of

the Gibbs sampler would be to first sample the binary indicators in m from f (m |Φ,Σ, y, w) and next

Φm from f (Φm |m ,Σ, y, w). However, this approach does not result in an irreducible Markov chain as

whenever an indicator in m equals zero, the corresponding coefficient in Φ is also zero which implies that

29

Page 30: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

the chain has absorbing states. Therefore, as in Fruhwirth-Schnatter and Wagner (2010) we marginalize

over the parameters Φ when sampling m and next draw Φm conditional on the binary indicators in m .

The posterior distribution f (m |Σ, y, w) can be obtained using Bayes’ Theorem as

f (m |Σ, y, w) ∝ f (y |m ,Σ, w ) p (m) , (A-6)

with p (m) being the prior probability of m and f (y |m ,Σ, w ) the marginal likelihood of the regression

model (A-5) where the effect of the parameters Φ has been integrated out. Under the normal conjugate

prior

p (Φm ) = N (Φm0 , V

m0 ) , (A-7)

the closed form solution of the marginal likelihood is given by

f (y |m ,Σ, w ) ∝ |Σ|−0.5 |V m

T |0.5

|V m0 |0.5

exp

(− y′Σ−1y + (Φm

0 )′(V m

0 )−1

Φm0 − (Φm

T )′(V mT )−1

ΦmT

2

), (A-8)

where the posterior moments ΦmT and V m

T can be calculated as

ΦmT = Bm

T

((wm )

′Σ−1y + (V m

0 )−1

Φm0

), (A-9)

V mT =

((wm )

′Σ−1wm + (V m

0 )−1)−1

. (A-10)

Following George and McCulloch (1993) we use a single-move sampler in which the binary indicators ιk

in m are sampled recursively from the Bernoulli distribution with probability

p (ιk = 1 |ι−k,Σ, y, w ) =f (ιk = 1 |ι−k,Σ, y, w )

f (ιk = 0 |ι−k,Σ, y, w ) + f (ιk = 1 |ι−k,Σ, y, w ), (A-11)

for k depending on the particular subset m of M . We further randomize over the sequence in which the

binary indicators are drawn.

Conditional on m , the posterior distribution of Φm is given by

p (Φm |m ,Σ, y, w) = N (ΦmT , V

mT ) , (A-12)

with posterior moments bmT and V m

T given in equations (A-9)-(A-10).

Block 1(a): Sampling the first step parameters δ and calculating Ztδ and ν∗t

Conditional on the stochastic volatility component hνt, equation (14) can be written in the general

notation of equation (A-1) as: yt = ∆ lnYt, xmt = xt = Zt, b

m = b = δ and θ(L) = 1 such that θ(L)et = νt

and the elements in Σ are given by σ2et = ehνt . In the notation of equation (A-5) we further have yt = yt,

wmt = wt = xt and Φm = Φ = δ. The parameters in δ can then be sampled from the posterior distribution

30

Page 31: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

in equation (A-12) and used to calculate Et−1 (∆ lnYt) = Ztδ and

ν∗t =θ (L) (∆ lnYt − Ztδ) ehµt/2√

1− ρ2ehνt/2,

conditional on θ, ρ, hνt and hµt.

Block 1(b): Sampling the binary indicators Mθ and the MA coefficients θ

Conditional on the parameters δ and φ2, on the time-varying coefficients β∗t , on the stochastic volatility

hµt and on the binary indicators Mφ2 , equation (16) can be written in the general notation of equation (A-

1) as: yt = ∆ lnCt, xt = (1, Ztδ, β∗0t, β

∗1tZtδ, ∆ lnCt−1), b = (β00, β10, ση0 , ση1 , γ) and θ(L)et = θ(L)εt,

such that the elements in Σ are given by σ2et = σ2

εt = ehµt/

(1− ρ2) . The values of the binary indicators

in the subset Mφ2of M then imply the restricted x

Mφ2t and bMφ2 .

Under the normal conjugate prior p (θ) = N(bθ0, V

θ0

), the exact conditional distribution of θ is

p(θ|ΦMφ2 ,Σ, y, x

)∝

T∏t=1

exp

(−et (θ)

2

2σ2et

)× exp

(−1

2

(θ − bθ0

)′ (V θ0)−1 (

θ − bθ0))

, (A-13)

where et (θ) = yt (θ) − wmt (θ) ΦMφ2 is calculated from the transformed model in equation (A-5) further

conditioning on the initial conditions λ to obtain ΦMφ2 =(bMφ2

′, λ′)′

.

Direct sampling of θ using equation (A-13) is not possible, though, as et (θ) is a non-linear function of

θ. To solve this issue, Chib and Greenberg (1994) propose to linearize et (θ) around θ∗ using a first-order

Taylor expansion

et (θ) ≈ et (θ∗)−Ψt (θ − θ∗) , (A-14)

where Ψt = (Ψ1t, . . . ,Ψqt) is a 1 × q vector including the first-order derivatives of et (θ) evaluated at θ∗

obtained using the following recursion

Ψit = −et−i (θ∗)−q∑j=1

θ∗jΨi,t−j , (A-15)

where Ψit = 0 for t ≤ 0. An adequate approximation can be obtained by choosing θ∗ to be the non-linear

least squares estimate of θ conditional on the other parameters in the model, which can be obtained as

θ∗ = argminθ

T∑t=1

(et (θ))2, (A-16)

Conditioning on θ∗, equation (A-14) can then be rewritten as an approximate linear regression model

et (θ∗) + Ψtθ∗ ≈ Ψtθ + et (θ) , (A-17)

with dependent variable et (θ∗)+Ψtθ∗ and explanatory variables Ψt. As such, the approximate expression

in (A-17) can be written in the general notation of equation (A-1) by setting yt = et (θ∗)+Ψtθ∗, xt = Ψt,

31

Page 32: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

b = θ and θ(L) = 1 such that θ(L)et = εt and σ2et = σ2

εt = ehµt/

(1− ρ2) . The values of the binary

indicators in the subset Mθ of M imply the restricted xMθt and bMθ . In the notation of equation (A-5) we

further have yt = yt, wmt = xMθ

t and Φm = bMθ .

The binary indicators in Mθ can then be sampled from the posterior distribution in equation (A-6) with

marginal likelihood calculated from equation (A-8). Next, we sample θMθ using a Metropolis-Hastings

(MH) step. Suppose θMθ (i) is the current draw in the Markov chain. To obtain the next draw θMθ (i+ 1),

first draw a candidate θMθ (c) using the following normal proposal distribution

q(θMθ |θ∗,Σ, y, x

)∼ N

(ΦMθT , V Mθ

T

), (A-18)

with posterior moments ΦMθT and V Mθ

T calculated using equations (A-9)-(A-10). The MH step then implies

a further randomization which amounts to accepting the candidate draw θMθ (c) with probability

α(θMθ (i), θMθ (c)

)= min

p(θMθ (c)|ΦMθ ,Σ, y, x

)p (θMθ (i)|ΦMθ ,Σ, y, x)

q(θMθ (i)|θ∗,Σ, y, x

)q (θMθ (c)|θ∗,Σ, y, x)

, 1

. (A-19)

If θMθ (c) is accepted, θMθ (i + 1) is set equal to θMθ (c) while if θMθ (c) is rejected, θMθ (i + 1) is set equal

to θMθ (i). Note that the unrestricted θ is restricted to obtain θMθ by excluding θl when ιθl = 0. In this

case θl is not sampled but set equal to zero.

Block 1(c): Sampling the binary indicators Mφ2and the second step parameters φ2

Conditional on β∗t , δ and ν∗t , equation (26) can be written in the general notation of equation (A-1) setting

yt = ∆ lnCt, xt = (1, Ztδ, β∗0t, β

∗1tZtδ, ∆ lnCt−1, ν∗t ), b = (β00, β01, ση0 , ση1 , γ, ρ) and θ(L)et = θ(L)µt,

such that the elements in Σ are given by σ2et = ehµt . Further conditioning on the MA parameters θ,

the unrestricted transformed variables yt and wt in equation (A-5) are obtained, with corresponding

unrestricted extended parameter vector Φ = (b′, λ′)′. The values of the binary indicators in Mφ2

then

imply the restricted wMφ2t and ΦMφ2 .

First, the binary indicators in Mφ2are sampled from the posterior distribution in equation (A-6) with

marginal likelihood calculated from equation (A-8). Second, conditional on Mφ2 , φ2 is sampled, together

with λ, by drawing ΦMφ2 from the general expression in equation (A-12). Note that the unrestricted Φ =

(β00, β10, ση0 , ση1 , γ, ρ, λ) is restricted to obtain ΦMφ2 by excluding those coefficients for which the

corresponding binary indicator in Mφ2 is zero. These restricted coefficients are not sampled but set equal

to zero.

Block 1(d): Sampling the binary indicators Mφ1and the stochastic volatility parameters φ1

Using νt ∼ N(0, ehνt

)and µt ∼ N

(0, ehµt

)together with equation (18), we can write νt = ehνt/2µ1t and

µt = ehµt/2µ2t with µ1t and µ2t i.i.d. error terms with unit variance. A key feature of these stochastic

volatility components is that they are nonlinear but can be transformed into linear components by taking

32

Page 33: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

the logarithm of their squares

ln(ehνt/2µ1t

)2

= hνt + ln (µ1t)2, ln

(ehµt/2µ2t

)2

= hµt + ln (µ2t)2, (A-20)

where ln (µ1t)2

and ln (µ2t)2

are log-chi-square distributed with expected value −1.2704 and variance

4.93. Following Kim et al. (1998), we approximate the linear models in (A-20) by an offset mixture time

series model as

gjt = hjt + εjt, (A-21)

for j = ν, µ, where gνt = ln((ehνt/2µ1t

)2+ c)

and gµt = ln((ehµt/2µ2t

)2+ c)

, with c = .001 being an

offset constant, and the distribution of εjt given by the following mixture of normals

f (εjt) =

M∑n=1

qnfN(εjt|mn − 1.2704, s2

n

), (A-22)

with component probabilities qn, means mn−1.2704 and variances s2j . Equivalently, this mixture density

can be written in terms of the component indicator variable κjt as

εjt| (κjt = n) ∼ N(mn − 1.2704, s2

n

), with Pr (κjt = n) = qn. (A-23)

Following Kim et al. (1998), we use a mixture of M = 7 normal distributions to make the approximation

to the log-chi-square distribution sufficiently good. Values for qn,mn, s2n are provided by Kim et al. in

their Table 4.

Conditional on hjt and εjt, calculated from equation (A-21) as εjt = gjt − hjt, we first sample the

mixture indicators κjt from the conditional probability mass

p (κjt = n|hjt, εjt) ∝ qnfN(εjt|hjt +mn − 1.2704, s2

n

). (A-24)

Conditional on h∗jt, gjt and κjt and using the non-centered parameterization in equation (27), equation

(A-21) can then be rewritten in the general linear regression format of (A-1) setting yt = gjt − (mκjt −1.2704), xt = (1, h∗jk), b = (hj0, συj ) and θ(L)et = εjt = εjt − (mκjt − 1.2704) for j = ν, µ. Given the

mixture distribution of εjt defined in equation (A-23), the centered error term εjt has a heteroscedastic

variance s2κjt such that Σ =diag(s2

κj1 , . . . , s2κjT ). In the notation of equation (A-5) we further have yt = yt

and the unrestricted wt = xt and Φ = b. The values of the binary indicators in Mφ1then imply the

restricted wMφ1t and ΦMφ1 . Hence, the binary indicators ιhj , for j = (ν, µ), in Mφ1

can be sampled from

the posterior distribution in equation (A-6) with marginal likelihood calculated from equation (A-8).

Second, further conditioning on Mφ1, the parameters (hj0, συj ), for j = ν, µ in φ1 are sampled by drawing

ΦMφ1 using the general expression in equation (A-12). Note that the unrestricted Φ = (hj0, συj ) is

restricted to obtain ΦMφ1 by excluding συj when ιhj = 0. In this case συj is not sampled but set equal

to zero.

33

Page 34: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

Block 2: Sampling time-varying parameters β∗ and stochastic volatilities h

In this block we use the forward-filtering and backward-sampling approach of Carter and Kohn (1994)

and De Jong and Shephard (1995) to sample the time-varying parameters β∗ and stochastic volatilities

h.

Block 2(a): Sampling the time-varying parameters β∗

Conditional on the coefficients φ2 and λ, on the stochastic volatility hµt, on the first block results Ztδ

and ν∗t and on the binary indicators Mφ2, equation (26) can be rewritten as

yt = ιβ0tση0β∗0t + ιβ1tση1β

∗1tx1t + θ (L)µt, (A-25)

with yt = ∆ lnCt − β00 − β10Ztδ − γ∆ lnCt−1 − ρν∗t and x1t = Ztδ. Using the recursive transformation

suggested by Ullah et al. (1986) and Chib and Greenberg (1994), the model in equation (A-25) can be

transformed to

yt = ιβ0tση0 β0t + ιβ1tση1 β1t + ωtλ+ µt, (A-26)

where yt and ωt = (ω1t, . . . , ωqt) are calculated (conditional on θ) from equations (A-2) and (A-4) and

similarly

β0t = β∗0t −q∑i=1

θiβ0,t−i, with β0t = 0 for t ≤ 0, (A-27)

β1t = β∗1tx1t −q∑i=1

θiβ1,t−i, with β1t = 0 for t ≤ 0. (A-28)

Substituting equation (23) in (A-27)-(A-28) yields

β0,t+1 = β∗0t −q∑i=1

θiβ0,t+1−i + η∗0t, (A-29)

β1,t+1 = β∗1tx1,t+1 −q∑i=1

θiβ1,t+1−i + x1,t+1η∗1t, (A-30)

such that the state space representation of the model in equations (A-26), (23) and (A-29)-(A-30) is given

by

yt − ωtλ =

ση︷ ︸︸ ︷[(0 ση0 0 . . . 0) (0 ση1 0 . . . 0)

] αt︷ ︸︸ ︷[α0t

α1t

]+µt, (A-31)[

α0,t+1

α1,t+1

]︸ ︷︷ ︸

αt+1

=

[T0t 0

0 T1t

]︸ ︷︷ ︸

Tt

[α0t

α1t

]︸ ︷︷ ︸αt

+

[K0t 0

0 K1t

]︸ ︷︷ ︸

Kt

[η∗0t

η∗1t

]︸ ︷︷ ︸ηt

, (A-32)

34

Page 35: On the stability of the excess sensitivity of aggregate ...geeverae/index_files/ES_TVP.pdfspeci cation is cast into a Bayesian state space model and estimated using Markov Chain Monte

with αi,t+1 given by

β∗i,t+1

βi,t+1

βit...

βi,t−(q−2)

︸ ︷︷ ︸

αi,t+1

=

1 0 . . . 0 0

xi,t+1 −θ1 . . . −θq−1 −θq0 1 0 0...

. . ....

0 0 . . . 1 0

︸ ︷︷ ︸

Tit

β∗it

βit...

βi,t−(q−2)

βi,t−(q−1)

︸ ︷︷ ︸

αit

+

1

xi,t+1

0...

0

︸ ︷︷ ︸

Kit

[η∗it

], (A-33)

for i = 0, 1 and where x0t = 1 ∀t, µt ∼ N(0, ehµt

)and η∗it ∼ N (0, 1). In line with equations (23) and

(A-27)-(A-28), each of the states in αt is initialized at zero.

Equations (A-31)-(A-32) constitute a standard linear Gaussian state space model, from which the

unknown state variables αt can be filtered using the standard Kalman filter. Sampling αt from its

conditional distribution can then be done using the multimove simulation smoother of Carter and Kohn

(1994) and De Jong and Shephard (1995). Using βi0, σηi and β∗it, the centered time-varying coefficients

βit in equation (20) can easily be reconstructed from equation (22). Note that in a restricted model

with ιβit = 0, σηi is excluded from ση and αit is dropped from the state vector αt. In this case, no

forward-filtering and backward-sampling for β∗it is needed as this can be sampled directly from its prior

distribution using equation (23).

Block 2(b): Sampling the stochastic volatilities h

Conditional on the transformed errors gjt defined under equation (A-21), on the mixture indicators κjt

and on the parameters φ1, equations (A-21) and (24)-(25) can we written in the following conditional

state space representation

[gjt −

(mκjt − 1.2704

)− hj0

]=[συj

] [h∗jt

]+[εjt

], (A-34)[

h∗j,t+1

]=[

1] [

h∗jt

]+[

1] [

υ∗jt

], (A-35)

for j = ν, µ, where εjt = εjt −(mκjt − 1.2704

)is εjt centered around zero with var(εjt) = s2

κjt and

var(υ∗jt) = 1. In line with equation (25), the random walk component h∗jt is initialized at zero.

As equations (A-34)-(A-35) constitute a standard linear Gaussian state space model, h∗jt can be filtered

using the Kalman filter and sampled using the multimove simulation smoother of Carter and Kohn (1994)

and De Jong and Shephard (1995). Using hj0, συj and h∗jt, the centered stochastic volatility component

hjt in equation (21) can easily be reconstructed from equation (24). Note that in a restricted model with

ιhj = 0, and hence συj = 0, no forward-filtering and backward-sampling for h∗jt is needed as this can be

sampled directly from its prior distribution using equation (25).

35