Bayesian Methods for DSGE models Lecture 1 Macro models...

Preview:

Citation preview

Bayesian Methods for DSGE modelsLecture 1

Macro models as data generating processes

Kristoffer NimarkCREI, Universitat Pompeu Fabra and Barcelona GSE

June 30, 2014

Bayesian Methods for DSGE models

Course admin

Lectures: Mon-Fri, 9-11, Room 40.246

Practicas: Mon-Thur, 15.00-16.15, Room 40.153

To contact me:

I knimark@crei.cat

I Office 23.408 (4th floor, silver building between Econ Deptand the park)

Bayesian Methods for DSGE models

Course overview

1. Macro models as data generating processes

2. State space models and likelihood based estimation

3. Bayesian estimation of DSGE models

4. Bayesian analysis of DSGE models

5. Structural empirical models of news, noise and imperfectinformation

Macro models as data generating processes

Today:

I Introduction to Bayesian estimation

I Theoretical models as likelihood functions

I Micro founded macro models

I Solving linear rational expectations models

Bayesian statistics

Bayesian statistics

Bayesians used to be fringe types, but Bayesian methods nowdominate applied macro economics

I This is largely due to increased computing power

Bayesian methods have several advantages:

I Facilitates incorporating information from outside of sample

I Easy to compute confidence/probabilty intervals of functionsof parameters

I Good (i.e. known) small sample properties

Why probabilistic models?

Is the world characterized by randomness?

I Is the weather random?

I Is a coin flip random?

I ECB interest rates?

It is difficult to say with certainty whether something is “truly”random.

Two schools of statistics

Probability and statistics: What’s the difference?

Probability is a branch of mathematics

I There is little disagreement about whether the theoremsfollow from the axioms

Statistics is an inversion problem: What is a good probabilisticdescription of the world, given the observed outcomes?

I There is some disagreement about how to interpretprobabilistic statements and about how we should interpretdata when we make inference about unobservable parameters

Two schools of thought

What is the meaning of probability, randomness and uncertainty?

I The classical (or frequentist) view is that probabilitycorresponds to the frequency of occurrence in repeatedexperiments

I The Bayesian view is that probabilities are statements aboutour state of knowledge, i.e. a subjective view.

The difference has implications for how we interpret estimatedstatistical models and there is no general agreement about whichapproach is “better”.

Frequentist vs Bayesian statistics

Variance of estimator vs variance of parameter

I Bayesians think of parameters as having distributions whilefrequentist conduct inference by thinking about the varianceof an estimator

Frequentist confidence intervals:

I If point estimates are the truth and with repeated draws fromthe population of equal sample length, what is the intervalthat θ lies in 95% of the time?

Bayesian probability intervals:

I Conditional on the observed data, a prior distribution of θ anda functional form (i.e. model), what is the shortest intervalthat with 95% contains θ?

Coin flip example

After flipping a coin 10 times and counting the number of heads,what is the probability that the next flip will come up heads?

I A Bayesian with a uniform prior would say that the probabilityof the next flip coming up heads is x/10 and where x is thenumber of times the flip came up heads in the observedsample

I A frequentist cannot answer this question: A frequentistmakes probabilistic statements about the likelihood ofobserving a particular realized sample given a null hypothesisabout the parameter that governs the probability of heads vstails

The subjective view of probability

Important: Probabilities can be viewed as statements about ourknowledge of a parameter, even if the parameter is a constant

I E.g. treating risk aversion as a random variable does notimply that we think of it as varying across time.

Subjective does not mean that anything goes!

The main components in Bayesian inference

Bayes’ Rule

Deriving Bayes’ Rule

p (A,B) = p(A | B)p(B)

Symmetricallyp (A,B) = p(B | A)p(A)

Implyingp(A | B)p(B) = p(B | A)p(A)

or

p(A | B) =p(B | A)p(A)

p(B)

which is known as Bayes’ Rule.

Data and parameters

The purpose of Bayesian analysis is to use the data y to learnabout the “parameters” θ

I Parameters of a statistical model

I Or anything not directly observed

Replace A and B in Bayes rule with θ and y to get

p(θ | y) =p(y | θ)p(θ)

p(y)

The probability density p(θ | y) then describes what we knowabout θ, given the data.

Parameters as random variables

Since p(θ | y) is a density function, we thus treat θ as a randomvariable with a distribution.

This is the subjective view of probability

I Important: the randomness of a parameter is subjective in thesense of depending on what information is available to theperson making the statement.

The density p(θ | y) is proportional to the prior times thelikelihood function

p(θ | y)︸ ︷︷ ︸posterior

∝ p(y | θ)︸ ︷︷ ︸likelihood function

p(θ)︸︷︷︸prior

But what exactly is a prior and a likelihood function?

The prior

The prior summarizes the investigators knowledge aboutparameters θ before observing the data y .

In practice, we need to choose functional forms and select “hyperparameters” that reflect this our prior knowledge.

Example:Let’s say that θ is 2 dimensional

I θ1 ∼ U[−1, 1]

I θ2 ∼ N(µθ2 , σ2θ2

)

-1,1,µθ2 and σ2θ2

would then be the hyper parameters

The likelihood function

The likelihood function is the probability density of observing thedata y conditional on a statistical model and the parameters θ.

For instance if the data y conditional on θ is normally distributedwe have that

p(y | θ) = (2π)−T/2∣∣Ω−1

∣∣1/2exp

[−1

2(y − µ)′Ω−1(y − µ)

]where Ω and µ are either elements or functions of θ.

The prior and the likelihood function

Remember how we derived Bayes’ rule:

p(θ, y) = p(y | θ)p(θ)

The model determines both the likelihood function and what thethe prior is “about”.

The posterior density p(θ | y)

The posterior density is often the object of fundamental interest inBayesian estimation.

I The posterior is the “result”.

I We are interested in the entire conditional distribution of theparameters θ

Model comparison

Bayesian statistics generally do not “reject” models

Instead: assign probabilities to different models

I We can index different models by i = 1, 2, ...m

p(θ | y ,Mi ) =p(y | θ,Mi )p(θ | Mi )

p(y | Mi )

Model comparison

The posterior model probability is given by

p(Mi | y) =p(y | Mi )p(Mi )

p(y)

where p(y | Mi ) is called the marginal likelihood. It can becomputed from∫

p(θ | y ,Mi )dθ =

∫p(y | θ,Mi )p(θ,Mi )

p(y | Mi )dθ

by using that∫p(θ | y ,Mi )dθ = 1 so that

p(y | Mi ) =

∫p(y | θ,Mi )p(θ,Mi )dθ

The posterior odds ratio

The posterior odds ratio is the relative probabilities of two modelsconditional on the data

p(Mi | y)

p(Mj | y)=

p(y | Mi )p(Mi )

p(y | Mj)p(Mj)

It is made up of

I The prior odds ratio p(Mi )p(Mj )

I The Bayes factor p(y |Mi )p(y |Mj )

Concepts of Bayesian analysis

Prior densities, likelihood functions, posterior densities, posteriorodds ratios, Bayes factors etc are part of almost all Bayesiananalysis.

I Data, choice of prior densities and the likelihood function arethe inputs into the analysis

I Posterior densities etc are the outputs

The rest of the course is about how to choose these inputs andhow to compute the outputs

Bayesian computation

Bayesian computation

More specifics about the outputs

I Posterior mean E (θ | y) =∫θp(θ | y)dθ

I Posterior variance var(θ | y) = E [[θ − E (θ | y)] | y ]2

I Posterior prob(θi > 0)

These objects can all be written in the form

E (g (θ) | y) =

∫g (θ) p(θ | y)dθ

where g(y) is the function of interest.

Posterior simulation

There are only a few cases when the expected value of functions ofinterest can be derived analytically.

I Instead, we rely on posterior simulation and Monte Carlointegration.

I Posterior simulation consists of constructing a sample fromthe posterior distribution p (θ | y)

Monte carlo integration then uses that

gS =1

S

S∑s=1

g(θ(s))

and that limS→∞ gS = E (g (θ) | y) where θ(s) is a draw from theposterior distribution.

The end-product of Bayesian statistics

Most of Bayesian econometrics consists of simulating distributionsof parameters using numerical methods.

I A simulated posterior is a numerical approximation to p(θ | y)

I We rely on ergodicity, i.e. that the moments of theconstructed sample correspond to the moments of thedistribution p(θ | y)

The most popular (and general) procedure to simulate theposterior is called the Random Walk Metropolis Algorithm

Bayesian DSGE models

“Modern macro models” (Kocherlakota)

1. Specifies constraints for households, technologies for firms andresource constraints for the whole economy

2. Specifies household preferences and firm objectives

3. Assumes forward looking behavior

4. Includes stochastic processes

5. Are models of the entire economy: No prices are exogenous

Bayesian DSGE models

A strategy for estimating structural macro models:

I Specify problems of representative agent and firm and thetechnologies at their disposal

I Log-linearize model to find approximate first order conditionsand budget constraints

I Estimate parameters of model using likelihood based methods

I “Likelihood based” means that we specify a completestatistical model that is assumed to be the data generatingprocess

Bayesian estimation

I Combine prior and sample information in a transparentmanner

A minimalistic DSGE model

The building blocks of a minimalistic DSGE model

I A representative agent that consumes and supply labor

I A representative firm that sets prices to maximize profit

I A central bank that sets nominal interest rates

I Exogenous shocks (or “wedges”)

Preferences and technology

The representative agent’s objective

In period t the representative agent maximizes expected futureutility

Et

∞∑s=0

βsU (Ct+s(i),Nt+s(i))

where Ct is consumption, Nt is supplied hours of labor.

Agents are assumed to be forward looking, but also to value thepresent more than the future (0 < β < 1).

The representative agent’s objective

Period t utility

U (Ct(i),Nt(i)) =(Ct(i))1−γ

(1− γ)− Nt(i)

1+ϕ

1 + ϕ

Main assumptions:Households like to consume but they do not like to work.

I Decreasing marginal utility of consumption

I Increasing marginal disutility of working

But we also impose specific functional forms.

The decisions made by the representative agent

I How much to consume and how much to save(Euler-equation)

Uc(Ct) = βEtRtPtUc(Ct+1)

Pt+1

I How much to work:

Wt

PtUc(Ct) = Nϕ

t

Production of goods

Output in firm j ∈ (0, 1) is produced using labor as the sole input

Yt(j) = exp(at)Nt(j)

where at is an exogenous labor augmenting technology.

Price setting

Prices are reset only infrequently (with prob = 1− θ).

This results in a (linearized) Phillips curve

πt = βEtπt+1 + κ(yt − xt)

where x is potential output.

Monopolistic competition: Each firm produces a unique good andhas some market power.

Monetary policy

Monetary policy is often represented by a simple Taylor-type rule

rt = φyyt + φππt + φr rt−1 + urt

Can be viewed as a reduced form representation of an optimizingcentral bank and fits the data quite well.

We will simplify the model further by assuming that φy = φr = 0.

The linearized structural system

After linearizing, the main model equations are given by

πt = βEtπt+1 + κ(yt − y t)

yt = Etyt+1 − σ (it − Etπt+1)

it = φπt

xt = ρxt−1 + uxt : uxt ∼ N(0, σ2u)

where πt , yt , yt , it are inflation, output, potential output andnominal interest rate respectively.

We want to estimate the parameters θ = ρ, γ, κ, φ, σx , σy , σπ,

Solving the model

3 ways to solve a linear model

Solving a model using full information rational expectations as theequilibrium concept involves integrating out expectations termsfrom the structural equations of the model by replacing agents’expectations with the mathematical expectation, conditional onthe state of the model.

Three different ways of doing this.

1. Method of undetermined coefficients, can be very quick whenfeasible and illustrates the fixed point nature of the rationalexpectations solution.

2. Replacing expectations with linear projections onto observablevariables

3. Decouple the stable and unstable dynamics of the model andset the unstable part to zero.

The 3 equation NK model

As a vehicle to demonstrate the different solution methods, we willuse a simple New-Keynesian model

πt = βEtπt+1 + κ(yt − y t)

yt = Etyt+1 − σ (it − Etπt+1)

it = φπt

xt = ρxt−1 + ut : ut ∼ N(0, σ2u)

where πt , yt , yt , it are inflation, output, potential output andnominal interest rate respectively.

I Single variable, potential output xt , as the state.

Method I:Method of undetermined coefficients

Pros

I Method is quick when feasible

I Illustrates well the fixed point nature of rational expectationsequilibria.

Cons

I Difficult to implement in larger models

Method of undetermined coefficients

πt = βEtπt+1 + κ(yt − xt)

yt = Etyt+1 − σ (it − Etπt+1)

it = φπt

xt = ρxt−1 + ut : ut ∼ N(0, σ2u)

Start by substituting in the interest rate in the Euler equation

xt = ρxt−1 + uxt

yt = Et(yt+1)− 1

γ[φππt − Et (πt+1)]

πt = Et (πt+1) + κ [yt − xt ]

Solving model using method of undeterminedcoefficients

Conjecture that model can be put in the form

xt = ρxt−1 + uxt

yt = axt

πt = bxt

Why is this a good guess?

Solving model using method of undeterminedcoefficients

Substitute in conjectured form of solution into structural equation

axt = aρxt −1

γ[φπbxt − bρxt ]

bxt = bρxt + κ [axt − xt ]

where we used that xt = ρxt−1 + uxt implies that E[xt+1 | xt ] = ρxt

Solving model using method of undeterminedcoefficients

Equate coefficients on right and left hand side

a = aρ− 1

γφπb +

1

γbρ

b = bρ+ κ [a− 1]

or [(1− ρ) 1

γ (φπ − ρ)

−κ (1− ρ)

] [ab

]=

[0−κ

]

Solving model using method of undeterminedcoefficients

Solve for a and b[ab

]=

[(1− ρ) 1

γ (φπ − ρ)

−κ (1− ρ)

]−1 [0−κ

]or [

ab

]=

[−κφ−ρ−cκγ 1−ρ−c

]where c = γ − κρ− 2γρ+ κφ+ γρ2 < 0

The solved model

The solved model is of the form

xt = ρxt−1 + uxt

yt = −κρ− φπc

xt

πt = κγρ− 1

cxt

where c = γ − κρ− 2γρ+ κφ+ γρ2 < 0

Method II: Replacing expectations with linearprojections

The second method uses that projections of the future values ofvariables on observables gives optimal expectations (in the sense ofminimum error variance) if the observables span the space of thestate.

How does it work?

I Replace Etπt+1 and Etyt+1 with linear projections of thesevariables on current inflation.

I There is nothing special about inflation. Projecting ontocurrent output would also work.

But first we need to know more about projections.

Inner product spaces

An real vector space H is said to be an inner product space if foreach pair of elements x and y in H there is a number 〈x , y〉 calledthe inner product of x and y such that

〈x , y〉 = 〈y , x〉〈x + y , z〉 = 〈x , z〉+ 〈y , z〉 for all x , y , z ∈ H〈αx , y〉 = α 〈x , y〉 for all x , y ∈ H and α ∈ R〈x , x〉 ≥ 0 for all x ∈ H〈x , x〉 = 0 if and only if x = 0

The projection theorem

If M is a closed subspace of the Hilbert Space H and x∈ H, then

(i) there is a unique element x ∈ M such that

‖x − x‖ = infy∈M

‖x − y‖

and(ii) x ∈ M and ‖x − x‖ = infy∈M ‖x − y‖ if and only if x ∈ M and(x − x) ∈ M⊥ where M⊥ is the orthogonal complement to M in H.

‖·‖ is defined as√〈·, ·〉

The element x is called the orthogonal projection of x onto M.

The space L2 (Ω,F ,P)

The space L2 (Ω,F ,P) consists of all collections C of randomvariables X defined on the probability space (Ω,F ,P) satisfyingthe condition

EX 2 =

∫ΩX 2(ω)P (dω) <∞

and define the inner product of this space as

〈X ,Y 〉 = E (XY ) for any X ,Y ∈ C

Least squares estimation via the projection theorem

The inner product of inner product space L2 corresponds to acovariance ⇒ orthogonal projection will be the minimum varianceestimate since

‖x − x‖ =

√E (x − x) (x − x)′

I Both the information set and the variables we are trying topredict must be elements of the relevant space

Least squares estimation via the projection theorem

To find the estimate x as a linear function of y simply use that

〈x − βy , y〉 = E[(x − βy) y ′

]= 0

and solve for ββ = E

(xy ′) [

E(yy ′)]−1

The advantage of this approach is that once you have made surethat the variables y and x are in a well defined inner product space,there is no need to minimize the variance directly. The projectiontheorem ensures that an estimate with orthogonal errors is the(linear) minimum variance estimate.

Two useful properties of linear projections

1. If two random variables X and Y are Gaussian, then theprojection of Y onto X coincides withe the conditionalexpectation E (Y | X ).

2. If X and Y are not Gaussian, the linear projection of Y ontoX is the minimum variance linear prediction of Y given X .

Replacing expectations with linear projections

We will use that in equilibrium

E (πt+1 | πt) =cov(πt , πt+1)

var (πt)πt

E (yt+1 | πt) =cov(πt , yt+1)

var (πt)πt

if the innovations ut to xt are Gaussian.

Replacing expectations with linear projections

Let

c0πt = E ∗ (πt+1 | πt)d0πt = E ∗ (yt+1 | πt)

denote initial candidate projections of expected inflation andoutput on current inflation. We can then write the structuralequations as

πt = βc0πt + κ(yt − xt)

yt = d0πt − σ (φπt − c0πt)

Replacing expectations with linear projections

Put the whole system in matrix form xtπtyt

=

1 0 0κ 1− βc0 −κ0 −d0 + σφ− σc0 1

−1 ρ 0 00 0 00 0 0

xt−1

πt−1

yt−1

+

1 0 0κ 1− βco −κ0 −d0 + σφ− σc0 1

−1 100

ut

orXt = AXt−1 + Cut

Solution algorithm

1. Make an initial guess of c0 and d0 in (1)

2. Compute the implied covariances of current inflation and futureinflation and output using

E [XtX′t ] = ΣXX

ΣXX = AΣXXA′ + CC ′

andE [Xt+1X

′t ] = AΣXX

3. Replace the cs and ds with the cs+1 and ds+1 in ((1))

cs+1 =cov(πt , πt+1)

var (πt)

ds+1 =cov(πt , yt+1)

var (πt)

using the covariances from Step 2.

4. Repeat Step 2-3 until cs and ds converges.

Method III: Stable/unstable decoupling

Originally due to Blanchard and Kahn (1980)

I Computational aspects of the method has been furtherdeveloped by others, for instance Klein (2000).

I The most accessible reference is probably Soderlind (1999),who also has code posted on his web site.

The method has several advantages:

I Fast

I Provides conditions for when a solution exists

I Provides conditions for when the solution is unique.

Method III: Stable/unstable decoupling

Start by putting the model into matrix form

1 0 00 β 00 σ 1

xt+1

Etπt+1

Etyt+1

=

ρ 0 0κ 1 −κ0 σφ 1

xt−1

πtyt

+

100

ut+1

or

A0

[x1t+1

Etx2t+1

]= A1

[x1t

x2t

]+ C1ut+1

I x1t is vector containing the pre-determined and/or exogenous

variables (i.e. xt)I x2

t a vector containing the forward looking (”jump”) variables(i.e. Etyt+1 and Etπt+1).

Method III: Stable/unstable decoupling

Pre-multiply both sides of

A0

[x1t+1

Etx2t+1

]= A1

[x1t

x2t

]+ C1ut+1

by A−10 to get [

x1t+1

Etx2t+1

]= A

[x1t

x2t

]+ Cut+1

where A = A−10 A1 and C = A−1

0 C1.

For the model to have unique stable solution the number of stableeigenvalues of A must be equal to the number ofexogenous/pre-determined variables.

Method III: Stable/unstable decoupling

Use a Schur decomposition to get

A = ZTZH

where T is upper block triangular

T =

[T11 T12

0 T22

]and Z is a unitary matrix so that ZHZ = ZZH = I( =⇒ ZH = Z−1).

I For any square matrix W , W−1AW is a so called similaritytransformation of A.

I Similarity transformations do not change the eigenvalues of amatrix

I We can always choose Z and T so that the unstableeigenvalues of A are shared with T22

Method III: Stable/unstable decoupling

Define the auxiliary variables[θtδt

]= ZH

[x1t

x2t

]We can then rewrite the system (67) as

ZH

[x1t+1

Etx2t+1

]= ZHZTZH

[x1t

x2t

]or equivalently

E

[θt+1

δt+1

]=

[T11 T12

0 T22

] [θtδt

]since ZHZ = I .

Method III: Stable/unstable decoupling

For this system to be stable, the auxiliary variables associated with theunstable roots in T22 must be zero for all t. (WHY?)

Imposing δt = 0∀t reduces the relevant state dynamics to

θt = T11θt−1

To get back the original variables we simply use that[x1t

x2t

]=

[Z11

Z21

]θt

or [x1t

x2t

]=

[Z11

Z21

]Z−1

11 x1t

which is the solution to the model. It is in the form

x1t = Mx1

t−1 + εt

x2t = Gx1

t

where M = Z11T11Z−111 (= ρ in our example) and G = Z21Z

−111 .

The model as a state space system

Xt = AXt−1 + Cut

Zt = DXt

where

Xt = xt ,A = ρ,Cut = uxt

Zt =

rtπtyt

,D =

φπκγ 1−ρ−c

−κφπ−ρ−c

Additional references

Bayesian econometrics:

I Bayesian Econometrics, Gary Koop, Wiley 2003.

I Contemporary Bayesian Econometrics and Statistics, JohnGeweke, Wiley 2005.

New Keynesian macro models:

I Monetary Policy, Inflation, and the Business Cycle: AnIntroduction to the New Keynesian Framework, Jordi Gali,Princeton 2008.

Recommended