View
233
Download
0
Category
Preview:
Citation preview
Bayesian Methods for DSGE modelsLecture 1
Macro models as data generating processes
Kristoffer NimarkCREI, Universitat Pompeu Fabra and Barcelona GSE
June 30, 2014
Bayesian Methods for DSGE models
Course admin
Lectures: Mon-Fri, 9-11, Room 40.246
Practicas: Mon-Thur, 15.00-16.15, Room 40.153
To contact me:
I knimark@crei.cat
I Office 23.408 (4th floor, silver building between Econ Deptand the park)
Bayesian Methods for DSGE models
Course overview
1. Macro models as data generating processes
2. State space models and likelihood based estimation
3. Bayesian estimation of DSGE models
4. Bayesian analysis of DSGE models
5. Structural empirical models of news, noise and imperfectinformation
Macro models as data generating processes
Today:
I Introduction to Bayesian estimation
I Theoretical models as likelihood functions
I Micro founded macro models
I Solving linear rational expectations models
Bayesian statistics
Bayesian statistics
Bayesians used to be fringe types, but Bayesian methods nowdominate applied macro economics
I This is largely due to increased computing power
Bayesian methods have several advantages:
I Facilitates incorporating information from outside of sample
I Easy to compute confidence/probabilty intervals of functionsof parameters
I Good (i.e. known) small sample properties
Why probabilistic models?
Is the world characterized by randomness?
I Is the weather random?
I Is a coin flip random?
I ECB interest rates?
It is difficult to say with certainty whether something is “truly”random.
Two schools of statistics
Probability and statistics: What’s the difference?
Probability is a branch of mathematics
I There is little disagreement about whether the theoremsfollow from the axioms
Statistics is an inversion problem: What is a good probabilisticdescription of the world, given the observed outcomes?
I There is some disagreement about how to interpretprobabilistic statements and about how we should interpretdata when we make inference about unobservable parameters
Two schools of thought
What is the meaning of probability, randomness and uncertainty?
I The classical (or frequentist) view is that probabilitycorresponds to the frequency of occurrence in repeatedexperiments
I The Bayesian view is that probabilities are statements aboutour state of knowledge, i.e. a subjective view.
The difference has implications for how we interpret estimatedstatistical models and there is no general agreement about whichapproach is “better”.
Frequentist vs Bayesian statistics
Variance of estimator vs variance of parameter
I Bayesians think of parameters as having distributions whilefrequentist conduct inference by thinking about the varianceof an estimator
Frequentist confidence intervals:
I If point estimates are the truth and with repeated draws fromthe population of equal sample length, what is the intervalthat θ lies in 95% of the time?
Bayesian probability intervals:
I Conditional on the observed data, a prior distribution of θ anda functional form (i.e. model), what is the shortest intervalthat with 95% contains θ?
Coin flip example
After flipping a coin 10 times and counting the number of heads,what is the probability that the next flip will come up heads?
I A Bayesian with a uniform prior would say that the probabilityof the next flip coming up heads is x/10 and where x is thenumber of times the flip came up heads in the observedsample
I A frequentist cannot answer this question: A frequentistmakes probabilistic statements about the likelihood ofobserving a particular realized sample given a null hypothesisabout the parameter that governs the probability of heads vstails
The subjective view of probability
Important: Probabilities can be viewed as statements about ourknowledge of a parameter, even if the parameter is a constant
I E.g. treating risk aversion as a random variable does notimply that we think of it as varying across time.
Subjective does not mean that anything goes!
The main components in Bayesian inference
Bayes’ Rule
Deriving Bayes’ Rule
p (A,B) = p(A | B)p(B)
Symmetricallyp (A,B) = p(B | A)p(A)
Implyingp(A | B)p(B) = p(B | A)p(A)
or
p(A | B) =p(B | A)p(A)
p(B)
which is known as Bayes’ Rule.
Data and parameters
The purpose of Bayesian analysis is to use the data y to learnabout the “parameters” θ
I Parameters of a statistical model
I Or anything not directly observed
Replace A and B in Bayes rule with θ and y to get
p(θ | y) =p(y | θ)p(θ)
p(y)
The probability density p(θ | y) then describes what we knowabout θ, given the data.
Parameters as random variables
Since p(θ | y) is a density function, we thus treat θ as a randomvariable with a distribution.
This is the subjective view of probability
I Important: the randomness of a parameter is subjective in thesense of depending on what information is available to theperson making the statement.
The density p(θ | y) is proportional to the prior times thelikelihood function
p(θ | y)︸ ︷︷ ︸posterior
∝ p(y | θ)︸ ︷︷ ︸likelihood function
p(θ)︸︷︷︸prior
But what exactly is a prior and a likelihood function?
The prior
The prior summarizes the investigators knowledge aboutparameters θ before observing the data y .
In practice, we need to choose functional forms and select “hyperparameters” that reflect this our prior knowledge.
Example:Let’s say that θ is 2 dimensional
I θ1 ∼ U[−1, 1]
I θ2 ∼ N(µθ2 , σ2θ2
)
-1,1,µθ2 and σ2θ2
would then be the hyper parameters
The likelihood function
The likelihood function is the probability density of observing thedata y conditional on a statistical model and the parameters θ.
For instance if the data y conditional on θ is normally distributedwe have that
p(y | θ) = (2π)−T/2∣∣Ω−1
∣∣1/2exp
[−1
2(y − µ)′Ω−1(y − µ)
]where Ω and µ are either elements or functions of θ.
The prior and the likelihood function
Remember how we derived Bayes’ rule:
p(θ, y) = p(y | θ)p(θ)
The model determines both the likelihood function and what thethe prior is “about”.
The posterior density p(θ | y)
The posterior density is often the object of fundamental interest inBayesian estimation.
I The posterior is the “result”.
I We are interested in the entire conditional distribution of theparameters θ
Model comparison
Bayesian statistics generally do not “reject” models
Instead: assign probabilities to different models
I We can index different models by i = 1, 2, ...m
p(θ | y ,Mi ) =p(y | θ,Mi )p(θ | Mi )
p(y | Mi )
Model comparison
The posterior model probability is given by
p(Mi | y) =p(y | Mi )p(Mi )
p(y)
where p(y | Mi ) is called the marginal likelihood. It can becomputed from∫
p(θ | y ,Mi )dθ =
∫p(y | θ,Mi )p(θ,Mi )
p(y | Mi )dθ
by using that∫p(θ | y ,Mi )dθ = 1 so that
p(y | Mi ) =
∫p(y | θ,Mi )p(θ,Mi )dθ
The posterior odds ratio
The posterior odds ratio is the relative probabilities of two modelsconditional on the data
p(Mi | y)
p(Mj | y)=
p(y | Mi )p(Mi )
p(y | Mj)p(Mj)
It is made up of
I The prior odds ratio p(Mi )p(Mj )
I The Bayes factor p(y |Mi )p(y |Mj )
Concepts of Bayesian analysis
Prior densities, likelihood functions, posterior densities, posteriorodds ratios, Bayes factors etc are part of almost all Bayesiananalysis.
I Data, choice of prior densities and the likelihood function arethe inputs into the analysis
I Posterior densities etc are the outputs
The rest of the course is about how to choose these inputs andhow to compute the outputs
Bayesian computation
Bayesian computation
More specifics about the outputs
I Posterior mean E (θ | y) =∫θp(θ | y)dθ
I Posterior variance var(θ | y) = E [[θ − E (θ | y)] | y ]2
I Posterior prob(θi > 0)
These objects can all be written in the form
E (g (θ) | y) =
∫g (θ) p(θ | y)dθ
where g(y) is the function of interest.
Posterior simulation
There are only a few cases when the expected value of functions ofinterest can be derived analytically.
I Instead, we rely on posterior simulation and Monte Carlointegration.
I Posterior simulation consists of constructing a sample fromthe posterior distribution p (θ | y)
Monte carlo integration then uses that
gS =1
S
S∑s=1
g(θ(s))
and that limS→∞ gS = E (g (θ) | y) where θ(s) is a draw from theposterior distribution.
The end-product of Bayesian statistics
Most of Bayesian econometrics consists of simulating distributionsof parameters using numerical methods.
I A simulated posterior is a numerical approximation to p(θ | y)
I We rely on ergodicity, i.e. that the moments of theconstructed sample correspond to the moments of thedistribution p(θ | y)
The most popular (and general) procedure to simulate theposterior is called the Random Walk Metropolis Algorithm
Bayesian DSGE models
“Modern macro models” (Kocherlakota)
1. Specifies constraints for households, technologies for firms andresource constraints for the whole economy
2. Specifies household preferences and firm objectives
3. Assumes forward looking behavior
4. Includes stochastic processes
5. Are models of the entire economy: No prices are exogenous
Bayesian DSGE models
A strategy for estimating structural macro models:
I Specify problems of representative agent and firm and thetechnologies at their disposal
I Log-linearize model to find approximate first order conditionsand budget constraints
I Estimate parameters of model using likelihood based methods
I “Likelihood based” means that we specify a completestatistical model that is assumed to be the data generatingprocess
Bayesian estimation
I Combine prior and sample information in a transparentmanner
A minimalistic DSGE model
The building blocks of a minimalistic DSGE model
I A representative agent that consumes and supply labor
I A representative firm that sets prices to maximize profit
I A central bank that sets nominal interest rates
I Exogenous shocks (or “wedges”)
Preferences and technology
The representative agent’s objective
In period t the representative agent maximizes expected futureutility
Et
∞∑s=0
βsU (Ct+s(i),Nt+s(i))
where Ct is consumption, Nt is supplied hours of labor.
Agents are assumed to be forward looking, but also to value thepresent more than the future (0 < β < 1).
The representative agent’s objective
Period t utility
U (Ct(i),Nt(i)) =(Ct(i))1−γ
(1− γ)− Nt(i)
1+ϕ
1 + ϕ
Main assumptions:Households like to consume but they do not like to work.
I Decreasing marginal utility of consumption
I Increasing marginal disutility of working
But we also impose specific functional forms.
The decisions made by the representative agent
I How much to consume and how much to save(Euler-equation)
Uc(Ct) = βEtRtPtUc(Ct+1)
Pt+1
I How much to work:
Wt
PtUc(Ct) = Nϕ
t
Production of goods
Output in firm j ∈ (0, 1) is produced using labor as the sole input
Yt(j) = exp(at)Nt(j)
where at is an exogenous labor augmenting technology.
Price setting
Prices are reset only infrequently (with prob = 1− θ).
This results in a (linearized) Phillips curve
πt = βEtπt+1 + κ(yt − xt)
where x is potential output.
Monopolistic competition: Each firm produces a unique good andhas some market power.
Monetary policy
Monetary policy is often represented by a simple Taylor-type rule
rt = φyyt + φππt + φr rt−1 + urt
Can be viewed as a reduced form representation of an optimizingcentral bank and fits the data quite well.
We will simplify the model further by assuming that φy = φr = 0.
The linearized structural system
After linearizing, the main model equations are given by
πt = βEtπt+1 + κ(yt − y t)
yt = Etyt+1 − σ (it − Etπt+1)
it = φπt
xt = ρxt−1 + uxt : uxt ∼ N(0, σ2u)
where πt , yt , yt , it are inflation, output, potential output andnominal interest rate respectively.
We want to estimate the parameters θ = ρ, γ, κ, φ, σx , σy , σπ,
Solving the model
3 ways to solve a linear model
Solving a model using full information rational expectations as theequilibrium concept involves integrating out expectations termsfrom the structural equations of the model by replacing agents’expectations with the mathematical expectation, conditional onthe state of the model.
Three different ways of doing this.
1. Method of undetermined coefficients, can be very quick whenfeasible and illustrates the fixed point nature of the rationalexpectations solution.
2. Replacing expectations with linear projections onto observablevariables
3. Decouple the stable and unstable dynamics of the model andset the unstable part to zero.
The 3 equation NK model
As a vehicle to demonstrate the different solution methods, we willuse a simple New-Keynesian model
πt = βEtπt+1 + κ(yt − y t)
yt = Etyt+1 − σ (it − Etπt+1)
it = φπt
xt = ρxt−1 + ut : ut ∼ N(0, σ2u)
where πt , yt , yt , it are inflation, output, potential output andnominal interest rate respectively.
I Single variable, potential output xt , as the state.
Method I:Method of undetermined coefficients
Pros
I Method is quick when feasible
I Illustrates well the fixed point nature of rational expectationsequilibria.
Cons
I Difficult to implement in larger models
Method of undetermined coefficients
πt = βEtπt+1 + κ(yt − xt)
yt = Etyt+1 − σ (it − Etπt+1)
it = φπt
xt = ρxt−1 + ut : ut ∼ N(0, σ2u)
Start by substituting in the interest rate in the Euler equation
xt = ρxt−1 + uxt
yt = Et(yt+1)− 1
γ[φππt − Et (πt+1)]
πt = Et (πt+1) + κ [yt − xt ]
Solving model using method of undeterminedcoefficients
Conjecture that model can be put in the form
xt = ρxt−1 + uxt
yt = axt
πt = bxt
Why is this a good guess?
Solving model using method of undeterminedcoefficients
Substitute in conjectured form of solution into structural equation
axt = aρxt −1
γ[φπbxt − bρxt ]
bxt = bρxt + κ [axt − xt ]
where we used that xt = ρxt−1 + uxt implies that E[xt+1 | xt ] = ρxt
Solving model using method of undeterminedcoefficients
Equate coefficients on right and left hand side
a = aρ− 1
γφπb +
1
γbρ
b = bρ+ κ [a− 1]
or [(1− ρ) 1
γ (φπ − ρ)
−κ (1− ρ)
] [ab
]=
[0−κ
]
Solving model using method of undeterminedcoefficients
Solve for a and b[ab
]=
[(1− ρ) 1
γ (φπ − ρ)
−κ (1− ρ)
]−1 [0−κ
]or [
ab
]=
[−κφ−ρ−cκγ 1−ρ−c
]where c = γ − κρ− 2γρ+ κφ+ γρ2 < 0
The solved model
The solved model is of the form
xt = ρxt−1 + uxt
yt = −κρ− φπc
xt
πt = κγρ− 1
cxt
where c = γ − κρ− 2γρ+ κφ+ γρ2 < 0
Method II: Replacing expectations with linearprojections
The second method uses that projections of the future values ofvariables on observables gives optimal expectations (in the sense ofminimum error variance) if the observables span the space of thestate.
How does it work?
I Replace Etπt+1 and Etyt+1 with linear projections of thesevariables on current inflation.
I There is nothing special about inflation. Projecting ontocurrent output would also work.
But first we need to know more about projections.
Inner product spaces
An real vector space H is said to be an inner product space if foreach pair of elements x and y in H there is a number 〈x , y〉 calledthe inner product of x and y such that
〈x , y〉 = 〈y , x〉〈x + y , z〉 = 〈x , z〉+ 〈y , z〉 for all x , y , z ∈ H〈αx , y〉 = α 〈x , y〉 for all x , y ∈ H and α ∈ R〈x , x〉 ≥ 0 for all x ∈ H〈x , x〉 = 0 if and only if x = 0
The projection theorem
If M is a closed subspace of the Hilbert Space H and x∈ H, then
(i) there is a unique element x ∈ M such that
‖x − x‖ = infy∈M
‖x − y‖
and(ii) x ∈ M and ‖x − x‖ = infy∈M ‖x − y‖ if and only if x ∈ M and(x − x) ∈ M⊥ where M⊥ is the orthogonal complement to M in H.
‖·‖ is defined as√〈·, ·〉
The element x is called the orthogonal projection of x onto M.
The space L2 (Ω,F ,P)
The space L2 (Ω,F ,P) consists of all collections C of randomvariables X defined on the probability space (Ω,F ,P) satisfyingthe condition
EX 2 =
∫ΩX 2(ω)P (dω) <∞
and define the inner product of this space as
〈X ,Y 〉 = E (XY ) for any X ,Y ∈ C
Least squares estimation via the projection theorem
The inner product of inner product space L2 corresponds to acovariance ⇒ orthogonal projection will be the minimum varianceestimate since
‖x − x‖ =
√E (x − x) (x − x)′
I Both the information set and the variables we are trying topredict must be elements of the relevant space
Least squares estimation via the projection theorem
To find the estimate x as a linear function of y simply use that
〈x − βy , y〉 = E[(x − βy) y ′
]= 0
and solve for ββ = E
(xy ′) [
E(yy ′)]−1
The advantage of this approach is that once you have made surethat the variables y and x are in a well defined inner product space,there is no need to minimize the variance directly. The projectiontheorem ensures that an estimate with orthogonal errors is the(linear) minimum variance estimate.
Two useful properties of linear projections
1. If two random variables X and Y are Gaussian, then theprojection of Y onto X coincides withe the conditionalexpectation E (Y | X ).
2. If X and Y are not Gaussian, the linear projection of Y ontoX is the minimum variance linear prediction of Y given X .
Replacing expectations with linear projections
We will use that in equilibrium
E (πt+1 | πt) =cov(πt , πt+1)
var (πt)πt
E (yt+1 | πt) =cov(πt , yt+1)
var (πt)πt
if the innovations ut to xt are Gaussian.
Replacing expectations with linear projections
Let
c0πt = E ∗ (πt+1 | πt)d0πt = E ∗ (yt+1 | πt)
denote initial candidate projections of expected inflation andoutput on current inflation. We can then write the structuralequations as
πt = βc0πt + κ(yt − xt)
yt = d0πt − σ (φπt − c0πt)
Replacing expectations with linear projections
Put the whole system in matrix form xtπtyt
=
1 0 0κ 1− βc0 −κ0 −d0 + σφ− σc0 1
−1 ρ 0 00 0 00 0 0
xt−1
πt−1
yt−1
+
1 0 0κ 1− βco −κ0 −d0 + σφ− σc0 1
−1 100
ut
orXt = AXt−1 + Cut
Solution algorithm
1. Make an initial guess of c0 and d0 in (1)
2. Compute the implied covariances of current inflation and futureinflation and output using
E [XtX′t ] = ΣXX
ΣXX = AΣXXA′ + CC ′
andE [Xt+1X
′t ] = AΣXX
3. Replace the cs and ds with the cs+1 and ds+1 in ((1))
cs+1 =cov(πt , πt+1)
var (πt)
ds+1 =cov(πt , yt+1)
var (πt)
using the covariances from Step 2.
4. Repeat Step 2-3 until cs and ds converges.
Method III: Stable/unstable decoupling
Originally due to Blanchard and Kahn (1980)
I Computational aspects of the method has been furtherdeveloped by others, for instance Klein (2000).
I The most accessible reference is probably Soderlind (1999),who also has code posted on his web site.
The method has several advantages:
I Fast
I Provides conditions for when a solution exists
I Provides conditions for when the solution is unique.
Method III: Stable/unstable decoupling
Start by putting the model into matrix form
1 0 00 β 00 σ 1
xt+1
Etπt+1
Etyt+1
=
ρ 0 0κ 1 −κ0 σφ 1
xt−1
πtyt
+
100
ut+1
or
A0
[x1t+1
Etx2t+1
]= A1
[x1t
x2t
]+ C1ut+1
I x1t is vector containing the pre-determined and/or exogenous
variables (i.e. xt)I x2
t a vector containing the forward looking (”jump”) variables(i.e. Etyt+1 and Etπt+1).
Method III: Stable/unstable decoupling
Pre-multiply both sides of
A0
[x1t+1
Etx2t+1
]= A1
[x1t
x2t
]+ C1ut+1
by A−10 to get [
x1t+1
Etx2t+1
]= A
[x1t
x2t
]+ Cut+1
where A = A−10 A1 and C = A−1
0 C1.
For the model to have unique stable solution the number of stableeigenvalues of A must be equal to the number ofexogenous/pre-determined variables.
Method III: Stable/unstable decoupling
Use a Schur decomposition to get
A = ZTZH
where T is upper block triangular
T =
[T11 T12
0 T22
]and Z is a unitary matrix so that ZHZ = ZZH = I( =⇒ ZH = Z−1).
I For any square matrix W , W−1AW is a so called similaritytransformation of A.
I Similarity transformations do not change the eigenvalues of amatrix
I We can always choose Z and T so that the unstableeigenvalues of A are shared with T22
Method III: Stable/unstable decoupling
Define the auxiliary variables[θtδt
]= ZH
[x1t
x2t
]We can then rewrite the system (67) as
ZH
[x1t+1
Etx2t+1
]= ZHZTZH
[x1t
x2t
]or equivalently
E
[θt+1
δt+1
]=
[T11 T12
0 T22
] [θtδt
]since ZHZ = I .
Method III: Stable/unstable decoupling
For this system to be stable, the auxiliary variables associated with theunstable roots in T22 must be zero for all t. (WHY?)
Imposing δt = 0∀t reduces the relevant state dynamics to
θt = T11θt−1
To get back the original variables we simply use that[x1t
x2t
]=
[Z11
Z21
]θt
or [x1t
x2t
]=
[Z11
Z21
]Z−1
11 x1t
which is the solution to the model. It is in the form
x1t = Mx1
t−1 + εt
x2t = Gx1
t
where M = Z11T11Z−111 (= ρ in our example) and G = Z21Z
−111 .
The model as a state space system
Xt = AXt−1 + Cut
Zt = DXt
where
Xt = xt ,A = ρ,Cut = uxt
Zt =
rtπtyt
,D =
φπκγ 1−ρ−c
−κφπ−ρ−c
Additional references
Bayesian econometrics:
I Bayesian Econometrics, Gary Koop, Wiley 2003.
I Contemporary Bayesian Econometrics and Statistics, JohnGeweke, Wiley 2005.
New Keynesian macro models:
I Monetary Policy, Inflation, and the Business Cycle: AnIntroduction to the New Keynesian Framework, Jordi Gali,Princeton 2008.
Recommended