Macroeconomics I - GRIPSjulen/teaching/macro1_13/todai_macro1_s13_ln5.… · Core Macro I - Spring 2013 Lecture 5 Dynamic Programming I: Theory I Backward Induction: Step 5! Step

Lecture 5

Macroeconomics IUniversity of Tokyo

Dynamic Programming I: Theory ILS, Chapter 3

(Extended with King (2002) “A Simple Introduction to Dynamic Programming in Macroeconomic Models”)

Julen Esteban-PretelNational Graduate Institute for Policy Studies

Lecture 5 Dynamic Programming I: Theory ICore Macro I - Spring 2013

The General Problem§ We want to analyze problems of the following type:

s.t.

§ xt is a vector of state variables. Describe the state of the system at any point in time.- xit could be amount of capital good i at time t.

§ ut is a vector of control variables. Chosen every period by the decision maker.- ujt could be the consumption good j at time t.

§ R(·) is the objective function. In general it is a function of the states and the controls.§ g(·) is the system of equation of motions or transition equations. Intertemporal constraints connecting the state and control variables.

§ Ω is the feasible set for the control variables.

2

xT � 0

Deterministic, Finite Horizon Models

ut ⇥ for all t = 0, 1, . . . , T� 1Ωx0 = x0 is given

(5.1)

max{ut}R(x0, x1, . . . , xT; u0, . . . , uT�1)

g(x0, x1, . . . , xT; u0, . . . , uT�1) � 0


The Recursive Problem§ Assume that R(·) and g(·) are time separable:

• where r(·) is the return function and S(xT) is the value fn. at the end of the program, when no more decisions are made.

• g(·) functions follow the markov structure:

• Time separability allows interaction between states and controls, but only within periods.

§ The problem becomes:

3


.

.

.

R(x0, x1, . . . , xT; u1, . . . , uT�1) � r0(x0, u0) + r1(x1, u1) + . . .+ rT�1(xT�1, uT�1) + S(xT)

x1 = g0(x0, u0)

x2 = g1(x1, u1)

xT = gT�1(xT�1, uT�1)

Ω

s.t.

(5.2)

xi0 = xi0 is given � i = 1, 2, . . . , n

max{ut}T�1t=0 ,ut⇥

T�1�

t=0

rt(xt, ut) + S(xT)

xit+1 = git(xt, ut) ⇥i = 1, 2, . . . , n and t = 0, 1, . . . , T� 1


Example: Cass-Koopmans Optimal Growth Model

§ The planner’s problem in the Cass-Koopmans economy is:

§ We have two ways to choose the control and the state variables:1. ct and kt+1 are the controls and kt the state.2. kt+1 is the control and kt the state.

§ Using the second way:

• Notice that the state does not appear in the transition equation.

4

s.t. kt+1 = (1� �)kt + f(kt)� ctk0 is given.

max{zt}

T�

t=0

�tu((1� ⇥)kt + f(kt)� zt)

s.t. kt+1 = zt,k0 is given.

(5.3)

(5.4)

max{ct,kt+1}

T�

t=0

�tu(ct)


Bellman’s Method§ Consider the problem in (5.2) at time t = 0.

§ Problem A:

§ Consider now the problem starting in t0 > 0.§ Problem B:

5


Ω

s.t.

(5.2)

Ω

s.t.

(5.5)

xi0 = xi0 is given � i = 1, 2, . . . , n

xit0

= xit0 is given � i = 1, 2, . . . , n

max{ut}T�1t=0 ,ut⇥

T�1�

t=0

rt(xt, ut) + S(xT)

max{ut}T�1t=t0

,ut⇥

T�1�

t=t0

rt(xt, ut) + S(xT)

xit+1 = git(xt, ut) ⇥i = 1, 2, . . . , n and t = t0, . . . , T� 1

xit+1 = git(xt, ut) ⇥i = 1, 2, . . . , n and t = 0, 1, . . . , T� 1


Principle of Optimality§ Let the solution to problem B be defined as a value function V(xt0, T - t0).

§ Bellman’s principle of optimality states that:

• Any solution to Problem A (on the range t = 0,..., T) which yields must also solve Problem B (on the range t = t0,..., T).

§ In other words, if the rules for the control variables chosen for the t0 problem are optimal for any given , then they must be optimal for the of the larger problem.

§ Hence, we can solve the large Problem A by solving the smaller Problem B recursively.

§ Since t0 is arbitrary, we can initially choose t0 = T - 1 and solve a two period problem and then work backwards.

6

xt0 � xt0

xt0 x�t0



Backward Induction: Step 1

§ Step 1: Set t0 = T - 1, and Problem B becomes:

• Plug the first constraint in the return fn. and obtain the policy function in T - 1:

• Plug (5.7) in the return fn. to characterize the solution as a value function.

7


s.t. (5.6)

max{uT�1}

rT�1(xT�1, uT�1) + S(xT)

xT = gT�1(xT�1, uT�1)xT�1 = xT�1 is given.

uT�1 = hT�1(xT�1) (5.7)

V(xT�1, 1) � rT�1(xT�1, h(xT�1)) + S(gT�1(xT�1, h(xT�1))) (5.8)



§ Step 2: Set t0 = T - 2, so Problem B becomes:

• By Bellman’s P. of O., we can rewrite the problem as:

• Using the solution from step 1:

• We can find the new policy function for T - 2 and the solution as a value fn. :

8


s.t. (5.9)xT = gT�1(xT�1, uT�1)

max{uT�1,uT�2}

{rT�2(xT�2, uT�2) + rT�1(xT�1, uT�1) + S(xT)}

xT�2 = xT�2 is given.

xT�1 = gT�2(xT�2, uT�2)

max{uT�2}

�rT�2(xT�2, uT�2) + max

{uT�1}{rT�1(xT�1, uT�1) + S(xT)}

⇥

s.t. (i), (ii) and (iii).

i)ii)iii)

(5.10)

s.t. (ii) and (iii). (5.11)

uT�2 = hT�2(xT�2) (5.12)

(5.13)V(xT�2, 2) � rT�2(xT�2, h(xT�2)) + V[gT�2(xT�2, h(xT�2)), 1]

max{uT�2}

{rT�2(xT�2, uT�2) + V(xT�1, 1)}



§ Step 3: In general, the problem in period T - k is

• The max problem will yield the policy function:

9


s.t. (5.14)i)ii)

V(xT�k, k)) = max{uT�k}

{rT�k(xT�k, uT�k) + V(xT�k+1, k� 1))}

xT�k+1 = gT�k(xT�k, uT�k)xT�k = xT�k is given.

uT�k = hT�k(xT�k) (5.15)



§ Step 4: Eventually going backward we will reach period zero.

• This will yield the policy function:

10


s.t. (5.16)i)ii)

(5.17)

V(x0, T)) = max{u0}{r0(x0, u0) + V(x1, T� 1))}

x0 = x0 is given.

u0 = h0(x0)

x1 = g0(x0, u0)



§ Step 5: Use the known x0 and the policy function for t = 0 (u0 = h0(x0)) to obtain u0 .

• With x0, u0 and the transition equation, x1 = g0(x0,u0), obtain x1.

• With x1 we can get u1, from the policy function of that period and using the transition equation we obtain x2.

• This process is repeated until we have obtained all xt and ut, at which point the whole Problem A is solved.

11



Backward Induction: Steps 1-5§ Starting from the last period we solve two-period problems until we reach

period 0.

§ Step 1: Solve the date T - 1 decision problem, and obtain

§ Step 2: Plug V(xT-1, 1) into the max problem for date T - 2, solve it and obtain

§ Step 3: For any period T - k, we obtain the policy and value functions.

§ Step 4: Keep going backward until reaching date 0 and obtain u0 and V(x0,T).

§ Step 5: Use the initial condition x0, the sequence of policy functions ut=ht(xt) and transition equations, xt+1=gt(xt,ut) to get the sequence of xt and ut.

12

uT�1 = hT�1(xT�1)

uT�2 = hT�2(xT�2)V(xT�2, 2) � rT�2(xT�2, h(xT�2)) + V[gT�2(xT�2, h(xT�2)), 1]

V(xT�1, 1) � rT�1(xT�1, h(xT�1)) + S[gT�1(xT�1, h(xT�1))]


Why Infinite Horizon?§ Why should we study the case of an infinite time horizon?

§ Altruism:• People do not live forever, but they care about their descendants.

• The existence of bequests indicates that there is altruism.- Can bequest be accidental? Some may, but since annuity markets are not

fully used, at least some are not accidental.• Altruism should be in the form of caring about the utility of the descendants.

§ Simplicity:• Similar results of many macro models with long time horizons and

infinite-horizon.• Infinite-horizon models are stationary, hence its solution can be found

more easily.

13

Deterministic , Infinite Horizon Models


Infinite Horizon Problem§ In finite horizon problems we can find time-varying policy functions:

§ This is due to two reasons:• T is finite.• rt(xt, ut) and gt(xt, ut) have been allowed to depend on time.

§ In infinite horizon problems assump. are made to have time-ind. policy fn.§ Consider the following infinite horizon problem (with time-separability):

§ For the problem to have a unique solution, we need bounded objective fn. One trick is to assume discounting: βt ∈ (0,1).

§ Further, assume:• βt = β ∀t• rt(xt,ut) = βt r(xt,ut)• gt(xt,ut) = g(xt,ut)

14

ut = ht(xt)

Ω(5.18)

(5.19)


max{ut}�t=0,ut⇥

��

t=0

rt(xt, ut) s.t. xt+1 = gt(xt, ut), ⇥t � 0, x0 is given.


From Sequential to D.P. Formulation§ With the assumptions in (5.19) the infinite-horizon sequential problem is:

§ Let Vt(xt) be the value of the optimal program at t0 with given initial cond. xt0.

§ Using the definition of Vt(xt), we obtain, for a generic time period (given xt):

15

Ω(5.18)

Ω Ω

= maxut0⇤

�r(xt0 , ut0) + � max

{ut}�t=t0+1,ut⇤

⇥⇤

t=t0+1

�t�(t0+1)r(xt, ut)

⇥s.t. xt+1 = g(xt, ut), ⇥ t � t0

= maxut0⇤

�r(xt0 , ut0) + max

{ut}�t=t0+1,ut⇤

⇥⇤

t=t0+1

�t�t0 r(xt, ut)

⇥s.t. xt+1 = g(xt, ut), ⇥ t � t0

Ω Ω

ΩVt0(xt0) = max

{ut}�t=t0 ,ut⇤

⇥�

t=t0

�t�t0 r(xt, ut) s.t. xt+1 = g(xt, ut), ⇥ t � t0.

(SP)

(5.20)

max{ut}�t=0,ut⇥

��

t=0

�tr(xt, ut) s.t. xt+1 = g(xt, ut), ⇥t � 0, x0 given.

(5.21)or

Vt(xt) = maxut{r(xt, ut) + �Vt+1(xt+1)} s.t. xt+1 = g(xt, ut).

Vt(xt) = maxut{r(xt, ut) + �Vt+1[g(xt, ut)]} .


From Sequential to D.P. Formulation (cont.)

§ Under suitable conditions (including concavity of r and g) we can prove (see LS Appendix A1, A2):

1. The functional equation in (5.20) (i.e. )

has a unique strictly concave, time-invariant solution:

2. This solution is approached in the limit as j→∞ by iterations on

3. There is a unique time-invariant policy function

16

ut = h(xt)

(5.22)

(5.23)

Vt(xt) = maxut{r(xt, ut) + �Vt+1[g(xt, ut)]}

V(xt) = maxut{r(xt, ut) + �V[g(xt, ut)]}

V(j+1)(xt) = maxut

�r(xt, ut) + �V(j)[g(xt, ut)]

⇥


Sequential and D.P. Formulation

17

§ Hence, the sequential and D.P. formulations of the problem are:

§ Sequential formulation:

where the solution is a sequence of the control variable optimal choices:

§ Dynamic Programming formulation: (drop subscripts and denote next period with ‘)

where the solution is the value function:

with associated policy function:

max{ut}�t=0,ut⇥

��

t=0

�tr(xt, ut) s.t. xt+1 = g(xt, ut), x0 = x0 given.Ω

{ut}�t=0

V(x) = maxu{r(x, u) + �V(x�)} s.t. x� = g(x, u), x given.

V(x)

h(x)

(S.P.)

(D.P.)


Computational Methods§ Consider the following problem:

§ We consider 4 methods to solve this problem:§ Value function iteration.

• Constructs a sequence of value functions and associated policy functions by iterating on the value function.

§ Guess and verify.• Involves guessing and verifying a solution for V in equation (5.22).

§ Howard’s improvement algorithm.

• Finds the solution by iterating on the policy function. § Benveniste-Scheinkman formula.

• It has the advantage of not requiring to know the value function to get results.

18


Ωmax

{ut}�t=0,ut⇥

��

t=0

�tr(xt, ut) s.t. xt+1 = g(xt, ut), ⇥t � 0, x0 given.


Value Function Iteration§ The initial step is to set up the Bellman equation:

§ This method finds the solution by iterating (starting from V(0) = 0 and continuing until V(j) has converged) on the following function:

§ First: Set V(0) = 0 and solve the max problem on:

• This will give us a rule u=h(0)(x), which delivers: r(x,h(0)(x)) and V(1)(x)=r(x,h(0)(x)).

§ Second: Now set up the Bellman equation again:

§ Third: Solve again and obtain u = h(1)(x) and V(2)(x).§ Continue this procedure until the V(j) converges.

19



V(j+1)(x) = maxu

�r(x, u) + �V(j)(x�)

⇥s.t. x� = g(x, u), x given. (5.24)

V(2)(x) = maxu

�r(x, u) + �V(1)(x�)

⇥s.t. x� = g(x, u), x given.

or

V(1)(x) = max

u{r(x, u)} .

V(2)(x) = maxu{r(x, u) + �r[g(x, u), h(g(x, u))]} , x given.


Guess and Verify

20

§ The initial step is again to set up the Bellman equation:

§ This method is a variant of the previous one, but where we make an informed guess about the functional form of the value function.

§ First: Guess the form of the value function VG(x’) and substitute the guess into the Bellman equation:

§ Second: Perform the RHS max problem and obtain the policy fn: u=h(x).§ Third: Substitute the policy fn. into the Bellman equation and verify the initial

guess (i.e. V(x) = VG(x)).§ If the initial guess is not correct, try the form of the value fn. suggested by

the initial guess as a the new guess.


(5.25)


V(x) = maxu

�r(x, u) + �VG[g(x, u)]

⇥.


Howard’s Improvement Algorithm§ This method involves making an initial guess on the policy function and

iterating on it until it converges.

§ First: Pick a feasible policy, u = h(0)(x), and compute the value associated with using this policy forever:

§ Second: Generate a new policy function u = h(1)(x) by solving:

- Use the new policy function as in step 1 to obtain Vh(1).

§ We continue iterating over j until we obtain convergence on h(j)(x) on steps 1 and 2.

21

V h(0)

(xt) =��

t=0

�tr(xt, h(0)(xt)) s.t. xt+1 = g(xt, ut) (5.26)

(5.27)maxu

�r(x, u) + �V h

(0)[g(x, u)]

⇥


Benveniste-Scheinkman Formula§ Restating the Bellman equation (unconstrained):

§ The Benveniste-Scheinkman (BS) formula is:

§ If the equation of motion does not involve the state (i.e. xt+1 = g(ut)):

22

(5.28)

(5.29)

V�(x) =⇥r(x, h(x))⇥x

+ �V�[g(x, h(x))]⇥g(x, h(x))⇥x

.

V�(xt) =

�r(xt, ut)�xt

.

V(x) = maxu{r(x, u) + �V[g(x, u)]} , x given.


Derivation of BS Formula§ The Bellman equation is:

§ FOC:

this delivers the policy fn. u = h(x) (assuming concavity and no corner solutions).

§ Substitute u = h(x) into the Bellman equation to obtain

§ Differentiate both sides w.r.t. x, taking into account that u changes when x does through the policy function. We obtain:

23

⇥r(x, u)⇥u

+ �V�[g(x, u)]⇥g(x, u)⇥u

= 0. (5.30)

V(x) = r(x, h(x)) + �V[g(x, h(x))].

V�(x) =

�r(x, h(x))�x

+�r(x, h(x))�u

h�(x)

+�V�[g(x, h(x))]�⇥g(x, h(x))⇥x

+⇥g(x, h(x))⇥u

h�(x)⇥.

(5.31)


V(x) = maxu{r(x, u) + �V[g(x, u)]} , x given.


Derivation of BS Formula (cont.)

§ Restating the FOC and Benveniste-Scheinkman, without explicitly displaying the arguments:

§ This is the expression for the derivative of the value function derived by BS.§ Notice that we have used the envelope theorem.§ Writing BS in full

24

FOC:⇥r

⇥u+ �V� · ⇥g

⇥u= 0,

V�(x) =⇥r

⇥x+⇥r

⇥uh�(x) + �V� ·

�⇥g

⇥x+⇥g

⇥uh�(x)⇥

=⇥r

⇥x+ �V� · ⇥g

⇥x+�⇥r

⇥u+ �V� · ⇥g

⇥u

⇥h�(x)

=⇥r

⇥x+ �V� · ⇥g

⇥x. (by FOC)

V�(x) =⇥r(x, h(x))⇥x

+ �V�[g(x, h(x))] · ⇥g(x, h(x))⇥x

. (5.27)



Derivation of BS Formula (cont.)

§ Restating the full versions of the FOC and BS,

§ Reintroducing the time time subscripts:

§ If the equation of motion does not involve the state (i.e. xt+1 = g(ut)), the BS equation reduces to

25

FOC:⇥r(x, h(x))⇥u

+ �V�[g(x, h(x))] · ⇥g(x, h(x))⇥u

= 0,

BS: V�(x) =⇥r(x, h(x))⇥x

+ �V�[g(x, h(x))] · ⇥g(x, h(x))⇥x

.

FOC:⇥r(xt, ut)⇥ut

+ �V�(xt+1) ·⇥g(xt, ut)⇥ut

= 0,

BS: V�(xt) =⇥r(xt, ut)⇥xt

+ �V�(xt+1) ·⇥g(xt, ut)⇥xt

.

V�(xt) =

�r(xt, ut)�xt

. (5.28)



Euler Equation§ Assume still that xt+1 = g(ut).

§ Reproducing the FOC and BS formula:

§ Shift the BS one period forward to get . Plug this into the FOC to obtain the Euler equation:

26

FOC:⇥r(xt, ut)⇥ut

+ �V�(xt+1) · g�(ut) = 0,

BS: V�(xt) =

�r(xt, ut)�xt

.

V�(xt+1) = �r(xt+1,ut+1)

�xt+1

⇥r(xt, ut)⇥ut

+ � · ⇥r(xt+1, ut+1)⇥xt+1

g�(ut) = 0.


Documents

Macroeconomics I - GRIPSjulen/teaching/macro1_13/todai_macro1_s13_ln5.… · Core Macro I - Spring 2013 Lecture 5 Dynamic Programming I: Theory I Backward Induction: Step 5! Step