Transcript
Page 1: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

Dynamic Programming andOptimal Control

Fall 2009

Problem Set:Infinite Horizon Problems, Value Iteration, PolicyIteration

Notes:

• Problems marked with BERTSEKAS are taken from the book Dynamic Programming andOptimal Control by Dimitri P. Bertsekas, Vol. I, 3rd edition, 2005, 558 pages, hardcover.

• The solutions were derived by the teaching assistants. Please report any error that youmay find to [email protected] or [email protected].

• The solutions in this document are handwritten. A typed version will be available in future.

Page 2: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

Problem Set

Problem 1 (BERTSEKAS, p. 445, exercise 7.1)

Problem 2 (BERTSEKAS, p. 446, exercise 7.3)

Problem 3 (BERTSEKAS, p. 448, exercise 7.12)

Problem 4 (BERTSEKAS, p. 60, exercise 1.23)

2

Page 3: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

---

---

£Xe1==_~=====~:.=::::::J!.....:..-20­

±-iJ

~_.

~=_-o

---.--+-­

- -­vt.( W k-) G­ 0, 4+ If. fu- 'i oLt ~3 "-1-1~

1-;---------+- - - - '---­

I-:---'--'-'--I.""""-'--.LI.-- -~~ pOLQt1 --of-~ A ­

111~----as:~w ... ~ &>"

'<. '<._A-'<. YO

o..lli -to (ie'" L.(~~L

~ ~ l(4.0-U~_~~r_

fOr

-~~ ---'­

~ ... --<-+

l.l-­ "

'--v­ >--t{ -... .. 4( ~ - -,.­ -+ k..-. ' i-I.---­

or--5StYQ{\'_.

. -<----~

"-I--r-..!L.~

p~e.s Q t -&e..- 'fJ~ k (.. Lt 4.

----==-=-­

() G.Jc., -At(...~-~;'(

-A-I-=-----U\-,C -,

Page 4: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

Ii'

I +

j _

I \

I"

J,

..

1--'

f ';t

t•

t t

J r [I

~1'

I . 1

;#:

'1

T

.. II

~1 I

I' I

I

I I~~

•.

I _)

U ..,...

(.J

R.v

"'

1.

-, -

j~

-..

~J;

I

I 1,

1 '

,

't+

r t

. 'T

I

I;; i"

;

I j

l

_ I

1 ,

,s:::

1 I

I

j I

1 •

I "

I '

f · , ·

r :r

-.

I.

• r

I' I

I !".

.,

~ '.

. I.

I1

'III

I I

.l rI

l' .;

j~ ~ r'

~. r ;

;;

A"

1l'

-I

Ie j

. ..

~ ~t

.'

~ \.

••

tIt

fI

"lI

~ -

I ",

.•

J

:! ..

. ::

."

:;'11

.", ,I:'

I '

I ~

.

-1

~.

1:

t

I-

r .-

l~ P

,,-cy

j I;

(J

..!Jj

' _~

I

f'

I ~

I

S?)

-1

'1

1

14-1

;

~

D. .

.. ~

fU

1c

'=:';

V

-..

.../

>:'

,i

'-il

r ·I

Irh.

I.

t'

I I

. .

1\;1

V

I I ~

I ~~

..:J

II ~

J \J

tj

~j

~ l)1

1' l

Rf' f~

Ij

I.

. ,~I

'J ~ ~

l~aJ

IL

~-I

j 1'1

i..'

I rI

~8 e:

~ l ~

._II

I

II

l t,

...l

,p

f?'j

t-1=1

N

-j

I 0;'

I

I -

r.I

'I

-,--

.--.

rl C

9f'

74

''f

D~

I

_-rC

:f I ~

, If

fit

~/l'~'

P

I~ I

I

t ~

W P

eN

'f

Ct ~

r 1;.

K

,<

-'I

-1f

-.

SI

t1<:

::: I

pt'~

..:~

-~

'f

f I

It"

7F.~t

if

<

-q,

C

rI

"I

W

'{''

OJ

r-

C:J

(.,J

-I:

-'Tt

~'

4-

' 'f.

l' I I

Hi

.r+

r""

--

OJ

~

I ~

'1

-+-

I R

~

~~

,~

fJ

-

-'f.J

11

. -~

.r ~

I ~

,oj

I 1

t

'-"0

-r --/,)

..

~

I Y

a:

• r·

"t

I -.

~ y

~ I

f7

~ '+

1 ~

j

Ie1

01 t

~ ~~

r ••

I I'll

I, Ii

II'

II

-oe

"

"

1 I ..

. :!

:.:

1.'

)"::

'1

1 T

J

ll!j

'llr

ll~-

Q~I"

j 1

'I t

).).

1 I.

••

1 •

I.

..,I

..I rI

.1

I/tt

' ~

.,

.. t

·t

+L

+

~

a

, --

T)

I'

1 r

I'1

t..

-~ ~

I

) •

· ~;

; i ~

: ~

~ i ti

I -

; II r

j 1:

iI ~

~ Ii:

'~ ~: I

~

....

I~~.

It

;~

I ~ r;

t.

~ ! ~

'!'

~

I' If

~ '

j

~~ I

•t

w

I~I

0+1

r'

~ "f"

r•

:.

j J

.. I 1

+.;

-'"

' ~

f?

II'

~I

I; 4

r.

l •

III

l..I

:jt -'I"~

.'1.. t"

~~

I·;:

II

, '"

,. ·

I,

1 +

.,

I Pt

I ,."

I,.

., t::

!!>1

j

I j.~.

f

'!'~

I 1

'"

M"

c:; r

' I

1

I r

iii r .

ii ~ r

T

; Ii;

lJ I·

I t

[I ~ I

I:i

f !

[ :;

: j' !

'!;

II

Page 5: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

, f

't

~ 1

~ J

t +

I

I

Page 6: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

--+­

. -~ - .....

I I

+-­ -t----­ -

_... ---­

... -

--+--­ ..... -­

• I

.---­

- ~ I 1­ r

._-"---,---+ - - - - r ~

I I

I

~ ..--<---'----r--------+

---.. I

----'----­ - ---=

~ ~_ iT: __------"----1 C-­ -------,-- .­

i--'---+-----.------;-------,,---'-~__+_.=..:---=-~:t -I­ -rr=~ ~ - 1-----­ - -~

~~I--'C-,PI-I----'-'~~..=---,. ~ L~'"f4A.--<l: It ;L~I -i . I ~ ~ r ...-----;-----,---'-----,~--'----.,.-------:....I-----' - . ~ ~ ~ ~Q-~(l~~ 1)­ - ~~

~,.-.--~ I I =;=[~ ) ~ __ u; Vq-OL ~---r-

Page 7: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 1 of 6

% Solution to Exercise 7.1 b) % in Bertsekas - "Dynamic Programming and Optimal Control" Vol. 1 p. 445%% --% ETH Zurich% Institute for Dynamic Systems and Control% Angela Schöllig% [email protected]%% --% Revision history% [12.11.09, AS] first version%% % clear workspace and command windowclear;clc; %% PARAMETERS% Landing probability p(2) = 0.95; % Slow serve % Winning probabilityq(1) = 0.6; % Fast serveq(2) = 0.4; % Slow serve % Define value iteration error bounderr = 1e-100; % Specifiy if you want to test one value for p(1) - landing probability for% fast serve: use test = 1, or a row of probabilities: use test = 2test =1; if test ==1 % Define p(1) p(1) = 0.2; % Number of runs mend = 1;else % Define increment for p(1) incr = 0.01; % Number of runs mend = ((0.95-incr)/incr+2);end %% VALUE ITERATIONfor m=1:mend %% PARAMETER % Landing probability if test == 2

Page 8: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 2 of 6

p(1) = (m-1)*incr; % Fast serve, check for 0...0.9 end %% INITIALIZE PROBLEM % Our state space is S={0,1,2,3}x{0,1,2,3}x{1,2}, % i.e. x_k = [score player 1, score player 2, serve] % ATTENTION: indices are shifted in code % Set the initial costs to 0, although any value will do (except for the % termination cost, which must be equal to 0). J = (p(1)+0.1).*ones(4, 4, 2); % Initialize the optimal control policy: 1 represents Fast serve, 2 % represents Slow serve FVal = ones(4, 4, 2); %% POLICY ITERATION % Initialize cost-to-go costToGo = zeros(4, 4, 2); iter = 0; while (1) iter = iter+1; if (mod(iter,1e3)==0) disp(['Value Iteration Number ',num2str(iter)]); end % Update the value for i = 1:3 [costToGo(4,i,1),FVal(4,i,1)] = max(q.*p + (1-q).*p.*J(4,i+1,1)+(1-p).*J(4,i,2)); [costToGo(4,i,2),FVal(4,i,2)] = max(q.*p + (1-q.*p).*J(4,i+1,1)); [costToGo(i,4,1),FVal(i,4,1)] = max(q.*p.*J(i+1,4,1) + (1-p).*J(i,4,2)); [costToGo(i,4,2),FVal(i,4,2)] = max(q.*p.*J(i+1,4,1)); for j = 1:3 % Maximize over input 1 and 2 [costToGo(i,j,1),FVal(i,j,1)] = max(q.*p.*J(i+1,j,1) + (1-q).*p.*J(i,j+1,1)+(1-p).*J(i,j,2)); [costToGo(i,j,2),FVal(i,j,2)] = max(q.*p.*J(i+1,j,1) + (1-q.*p).*J(i,j+1,1)); end end [costToGo(4,4,1),FVal(4,4,1)] = max(q.*p.*J(4,3,1) + (1-q).*p.*J(3,4,1)+(1-p).*J(4,4,2)); [costToGo(4,4,2),FVal(4,4,2)] = max(q.*p.*J(4,3,1) + (1-q.*p).*J(3,4,1)); % max(max(max(J-costToGo)))/max(max(max(costToGo))) if (max(max(max(J-costToGo)))/max(max(max(costToGo))) < err) % update cost J = costToGo; break;

Page 9: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 3 of 6

else J = costToGo; end end % Probability of player 1 wining the game JValsave(m)=J(1, 1, 1);end; %% POLICY ITERATIONfor m=1:mend %% PARAMETER % Landing probability if test == 2 p(1) = (m-1)*incr; % Fast serve, check for 0...0.9 end %% INITIALIZE PROBLEM % Our state space is S={0,1,2,3}x{0,1,2,3}x{1,2}, % i.e. x_k = [score player 1, score player 2, serve] % ATTENTION: indices are shifted in code % Initialize the optimal control policy: 1 represents Fast serve, 2 % represents Slow serve - in vector form FPol = 2.*ones(32,1); %% VALUE ITERATION iter = 0; while (1) iter = iter+1; disp(['Policy Iteration Number ',num2str(iter)]); if (mod(iter,1e3)==0) disp(['Policy Iteration Number ',num2str(iter)]); end % Update the value % Initialize matrices APol = zeros(32,32); bPol = zeros(32,1); for i = 1:3 % [costToGo(4,i,1),FVal(4,i,1)] = max(q.*p + (1-q).*p.*J(4,i+1,1)+(1-p).*J(4,i,2)); f = FPol(12+i); APol(12+i,12+i)=-1; bPol(12+i)= -q(f)*p(f); APol(12+i,13+i)= (1-q(f))*p(f); APol(12+i,28+i)= (1-p(f)); % [costToGo(4,i,2),FPol(4,i,2)] = max(q.*p + (1-q.*p).*J(4,i+1,1));

Page 10: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 4 of 6

f = FPol(28+i); APol(28+i,28+i)=-1; bPol(28+i)= -q(f)*p(f); APol(28+i,13+i)= 1-p(f)*q(f); % [costToGo(i,4,1),FPol(i,4,1)] = max(q.*p.*J(i+1,4,1) + (1-p).*J(i,4,2)); f = FPol(4*i); APol(4*i,4*i)=-1; APol(4*i,4*i+4)= q(f)*p(f); APol(4*i,20+4*(i-1))= 1-p(f); % [costToGo(i,4,2),FPol(i,4,2)] = max(q.*p.*J(i+1,4,1)); f = FPol(4*i+16); APol(4*i+16,4*i+16)=-1; APol(4*i+16,4*i+4)= q(f)*p(f); for j = 1:3 % Maximize over input 1 and 2 % [costToGo(i,j,1),FPol(i,j,1)] = max(q.*p.*J(i+1,j,1) + (1-q).*p.*J(i,j+1,1)+(1-p).*J(i,j,2)); f = FPol(4*(i-1)+j); APol(4*(i-1)+j,4*(i-1)+j)=-1; APol(4*(i-1)+j,4*i+j)= q(f)*p(f); APol(4*(i-1)+j,4*(i-1)+j+1)= (1-q(f))*p(f); APol(4*(i-1)+j,4*(i-1)+j+16)= (1-p(f)); % [costToGo(i,j,2),FPol(i,j,2)] = max(q.*p.*J(i+1,j,1) + (1-q.*p).*J(i,j+1,1)); f = FPol(4*(i-1)+j+16); APol(4*(i-1)+j+16,4*(i-1)+j+16)=-1; APol(4*(i-1)+j+16,4*i+j)= q(f)*p(f); APol(4*(i-1)+j+16,4*(i-1)+j+1)= 1-q(f)*p(f); end end % [costToGo(4,4,1),FPol(4,4,1)] = max(q.*p.*J(4,3,1) + (1-q).*p.*J(3,4,1)+(1-p).*J(4,4,2)); f = FPol(16); APol(16,16)=-1; APol(16,15)= q(f)*p(f); APol(16,12)= (1-q(f))*p(f); APol(16,32)= (1-p(f)); % [costToGo(4,4,2),FPol(4,4,2)] = max(q.*p.*J(4,3,1) + (1-q.*p).*J(3,4,1)); f = FPol(32); APol(32,32)=-1; APol(32,15)= q(f)*p(f); APol(32,12)= 1-q(f)*p(f); % Calculate cost JPol = zeros(32,1); JPol = APol\bPol; % Can now update the policy. FPolNew = FPol; JPolNew = JPol; JPol_re = reshape(JPol,4,4,2);

Page 11: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 5 of 6

JPol_re(:,:,1)= JPol_re(:,:,1)'; JPol_re(:,:,2)= JPol_re(:,:,2)'; % Initialize cost-to-go costToGo = zeros(4, 4, 2); FPolNew_re = zeros(4, 4, 2); % Update the optimal policy for i = 1:3 [costToGo(4,i,1),FPolNew_re(4,i,1)] = max(q.*p + (1-q).*p.*JPol_re(4,i+1,1)+(1-p).*JPol_re(4,i,2)); [costToGo(4,i,2),FPolNew_re(4,i,2)] = max(q.*p + (1-q.*p).*JPol_re(4,i+1,1)); [costToGo(i,4,1),FPolNew_re(i,4,1)] = max(q.*p.*JPol_re(i+1,4,1) + (1-p).*JPol_re(i,4,2)); [costToGo(i,4,2),FPolNew_re(i,4,2)] = max(q.*p.*JPol_re(i+1,4,1)); for j = 1:3 % Maximize over input 1 and 2 [costToGo(i,j,1),FPolNew_re(i,j,1)] = max(q.*p.*JPol_re(i+1,j,1) + (1-q).*p.*JPol_re(i,j+1,1)+(1-p).*JPol_re(i,j,2)); [costToGo(i,j,2),FPolNew_re(i,j,2)] = max(q.*p.*JPol_re(i+1,j,1) + (1-q.*p).*JPol_re(i,j+1,1)); end end [costToGo(4,4,1),FPolNew_re(4,4,1)] = max(q.*p.*JPol_re(4,3,1) + (1-q).*p.*JPol_re(3,4,1)+(1-p).*JPol_re(4,4,2)); [costToGo(4,4,2),FPolNew_re(4,4,2)] = max(q.*p.*JPol_re(4,3,1) + (1-q.*p).*JPol_re(3,4,1)); JPolNew(1:16) = reshape(costToGo(:,:,1)', 16,1); JPolNew(17:32) = reshape(costToGo(:,:,2)', 16,1); FPolNew(1:16) = reshape(FPolNew_re(:,:,1)', 16,1); FPolNew(17:32) = reshape(FPolNew_re(:,:,2)', 16,1); % Quit if the policy is the same, or if the costs are essentially the % same. if ((norm(JPol - JPolNew)/norm(JPol) < 1e-10) ) break; else FPol = FPolNew; end end % Probability of player 1 wining the game JPolsave(m)=JPolNew(1,1); end;

Page 12: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 6 of 6

if test ==2 plot(0:incr:(0.95-incr), JValsave,'.'); hold on plot(0:incr:(0.95-incr), JPolsave,'r'); grid on title('Probability of the server winning a game') xlabel('p_F') ylabel('Probability of winning') legend('Value Iteration', 'Policy Iteration')else disp(['Probability of player 1 winning ',num2str(JValsave(1))]); subplot(1,2,1) bar3(FVal(:,:,1),0.25,'detached') title(['First Serve for p_F =', num2str(p(1))]) ylabel('Score Player 1') xlabel('Score Player 2') zlabel('Input: Fast Serve (1) or Slow Serve (2)') subplot(1,2,2) bar3(FVal(:,:,2),0.25,'detached') title(['Second Serve for p_F =', num2str(p(1))]) ylabel('Score Player 1') xlabel('Score Player 2') zlabel('Input: Fast Serve (1) or Slow Serve (2)')end

Page 13: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Probability of the server winning a game

pF

Pro

babi

lity

of w

inni

ng

Value IterationPolicy Iteration

Page 14: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems
Page 15: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

f<- LA.) e U-L-t}:= l FJ I R J r=;-: ~ciV<ShIK

H ~ -:-~ 'r Qa{LX.S~K

--+ -­

-tJ-'.-j f-­ .­ - ..... >----­

f- r I-­ .­I

t I-­ ....--­ -r­-I-­

I--L_ f+-I-­-I-­

I I II

I I I t--t-­-I~I -!-- l--I-­

~, lJi .... 1-1­ I 1 -t +--f-­...

[ L~ r!J 1 ~ ::fH :. 1 I~f:( t ~ ; ~

;­ II. {{.ottf t r • T i t:­ ~ ?'O~~7. I L · t r ~ ---. r= . ~l - 1----+

L -~ ; c ~ = L;.( z.L f A~ s(,.tLS c.-rcJ-' I Z ; c£.vc:.s 111 rI kU c...X1.(

.....

Page 16: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

S"~(we" ~~~~ ~ c..:>~~t ~ C)f<.-...u roG'w.s.

pZ= (A,n.) .

)!-'J.: (f=l,R).

j3 ~ ( I - [~.~ ~.Jr' [!s-lat

~4= (ft, R _' -

"'O.() O'~]1-1 [6] - 0<. Lo~ If O. (; - ~

. -

-1 -

cx-p ] /l-oL+~P

- A-2",-+-0-~ + ~ oLp -01(, oL~p

lA-o-.)'? + (-1-o.-)ol(9+P)

(A-ol) (A.- eX. 17 ol(?+p))

..J: ~ ( - ,<A)-1,: 11

-) I ot->-1 0-ol) r1~)

-7­

Page 17: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

I+. 12.

-:L !'1] =

11.((.09

Ca.¥<. -'f: eX.. gv../+r'~k J SeNt ( : (;)( -">0

--1 Lt::::~ j<ft ~L~] .

3 c:]0<.->0 eX.. ") ()

::::~ fJ ~L~ 1 I r ~ f-~10...-' 0 0/...-)0

Cc<-\-<.... z: ol.- ~/"'~ I-<.r cw k 1-0. A !

~ 3·: ~:-j oq [~; ~ ~] [_:] ­

:J '} := 'to z-z,~ J /1- ~ ...-r-OL"'1

:=:"> ~ }" > ~ jJ , j=z,3,lf oJ.- -">--1 --- 0< >...,

.:::~ '~~~k~' 0'L'ur r ":: (17t I it ) l'j ~~a-e

-s­

Page 18: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

rO -= Cy0(A.),jA U(t))

=-C­1

/2)

• Sfup"2; POG.<CCJ ~ruo~'_J·-u ..A._d Sk-p

M-(L.t"I( I) .- 0JC\ V'-\.()..>( (~(fIU-) +r>Lt r,~ tlA.) .1j1l: Cj) ) i ~'1, L ,; <J t.L€. UU) r' I

r~ (-{) -OJ~ V\Ac..J( f 4- -r (}q (Og If-''(.{) -t-O·l.lyo(?')),

6 +0~ ((';IS J}'.(-t) +o.qy.n)) J =: CWt \..U~l( t A 6. 2, /1 t:;'. ~ ~ =- R­

jJ- ~ ('2) = (D~ ..u.C<-x [- S- + (11. ~ ( C). t· Jft>( I{) 40. '3 . ~ft> (2 ) ) I

-1 +0· ~ ((J. t ~rrJ(I() +(). (~jJ(){Z ) ~

- O-J~ ~o_x 1(.27-, t;'.63 = J~

~ (R, R)

-(J. ­

Page 19: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems
Page 20: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

12.11.09 11:44 D:\research\IMRT\teaching\DPOC_Fall09\ProblemS...\script_P73c.m 1 of 4

% script_P73c.m

%

% Matlab Script with implementation of a solution to Problem 7.3c from the

% class textbook.

%

% Dynamic Programming and Optimal Control

% Fall 2009

% Problem Set, Chapter 7

% Problem 7.3 c

%

% --

% ETH Zurich

% Institute for Dynamic Systems and Control

% Sebastian Trimpe

% [email protected]

%

% --

% Revision history

% [11.11.09, ST] first version

%

%

% clear workspace and command window

clear;

%clc;

%% Program control

% Flag defining whether we run the value iteration with or without error

% bounds.

flag_errbnds = true;

% False: No error bounds; fixed number of iterations MAX_ITER.

% True: Use error bounds. Terminate when result within tolerance ERR_TOL.

MAX_ITER = 1000;

ERR_TOL = 1e-6;

%% Setup problem data

% Stage costs g. g is a 2x2 matrix where the first index correponds to the

% state i=1,2 and the second index correponds to the admissible control

% input u=1 (=advertising/do research), or 2 (=don't advertise/don't do

% research).

g = [4 6; -5 -3];

% Transition probabilities. P is a 2x2x2 matrix, where the first index

% corresponds to the origin state, the second index corresponds to the

% destination state and the third input corresponds to the applied control

% input where (1,2) maps to (advertise/do research, don't advertise/don't

% do research). For example, the probability of transition from node 1 to

% node 2 given that we do not advertice is P(1,2,2).

P = zeros(2,2,2);

% Advertise/do research (u=1):

Page 21: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

12.11.09 11:44 D:\research\IMRT\teaching\DPOC_Fall09\ProblemS...\script_P73c.m 2 of 4

P(:,:,1) = [0.8 0.2; 0.7 0.3];

% Don't advertise/don't do research (u=2):

P(:,:,2) = [0.5 0.5; 0.4 0.6];

% Discount factor.

%alpha = 0.9;

alpha = 0.99;

%% Value Iteration

% Initialize variables that we update during value iteration.

% Cost (here it really is the reward):

costJ = [0,0];

costJnew = [0,0];

% Policy

policy = [0,0];

% Control variable for breaking the loop

doBreak = false;

% Loop over value iterations k.

for k=1:MAX_ITER

for i=1:2 % loop over two states

% One value iteration step for each state.

[costJnew(i),policy(i)] = max( g(i,:) + alpha*costJ*squeeze(P(i,:,:)) );

end;

% Save results for plotting later.

res_J(k,:) = costJnew;

% Construct string to be displayed later:

dispstr = ['k=',num2str(k,'%5d'), ...

' J(1)=',num2str(costJnew(1),'%6.4f'), ...

' J(2)=',num2str(costJnew(2),'%6.4f'), ...

' mu(1)=',num2str(policy(1),'%3d'), ...

' mu(2)=',num2str(policy(2),'%3d')];

% Use error bounds if flag has been set.

if flag_errbnds

% Compute error bounds.

lower = costJnew + alpha/(1-alpha)*min(costJnew-costJ); % lower bound on J*

upper = costJnew + alpha/(1-alpha)*max(costJnew-costJ); % upper bound on J*

% display string:

dispstr = [dispstr,' lower=[',num2str(lower,'%9.4f'), ...

'] upper=[',num2str(upper,'%9.4f'),']'];

% If desired tolerance reached: can terminate the algorithm.

if all((upper-lower)<=ERR_TOL)

% Difference between upper and lower bound less than specified

Page 22: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

12.11.09 11:44 D:\research\IMRT\teaching\DPOC_Fall09\ProblemS...\script_P73c.m 3 of 4

% tolerance for all states: set cost to average of upper and

% lower bound and terminate.

% Note that using the bounds, we can get a better result than

% just using J_{k+1}.

costJnew = mean([lower; upper],1);

dispstr = [dispstr, ...

sprintf('\nResult within desired tolerance. Terminate.\n')];

doBreak = true;

end;

% Save results for plotting later.

res_lower(k,:) = lower;

res_upper(k,:) = upper;

end;

% Update the cost. One could also implement an immidiate update of the

% cost as Gauss-Seidel type update, which would yield faster

% convergence.

costJ = costJnew;

% Display:

disp(dispstr);

if doBreak

break;

end;

end;

% display the optained costs

disp('Result:');

disp([' J*(1) = ',num2str(costJ(1),'%9.6f')]);

disp([' J*(2) = ',num2str(costJ(2),'%9.6f')]);

disp([' mu*(1) = ',num2str(policy(1))]);

disp([' mu*(2) = ',num2str(policy(2))]);

%% Plots

% Plot costs and bounds over iterations.

kk = 1:size(res_J,1);

figure;

subplot(2,1,1);

if(flag_errbnds)

plot(kk,[res_J(:,1),res_lower(:,1), res_upper(:,1)], ...

[kk(1),kk(end)],[costJ(1), costJ(1)],'k');

else

plot(kk,[res_J(:,1)],[kk(1),kk(end)],[costJ(1), costJ(1)],'k');

end;

grid;

%legend('lower bound','upper bound','J_k','J*');

%xlabel('iteration');

%

Page 23: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

12.11.09 11:44 D:\research\IMRT\teaching\DPOC_Fall09\ProblemS...\script_P73c.m 4 of 4

subplot(2,1,2);

if(flag_errbnds)

plot(kk,[res_J(:,2),res_lower(:,2), res_upper(:,2)], ...

[kk(1),kk(end)],[costJ(2),costJ(2)],'k');

else

plot(kk,[res_J(:,2)],[kk(1),kk(end)],[costJ(2), costJ(2)],'k');

end;

grid;

if(flag_errbnds)

legend('J_k','lower bound','upper bound','J*');

else

legend('J_k','J*');

end;

xlabel('iteration');

Page 24: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

I

T

~

..

r

-+--..------. ---­

t-­

-­ --I- +--­ ~ __

-,

_____ -.0-. __ -+-­

-~ ,..--,--­

------r---+­ ,.. .... --r r­-+-.,.---.,.----,­ - t­

~-- --­ ~

-~)--

--­ - +--~

-_ --+--1--'-_-­

~ ~ ~~-=-=-=-~ =: ==~ - I±fl.£dn .l----,----j--,-~~....;__f_-- I ~--

-l-------'----'­ .1--1..,....-.,---'----'-'---!--_ _~ ~- ~ - _ ~

- ~ I 1'.-=-~_-a-­ -~ .::I! .. ---+­__• . I I I ' ~~

, - -CoJlDi~..L-i~'-,,-,,'!--.,....-"'&--t:-~~

..

~_--t ~I ~ T­~ - --­

I--:--­__~~ ~ k<:/Yl ~ t$ Q (c) r ~ ~, '"'l:l .r> rJ: ­ r-~\ ~- L '­~ ~ft(/~\'£F- -\.,~ I, kYC­ ~...,.."..,

""IIiI'~"-J ......, --at X 1if -F--m..~_-'-- ):\S6 ~~ I 1~_

h __ ~_ 0t<.LZ) _C~ ~ -=r6E)J± or propa ~~;)~I .-.­ a:s-~ - Cr5;(~ _ -­ ---r­ ~

~ £hd:- D\...~ o.ft:tr a..:l (Y'uX::£ CL <Steps { (.,Je..

i---..----r-- --c.eKf{(}{~~~ ~~-=.6 - ~~ a.L~

b (,.)(.t.~ ~po {we. (0to6~tl.j

Page 25: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

I I 1

; 1I

• 1

I t­

., +

...

r I I •

I : :;)

j. I

j lil-

if "

I ~

l.! j"

. j

r I

1 ••.

I I

j I

~ t

] .. I

• I

I I:

I'

I

I I

I I

c t.

I

I I

'I

! I

r j

I

)/IJ

I tll

I I

r ,

III

I ;"

rllr:j

jl1

!11

I

I IJ

J I l~

I.

1 ;

1

~ I

I I

I [

: !I

i.

I 1

;

,;Iii

; I

·I c

I -I

! ! I

I '

i~ ·I

~ I , I +

I +

J

..

Page 26: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

I t­

'Pn:7 ~ <..u....t ..{." '1 [ . . I

~. [ ~ ~~~~~ r-roru ~.c:J i2P'-s ~

p-oace, t~o....h(,~ Oyt pCLcr- 4- /s .J • •• (j-,'~ 1/J-1 eX) ~ lJJ (x) VX' e. S

• T {;" k~ ~~: }I' (X' ).f 11c.t I (x)

P(Q-O~ b ~~ch'~~ f ~

rw ~ ;t -,-l;-/ J><! ~-1 hi l'< ) ~'~

:1 K. L>< ') ~ ~ ~-i; (l) \7 >( E S_ {<N S~ k

- .

~-~ -~C,,-C>4-~.J-­u6(.a~4

t:le-t (\<'1 v... "7,,,- LX- IA. ~~ -/ _~~_~ +0... ~}- T

-} - .

. - ­

L ~ __-9" e eX..

Page 27: Dynamic Programming and Optimal Control · Dynamic Programming and Optimal Control Fall 2009 Problem Set: In nite Horizon Problems, Value Iteration, Policy Iteration Notes: Problems

N '"

Q

..D

I

1;1 r

r~

1 --f

N

~tt

l ~

rJ

« -

'+

-...-.

x ~!.

~

~ "7

}:

rJ

­0

~

(( -~

~ ~

I '~

r ~-

t

t :

I

+

i

I­t

I

t

~

....... '

tX

-

V! -..

.A

r\1"

.

'Ls.

.J

· ·.

r:~:

j~1

E. >

(

·f ~

I i

-.

-: I

+

L

, ,

... t

t ~

+

I +

I

t-I

1

i +

t

-I ~

I· •

~ j­

r -\­

-t---

4--,

--~

l' -

-:--­

~

jl-'

}

"J.+

I

:l+t

_t~!

!l

• I

I­I

1­+--

1­t­

1 I

t-I

,

t -I­

--

-.­•

+.

r t

Itt

.

t 1

+

I i

_ ._

. __~

1 .:

t 1

r J

: ~_

1 I

~

I

+

_J L

,

t­+

t-l-

t

f ! ~

+

I ~

r

~ j ;

I ~ ~

[I t

:1 ·

, . ,

r

-.. . -­


Recommended