Iteration IERS.A

Preview:

Citation preview

Bellman operators T RSA RSAf fERSH Tf sa Rcs.a t PEIswpcs.az Vfb'D

where Vfcs maeyxfcsi.ae

Q TO Bellman rationality afnifmallValue Iteration 0 TO't then NEU'tHoi'ssmallAlg dethe 0 0 iIERS.A.IQh

JQ h l stop at large h H

Goal bound 110t t 0 16

Hemmed V f.f Tf tf.es If fHQ ht QHo llTR TQ Hoo

FAHEYE 8110 Alles

havLangeHmedstabe E E 8 110 0 0 16It griefs E 84k

Tt ff mgxfGsag

Proof Vcs.at ICTfxs.at CTFXs.pl

IRG4tITtswps.afVfCs7J RCs.aX IEspILVfiGDD IEsnpgLYGD Vf.CM l

a 811th Vella 811 f f'HosMRS YRSA

It suffices to show Ys Ivf Vfts IEMaaxlfls.atf'GaI mqxfcs.at maaxfts.at emaaxIfcs.aI fts.asl

iiidefine a argmaaefts.at y.fso fCsa Vfcstmaaxfcs.at 1 3

magfis.ai maafks.g fcs.axy maaxfts.ae

fisa f Cs a IEFT

AlettaF

18 t 0 06V cs E 98 rt la sis 8VHtCs7imqxTs7 QCs.a7

s

Claim Q t as the outputof VL is theoptimal Q function for H steptruncatedobjective

0 0 8 Q't e TR RQMGet_RsaIt 8 Eswpcs.agEmaaXRcs5a93

V t E mean a8 rt

O

E g qt a t o e

HisoptwrtyE8tIrthfsiaQ Ga Q HcsaytH

E 78 rt y8 rakiE Eta rt f

T'T

L LyHy't 48 Rmax HEFI.gg

Hzo

4Q 0

20 3 Q't t

I d t dTE Th T's TH

fait Tut l a Tl

LExerciseiyefudafebeuthffttfd.figtg8

2 Bend the suboptimalityofdefined unit 818

the Tutto Te nonstationarypolicy