View
1
Download
0
Category
Preview:
Citation preview
Bellman operators T RSA RSAf fERSH Tf sa Rcs.a t PEIswpcs.az Vfb'D
where Vfcs maeyxfcsi.ae
Q TO Bellman rationality afnifmallValue Iteration 0 TO't then NEU'tHoi'ssmallAlg dethe 0 0 iIERS.A.IQh
JQ h l stop at large h H
Goal bound 110t t 0 16
Hemmed V f.f Tf tf.es If fHQ ht QHo llTR TQ Hoo
FAHEYE 8110 Alles
havLangeHmedstabe E E 8 110 0 0 16It griefs E 84k
Tt ff mgxfGsag
Proof Vcs.at ICTfxs.at CTFXs.pl
IRG4tITtswps.afVfCs7J RCs.aX IEspILVfiGDD IEsnpgLYGD Vf.CM l
a 811th Vella 811 f f'HosMRS YRSA
It suffices to show Ys Ivf Vfts IEMaaxlfls.atf'GaI mqxfcs.at maaxfts.at emaaxIfcs.aI fts.asl
iiidefine a argmaaefts.at y.fso fCsa Vfcstmaaxfcs.at 1 3
magfis.ai maafks.g fcs.axy maaxfts.ae
fisa f Cs a IEFT
AlettaF
18 t 0 06V cs E 98 rt la sis 8VHtCs7imqxTs7 QCs.a7
s
Claim Q t as the outputof VL is theoptimal Q function for H steptruncatedobjective
0 0 8 Q't e TR RQMGet_RsaIt 8 Eswpcs.agEmaaXRcs5a93
V t E mean a8 rt
O
E g qt a t o e
HisoptwrtyE8tIrthfsiaQ Ga Q HcsaytH
E 78 rt y8 rakiE Eta rt f
T'T
L LyHy't 48 Rmax HEFI.gg
Hzo
4Q 0
20 3 Q't t
I d t dTE Th T's TH
fait Tut l a Tl
LExerciseiyefudafebeuthffttfd.figtg8
2 Bend the suboptimalityofdefined unit 818
the Tutto Te nonstationarypolicy
Recommended