3
Bellman operators T RSA RSA f f ERSH Tf sa Rcs.a t PEIswpcs.az Vfb'D where Vfcs maeyxfcsi.ae Q TO Bellman rationality af nifmall Value Iteration 0 TO't then NEU't Hoi's small Alg dethe 0 0 i IERS.A.IQ h JQ h l stop at large h H Goal bound 110 t t 0 16 Hemmed V f.f T f tf.es If f HQ ht Q Ho llTR TQ Hoo FAHEY E 8110 Alles hav Lange Hmedstabe E E 8 110 0 0 16 It griefs E 84k Tt ff mgxfGsag Proof V cs.at ICTfxs.at CTFXs.pl IRG4tITtswps.afVfCs7J RCs.aX IEspILVfiGD D IEsnpg LY GD Vf.CM l a 811th Vella 811 f f'Hos MRS YRSA

Iteration IERS.A

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Iteration IERS.A

Bellman operators T RSA RSAf fERSH Tf sa Rcs.a t PEIswpcs.az Vfb'D

where Vfcs maeyxfcsi.ae

Q TO Bellman rationality afnifmallValue Iteration 0 TO't then NEU'tHoi'ssmallAlg dethe 0 0 iIERS.A.IQh

JQ h l stop at large h H

Goal bound 110t t 0 16

Hemmed V f.f Tf tf.es If fHQ ht QHo llTR TQ Hoo

FAHEYE 8110 Alles

havLangeHmedstabe E E 8 110 0 0 16It griefs E 84k

Tt ff mgxfGsag

Proof Vcs.at ICTfxs.at CTFXs.pl

IRG4tITtswps.afVfCs7J RCs.aX IEspILVfiGDD IEsnpgLYGD Vf.CM l

a 811th Vella 811 f f'HosMRS YRSA

Page 2: Iteration IERS.A

It suffices to show Ys Ivf Vfts IEMaaxlfls.atf'GaI mqxfcs.at maaxfts.at emaaxIfcs.aI fts.asl

iiidefine a argmaaefts.at y.fso fCsa Vfcstmaaxfcs.at 1 3

magfis.ai maafks.g fcs.axy maaxfts.ae

fisa f Cs a IEFT

AlettaF

18 t 0 06V cs E 98 rt la sis 8VHtCs7imqxTs7 QCs.a7

s

Claim Q t as the outputof VL is theoptimal Q function for H steptruncatedobjective

0 0 8 Q't e TR RQMGet_RsaIt 8 Eswpcs.agEmaaXRcs5a93

V t E mean a8 rt

O

Page 3: Iteration IERS.A

E g qt a t o e

HisoptwrtyE8tIrthfsiaQ Ga Q HcsaytH

E 78 rt y8 rakiE Eta rt f

T'T

L LyHy't 48 Rmax HEFI.gg

Hzo

4Q 0

20 3 Q't t

I d t dTE Th T's TH

fait Tut l a Tl

LExerciseiyefudafebeuthffttfd.figtg8

2 Bend the suboptimalityofdefined unit 818

the Tutto Te nonstationarypolicy