Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen BTSM Seminar 12.07.19.(Thu)...

Ch 17. Optimal control theory and the linear Bellman equation

HJ Kappen

BTSM Seminar12.07.19.(Thu)

Summarized by Joon Shik Kim

Introduction

• Optimising a sequence of actions to attain some fu-ture goal is the general topic of control theory.

• In an example of a human throwing a spear to kill an animal, a sequence of actions can be assigned a cost consists of two terms.

• The first is a path cost that specifies the energy consumption to contract the muscles.

• The second is an end cost that specifies whether the spear will kill animal, just hurt it, or miss it.

• The optimal control solution is a sequence of motor commands that results in killing the animal by throwing the spear with minimal physical effort.

Discrete Time Control (1/3)

• where xt is an n-dimensional vector describing the state of the system and ut is an m-dimensional vector that specifies the control or action at time t. • A cost function that assigns a cost to each sequence

of controls

where R(t,x,u) is the cost associated with taking action u at time t in state x, and Φ(xT) is the cost associated with ending up in state xT at time T.

1 ( , , ),t t t tx x f t x u 0,1,..., 1,t T

0 0: 10

( , ) ( ) ( , , )T

T T t tt

C x u x R t x u

• The problem of optimal control is to find the sequence u0:T-1 that min-imises C(x0, u0:T-1).

• The optimal cost-to-go

( , ) min ( ) ( , , )t T

t T s su

J t x x R s x u

min( ( , , ) ( 1, ( , , ))).t

t t t t tu

R t x u J t x f t x u

• The algorithm to compute the opti-mal control, trajectory, and the cost is given by

• 1. Initialization: • 2. Backwards: For t=T-1,…,0 and for

x compute

• 3. Forwards: For t=0,…,T-1 compute

( , ) ( ).J T x x

*( ) argmin{ ( , , ) ( 1, ( , , ))},tu

u x R t x u J t x f t x u * *( , ) ( , , ) ( 1, ( , , )).t tJ t x R t x u J t x f t x u

* * * * *1 ( , , ( )).t t t t tx x f t x u x

The HJB Equation (1/2)

• (Hamilton-Ja-cobi-Belman equation)

• The optimal control at the current x, t is given by

• Boundary condition is

( , ) min( , , ) ( , ( , , ) )),u

J t x R x u dt J t dt x f x u t dt

min( ( , , ) ( , ) ( , ) ( , ) ( , , ) ),t xuR t x u dt J t x J t x dt J t x f x u t dt

( , ) min( ( , , ) ( , , ) ( , )).t xu

J t x R t x u f x u t J x t

( , ) argmin( , , ) ( , , ) ( , )).xu

u x t R u t f x u t J t x

( , ) ( ).J x T x

The HJB Equation (2/2)

Optimal control of mass on a spring

Stochastic Differential Equations (1/2)

• Consider the random walk on the line

with x0=0.

• In a closed form, .• • In the continuous time limit we define

• The conditional probability distribution

1 ,t t tx x ,t

0,tx 2 .tx t

( )1( , | ,0) exp .

x xx t x

t t dt tdx x x d (Wiener Process)

Stochastic Optimal Control Theory (2/2)

• • dξ is a Wiener process with .• Since <dx2> is of order dt, we must

make a Taylor expansion up to order dx2.

( ( ), ( ), )dx f x t u t t dt d

( , , )i j ijd d t x u dt

21( , ) min ( , , ) ( , , ) ( , ) ( , , ) ( , ) .

2t x xu

J t x R t x u f x u t J x t t x u J x t

Stochastic Hamilton-Jacobi-Bellman equation

( , , )dx f x u t dt 2 ( , , )dx t x u dt : drift : diffusion

Path Integral Control (1/2)

• In the problem of linear control and quadratic cost, the nonlinear HJB equation can be transformed into a linear equation by a log transforma-tion of the cost-to-go.( , ) log ( , ).J x t x t

21( , ) ( ) .

Vx t f Tr g g

HJB becomes

Path Integral Control (2/2)

• Let describe a diffusion process for defined Fokker-Planck equation

( , | , )y x t t

21( ) ( ) .

2T TVf Tr g g

( , ) ( , | , ) exp( ( ) / ).x t dy y T x y y

The Diffusion Process as a Path In-tegral (1/2)

• Let’s look at the first term in the equation 1 in the previous slide. The first term describes a process that kills a sample trajectory with a rate of V(x,t)dt/λ.

• Sampling process and Monte Carlo( , ) ( , ) ,dx f x t dt g x t d

,x x dx With probability 1-V(x,t)dt/λ,

†,ix with probability V(x,t)/λ, in this case, path is killed.

1( , ) ( , | , ) exp( ( ) / ) exp( ( ( )) ).i

i alive

x t dy y T x t y x TN

The Diffusion Process as a Path In-tegral (2/2)

where ψ is a partition function, J is a free-energy, S is the energy of a path, and λ the temperature.

1 1( ( ) | , ) exp ( ( )) .

( , )p x t T x t S x t T

Discussion

• One can extend the path integral control of formalism to multiple agents that jointly solve a task. In this case the agents need to coordi-nate their actions not only through time, but also among each other to maximise a common reward function.

• The path integral method has great potential for application in robotics.

Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen BTSM Seminar 12.07.19.(Thu)...

Documents

mmanagement.e-monsite.commmanagement.e-monsite.com/medias/files/sujet-btsm… · Web viewSa clientèle est composée des populations des bassins d'emploi et de vie de Forbach ainsi

Die Leitung kappen - topeins.dguv.de · Die Leitung kappen So sorgen Führungskräfte für das richtige Maß an Erreichbarkeit und Ruhephasen Das Magazin für Führungskräfte Ausgabe

Hatertse en Overasseltse Vennen - bomengids.nlDoorgerekende maatregelen Maas en Maas-Waalkanaal • Peilverhoging Rijkswaterstaat Bestaande natuurgebieden: • Kappen van alle bos

U bent van harte welkom · 2013. 12. 27. · w13.000541; Orionweg ong., kappen 16 bomen (19/12/2013) Velsen-Zuid Rijksweg 136, kappen diverse bomen (19/12/2013) w13.000543; Minister

GEBURTSTAG LAY 12.07.19 - TravelClick Web Solutions · CHF-8050 Zürich info@sternenoerlikon.ch +41 43 300 65 65 H PPY BIRTHDAY! IM RESTAURANT Ö . Title: GEBURTSTAG_LAY_12.07.19.indd

Naar een professionele, vitale en inspirerende ... … · respectievelijk binnen en tussen teams * Gelijke monniken, gelijke kappen ... • Rechtstreeks nformatie halen, ‘over’

DAS SICHERE FALLSCHUTZ MATTENSYSTEM | PUNKT FÜR … · Spritzgußmaschinen. Mit weiß verzinkten oder verchromten Kappen. SPEZIALMASCHINEN- STELLFÜSSE NIVELLIERBAR 03. 5 Maschinenfuß

BTSM Local Operation Guide

UNIK4230: Mobile Communicationscwi.unik.no/images/8/87/UNIK4230_Summary_2013.pdf · Protocol Stack structure of GSM CM MM RR LAPDm MS RR LAPDm Um LAPD BTSM LAPD Abis RR BTSM SCCP

MEDREP Finance Mediterranean Renewable Energy Program MEDREP Jan Kappen Project Manager, Energy Finance UNEP-DTIE jan.kappen@unep.fr

Monniken met gelijke kappen Hans Koenders. functioneel loket & “mijn loket” beide maken gebruik van opgeslagen gegevens, toegankelijk met functionele

Upgrade Gtmu With Btsm

Bert Kappen, Wim Wiegerinck SNN University of Nijmegen ...wimw/dirPRforAI/sheets_prml.pdf · Introduction to Pattern Recognition Bert Kappen, Wim Wiegerinck SNN University of Nijmegen,

Tourist & Ticket Service - stadtmarketing-witten.de · 01.07.19 Mark Knopfler König-Pilsener-Arena Oberhausen ab 72,35 € 12.07.19 Udo Lindenberg Westfalenhalle Dortmund ab 74,00

ID069-Uskmouth infographic Ammends 12.07.19 - 300% (003) · ID069-Uskmouth infographic Ammends 12.07.19 - 300% (003).jpg Author: rcorbet Created Date: 7/15/2019 6:48:31 AM

08 - Kleidung · Röcke, Abendkleider, Blusen, Kostüme Schianzüge, Trainingshosen, Turnanzüge, Badehosen Kappen, Helme, Mützen, Hüte Kopfbedeckung • Unterwäsche • Damenbekleidung

mediakit 2019 - SEPAWA CONGRESS · No. Deadlines Main Focus of the Issue Personal Care Home Care Bonus Distribution 9 Editorial: 12.07.19 Adverts: 16.08.19 Publication: 11.09.19 Anti-Ageing

1 jan kappen cc risks & opps j kappen spanish (bogota 18 11-10) final

Artificial Intelligence and interdisciplinarity Bert Kappen Symposium Neuroscience Oktober 2012

userpage.fu-berlin.deuserpage.fu-berlin.de/leinfelder/palaeo_de/leinfelder/pdfs/lein... · kappen sowie Inlandvereisung als „Kühlhaus"-Phasen gekennzeichnet. Die dazwischenliegenden