Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen BTSM Seminar 12.07.19.(Thu)...

Preview:

Citation preview

Ch 17. Optimal control theory and the linear Bellman equation

HJ Kappen

BTSM Seminar12.07.19.(Thu)

Summarized by Joon Shik Kim

Introduction

• Optimising a sequence of actions to attain some fu-ture goal is the general topic of control theory.

• In an example of a human throwing a spear to kill an animal, a sequence of actions can be assigned a cost consists of two terms.

• The first is a path cost that specifies the energy consumption to contract the muscles.

• The second is an end cost that specifies whether the spear will kill animal, just hurt it, or miss it.

• The optimal control solution is a sequence of motor commands that results in killing the animal by throwing the spear with minimal physical effort.

Discrete Time Control (1/3)

• where xt is an n-dimensional vector describing the state of the system and ut is an m-dimensional vector that specifies the control or action at time t. • A cost function that assigns a cost to each sequence

of controls

where R(t,x,u) is the cost associated with taking action u at time t in state x, and Φ(xT) is the cost associated with ending up in state xT at time T.

1 ( , , ),t t t tx x f t x u 0,1,..., 1,t T

1

0 0: 10

( , ) ( ) ( , , )T

T T t tt

C x u x R t x u

Discrete Time Control (3/3)

• The problem of optimal control is to find the sequence u0:T-1 that min-imises C(x0, u0:T-1).

• The optimal cost-to-go

: 1

1

( , ) min ( ) ( , , )t T

T

t T s su

s t

J t x x R s x u

min( ( , , ) ( 1, ( , , ))).t

t t t t tu

R t x u J t x f t x u

Discrete Time Control (1/3)

• The algorithm to compute the opti-mal control, trajectory, and the cost is given by

• 1. Initialization: • 2. Backwards: For t=T-1,…,0 and for

x compute

• 3. Forwards: For t=0,…,T-1 compute

( , ) ( ).J T x x

*( ) argmin{ ( , , ) ( 1, ( , , ))},tu

u x R t x u J t x f t x u * *( , ) ( , , ) ( 1, ( , , )).t tJ t x R t x u J t x f t x u

* * * * *1 ( , , ( )).t t t t tx x f t x u x

The HJB Equation (1/2)

• (Hamilton-Ja-cobi-Belman equation)

• The optimal control at the current x, t is given by

• Boundary condition is

( , ) min( , , ) ( , ( , , ) )),u

J t x R x u dt J t dt x f x u t dt

min( ( , , ) ( , ) ( , ) ( , ) ( , , ) ),t xuR t x u dt J t x J t x dt J t x f x u t dt

( , ) min( ( , , ) ( , , ) ( , )).t xu

J t x R t x u f x u t J x t

( , ) argmin( , , ) ( , , ) ( , )).xu

u x t R u t f x u t J t x

( , ) ( ).J x T x

The HJB Equation (2/2)

Optimal control of mass on a spring

Stochastic Differential Equations (1/2)

• Consider the random walk on the line

with x0=0.

• In a closed form, .• • In the continuous time limit we define

• The conditional probability distribution

1 ,t t tx x ,t

1

t

t iix

0,tx 2 .tx t

20

0

( )1( , | ,0) exp .

22

x xx t x

tt

t t dt tdx x x d (Wiener Process)

Stochastic Optimal Control Theory (2/2)

• • dξ is a Wiener process with .• Since <dx2> is of order dt, we must

make a Taylor expansion up to order dx2.

( ( ), ( ), )dx f x t u t t dt d

( , , )i j ijd d t x u dt

21( , ) min ( , , ) ( , , ) ( , ) ( , , ) ( , ) .

2t x xu

J t x R t x u f x u t J x t t x u J x t

Stochastic Hamilton-Jacobi-Bellman equation

( , , )dx f x u t dt 2 ( , , )dx t x u dt : drift : diffusion

Path Integral Control (1/2)

• In the problem of linear control and quadratic cost, the nonlinear HJB equation can be transformed into a linear equation by a log transforma-tion of the cost-to-go.( , ) log ( , ).J x t x t

21( , ) ( ) .

2T T

t

Vx t f Tr g g

HJB becomes

Path Integral Control (2/2)

• Let describe a diffusion process for defined Fokker-Planck equation

( , | , )y x t t

21( ) ( ) .

2T TVf Tr g g

( , ) ( , | , ) exp( ( ) / ).x t dy y T x y y

(1)

The Diffusion Process as a Path In-tegral (1/2)

• Let’s look at the first term in the equation 1 in the previous slide. The first term describes a process that kills a sample trajectory with a rate of V(x,t)dt/λ.

• Sampling process and Monte Carlo( , ) ( , ) ,dx f x t dt g x t d

,x x dx With probability 1-V(x,t)dt/λ,

†,ix with probability V(x,t)/λ, in this case, path is killed.

1( , ) ( , | , ) exp( ( ) / ) exp( ( ( )) ).i

i alive

x t dy y T x t y x TN

The Diffusion Process as a Path In-tegral (2/2)

where ψ is a partition function, J is a free-energy, S is the energy of a path, and λ the temperature.

1 1( ( ) | , ) exp ( ( )) .

( , )p x t T x t S x t T

x t

Discussion

• One can extend the path integral control of formalism to multiple agents that jointly solve a task. In this case the agents need to coordi-nate their actions not only through time, but also among each other to maximise a common reward function.

• The path integral method has great potential for application in robotics.

Recommended