11
SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 2014 1 Stochastic Extended LQR for Optimization-based Motion Planning Under Uncertainty Wen Sun, Jur van den Berg, and Ron Alterovitz Abstract—We introduce a novel optimization-based motion planner, Stochastic Extended LQR (SELQR), which computes a trajectory and associated linear control policy with the objective of minimizing the expected value of a user-defined cost function. SELQR applies to robotic systems that have stochastic non-linear dynamics with motion uncertainty modeled by Gaussian distribu- tions that can be state- and control-dependent. In each iteration, SELQR uses a combination of forward and backward value iteration to estimate the cost-to-come and the cost-to-go for each state along a trajectory. SELQR then locally optimizes each state along the trajectory at each iteration to minimize the expected total cost, which results in smoothed states that are used for dynamics linearization and cost function quadratization. SELQR progressively improves the approximation of the expected total cost, resulting in higher quality plans. For applications with imperfect sensing, we extend SELQR to plan in the robot’s belief space. We show that our iterative approach achieves fast and reliable convergence to high-quality plans in multiple simulated scenarios involving a car-like robot, a quadrotor, and a medical steerable needle performing a liver biopsy procedure. Note to Practitioners—The motion of a robot cannot be predicted with certainty in a variety of robotics applications, including aerial robots moving in turbulent conditions, mobile robots maneuvering on unfamiliar terrain, and robotic steerable needles being guided to clinical targets in soft tissue. Explicitly considering uncertainty during motion planning before task exe- cution can improve the quality of computed plans. We introduce a novel optimization-based motion planner, Stochastic Extended LQR (SELQR), which computes a trajectory and associated linear control policy with the objective of minimizing the expected value of a user-defined cost function. For applications that include both motion uncertainty and imperfect sensing, we extend SELQR to plan in the space of the robot’s beliefs, i.e., distributions over the set of possible states. We demonstrate the speed and effectiveness of SELQR in simulation for a variety of robotics applications. Index Terms—motion planning under uncertainty, nonholo- nomic motion planning I. I NTRODUCTION When a robot performs a task, the robot’s motion may be affected by uncertainty from a variety of sources, including unpredictable external forces or actuation errors. Uncertainty arises in a variety of robotics applications, including aerial This research was supported in part by the National Science Foundation under awards IIS-1117127 and IIS-1149965 and by the National Institutes of Health under award R21EB017952. Wen Sun is with the Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213 USA. Jur van den Berg is with the School of Computing, University of Utah, Salt Lake City, UT 84112 USA. Ron Alterovitz is with the Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 24799 USA (e-mail: [email protected]). (a) SELQR trajectory inserted from side (b) SELQR trajectory inserted from front Fig. 1. We show plans computed by SELQR for needle steering for a liver biopsy with motion uncertainty. The objective is to access the tumor (yellow) while avoiding the hepatic arteries (red), hepatic veins (blue), portal veins (pink), and bile ducts (green). The smooth trajectories explicitly consider uncertainty and minimize the a priori expected value of a cost function that considers obstacle avoidance and path length. robots moving in turbulent conditions, mobile robots maneu- vering on unfamiliar terrain, and robotic steerable needles being guided to clinical targets in soft tissue. A deliberative approach that accounts for uncertainty during motion planning before task execution can improve the quality of computed plans, increasing the chances that the robot will complete the desired motion in a manner that avoids obstacles and minimizes costs. We introduce an optimization-based motion planner that explicitly considers the impacts of motion uncertainty. Re- cent years have seen the introduction of multiple success- ful optimization-based motion planners, although most have focused on robots with deterministic dynamics (e.g., [29], [18], [7]). Compared to commonly used sampling-based mo- tion planners [9], optimization-based motion planners produce

SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE …

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE …

SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 2014 1

Stochastic Extended LQR for Optimization-basedMotion Planning Under Uncertainty

Wen Sun, Jur van den Berg, and Ron Alterovitz

Abstract—We introduce a novel optimization-based motionplanner, Stochastic Extended LQR (SELQR), which computes atrajectory and associated linear control policy with the objectiveof minimizing the expected value of a user-defined cost function.SELQR applies to robotic systems that have stochastic non-lineardynamics with motion uncertainty modeled by Gaussian distribu-tions that can be state- and control-dependent. In each iteration,SELQR uses a combination of forward and backward valueiteration to estimate the cost-to-come and the cost-to-go for eachstate along a trajectory. SELQR then locally optimizes each statealong the trajectory at each iteration to minimize the expectedtotal cost, which results in smoothed states that are used fordynamics linearization and cost function quadratization. SELQRprogressively improves the approximation of the expected totalcost, resulting in higher quality plans. For applications withimperfect sensing, we extend SELQR to plan in the robot’s beliefspace. We show that our iterative approach achieves fast andreliable convergence to high-quality plans in multiple simulatedscenarios involving a car-like robot, a quadrotor, and a medicalsteerable needle performing a liver biopsy procedure.

Note to Practitioners—The motion of a robot cannot bepredicted with certainty in a variety of robotics applications,including aerial robots moving in turbulent conditions, mobilerobots maneuvering on unfamiliar terrain, and robotic steerableneedles being guided to clinical targets in soft tissue. Explicitlyconsidering uncertainty during motion planning before task exe-cution can improve the quality of computed plans. We introducea novel optimization-based motion planner, Stochastic ExtendedLQR (SELQR), which computes a trajectory and associatedlinear control policy with the objective of minimizing the expectedvalue of a user-defined cost function. For applications thatinclude both motion uncertainty and imperfect sensing, weextend SELQR to plan in the space of the robot’s beliefs, i.e.,distributions over the set of possible states. We demonstrate thespeed and effectiveness of SELQR in simulation for a variety ofrobotics applications.

Index Terms—motion planning under uncertainty, nonholo-nomic motion planning

I. INTRODUCTION

When a robot performs a task, the robot’s motion may beaffected by uncertainty from a variety of sources, includingunpredictable external forces or actuation errors. Uncertaintyarises in a variety of robotics applications, including aerial

This research was supported in part by the National Science Foundationunder awards IIS-1117127 and IIS-1149965 and by the National Institutes ofHealth under award R21EB017952.

Wen Sun is with the Robotics Institute, Carnegie Mellon University,Pittsburgh, PA 15213 USA.

Jur van den Berg is with the School of Computing, University of Utah,Salt Lake City, UT 84112 USA.

Ron Alterovitz is with the Department of Computer Science, Universityof North Carolina at Chapel Hill, Chapel Hill, NC 24799 USA (e-mail:[email protected]).

(a) SELQR trajectory inserted from side

(b) SELQR trajectory inserted from front

Fig. 1. We show plans computed by SELQR for needle steering for a liverbiopsy with motion uncertainty. The objective is to access the tumor (yellow)while avoiding the hepatic arteries (red), hepatic veins (blue), portal veins(pink), and bile ducts (green). The smooth trajectories explicitly consideruncertainty and minimize the a priori expected value of a cost function thatconsiders obstacle avoidance and path length.

robots moving in turbulent conditions, mobile robots maneu-vering on unfamiliar terrain, and robotic steerable needlesbeing guided to clinical targets in soft tissue. A deliberativeapproach that accounts for uncertainty during motion planningbefore task execution can improve the quality of computedplans, increasing the chances that the robot will completethe desired motion in a manner that avoids obstacles andminimizes costs.

We introduce an optimization-based motion planner thatexplicitly considers the impacts of motion uncertainty. Re-cent years have seen the introduction of multiple success-ful optimization-based motion planners, although most havefocused on robots with deterministic dynamics (e.g., [29],[18], [7]). Compared to commonly used sampling-based mo-tion planners [9], optimization-based motion planners produce

Page 2: SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE …

SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 2014 2

plans that are smoother (without requiring a separate smooth-ing algorithm) and that are computed faster, albeit sometimeswith a loss of completeness and global optimality. Prioroptimization-based motion planners that consider deterministicdynamics can only minimize deterministic cost functions (e.g.,minimizing path length while avoiding obstacles). In this paperwe focus on robots with stochastic dynamics, and consequentlyminimize the a priori expected value of a cost function whena plan and corresponding controller are executed. The user-defined cost function can be based on path length, controleffort, and obstacle collision avoidance.

We first introduce the Stochastic Extended LQR (SELQR)motion planner, a novel optimization-based motion plannerwith fast and reliable convergence that explicitly considersrobot motion uncertainty. The method is designed for motionplanning problems involving robotic systems with non-linear(but linearizable) dynamics, any cost function with positive(semi)definite Hessians, and motion uncertainty that can bereasonably modeled using Gaussian distributions that can bestate- and control-dependent. Our approach builds on the linearquadratic regulator (LQR), a commonly used linear controllerthat does not explicitly consider obstacle avoidance. As anoptimization-based approach, SELQR starts motion planningfrom a start state and returns a high-quality trajectory and anassociated linear control policy that consider uncertainty andare optimized with respect to the given cost function.

To achieve fast performance, our approach in each iterationuses both the stochastic forward and inverse dynamics ina manner inspired by an iterated Kalman smoother [2]. Ineach iteration’s backward pass, SELQR uses the stochasticdynamics to compute a control policy and estimate the cost-to-go of each state, which is the minimum expected future costassuming the robot starts from each state. In each iteration’sforward pass, SELQR estimates the cost-to-come to each state,which is the minimum cost to reach each state from the initialstate. SELQR then approximates the expected total cost at eachstate by summing the cost-to-come and the cost-to-go. SELQRprogressively improves the approximation of the cost-to-comeand cost-to-go and hence improves its estimate of the expectedtotal cost. A key insight in SELQR is that we locally optimizeeach state along a trajectory at each iteration to minimize theexpected total cost, which results in smoothed states that arecost-informative and used for dynamics linearization and costfunction quadratization. These smoothed states enable the fastand reliable convergence of SELQR.

We next extend SELQR to consider uncertainty in bothmotion and sensing. Although the robot in such cases oftencannot directly observe its current state, it can estimate adistribution over the set of possible states (i.e., its belief state)based on noisy and partial sensor measurements. We introduceB-SELQR, a variant of SELQR that plans in belief spacerather than state space for robots with both motion and sensinguncertainty, where belief states are modeled with Gaussiandistributions. For such robots, the motion planning problemcan be modeled as a Partially Observable Markov DecisionProcess (POMDP). Exact global optimal solutions to POMDPsare prohibitive for most applications since the belief space(over which a control policy is to be computed) is, in the

most general formulation, the infinite-dimensional space of allpossible probability distributions over the finite dimensionalstate space. B-SELQR quickly computes a trajectory andlocally-valid controller from scratch in belief space.

In this work, we provide a refined archival version of theresults presented at a conference [19] and incorporate severalimportant extensions. We added detail to the method derivationand expanded the experiments to include a non-Gaussian noisescenario, different Gaussian noise levels for B-SELQR, and anew beacons-based scenario for B-SELQR. We demonstratethe speed and effectiveness of SELQR in simulation for a car-like robot, quadrotor, and medical steerable needle (see Fig. 1).We also demonstrate B-SELQR for scenarios with imperfectsensing.

II. RELATED WORK

Optimization-based motion planners have been studied fora variety of robotics applications and typically consider robotdynamics, trajectory smoothness, and obstacle avoidance.Optimization-based approaches have been developed that planfrom scratch as well as that locally optimize a feasible plancreated by another motion planner (such as a sampling-basedmotion planner), e.g. [29], [7], [18], [3], [5], [11]. These meth-ods work well for robots with deterministic dynamics, whereasSELQR is intended for robots with stochastic dynamics.

Our approach builds on Extended LQR [23], [24], whichextends the standard LQR to handle non-linear dynamicsand non-quadratic cost functions. Extended LQR assumesdeterministic dynamics, implicitly relying on the fact that theoptimal LQR solution is independent of the variance of themotion uncertainty. In contrast to Extended LQR, SELQRexplicitly considers stochastic dynamics and incorporates thestochastic dynamics into backward value iteration when com-puting a control policy, enabling computation of higher qualityplans. Approximate Inference Control [22] formulates theoptimal control problem using Kullback-Leibler divergenceminimization but focuses on cost functions that are quadraticin the control input. Our approach also builds on Iterative Lin-ear Quadratic Gaussian (iLQG) [21], which uses a quadraticapproximation to handle state- and control-dependent motionuncertainty but, in its original form, did not implement obsta-cle avoidance. To ensure that the dynamics linearization andcost function quadratization are locally valid, iLQG requiresspecial measures such as a line search. Our method does notrequire a line search, enabling faster performance.

For problems with partial or noisy sensing, the planning andcontrol problem can be modeled as a POMDP [6]. Solving aPOMDP to global optimality has been shown to be PSPACEcomplete. Point-based algorithms (e.g., [14], [8], [1]) havebeen developed for problems with discrete state, action, orobservation spaces. Another class of methods [4], [16], [25],[13] utilize sampling-based planners to compute candidatetrajectories or roadmaps in the state space in which pathscan be evaluated based on metrics that consider stochasticdynamics. These approaches can be highly effective, althoughthe methods [16], [4] require the existence of a two-pointboundary value problem solver that can connect any two

Page 3: SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE …

SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 2014 3

sampled configurations (which for nonholonomic robots maybe computationally expensive), while the methods [25], [13]focused on cost functions corresponding to estimates of col-lision. Optimization-based approaches have been developedfor planning in belief space [26], [15], [10], [12] by ap-proximating beliefs as Gaussian distributions and computinga value function valid only in local regions of the beliefspace. Platt et al. [15] achieve fast performance by definingdeterministic belief system dynamics based on the maximumlikelihood observation assumption. Methods [10], [12] basedon TrajOpt [17] effectively generate high quality plans, butthey do not compute closed-loop policies, meaning they mayrequire replanning at each time step during execution. Van denBerg et al. [26] require a feasible plan for initialization andthen use iLQG to optimize the plan in belief space. We willshow that B-SELQR, which considers stochastic dynamics,converges faster and more reliably than using iLQG in beliefspace, generates a local policy, and can plan from scratch.

III. PROBLEM DEFINITION

Let X ⊂ Rn be the n-dimensional state space of the robotand let U ⊂ Rm be the m-dimensional control input spaceof the robot. We consider robotic systems with differentiablestochastic dynamics and state- and control-dependent uncer-tainty modeled using Gaussian distributions. Let τ ∈ R+

denote time, and let us be given a continuous-time stochasticdynamics:

dx(τ) = f(x(τ),u(τ), τ)dτ +N(x(τ),u(τ), τ)dw(τ), (1)

with f : X×U×R+ → X and N : X×U×R+ → Rn×n, wherex(τ) ∈ X, u(τ) ∈ U, and w(τ) is a Wiener process withdw(τ) ∼ N (0, dτI). We have (w(τ)−w(0)) ∼ N (0, τI).

We assume time is discretized into intervals of duration∆, and the time step t ∈ N starts at time τ = t∆. Aswe will see in Sec. IV-E, by integrating the continuous timedynamics both backward and forward in time, we can constructthe stochastic discrete dynamics and the deterministic inversediscrete dynamics:

xt+1 = gt(xt,ut) +Mt(xt,ut)ξt, (2)xt = gt(xt+1,ut), (3)

where ξt ∼ N (0, I), with gt, gt ∈ X × U → X and Mt ∈X × U → Rn×n as derived in Sec. IV-E. Note that gt isthe deterministic part of the dynamics; for any x′, x, and usuch that x′ = gt(x,u), we have gt(gt(x

′,u),u) = x′ andgt(gt(x,u),u) = x.

Let the control objective be defined by a cost function thatcan incorporate metrics such as path length, control effort, andobstacle avoidance:

Ex

[cl(xl) +

l−1∑t=0

ct(xt,ut)

], (4)

where l ∈ N+ is the given time horizon and cl : X→ R andct : X × U → R are user-defined local cost functions. Theexpectation is taken because the dynamics are stochastic. Weassume the local cost functions are twice differentiable andhave positive (semi)definite Hessians: ∂2cl

∂x∂x ≥ 0, ∂2ct∂u∂u > 0,

∂2ct∂[ xu ]∂[ xu ]

≥ 0. The objective is to compute a control policy π

(defined by πt : X→ U for all t ∈ [0, l)) such that selectingthe controls ut = πt(xt) minimizes Eq. (4) subject to thestochastic discrete-time dynamics. This problem is addressedin Sec. IV.

For robotic systems with imperfect (e.g., partial and noisy)sensing, it is often beneficial during planning to explicitlyconsider the sensing uncertainty. We assume sensors providedata according to a stochastic observation model:

zt = h(xt) + nt, nt ∼ N (0, V (xt)), (5)

where zt is the sensor measurement at step t and the noise isstate-dependent and drawn from a given Gaussian distribution.We formulate this motion planning problem as a POMDP bydefining the belief state bt ∈ B, which is the distribution ofthe state xt given all past controls and sensor measurements.We approximate belief states using Gaussian distributions. Inbelief space we define the cost function as

Ez

[cl(bl) +

l−1∑t=0

ct(bt,ut)

], (6)

where the local cost functions are defined analogously to Eq.(4). The objective for this problem is to compute a controlpolicy π (defined by πt : B → U for all t ∈ [0, l)) inorder to minimize Eq. (6) subject to the stochastic discrete-time dynamics. This problem is addressed in Sec. V.

IV. STOCHASTIC EXTENDED LQR

SELQR explicitly considers a system’s stochastic nature inthe planning phase and computes a nominal trajectory and anassociated linear control policy that consider the impact ofuncertainty. With the control policy from SELQR, the robotthen executes the plan in a closed-loop fashion with sensorfeedback. As in related methods such as iLQG [21], SELQRapproximates the value functions quadratically by linearizingthe dynamics and quadratizing the cost functions. But, aswe will show, SELQR uses a novel approach to computepromising candidate trajectories around which to linearize thedynamics and quadratize the cost functions, enabling fasterperformance.

A. Method Overview

To consider non-linear dynamics and any cost functionwith positive (semi)definite Hessians, SELQR uses an iterativeapproach that linearizes the (stochastic) dynamics and locallyquadratizes the cost functions in each iteration. As shown inAlgorithm 1 and described below, each iteration includes botha forward pass and a backward pass, where each pass performsvalue iteration.

As in LQR, SELQR uses backward value iteration tocompute a control policy π and, for all t, the cost-to-govt(x), which is the minimum expected future cost that willbe accrued between time step t (including the cost at timestep t) and time step l if the robot starts at x at time step t.The backward value iteration, as described in Sec. IV-B, con-siders stochastic dynamics. SELQR also uses forward value

Page 4: SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE …

SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 2014 4

Algorithm 1: SELQRInput: stochastic continuous-time dynamics (Eq. (1)); ct:

local cost functions for 0 ≤ t ≤ l; ∆: time stepduration; l: number of time steps

Data: x: smoothed states; π: control policy; π: inversecontrol policy; vt: cost-to-go function;vt: cost-to-come function

1 πt = 0, St = 0, st = 0, st = 02 repeat3 S0 := 0, s0 := 0, s0 := 04 for t := 0; t < l; t := t+ 1 do5 xt = −(St + St)

−1(st + st) (smoothed states)6 ut = πt(xt), xt+1 = g(xt, ut)7 Linearize inverse discrete dynamics around

(xt+1, ut) (Eq. (16))8 Quadratize ct around (xt, ut) (Eq. (12))9 Compute St+1, st+1, st+1, vt+1, πt (forward

value iteration in Sec. IV-C)10 end11 Quadratize cl around xl in the form of Eq. (12) to

compute Ql, ql, and ql12 Sl := Ql, sl := ql, and sl := ql.13 for t := l − 1; t ≥ 0; t := t− 1 do14 xt+1 = −(St+1 + St+1)−1(st+1 + st+1)

(smoothed states)15 ut = πt(xt+1), xt = g(xt+1, ut)16 Linearize stochastic discrete dynamics around

(xt, ut) (Eq. (11))17 Quadratize ct around (xt, ut) (Eq. (12))18 Compute St, st, st, vt, πt (backward value

iteration in Sec. IV-B)19 end20 until Converged (e.g., v0 stops changing significantly);21 return πt for 0 ≤ t ≤ l

iteration to compute the cost-to-come vt(x), which computesthe minimum past cost that was accrued from time step 0 tostep t (excluding the cost at time step t) assuming the robot’sdynamics is deterministic, as described in Sec. IV-C. The sumof vt(x) and vt(x) provides an estimate of vt(x), the minimumexpected total cost for the entire task execution given that therobot passes through state x at step t. Selecting x to minimizevt yields a sequence of smoothed states

xt = argminxvt(x) = argminx(vt(x) + vt(x)), 0 ≤ t ≤ l.(7)

At each iteration, SELQR linearizes the (stochastic) dynam-ics and quadratizes the cost function around the smoothedstates. With each iteration, SELQR progressively improvesthe estimate of the cost-to-come and cost-to-go at each statealong a plan, and hence improves its estimate of the minimumexpected total cost. With this improved estimate comes a bettercontrol policy. The algorithm terminates when the estimatedtotal cost converges. The output of the motion planner is thecontrol policy πt for all t, where each πt is computed duringthe backward value iteration, which considers the stochastic

dynamics. During execution, a robot at state x executes controlut = πt(x) at time step t.

SELQR accounts for non-linear dynamics and non-quadraticcost functions in a manner inspired in part by the iteratedKalman Smoother [2], which iteratively performs a forwardpass (filtering) and a backward pass (smoothing) and at eachiteration linearizes the non-linear system around the statesfrom the smoothing pass. Likewise, SELQR consists of abackward pass (a backward value iteration) and a forwardpass (a forward value iteration). The combination of thesetwo passes at each iteration enables us to compute smoothedstates around which we linearize the (stochastic) dynamics andquadratize the cost functions.

B. Backward Pass

We assume the cost-to-come functions vt(x), the inversecontrol policy πt, and the smoothed state xl are availablefrom the previous forward pass. The backward pass computescost-to-go functions vt(x) and control policy πt, using theapproach of backward value iteration [20] in a backwardrecursive manner:

v`(x) = c`(x), (8)

vt(x) = minu

(ct(x,u) + Eξt

[vt+1(gt(x,u) +Mt(x,u)ξt)]),

πt(x) =

arg minu

(ct(x,u) + Eξt

[vt+1(gt(x,u) +Mt(x,u)ξt)]).

To make the backward value iteration tractable, SELQRlinearizes the stochastic dynamics and quadratizes the localcost functions to maintain a quadratic form of the cost-to-gofunction vt(x): vt(x) = 1

2xTStx + xT st + st. The backwardpass starts from step l by quadratizing cl(x) around xl (line11) as

cl(x) = 12xTQlx + xTql + ql, (9)

and constructing quadratic vl(x) by setting Sl = Ql, sl = ql,and sl = ql. Starting from t = l − 1, vt+1(x) is available. Toproceed to step t, SELQR first computes

vt+1(x) = 12xT (St+1+St+1)x+xT (st+1+st+1)+(st+1+st+1).

(10)Minimizing the quadratic vt+1(x) with respect to x gives thesmoothed states xt+1 (line 14). With the inverse control policyπt from the last forward pass, SELQR computes ut and xt

(line 15), around which the stochastic discrete dynamics canbe linearized as

gt(x,u) = Atx +Btu + at,

M(i)t (x,u) = F i

t x +Gitu + ei

t, 1 < i ≤ n, (11)

where M (i)t denotes the i’th column of matrix Mt, and At,

Bt, F it , Gi

t, at, and eit are given matrices and vectors of

the appropriate dimension, and the cost function ct can bequadratized as

ct(x,u) =1

2

[xu

]T [Qt PT

t

Pt Rt

] [xu

]+

[xu

]T [qt

rt

]+ qt. (12)

By substituting the linear stochastic dynamics and quadraticlocal cost function into Eq. 8, expanding the expectation, and

Page 5: SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE …

SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 2014 5

then collecting terms, we get a quadratic expression of thevalue function vt(x),

vt(x) = minu

(1

2

[xu

]T [Ct ET

t

Et Dt

] [xu

]+

[xu

]T [ctdt

]+ et

),

(13)

where

Ct = Qt +ATt St+1At +

∑ni=1 F

itTSt+1F

it ,

Dt = Rt +BTt St+1Bt +

∑ni=1G

itTSt+1G

it,

Et = Pt +BTt St+1At +

∑ni=1G

itTSt+1F

it ,

ct = qt +ATt St+1at +AT

t st+1 +∑n

i=1 FitTSt+1e

it,

dt = rt +BTt St+1at +BT

t st+1 +∑n

i=1GitTSt+1e

it,

et = qt + st+1 + 12aT

t St+1at + aTt st+1 + 1

2

∑ni=1 ei

tTSt+1e

it,

following the similar derivation in [21]. The minimization onthe right-hand side of Eq. (13) gives the linear control policy:

u = πt(x) = −D−1t Etx−D−1t dt. (14)

Filling u back into Eq. (13) gives vt(x) as a quadratic functionof x with St = Ct − ET

t D−1t Et, st = ct − ET

t D−1t dt, and

st = et − 12dT

t D−1t dt (line 18).

C. Forward Pass

The forward pass recursively computes the cost-to-comefunctions vt(x) and the inverse control policy πt using for-ward value iteration [23]:

v0(x) = 0, (15)vt+1(x) = min

u(ct(gt(x,u),u) + vt(gt(x,u))),

πt(x) = arg minu

(ct(gt(x,u),u) + vt(gt(x,u))).

To make the forward value iteration tractable, we linearize theinverse dynamics and quadratize the local cost functions sothat we can maintain a quadratic form of the cost-to-comefunction vt(x): vt(x) = 1

2xT Stx + xT st + st.The forward pass starts from time step 0 (line 3) to construct

the quadratic v0(x) by setting S0 = 0, s0 = 0, and s0 = 0.At time step t, we assume vt(x) and vt(x) are available. Toproceed to step t + 1, SELQR first computes the smoothedstate xt by minimizing the sum of vt(x) and vt(x) (line 5)which are quadratic. (We note that in our implementation ofmatrix inversion for St + St in line 5, we add a small positiveregularization ηI , η ∈ R+ to zero matrices to avoid computingthe inverse of a zero matrix and so x0 = 0 in the first iteration.)Since πt is available, SELQR then computes the ut and xt+1

as shown in line 7. Then, the deterministic inverse discretedynamics is linearized around (xt+1, ut):

gt(x,u) = Atx + Btu + at, (16)

where At, Bt, and at are given matrices and vectors ofthe appropriate dimension, and the local cost function ct isquadratized around (xt, ut) to get the quadratic form as inEq. (12).

Substituting the linearized inverse dynamics and quadraticlocal cost function into Eq. (15), expanding the expectation,

and then collecting terms, we get a quadratic expression forvt+1(x),

vt+1(x) = minu

(1

2

[xu

]T [Ct ET

t

Et Dt

] [xu

]+

[xu

]T [ctdt

]+ et

),

(17)where

Ct = ATt (Qt + St)At,

Dt = BTt (Qt + St)Bt + BT

t PTt + PtBt +Rt +∑n

i=1 GiTt (Qt + St)G

it,

Et = BTt (Qt + St)At + PtAt +

∑ni=1 G

iTt (Qt + St)F

it ,

ct = ATt (Qt+St)at + AT

t (qt+st) +∑n

i=1 FiTt (Qt+St)e

it,

dt = BTt (Qt + St)at + Ptat + BT

t (qt + st) + rt +∑ni=1(Gi

t)T (Qt + St)e

it,

et = 12 aT

t (Qt + St)at + aTt (qt + st) + qt + st +

12

∑ni=1 eiT

t (Qt + St)eit.

following the derivation in [23]. Take derivative of Eq. 17 withrespect to u and equal to zero, the solution gives the inversecontrol policy for stage t:

ut = πt(xt+1) = −D−1t Etxt+1 − D−1t dt. (18)

Plugging ut into Eq. (17) gives vt+1(x) as a quadratic functionof x with St+1 = Ct − ET

t D−1t Et, st+1 = ct − ET

t D−1t dt,

and st+1 = et − 12 dT

t D−1t dt (line 9).

D. Iterative Forward and Backward Value Iteration

Without any a priori knowledge, SELQR initializes the cost-to-go functions and the control policy to 0’s (line 1). As shownin Algorithm 1, SELQR starts with a forward pass and theniteratively performs backward passes and forward passes untilconvergence (e.g., v0 stops changing significantly). Similar tothe iterated Kalman Smoother and to Extended LQR [23],SELQR performs Gauss-Newton like updates toward a localoptimum.

Informed search methods often achieve speedups in prac-tice by exploring from states that minimize a heuristic costfunction. Analogously, in SELQR, the cost-to-go provides theminimum expected future cost, and the cost-to-come estimatesthe minimum expected cost that has been already accrued. Theforward value iteration uses a deterministic inverse dynamicsdue to the intractability of computing a stochastic discreteinverse dynamics. Hence, the function vt(x) estimates theminimum total cost assuming the robot passes through agiven state x at time step t. Previous methods such as iLQGchoose states for linearization and quadratization by blindlyshooting the control policy from the last iteration without anyinformation about the cost functions. These methods usuallyneed measures such as line search to maintain stability. Bycomputing smoothed states that are informed by cost forlinearization and quadratization, we show, experimentally, thatour method provides faster convergence.

Page 6: SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE …

SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 2014 6

E. Discrete-Time Dynamics ImplementationIf f(x,u, τ) in Eq. (1) is linear in x and N is not dependent

on x, then the distribution of the state at any time τ is givenby x(τ) ∼ N (x(τ),Σ(τ)), where x(τ) and Σ(τ) are definedby the following system of differential equations:

˙x =f(x,u, τ),

Σ =∂f

∂x(x,u, τ)Σ + Σ

∂f

∂x(x,u, τ)T +

N(x,u, τ)N(x,u, τ)T .

For non-linear f and state- and control-dependent N , theequations provide first-order approximations. Instead of usingan Euler integration [21], we use the Runge-Kutta method(RK4) to integrate the differential equations for x and Σforward in time simultaneously to compute gt and Mt in Eq.(2), and integrate the differential equation for x to computethe gt in Eq. (3).

V. STOCHASTIC EXTENDED LQR IN BELIEF SPACE

We introduce B-SELQR, a belief-state variant of SELQRfor robotic systems with both motion and sensing uncertainty,where beliefs are modeled with Gaussian distributions. With animperfect sensing model defined in the form of Eq. (5) and anobjective function in the form of Eq. (6), the motion planningproblem is a POMDP. B-SELQR needs a stochastic discreteforward belief dynamics and a deterministic discrete inversebelief dynamics. While the stochastic belief dynamics (Sec.V-A) can be modeled by an Extended Kalman Filter (EKF)[28] as shown in [26], the key challenge here is to develop thedeterministic discrete inverse belief dynamics. We will showin Sec. V-B that the inverse belief dynamics can be derivedby inverting the EKF.

A. Stochastic Discrete Belief DynamicsLet us be given the belief of the robot’s state at time step t

as xt ∼ N (xt,Σt) and a control input ut that the robot willexecute at time step t. The EKF is used to model the stochasticforward belief dynamics [26] by

xt+1 = g(xt,ut) + wt, wt ∼ N (0,KtHtΓt+1)

Σt+1 = Γt+1 −KtHtΓt+1,(19)

where

Γt+1 = AtΣtATt +Mt(xt,ut)Mt(xt,ut)

T , (20)

At =∂g

∂x(xt,ut), (21)

Kt = Γt+1HTt (HtΓt+1H

Tt + V (g(xt,ut)))

−1, (22)

Ht =∂h

∂x(g(xt,ut)). (23)

We refer readers to [26] for details of the derivation. Definingthe belief bt = (xt,Σt), the stochastic belief dynamics isgiven by

bt+1 = Φ(bt,ut) +W (bt,ut)ξt, ξt ∼ N (0, I), (24)

where W (bt,ut) =[√

KtHtΓt+1T, 0

]Tand Φ(bt,ut)

stands for the deterministic part of Eq. 19. The dynamics isstochastic since the observation is treated as a random variable.

B. Deterministic Inverse Discrete Belief Dynamics

The deterministic inverse discrete belief dynamics takesbelief bt+1 at time step t + 1 and control ut as input, andoutputs belief bt.

Proposition V.1 (Deterministic Inverse Discrete Belief Dy-namics) Given bt+1 = (xt+1,Σt+1) and the control inputut applied at time step t, there exists a belief bt = (xt,Σt)such that bt+1 = Φ(bt,ut) and bt is represented by

xt = g(xt+1,ut), (25)

Σt = A−1t (Γt − MtMTt )A−Tt , (26)

where

Mt = Mt(g(xt+1,ut),ut), At =∂g

∂x(g(xt+1,ut),ut),

(27)

Γt = (I − Σt+1HTt V−1t Ht)

−1Σt+1, (28)

Ht =∂h

∂x(xt+1), Vt = V (xt+1). (29)

Proof We show the proposition by substituting Eq. (25) andEq. (26) into Φ(bt,u) (deterministic part of Eq. (19)) andproving the equality to bt+1.

First, substitute Eq. (25) into g(xt,ut), we get

xt+1 = g(g(xt+1,ut),ut), (30)

which proves the correctness of Eq. (25).Hence, substituting Eq. (25) to Eq. (27), we can see that

At = At and Mt = Mt(xt,ut). Substituting xt+1 =g(xt,ut) into Eq. (29), we can show that Ht = Ht andVt = V (g(xt,ut)).

Secondly, substitute Eq. (26) back into Eq. (20), we can seethat Γt+1 = Γt. Substitute Γt+1 = Γt into Eq. (22), we get:

Kt = ΓtHTt (HtΓtH

Tt + V (g(xt,ut)))

−1 (31)

= ΓtHTt (HtΓtH

Tt + Vt)

−1 (32)

Substitute Kt (Eq. (32)) and Γt+1 = Γt into Eq. (19):

Γt+1 −KtHtΓt+1 = Γt − ΓtHTt (HtΓtH

Tt + Vt)

−1HtΓt

= (Γ−1t + HTt V−1t Ht)

−1

= (Σ−1t+1(I − Σt+1HTt V−1t Ht) + HtV

−1t Ht)

−1

= (Σ−1t+1)−1 = Σt+1,

where the second equality follows from Sherman-Morrison-Woodbury Identity and the third equality follows from Eq.(28) . Hence we prove the correctness of Eq. (26). �

Eqs. (25) and (26) model the deterministic discrete inversebelief dynamics, which we write as bt = Φ(bt+1,ut). Onecan show that bt+1 = Φ(Φ(bt+1,ut),ut). With the stochasticdiscrete forward belief dynamics and deterministic inversebelief dynamics, together with cost objective Eq. (6) definedover belief space, we can directly apply SELQR to planningin belief space.

Page 7: SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE …

SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 2014 7

VI. EXPERIMENTS

We demonstrate SELQR in simulation for a car-like robot,a quadrotor, and a medical steerable needle. Each robot mustnavigate in an environment with obstacles. We also apply B-SELQR to a car-like robot. We implemented the methods inC++ and ran scenarios on a PC with an Intel i3 2.4 GHzprocessor.

In our experiments, we used the local cost functions

c0(x,u) = 12 (x− x∗0)TQ0(x− x∗0) + 1

2 (u− u∗)TR(u− u∗),

ct(x,u) = 12 (u− u∗)TR(u− u∗) + f(x), (33)

cl(x,u) = 12 (x− x∗l )TQl(x− x∗l ),

where x∗0 is the initial state, x∗l is the goal state, and Q0,Ql, and R are positive definite matrices. In our experiments,we use the first term in ct(x,u) to represent control effortwhere each control has independent cost, so assume R is ascaled identity matrix. We also set Q0 and Ql to scaled identitymatrices, where the scaling parameter is large in order to fixthe initial state and goal state for planning. We set functionf(x) to enforce obstacle avoidance. For SELQR we used thesame cost term as in [23],

f(x) = q∑i

exp(−di(x)), (34)

where q ∈ R+ and di(x) is the signed distance betweenthe robot at state x and the i’th obstacle. The parameterq controls the preferred clearance from the obstacles: lowervalues will result in more aggressive motion plans that allowthe robot to move close to obstacles, while higher valuesencourage safer plans by heavily penalizing the robot beingclose to the obstacles. Since the Hessian of f(x) is notalways positive semidefinite, we regularize the Hessian bycomputing its eigendecomposition and setting the negativeeigenvalues to zeros [23]. We assume each obstacle is convex.For non-convex obstacles, we apply convex decomposition.For B-SELQR, to approximately consider the probability ofcollision we set f(b) = q

∑i exp(−di(b)), where di(b) is

the minimum number of standard deviations of the mean ofthe robot’s belief distribution needed to move to the obstacle’ssurface [26].

A. Car-like Robot in a 2-D Environment

We first apply SELQR to a non-holonomic car-like robotthat navigates in a 2-D environment and can perfectly sense itsstate. The robot’s state x = [x, y, θ, v] consists of its position(x, y), orientation θ, and speed v. The control inputs u =[a, φ] consist of acceleration a and steering wheel angle φ.The deterministic continuous dynamics is given by

x = vcos(θ), y = vsin(θ), θ = vtan(φ)/d, v = a, (35)

where d is the length of the car-like robot. We assume thedynamics is corrupted by noise from a Wiener process (Eq.1) and define N(x(τ),u(τ), τ) = α‖u(τ)‖, α ∈ R+. Forthe cost functions we set Q0 = Ql = 200I , R = 1.0I , andq = 0.2.

Fig. 2(a) shows the environment and the SELQR trajectory(illustrated by the path that results from following the control

(a) SELQR trajectory

0 0.05 0.1 0.15 0.20

1

2

3

4

5

6

7x 10−3

Noise Level

Dev

iatio

n to

Goa

l

Extended LQR (closed−loop)Stochastic Extended LQR (closed−loop)Locally optimal trajectory (open−loop)

α

(b) Impact of Noise Level

Fig. 2. (a) The SELQR trajectory for a car-like robot moving to a greengoal while avoiding red obstacles. (b) Mean and standard deviations forthe deviation from the goal over 1,000 simulations for SELQR and relatedmethods with different noise levels.

policy computed by SELQR in a simulation with zero noise).Consideration of stochastic dynamics is important for goodperformance. Fig. 2(b) shows the deviation from the goalfor varying levels of noise α. We compare with ExtendedLQR, which uses deterministic dynamics to compute thecontrol policy, and with simply executing the locally optimaltrajectory, which is the open-loop execution of SELQR’snominal trajectory. We ran each method for 1,000 independentsimulated runs. The locally optimal trajectory performs poorlysince it is executed in an open-loop manner; the use of aclosed-loop policy is needed for good performance in thisscenario. The control policies from SELQR result in a smallerdeviation from the goal since SELQR explicitly considers thecontrol-dependent noise.

In Table I, we show SELQR’s fast convergence for differentvalues of ∆. The results are averages of 100 independent runsfor random instances. In each instance, the initial state x∗0was chosen by uniformly sampling in the workspace, and thecorresponding goal state was x∗l = −x∗0 (where the origin isthe center of the workspace). Compared to iLQG, our methodachieved approximately equal costs but required substantiallyless computation time.

B. Quadrotor in a 3-D Environment

To show that SELQR scales to higher dimensions, we applyit to a simulated quadrotor with a 12-D state space. Its statex = [p,v, r,w] ∈ R12 consists of position p, velocity v,orientation r (angle-axis representation), and angular velocityw. Its control input u = [u1, u2, u3, u4] consists of the forcesexerted by each of the four rotors. We directly adopt the

Page 8: SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE …

SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 2014 8

TABLE IQUANTITATIVE COMPARISON OF SELQR AND ILQG. RESULTS AVERAGED OVER 100 INDEPENDENT RUNS.

Scenario ∆ (s) SELQR iLQGAverage Cost Average Time (s) Average # Iterations Average Cost Average Time (s) Average # Iterations

Car-like robot0.05 79.4 0.4 5.7 80.5 1.1 13.40.1 55.5 1.0 16.0 53.4 2.5 43.20.2 50.8 1.2 18.4 51.7 2.0 35.4

Quadrotor0.025 552.1 30.3 7.7 798.0 52.7 23.40.05 272.7 50.1 14.4 292.1 113.7 51.60.1 191.1 66.3 20.0 197.1 163.9 76.4

Steerable needle0.075 53.6 0.79 5.3 58.3 1.2 12.50.1 42.6 0.95 6.36 44.5 1.4 14.6

0.125 39.1 1.3 10.1 40.0 1.5 15.6

(a) q = 1.0 (b) q = 0.3

Fig. 3. SELQR trajectories for a quadrotor in an 8 cylindirical obstacleenvironment.

continuous dynamics x = f(x,u) with physical parameters ofthe quadrotor and the environment from [23]. The non-lineardynamics are given by

p = v,

v = −ge3 + ((u1 + u2 + u3 + u4)exp([r])e3 − kvv)/m,

r = w + 12 [r]w + (1− 1

2‖r‖/tan( 12‖r‖))[r]2w)/‖r‖2,

w = J−1(ρ(u2 − u4)e1 + ρ(u3 − u1)e2

+ km(u1 − u2 + u3 − u4)− e3 − [w]Jw),

where ei are the standard basis vectors, g = 9.8 m/s2 is theacceleration due to gravity, kv = 0.15 is a constant relating thevelocity to an opposite force related to rotor drag, m = 0.5 kgis the mass, J = 0.05I kg m2 is the moment of inertia matrix,ρ = 0.17 m is the distance between the center of mass andthe center of the rotors, and km = 0.025 is a constant relatingthe force of a rotor to its torque. The notation [a] refers tothe skew-symmetric cross product matrix of a. We add noisedefined by N(x(τ),u(τ), τ) = α‖u(τ)‖, where α ∈ R+.

Fig. 3 shows the SELQR trajectory for two different valuesof q, where we set α = 2%, Q0 = Ql = 500I , andR = 20I . As expected, the trajectory with larger q has largerclearance from obstacles. In Table I, we show SELQR’s fastconvergence for the quadrotor scenario for different values of∆. We conducted randomized runs in a manner analogous toSec. VI-A. For the quadrotor, compared to iLQG, our methodachieved slightly better costs while requiring substantially less

computation time.

C. Medical Needle Steering for Liver Biopsy

We also demonstrate SELQR for steering a flexible bevel-tipneedle through liver tissue while avoiding critical vasculaturemodeled by a triangular surface mesh (Fig. 1). We use thestochastic needle model introduced in [27], where the kinemat-ics are defined in SE(3). We represent the state x by the tip’sposition p and orientation r (angle-axis). The control input isu = [v, w, κ]T , where v is the insertion speed, w is the axialrotation speed, and κ is the curvature, which can vary from0 to a maximum curvature of κ0 using duty-cycling. For thecost function, we set u∗ = [0, 0, 0.5κ0]T . Hence, we penalizelarge insertion speed, which given l and ∆ corresponds topenalizing path length. It also penalizes curvatures that aretoo large (close to the kinematic limits of the device) ortoo small (requiring high-rate duty cycling, which may causetissue damage). For motion noise, we set Mt from Eq. 2 toσ‖ut‖2I , where σ ∈ R+ is a positive constant that controlsthe size of the noise. The noise is proportional to the norm ofcontrol input.

Fig. 1 shows the SELQR trajectories for two insertionlocations with ∆ = 0.1s, l = 30, Q0 = Ql = 100I ,R = I , q = 0.5, σ = 0.01. Table I shows SELQR’sfast convergence for the steerable needle for varying ∆. Theresults are averages of 100 independent runs for randominstances. In each instance, the goal state was held constant,and we set the initial state x∗0 such that the needle wasinserted into the tissue from a uniformly-sampled point on theleft (corresponding to the skin surface). Compared to iLQG,our method achieved approximately equal costs but requiredsubstantially less computation time.

Although SELQR is designed to compute policies forrobotic systems with Gaussian noise, the policies may insome cases in practice also be effective for robotic systemswith non-Gaussian noise. For the needle steering scenario,we evaluated the computed policy in simulation under twodifferent types of noise: (1) Gaussian noise sampled fromthe Gaussian distribution that was used for computing thepolicy, and (2) uniform noise sampled from a 6D super-ballwith radius σ‖u‖2. Namely, we set the radius of the ballequal to the standard deviation of the Gaussian distribution

Page 9: SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE …

SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 2014 9

and evaluated the probability of success of the policy (i.e.,the motion does not collide with obstacles). We conducted1000 independent simulated runs for both types of noise. Forα = 0.005, the probability of success was 99.8% for theGaussian distribution and 99.5% for the uniform distribution.For α = 0.007, the probability of success was 97.4% for theGaussian distribution and 97.0% for the uniform distribution.For α = 0.01, we achieved 93.1% for Gaussian distributionand 92.5% for uniform distribution. The results show that thecomputed locally optimal policy can, in some cases, handlemotion noise from non-Gaussian distributions.

D. Belief Space Planning for a Car-like Robot

We apply B-SELQR to the car-like robot in Sec. VI-A butnow with added uncertainty in sensing. We consider two envi-ronments, discussed below, in which the robot localizes itselfusing noisy measurements from sensors in the environment.The reliability of the measurement varies as a function of therobot’s position.

For belief space planning we use the cost functions

c0(b,u) = 12 (b− b∗0)TQ0(b− b∗0) + 1

2 (u− u∗)TR(u− u∗),

ct(b,u) = 12 tr[√

ΣQt

√Σ] + 1

2 (u− u∗)TR(u− u∗) + f(b),

cl(b,u) = 12 (x− x∗l )TQl(x− x∗l ) + tr[

√ΣQl

√Σ].

We set Q0 = 1000I , R = 2I , Qt = 10I , Ql = 500I , andq = 0.1.

1) Light-dark Environment: We first consider the light-dark scenario suggested in [15]. The robot receives reliablemeasurements in the bright region and noisier measurementsin the darker regions. Formally, the observation model is

zt = xt + nt, nt ∼ N (0, ((x− x∗)2 + 1)βI), (36)

where β ∈ R+ is a given constant.Fig. 4 shows the B-SELQR trajectory (illustrated by the

path that results from following the control policy computedby B-SELQR in a simulation with zero noise) and associatedbeliefs along the trajectory for a scenario with and withoutobstacles with β = 0.01. The computed control policies steerthe robot to the light region where the measurement noise issmallest in order to better localize the robot before proceedingto the goal. We also show the convergence of B-SELQR. Wecompare with iLQG executed for the same cost functions inbelief space using the method in [26]. The statistics werecomputed by averaging the results of 100 random instances.(For each random instance, we randomly sampled the initialstate x∗0.) On average, B-SELQR requires fewer iterations toreach a desired solution quality.

We also tested B-SELQR under different sensing noiselevels. Fig. 5 shows the computed trajectories under fourdifferent βs. When the noise is small (Fig. 5 (a) and (b)),the robot travels less distance in the light domain since it canlocalize itself more quickly. On the other hand, when the noiseis larger, the robot usually needs to travel more distance in thelight domain to better localize itself. For large sensing noise asshown in Fig. 5 (d), the uncertainty increase quickly again afterthe robot leaves the light domain and causes the possibility ofcollision.

5 10 15 20 250

0.5

1

1.5

2

Number of iterations

Exp

ecte

d co

st

B−SELQRiLQG in Belief Space

×103

(a) B-SELQR for light-dark environment without obstacles

10 15 20 250

0.5

1

1.5

2

Number of iterations

Exp

ecte

d co

st

B−SELQRiLQG in Belief Space

×103

(b) B-SELQR for light-dark environment with obstacles

Fig. 4. (a) B-SELQR trajectory for a car-like robot navigating to a goal(green) in a 2-D light-dark environment (adapted from [15]). (b) B-SELQRtrajectory for the environment with obstacles (red circles). The blue ellipsoidsshow 3 standard deviations of the belief distributions. B-SELQR convergesfaster than iLQG in belief space in both scenarios.

(a) β = 0.003 (b) β = 0.006

(c) β = 0.01 (d) β = 0.015

Fig. 5. B-SELQR trajectories under different sensing noise levels

2) Environment with Beacons: We also consider a scenariowhere the car-like robot estimates its location using measure-ments from two beacons, b1 and b2, placed in the environmentat positions (xb1 , yb1) and (xb2 , yb2). The strength of the signaldecays quadratically with the distance to the beacon. The robotalso measures its current speed and orientation angle. Themeasurement uncertainty is scaled by a constant matrix N .

Page 10: SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE …

SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 2014 10

(a)

×103

(b)

Fig. 6. B-SELQR trajectories for a car-like robot navigating to a goal (green)in an environment where two beacons are used for localization (adapted from[26]) while avoiding obstacles (red). (b) Comparison between B-SELQR andiLQG in belief space in terms of the number of iterations for convergence.

This gives us the non-linear observation model

zt =

1

1+(xt−xb1)2+(yt−yb1

)2

11+(xt−xb2

)2+(yt−yb2)2

θtvt

+Nnt, (37)

where z ∈ R4 and nt ∼ N (0, I). Fig. 6 (left) shows thequadratic decay in the signal strength of the beacons in theenvironment, where white indicates a strong signal and darkgray indicates a weak signal.

Fig. 6 shows the B-SELQR trajectory and associated beliefsalong the trajectory. The computed control policies steer therobot to move in the vicinity of the two beacons in orderto better localize the robot before proceeding to the goal. Wealso show the convergence of B-SELQR in Fig. 6. We comparewith iLQG in belief space executed for the same cost functionsin belief space using 100 random instances. Again, on average,B-SELQR requires fewer iterations to reach a desired solutionquality.

VII. CONCLUSION

We presented Stochastic Extended LQR (SELQR), a noveloptimization-based motion planner that computes a trajectoryand associated linear control policy with the objective ofminimizing the expected value of a user-defined cost func-tion. SELQR applies to robotic systems that have stochasticnon-linear dynamics and state- and control-dependent motionuncertainty. We also extended SELQR to applications withimperfect sensing, requiring motion planning in belief space.Our approach converges faster and more reliably than relatedmethods in both the robot’s state space and belief space formultiple simulated scenarios, ranging from a mobile robot toa steerable needle.

In future work, we hope to broaden the applicability ofthe approach. The approach currently assumes motion andsensing uncertainty are modeled using Gaussian distributions.While this assumption is often appropriate, it is not validfor some problems. Our approach also relies on first andsecond order information, so to improve stability we plan toinvestigate the use of automatic differentiation. We also plan toformally analyze the method’s convergence properties, whichmay benefit from incorporation of a line search into SELQR.We also plan to apply the methods to physical robots like

steerable needles in order to efficiently account for motionand sensing uncertainty.

REFERENCES

[1] H. Bai, D. Hsu, and W. S. Lee, “Integrated perception and planning inthe continuous space: A POMDP approach,” in Robotics: Science andSystems (RSS), Jun. 2013.

[2] B. M. Bell, “The iterated Kalman smoother as a Gauss-Newton method,”SIAM J. Optimization, vol. 4, no. 3, pp. 626–636, 1994.

[3] O. Brock and O. Khatib, “Elastic strips: A framework for motiongeneration in human environments,” Int. J. Robotics Research, vol. 21,no. 2, pp. 1031–1052, Dec. 2002.

[4] A. Bry and N. Roy, “Rapidly-exploring random belief trees for motionplanning under uncertainty,” in Proc. IEEE Int. Conf. Robotics andAutomation (ICRA), May 2011, pp. 723–730.

[5] K. Hauser and V. Ng-Thow-Hing, “Fast smoothing of manipulator tra-jectories using optimal bounded-acceleration shortcuts,” in Proc. IEEEInt. Conf. Robotics and Automation (ICRA), May 2010, pp. 2493–2498.

[6] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning andacting in partially observable stochastic domains,” Artificial Intelligence,vol. 101, no. 1-2, pp. 99–134, 1998.

[7] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal,“STOMP: Stochastic trajectory optimization for motion planning,” inProc. IEEE Int. Conf. Robotics and Automation (ICRA), May 2011, pp.4569–4574.

[8] H. Kurniawati, D. Hsu, and W. Lee, “SARSOP: Efficient point-basedPOMDP planning by approximating optimally reachable belief spaces,”in Robotics: Science and Systems (RSS), Jun. 2008.

[9] S. M. LaValle, Planning Algorithms. Cambridge, U.K.: CambridgeUniversity Press, 2006.

[10] A. Lee, Y. Duan, S. Patil, J. Schulman, Z. McCarthy, J. van denBerg, K. Goldberg, and P. Abbeel, “Sigma hulls for Gaussian beliefspace planning for imprecise articulated robots amid obstacles,” in Proc.IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Nov. 2013,pp. 5660–5667.

[11] J. Pan, L. Zhang, and D. Manocha, “Collision-free and smooth trajectorycomputation in cluttered environments,” Int. J. Robotics Research,vol. 31, no. 10, pp. 1155–1175, Sep. 2012.

[12] S. Patil, G. Kahn, M. Laskey, J. Schulman, K. Goldberg, and P. Abbeel,“Scaling up Gaussian belief space planning through covariance-freetrajectory optimization and automatic differentiation,” in Proc. Workshopon the Algorithmic Foundations of Robotics (WAFR), Aug. 2014.

[13] S. Patil, J. van den Berg, and R. Alterovitz, “Estimating probability ofcollision for safe motion planning under Gaussian motion and sensinguncertainty,” in Proc. IEEE Int. Conf. Robotics and Automation (ICRA),May 2012, pp. 3238–3244.

[14] J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: Ananytime algorithm for POMDPs.” Int. Joint Conf. Artificial Intelligence(IJCAI), pp. 1025–1032, 2003.

[15] R. Platt Jr., R. Tedrake, L. Kaelbling, and T. Lozano-Perez, “Belief spaceplanning assuming maximum likelihood observations,” in Robotics:Science and Systems (RSS), Jun. 2010.

[16] S. Prentice and N. Roy, “The belief roadmap: Efficient planning in beliefspace by factoring the covariance,” Int. J. Robotics Research, vol. 28,no. 11, pp. 1448–1465, Nov. 2009.

[17] J. Schulman, Y. Duan, J. Ho, A. Lee, I. Awwal, H. Bradlow, J. Pan,S. Patil, K. Goldberg, and P. Abbeel, “Motion planning with sequen-tial convex optimization and convex collision checking,” InternationalJournal of Robotics Research, vol. 33, no. 9, pp. 1251–1270, Aug. 2014.

[18] J. Schulman, J. Ho, A. Lee, I. Awwal, H. Bradlow, and P. Abbeel, “Find-ing locally optimal, collision-free trajectories with sequential convexoptimization,” in Robotics: Science and Systems (RSS), Jun. 2013.

[19] W. Sun, J. van den Berg, and R. Alterovitz, “Stochastic ExtendedLQR: Optimization-based motion planning under uncertainty,” in Proc.Workshop on the Algorithmic Foundations of Robotics (WAFR), Aug.2014.

[20] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. MIT Press,2005.

[21] E. Todorov, “A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems,” in Proc.American Control Conference, Jun. 2005, pp. 300–306.

[22] M. Toussaint, “Robot trajectory optimization using approximate infer-ence,” in Proc. Int. Conf. Machine Learning (ICML), 2009, pp. 1049–1056.

Page 11: SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE …

SUBMITTED TO IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, NOVEMBER 2014 11

[23] J. van den Berg, “Extended LQR: Locally-optimal feedback control forsystems with non-linear dynamics and non-quadratic cost,” in Int. Symp.Robotics Research (ISRR), Dec. 2013.

[24] ——, “Iterated LQR smoothing for locally-optimal feedback controlof systems with non-linear dynamics and non-quadratic cost,” in Proc.American Control Conference, Jun. 2014, pp. 1912 – 1918.

[25] J. van den Berg, P. Abbeel, and K. Goldberg, “LQG-MP: Optimizedpath planning for robots with motion uncertainty and imperfect stateinformation,” Int. J. Robotics Research, vol. 30, no. 7, pp. 895–913,Jun. 2011.

[26] J. van den Berg, S. Patil, and R. Alterovitz, “Motion planning underuncertainty using iterative local optimization in belief space,” Int. J.Robotics Research, vol. 31, no. 11, pp. 1263–1278, Sep. 2012.

[27] J. van den Berg, S. Patil, R. Alterovitz, P. Abbeel, and K. Goldberg,“LQG-based planning, sensing, and control of steerable needles,” in Al-gorithmic Foundations of Robotics IX (Proc. WAFR 2010), ser. SpringerTracts in Advanced Robotics (STAR), D. Hsu and Others, Eds., vol. 68.Springer, Dec. 2010, pp. 373–389.

[28] G. Welch and G. Bishop, “An introduction to the Kalman filter,”University of North Carolina at Chapel Hill, Technical Report TR 95-041, Jul. 2006.

[29] M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, K. Matthew, C. M.Dellin, J. A. Bagnell, and S. S. Srinivasa, “CHOMP: Covariant Hamil-tonian optimization for motion planning,” Int. J. Robotics Research,vol. 32, no. 9, pp. 1164–1193, Aug. 2012.

Wen Sun received his B.Tech degree in ComputerScience and Technology from Zhejiang University,China, in 2012, a B.S. degree with Distinctionin the School of Computing Science from SimonFraser University, Canada, in 2012 and an M.S.degree in Computer Science from the University ofNorth Carolina at Chapel Hill, USA, in 2014. Heis currently a graduate student in the Robotics Insti-tute, Carnegie Mellon University, USA. His researchinterests include motion planning under uncertaintyand medical robotics.

Jur van den Berg received the Ph.D. degreein computer science from Utrecht University, theNetherlands in 2007. After that, he was a post-doctoral research associate with the University ofNorth Carolina at Chapel Hill and the Universityof California, Berkeley. Since 2011, he is assistantprofessor at the University of Utah where he headsthe Algorithmic Robotics Laboratory. His researchinterests are in developing new algorithms withstrong theoretical foundations for relevant practicalapplications in mobile robotics, medical robotics,

artificial intelligence, crowd simulation, virtual environments and computergames, autonomous transportation, and personal robotics, with a particularfocus on motion planning under uncertainty, reciprocal collision avoidance,and information-theoretic decision making for autonomous virtual and real-world agents. In 2014 he joined Google[x] to work on the self-driving car.

Ron Alterovitz received his B.S. degree with Hon-ors from Caltech, Pasadena, CA in 2001 and thePh.D. degree in Industrial Engineering and Op-erations Research at the University of California,Berkeley, CA in 2006.

In 2009, he joined the faculty of the Departmentof Computer Science at the University of NorthCarolina at Chapel Hill, NC, where he leads theComputational Robotics Research Group. His re-search focuses on motion planning for medical andassistive robots. Prof. Alterovitz has co-authored a

book on Motion Planning in Medicine, was awarded a patent for a medicaldevice, has received multiple best paper finalist awards at IEEE roboticsconferences, and is the recipient of the National Institutes of Health RuthL. Kirschstein National Research Service Award and the National ScienceFoundation CAREER Award.