8
Evaluating Trajectory Collision Probability through Adaptive Importance Sampling for Safe Motion Planning Authors Omitted for Blind Review Abstract—This paper presents a tool for addressing a key component in many algorithms for planning robot trajectories under uncertainty: evaluation of the safety of a robot whose actions are governed by a closed-loop feedback policy near a nominal planned trajectory. We describe an adaptive importance sampling Monte Carlo framework that enables the evaluation of a given control policy for satisfaction of a probabilistic collision avoidance constraint which also provides an associated certificate of accuracy. In particular this adaptive technique is well-suited to addressing the complexities of non-linear dynamics. As a Monte Carlo method it is amenable to parallelization for computational tractability, and is generally applicable to a wide gamut of simulatable systems, including alternative noise models. Numerical experiments demonstrating the effectiveness of the adaptive importance sampling procedure are presented and discussed. I. I NTRODUCTION This paper addresses the problem of online planning of robot trajectories under uncertainty and performance/safety constraints. This problem has been recognized recently as a critical component towards deploying autonomous robotic systems (e.g., drones, self-driving cars, surgical robots, au- tonomous spacecraft, etc.) in unstructured environments [1], [2], [3], [4]. Conceptually, to enable an autonomous system (AS) to plan its actions under uncertainty (e.g., with re- spect to environment characterization), one needs to design a strategy (i.e., closed-loop policy) for a decision maker [5], a computationally expensive procedure [6]. Instead, a promising approach is to pose the decision-making problem as an optimization over a simpler class of nominal policies (or even open-loop action sequences), which are then evaluated via closed-loop predictions under the assumption that a local feedback control law strives to ensure nominal behavior [3], [7], [8], [6]. To illustrate this concept, inspired by recent results within the domain of Model Predictive Control (MPC) [9], and to motivate our work, let the dynamics of a robot be given by x t+1 = f (x t , u t )+ v t , z t = h(x t )+ w t , (1) where x t R dx is the state, u t R du is the control input, z t R dz is the measurement, and v t and w t represent process and measurement noise, respectively. Let X obs be the infeasible space (that is, the set of states that violate a given set of constraints), so that X feas := R d \X obs is the feasible space. Let X goal ⊂X feas and x 0 ∈X feas be the goal region and initial state, and let π t denote a feedback policy mapping the information available up to time t (i.e., the history of measurements and control actions) to a control action u t . Given a trajectory cost measure c and letting x 0 ,..., x T denote the robot’s random trajectory, the problem is, Closed-Loop Belief Space Planning: min {πt}t E [c( x 0 ,..., x T )] s.t. P ( x 0 ,..., x T ∩X obs 6= ) α P (x T ∈X goal ) 1 - α x 0 ∼N (x nom 0 ,P 0 ) Equation (1). (2) Problem (2) is, generally, intractable as it requires opti- mizing over output-feedback policies (i.e., the policies {π t } t ) – a POMDP problem that, even for relatively small-scale problems, might be not compatible with the tempo of AS- environment interaction. According to our previous discussion, the approach that we consider in this paper (see, e.g., [3], [7], [8], [6]) is to optimize over a simpler class of poli- cies, e.g., obtained by combining a nominal control sequence {u * t } t (yielding a nominal trajectory {x * t } t ) with a “feedback correction” term that strives to ensure nominal behavior. For example, one can consider the set of affine feedback policies u t := u * t u t , where u * t is a nominal control action and ¯ u t is a feedback correction term minimizing deviations from {x * t } t . The problem then becomes, Belief Space Planning with Closed-Loop predictions: min {u * t }t c( x * 0 ,..., x * T ) s.t. u t = u * t u t P ( x 0 ,..., x T ∩X obs 6= ) α P (x T ∈X goal ) 1 - α x 0 ∼N (x nom 0 ,P 0 ) Equation (1). (3) Problem (3) is computationally much more tractable as it only involves optimization over open-loop sequences (i.e., {u * t } t ), and represents a compromise between a POMDP formulation involving a minimization over the class of output- feedback control laws, and an open-loop formulation, in which the state is assumed to evolve in an open loop (i.e., it is conservatively assumed that over the future horizon no effort will be made to compensate for disturbances or new events). This approach, while very appealing from a computational standpoint, poses a number of challenges, chiefly that of quickly and reliably evaluating the probability of constraint violation (henceforth, we will focus on collision avoidance and refer to such probability as “collision probability” – CP – with the understanding that the constraints in (3) can

Evaluating Trajectory Collision Probability through Adaptive Importance …web.stanford.edu/~pavone/papers/Schmerling.Pavone.RSS16.pdf · 2016-03-14 · Adaptive Importance Sampling

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluating Trajectory Collision Probability through Adaptive Importance …web.stanford.edu/~pavone/papers/Schmerling.Pavone.RSS16.pdf · 2016-03-14 · Adaptive Importance Sampling

Evaluating Trajectory Collision Probability throughAdaptive Importance Sampling for Safe Motion Planning

Authors Omitted for Blind Review

Abstract—This paper presents a tool for addressing a keycomponent in many algorithms for planning robot trajectoriesunder uncertainty: evaluation of the safety of a robot whoseactions are governed by a closed-loop feedback policy near anominal planned trajectory. We describe an adaptive importancesampling Monte Carlo framework that enables the evaluationof a given control policy for satisfaction of a probabilisticcollision avoidance constraint which also provides an associatedcertificate of accuracy. In particular this adaptive technique iswell-suited to addressing the complexities of non-linear dynamics.As a Monte Carlo method it is amenable to parallelizationfor computational tractability, and is generally applicable to awide gamut of simulatable systems, including alternative noisemodels. Numerical experiments demonstrating the effectivenessof the adaptive importance sampling procedure are presentedand discussed.

I. INTRODUCTION

This paper addresses the problem of online planning ofrobot trajectories under uncertainty and performance/safetyconstraints. This problem has been recognized recently asa critical component towards deploying autonomous roboticsystems (e.g., drones, self-driving cars, surgical robots, au-tonomous spacecraft, etc.) in unstructured environments [1],[2], [3], [4]. Conceptually, to enable an autonomous system(AS) to plan its actions under uncertainty (e.g., with re-spect to environment characterization), one needs to designa strategy (i.e., closed-loop policy) for a decision maker[5], a computationally expensive procedure [6]. Instead, apromising approach is to pose the decision-making problem asan optimization over a simpler class of nominal policies (oreven open-loop action sequences), which are then evaluatedvia closed-loop predictions under the assumption that a localfeedback control law strives to ensure nominal behavior [3],[7], [8], [6].

To illustrate this concept, inspired by recent results withinthe domain of Model Predictive Control (MPC) [9], and tomotivate our work, let the dynamics of a robot be given by

xt+1 = f(xt,ut) + vt, zt = h(xt) + wt, (1)

where xt ∈ Rdx is the state, ut ∈ Rdu is the control input,zt ∈ Rdz is the measurement, and vt and wt representprocess and measurement noise, respectively. Let Xobs be theinfeasible space (that is, the set of states that violate a givenset of constraints), so that Xfeas := Rd\Xobs is the feasiblespace. Let Xgoal ⊂ Xfeas and x0 ∈ Xfeas be the goal regionand initial state, and let πt denote a feedback policy mappingthe information available up to time t (i.e., the history ofmeasurements and control actions) to a control action ut.

Given a trajectory cost measure c and letting x0, . . . ,xTdenote the robot’s random trajectory, the problem is,

Closed-Loop Belief Space Planning:min{πt}t

E [c(x0, . . . ,xT )]

s.t. P (x0, . . . ,xT ∩ Xobs 6= ∅) ≤ αP (xT ∈ Xgoal) ≥ 1− αx0 ∼ N (xnom

0 , P0)

Equation (1).

(2)

Problem (2) is, generally, intractable as it requires opti-mizing over output-feedback policies (i.e., the policies {πt}t)– a POMDP problem that, even for relatively small-scaleproblems, might be not compatible with the tempo of AS-environment interaction. According to our previous discussion,the approach that we consider in this paper (see, e.g., [3],[7], [8], [6]) is to optimize over a simpler class of poli-cies, e.g., obtained by combining a nominal control sequence{u∗t }t (yielding a nominal trajectory {x∗t }t) with a “feedbackcorrection” term that strives to ensure nominal behavior. Forexample, one can consider the set of affine feedback policiesut := u∗t + ut, where u∗t is a nominal control action and ut isa feedback correction term minimizing deviations from {x∗t }t.The problem then becomes,

Belief Space Planning with Closed-Loop predictions:

min{u∗

t }tc(x∗0, . . . ,x

∗T )

s.t. ut = u∗t + ut

P (x0, . . . ,xT ∩ Xobs 6= ∅) ≤ αP (xT ∈ Xgoal) ≥ 1− αx0 ∼ N (xnom

0 , P0)

Equation (1).

(3)

Problem (3) is computationally much more tractable as itonly involves optimization over open-loop sequences (i.e.,{u∗t }t), and represents a compromise between a POMDPformulation involving a minimization over the class of output-feedback control laws, and an open-loop formulation, in whichthe state is assumed to evolve in an open loop (i.e., it isconservatively assumed that over the future horizon no effortwill be made to compensate for disturbances or new events).

This approach, while very appealing from a computationalstandpoint, poses a number of challenges, chiefly that ofquickly and reliably evaluating the probability of constraintviolation (henceforth, we will focus on collision avoidanceand refer to such probability as “collision probability” –CP – with the understanding that the constraints in (3) can

Page 2: Evaluating Trajectory Collision Probability through Adaptive Importance …web.stanford.edu/~pavone/papers/Schmerling.Pavone.RSS16.pdf · 2016-03-14 · Adaptive Importance Sampling

also model performance or other safety requirements). Thisproblem is difficult as safety/performance constraints do notusually possess an additive structure, i.e., they can not beexpressed as the expectation of a summation of a set ofrandom variables. This is typical, for example, for chanceconstraints for obstacle avoidance as in Problem (3), anda fortiori for more complex constraints (expressed, e.g., aslogical formulas). The presence of non-linear dynamics andthe possible need of considering non-Gaussian noise modelsfurther compounds the difficulty of the problem. Within thiscontext, the goal of this paper is to design algorithms for thethe fast and reliable evaluation of collision probabilities (or,more in general, constraint violation probabilities).

Related Work: Evaluating a controller’s chance of fulfillingsafety/performance constraints while guiding a robot along areference trajectory has hitherto primarily been considered inan approximate fashion. As discussed in [6], most previousmethods essentially rely on two approaches. In the first ap-proach (see, e.g., [10], [11], [12], [13]), referred to as the“additive approach,” a trajectory CP is approximately evalu-ated by using Boole’s inequality (i.e., P (∪iAi) ≤

∑i P (Ai))

and summing pointwise CPs at a certain number of waypointsalong the reference trajectory. In contrast, in the secondapproach (see, e.g., [3], [8], [14]), referred to as the “multi-plicative approach,” a trajectory CP is approximately evaluatedby multiplying the complement of pointwise CPs at a certainnumber of waypoints along the reference trajectory. In anutshell, the additive approach treats waypoint collisions asmutually exclusive, while the multiplicative approach treatsthem as independent. Since neither mutual exclusivity norindependence hold in general, nor any accounting is made forwhat happens in between waypoints, such approaches can beoff by large multiples and hinder the computation of feasibletrajectories [6]. Even worse, it was shown in [6] that suchapproaches are asymptotically tautological, i.e., as the numberof waypoints approaches infinity, a trajectory CP is approxi-mated with a number greater than or equal to one, contraryto what one might expect from a refinement procedure. (Wenote, however, that such approximations would still be veryuseful in those cases where the problem is unconstrained andthe sole objective is to optimize safety/performance, as whatis needed in such cases is the characterization of relativeCPs among trajectories. However, in a constrained setting,one needs an accurate, absolute measure of CP.) The workof [7] introduced a first-order approximate correction to theindependence assumption of the multiplicative approach thatappears empirically to mitigate its aforementioned drawbacks,but can still be off by a considerable margin [6] and canboth under-approximate or over-approximate the true CP (incontrast to the additive approach, which is always on theconservative side).

The limitations of the above approximation approaches mo-tivated the authors of [6] to consider variance-reduced MonteCarlo (MC) to estimate trajectory CPs. In theory, the exactCP can be computed up to arbitrary accuracy by simulatinga large number of trajectories and counting the number that

violate a constraint. When the CP has a low value (e.g.,≤ 1%), however, as is desired for most robotic applications, anenormous number of Monte Carlo samples may be required toachieve confidence in the estimate. The work in [6] demon-strates statistical variance-reduction techniques based on themethods of importance sampling (IS) and control variates (CV)that reduce the required number of samples to a few hundredper evaluation, amenable to real-time implementation (notably,the samples may be processed in parallel). However, thesetechniques were designed and validated only for linear time-invariant systems under the influence of Gaussian process andmeasurement noise. As both assumptions do not always hold inpractice, the objective of this work is to extend the variance-reduced method introduced in [6] to the case of non-lineardynamics and non-Gaussian noise models.

Statement of Contributions: We consider a discrete LQGcontroller equipped with an EKF (as considered in [3], [7])applied to the linearized dynamics and Gaussianized noise(first-and-second moment-matching) of a general problem set-ting as our closed-loop feedback policy. As we demonstrate inthe experiments section of this paper, the interactions betweennoise and general non-linear dynamics models may cause thecollision probability for the true system to diverge significantlyfrom that of the linearized LQG system. This may be thecase even when both systems are driven using the same noisesampled from the true, non-moment-matched noise distribu-tions. Additionally, applying importance sampling techniquesin direct analogue to those for a linear system may lead to poorestimator performance (indeed, no better than naive MonteCarlo without modifications intended to reduce variance) dueto similar issues of model mismatch. To be specific, in thework of [6], the IS distribution is derived by shifting the meansof noise distributions by amounts computed pre-MC fromknowledge of the workspace geometry, linear dynamics, andnoise characteristics — at face value an approach that appearsequally applicable to the more general settings considered inthis paper. We show an example of a simple car where thisdoes not hold, and which necessitates the development of CPestimation techniques that can adapt to these errors introducedby non-linearity.

The primary contribution of this paper is an adaptiveimportance sampling framework (whereby the sampling dis-tribution is refined through the course of simulating sampletrajectories) which enables accelerated Monte Carlo collisionprobability estimation in general, non-linear problem settings.Unlike any aforementioned approximate approach, this methodallows for certified error margins through confidence intervalsconstructed using a standard error estimate accompanying theCP estimate. We demonstrate its effectiveness on a non-linearsimple car model where it uses fewer MC samples than anyalternative to accurately evaluate a policy with low associatedCP (0.5%). Given the successes of [6] in incorporating MCcollision probability estimation within a near-real-time safemotion planning algorithm, and especially in light of thisMC method’s parallelization potential, we believe that theadaptive importance sampling framework described in this

Page 3: Evaluating Trajectory Collision Probability through Adaptive Importance …web.stanford.edu/~pavone/papers/Schmerling.Pavone.RSS16.pdf · 2016-03-14 · Adaptive Importance Sampling

paper represents a tool worthy for consideration in a widevariety of robotic applications that plan with safety constraintsin operation.

Organization: The remainder of this paper is organizedas follows: in Section II we review background material onimportance sampling relevant to robot collision probabilityestimation including adaptive techniques, in Section III wedefine the problem dynamics and LQG controller model, andin Section IV we present the adaptive IS framework that repre-sents the main contribution of this paper. Section V providesillustrative experiments with a simple non-linear car modelfor the framework developed in the previous section, andSection VI contains conclusions and speculation on possibledirections of future research.

II. BACKGROUND MATERIAL

In this section we review the basics of importance sampling,a technique for reducing the variance of a Monte Carloestimator, as well as an adaptive variant and some objectivesto keep in mind when designing importance sampling distribu-tions. Our discussion is tailored to the problem of computingtrajectory CPs, where the high-dimensional nature of the noise(a joint distribution spanning the entire length of a trajectory)increases the challenge of selecting a good distribution.

A. Importance Sampling

Consider a random variable X ∈ Rn distributed accordingto the probability density function (pdf) P : Rn → R≥0. Theexpectation E [f(X)] =

∫Rn f(x)P (x) dx of a function f(X)

may be rewritten as

EP [f(X)] =

∫Rn

f(x)P (x) dx

=

∫Rn

(f(x)

P (x)

Q(x)

)Q(x) dx

= EQ[f(X)P (X)

Q(X)

],

(4)

where Q is an alternative pdf satisfying Q(x) > 0 for allx ∈ Rn where P (x) > 0, and EP and EQ denote expectationscomputed under the distributions X ∼ P and X ∼ Q,respectively. The quantity P (x)/Q(x) is referred to as thelikelihood ratio of the pdfs at x.

In the context of Monte Carlo estimation, the above com-putation implies that given independent and identically dis-tributed (i.i.d.) samples {X(i)}mi=1 drawn from Q, the expec-tation p := E [f(X)] may be estimated by

pQ :=1

m

m∑i=1

f(X(i))P (X(i))

Q(X(i)). (5)

The variance of the estimator pQ may be estimated by

V Q :=1

m(m− 1)

m∑i=1

(f(X(i))

P (X(i))

Q(X(i))− pQ

)2

. (6)

In this paper, X will denote a vector of noise samples that actupon a robot at a sequence of discrete time increments, and

f(X) will denote the indicator function that the robot, guidedby a trajectory-tracking controller, collides into the obstacleset given the trajectory noise sample X. Thus p = E [f(X)]is the robot’s collision probability, the quantity we wish toestimate.

We see from (4) that pQ defined in (5) is an unbiasedestimator of the CP p, that is, as long as the support of Qcontains the support of P , the estimator pQ converges inprobability to p as m → ∞ regardless of the choice of Q.Thus the selection criterion for the IS pdf Q is to minimizethe variance of pQ. This has the practical effect of reducingthe number of samples required before the estimator yields auseable result, which, as argued in [6], is critical for real-timeaccurate CP estimation for robotic applications with a verylow collision chance constraint. The optimal IS distributionin this regard is given by π∗(x) := f(x)P (x)/EP [f(X)]. Infact, for every X drawn from π∗ we have

f(X)P (X)

π∗(X)= EP [f(X)],

so that the estimator pπ∗

indeed has zero variance and a singlesample is sufficient for estimation.

Constructing and sampling from π∗ is usually not possible(note: the normalization factor EP [f(X)] is precisely thequantity we wish to estimate), but its form yields someinsights. In the case that f is an indicator function, π∗ hassupport on only the “important” parts of P where f(x) = 1;for our purposes, to sample from π∗ is to sample only noisetrajectories that lead to collision and weight them in thecomputation of pπ

∗by their relative likelihood. This motivates

the search for IS distributions Q that artificially inflate theoccurrence of the rare event f(X) = 1, but this should beaccomplished while maintaining relative probability accordingto P lest the likelihood ratio P/Q be very large for somelikely realization of the event (corresponding to a large valuein the variance integral for pQ). Mathematically, we aimat minimizing the Kullback-Leibler divergence (or relativeentropy) between Q and the target distribution π∗, that is

D (π∗‖Q) =

∫Rn

π∗(x) log π∗(x)− π∗(x) logQ(x) dx, (7)

a measure of difference between probability distributions.From (7) we see that this is equivalent to maximizing

J(π,Q) =

∫Rn

π(x) logQ(x) dx (8)

where π(x) = f(x)P (x) is a constant scale factor differentfrom π∗(x). We note that one interpretation of the way theauthors of [6] choose their trajectory-noise IS distributionis as a mixture of distributions that each maximize J(π,Q)subject to the constraints that: (a) Q equals P up to a shiftin mean, and (b) the trajectory that evolves from the mean ofQ intersects some particular convex obstacle near the nominaltrajectory for P .

Page 4: Evaluating Trajectory Collision Probability through Adaptive Importance …web.stanford.edu/~pavone/papers/Schmerling.Pavone.RSS16.pdf · 2016-03-14 · Adaptive Importance Sampling

B. Adaptive Importance Sampling

Now let us consider importance sampling distributionsQθ parameterized by a set of values θ, e.g., weights fora mixture distribution, mean and covariance matrix for aGaussian distribution, etc. We aim to improve the choice ofparameter θ through iterative refinement where knowledgeabout the quality of Qθ is itself estimated through MonteCarlo sampling. That is, we aim to compute (or approximate)θ∗ = argmaxθ J(π,Qθ) by sampling from some Qθt in orderto update Qθt+1 . A classic example of such an adaptive ISprocedure is the cross-entropy (CE) method [15]. Briefly, theCE method is used to simulate rare events by: (1) samplingfrom Qθt , (2) selecting the portion of the data that satisfies arelaxed definition of “rare,” (3) fitting the parameters θt+1 ofthe next distribution in order to maximize the empirical sumcorresponding to the integral in J(π,Qθ) over those obser-vations, and (4) repeating this procedure while successivelystepping up the rarity threshold until a suitable IS distributionis achieved. We note that the parameter fitting in step (3) maybe interpreted as selecting θt+1 according to the maximum-likelihood estimate (MLE) of θ given the dataset of rare eventrealizations.

In the context of safe robotic motion planning where the rareevent is trajectory collision, the CE method may be applied,e.g., with minimum obstacle clearance (stepped down towards0) as the rarity factor to be considered in step (2). Then thereis the question of how Qθ should be parameterized. As recog-nized in [6], there are typically multiple ways in which a noise-perturbed robot trajectory can collide with its surroundings.Although the noise pdf P is usually unimodal (correspondingto a robot centered on the nominal trajectory), an effective ISdistribution should likely be multimodal (corresponding to themany ways the robot can drift into obstacles). This motivatesthe use of mixture IS distributions with pdfs of the form

Q(α,η)(x) =

D∑d=1

αdqd(x; ηd), (9)

parameterized by θ = (α, η); the αd are the mixture weightsand the ηd are the internal parameters of the componentdensities qd. For mixture distibutions, fitting the αd and ηdMLE exactly for importance sampling is generally intractable,but approximation schemes have been studied in the literature[16], [17], [18]. In particular, we reproduce in Algorithm 1 anexpectation-maximization (EM) scheme from [18] designedfor adaptive IS in general mixture classes. We omit thedetails here of performing the per-component parameter fitting(argmaxηd in Algorithm 1, Line 3) but refer the reader to [18]for derivations in the cases of Gaussian mixtures and mixturesof multivariate t distributions. The convergence of Algorithm 1may be monitored by computing the normalized perplexity,that is exp(−

∑Ni=1 ωi,j log ωi,j)/N , which is non-decreasing

over iterations j for N sufficiently large.

Algorithm 1 Mixture Population MC (Algorithm 1, [18])Require: Target density π(x), initial mixture IS proposal

Q(α1,η1)(x) =∑Dd=1 α

1dqd(x; η1d), population size m,

number of EM iterations k1: for j = 1 : k do2: Sample {Xi,j}mi=1 from the current proposal Q(αj ,ηj)

and compute the normalized importance weights

ωi,j =π(Xi,j)∑D

d=1 αjdqd(Xi,j ; η

jd)

/m∑`=1

π(X`,j)∑Dd=1 α

jdqd(X`,j ; η

jd)

for i = 1, . . . , N and mixture posterior probabilities

ρd(Xi,j ;αj , ηj) = αjdqd(Xi,j ; η

jd)

/D∑k=1

αjkqk(Xi,j ; ηjk)

for i = 1, . . . , N , d = 1, . . . , D.3: Update α and η according to

αj+1d =

N∑i=1

ωi,jρd(Xi,j ;αj , ηj),

ηj+1d = argmax

ηd

[N∑i=1

ωi,jρd(Xi,j ;αj , ηj) log qd(Xi,j ; ηd)

]for d = 1, . . . , D.

4: end for5: return Optimized mixture IS distribution Q(αk+1,ηk+1)

III. PROBLEM FORMULATION

As in the introduction, let the dynamics of a robot be givenby

xt+1 = f(xt,ut) + vt, zt = h(xt) + wt, (10)

where xt ∈ Rdx is the state, ut ∈ Rdu is the control input,zt ∈ Rdz is the measurement, and vt ∼ Vt and wt ∼ Wt

are the process and measurement noise respectively at time t.We restrict our attention to independent noise distributions Vt

and Wt with zero mean and finite second moments (we notethat cases of colored noise may be addressed through stateaugmentation). Let Vt and Wt denote the covariance matricesof Vt and Wt, respectively. We are interested in estimatingthe collision probability that arises from tracking a nominalpath P∗ = {x∗0,u∗0,x∗1,u∗1, . . . ,x∗T } where x∗t+1 = f(x∗t ,u

∗t )

for t = 0, . . . , T − 1. The true initial state x0 satisfies x0 =x∗0 +p0, where the initial state uncertainty p0 ∼ P0 is drawnfrom a distribution with zero mean and covariance matrixP0, and the controller uses the information from observationsz0, · · · , zT−1 to choose actions u0, . . . ,uT−1 from which thetrue state xt evolves according to (10).

Similar to [3] and [7] we consider an LQR state-feedbackcontroller equipped with an extended Kalman Filter for stateestimation (together, LQG control) to control the deviation of arobot from a nominal trajectory. With deviation variables fromP∗ defined as xt = xt−x∗t , ut = ut−u∗t , and zt = zt−h(x∗t ),

Page 5: Evaluating Trajectory Collision Probability through Adaptive Importance …web.stanford.edu/~pavone/papers/Schmerling.Pavone.RSS16.pdf · 2016-03-14 · Adaptive Importance Sampling

we linearize (10) according to:

xt+1 = At+1xt +Bt+1ut + vt +O(‖xt‖2 + ‖ut‖2),

zt = Htxt + wt +O(‖xt‖2),(11)

where At = ∂f∂x (x∗t−1,u

∗t−1), Bt = ∂f

∂u (x∗t−1,u∗t−1), and

Ht = ∂h∂x (x∗t ). The LQG controller maintains an estimate xt

of the true state deviation xt using an extended Kalman filter

xt+1 = Kt+1zt+1 + (I −Kt+1Ht+1)(At+1xt +Bt+1ut),

where Kt is the Kalman gain matrix at time t correspondingto the linearized system (11) with noise covariance matricesP0, Vt, and Wt [19]. Then at each time t = 0, . . . , T − 1 theLQG controller applies the input ut = u∗t + ut with

ut = u∗t + Lt+1xt,

where Lt is the finite time horizon LQR feedback gain matrixcorresponding to (11) with appropriately chosen state regula-tion and control effort penalties for the robotic application.

We emphasize that when simulating trajectories in thispaper, we evolve the exact state evolution according to the truedynamics (10). However, we note here that with yt =

[xt; xt

]we may write down an approximate system with linearizeddynamics and Gaussian noise (moment-matched to the truenoise up to second order)

yt+1 = Ftyt +Gtqt, qt ∼ N (0, Qt), (12)

for appropriate choices of matrices Ft, Gt and Qt. See [3]for a full derivation. In addition to evolving approximate statetrajectories, this system may be used to evolve approximatepointwise uncertainty distributions of xt as multivariate Gaus-sians provided that the input qt at each time step is also drawnfrom a multivariate Gaussian [3], [7], [6].

The problem we wish to solve in this paper is then to devisean accurate, computationally-efficient algorithm equipped withan error estimate to estimate the CP

P (x0, . . . ,xT ∩ Xobs 6= ∅) ≤ α,

where Xobs denotes the infeasible space (i.e., the set of statesthat violate a given set of constrains), and the state trajectoryxt is controlled via the control law ut = u∗t +ut. As discussedin the introduction, the primary motivation of this problem isto enable belief space planning with closed-loop predictionsfor general non-linear problems with possibly non-Gaussiannoise models.

A few comments are in order. First, in this paper we arenot proposing a new planning algorithm, rather an algorithmto addressing one of the key bottlenecks for planning underuncertainty. Second, the method proposed here can be usedin combination with a variety of planning frameworks, e.g.,exhaustive evaluation of RRT plans [3] or meta-algorithms asin [6]. Third, in stark contrast with alternative methods (withthe exception of [6], from which we draw inspiration), weprovide a computable error estimate that can be used as acertificate of accuracy for the trajectory’s estimated CP.

IV. ADAPTIVE IMPORTANCE SAMPLING FORCOLLISION PROBABILITY ESTIMATION

In this section we present an algorithm for the accurate,computationally-efficient estimation of a trajectory’s CP undernon-linear dynamics and non-Gaussian noise models. Ourapproach is to select importance sampling distributions for CPestimation as a mixture of reparameterized copies of the actualprocess and measurement noise, corresponding to differentmodes of failure, similar to [6]. In the notation of Section II,

X = (p0,v0,w0, . . . ,vT−1,wT−1) ,

P (x) = P0(p0) ·V0(v0) · . . . ·WT−1(wT−1),

(where we have assumed the independence of the noisedistributions in the construction of this joint pdf) and f(X)is the event that the noise random variable X gives rise toa colliding trajectory under LQG control. We consider ISdistributions of the form

Q(α,η)(x) =

D∑d=1

αdqd(x; ηd)

qd(x; ηd) = P0(p0; ηd) ·V0(v0; ηd) · . . . ·WT−1(wT−1; ηd)

where ηd encodes all of the parameters required to specify theprocess and measurement noise distributions. For example, inthe case of Gaussian noise, ηd consists of (2T + 1) meanvectors and covariance matrices: one pair for the initial stateuncertainty and for each process/measurement noise distribu-tion at each time step t = 0, . . . , T − 1.

Algorithm 2 Monte Carlo Path CP EstimationRequire: Nominal path P∗, noise distribution P , workspace

obstacle set Xobs, number of mixture components D,adaptive IS sample count m, adaptive IS EM iterationsk, schedule of CE obstacle clearances δ1 > · · · > δ` = 0,final IS sample count mf , warm-start parameter β ∈ [0, 1]

1: Using the procedure detailed in [6], compute a set ofD shifted trajectory noise means (each of which pushesthe robot into collision when applied directly to theapproximate system (12)) and use these means, scaled byβ, to construct the mixture distribution Q(α0,η0).

2: for j = 1 : ` do3: Using Algorithm 1 with target πj(x) = fδj (x)P (x),

compute Q(αj ,ηj) as an update from Q(αj−1,ηj−1).4: end for5: Sample {X(i)}mi=1 i.i.d. from Q(αk,ηk).6: Compute

p =1

m

m∑i=1

f(X(i))P (X(i))

Q(αk,ηk)(X(i)),

V =1

m(m− 1)

m∑i=1

(f(X(i))

P (X(i))

Q(αk,ηk)(X(i))− p)2

.

7: return p, V

Page 6: Evaluating Trajectory Collision Probability through Adaptive Importance …web.stanford.edu/~pavone/papers/Schmerling.Pavone.RSS16.pdf · 2016-03-14 · Adaptive Importance Sampling

The CP estimation scheme incorporating adaptive impor-tance sampling is detailed in Algorithm 2. In keeping withthe stepping scheme of the cross-entropy method, we definefδ(X) to be the indicator that the robot, subjected to noiseX, passes within a Euclidean distance δ of the obstacle setXobs (alternatively, the trajectory xt collides with the inflatedobstacle set {x ∈ Rn | ∃ s ∈ Xobs s.t. ‖x − s‖ ≤ δ}). Wenote that this stepping scheme is most useful for producingIS distributions capable of estimating extremely low CPs. ForCPs above 5%, we have found that it suffices to choose k = 1,δ1 = 0 and apply Algorithm 1 directly to optimize the mixturedistribution parameters: the tradeoff to consider is havingenough “interesting” samples in the initial mixture populationof size m to make good updates in Algorithm 1 versusspending too many MC samples tuning the IS distribution Qinstead of using it to estimate p.

The starting distribution parameters (α0, η0) fed into theadaptive IS procedure are derived from the linearized, Gaus-sianized system (12) using the method described in [6]. Thatwork suggests η0 matching P up to shifts in noise means. Wescale those shifts down initially by a factor β to account forthe fact that the full non-linear dynamics may propagate noisemore unpredictably, and rely on adaptive importance samplingto converge to better parameter values.

Finally, although there are many hyperparameters to tune inAlgorithms 1 and 2, we recall from Section II that any Q weconsider is only an approximation to the optimal IS distibutionπ∗. Thus if fitting low-variance-yielding parameters (α, η) ismore computationally efficient than fitting some theoretically-better (α∗, η∗), then it makes sense to only implement theformer. For example the ηd update equation in Algorithm 1,Line 3 has a closed-form solution for Gaussian mixturecomponents qd, but can be difficult, in general, to solve exactly(e.g., if the process or measurement noise is itself a mixtureof Gaussians). In such cases, taking the mean or varianceparameter updates as suggested by a Gaussian approximationmay produce better overall results than solving the argmaxηdexactly.

V. NUMERICAL EXPERIMENTS

In this section we consider a unit-speed 2D simple car model[5] with state x = [x; y; θ] subject to the continuous dynamicsxy

θ

=

cos(θ)sin(θ)u

, u ∈ [−1/R, 1/R]

where the absolute value of the control input u = [u] isbounded by the reciprocal of the car’s turning radius R. Thiscorresponds to the discrete model with timestep dt:

xt+1 = f(xt,ut) =

xt + [sin(θt + utdt)− sin(θt)]/utyt + [cos(θt)− cos(θt + utdt)]/ut

θt + utdt

in the case that ut 6= 0, and

xt+1 = f(xt,ut) =

xt + dt cos(θt)yt + dt sin(θt)

θt

Exact vs. Linearized Dynamics

Fig. 1. Illustration of the pointwise uncertainty in simple car position atincrements of dt = 0.05 for simulating the exact dynamics (10) vs. theapproximate LQG linearized dynamics (12) under the influence of Gaussiannoise. The green and purple ellipses denote 95% pointwise marginal confi-dence intervals for the exact (computed empirically over 5000 samples) andapproximate (derivable from (12)) dynamics respectively.

otherwise. We assume that the state is fully observed (i.e.h(xt) = xt) up to the measurement noise wt. Obstaclecollisions are computed modeling the car as a point robot; thisis equivalent to modeling it as a disk subject to redefinitionof Xobs (we note that in principle Monte Carlo simulation isamenable to finer models of collision geometry).

A. Effects of Linearization

Figure 1 exhibits a nominal path where a combination of thenoise magnitude and discretization step results in a divergencein the state trajectory predicted by the linearized dynamics (12)compared to the true state evolution governed by (10), evenin the case of Gaussian p0,vt,wt. The true CP associatedwith the exact dynamics in this case is 2.5%, while the CPassociated with the approximate dynamics is 0.18% (bothfigures were computed using naive, i.e., non-IS, Monte Carlowith 106 samples). This illustrates the fact that if a system issufficiently non-linear, any approach based on the linearizedsystem may fail to return a conservative judgement. An MC-based approach, provided it has enough samples to ensurelow estimator variance, is affected by the complexity of thedynamics only in total running time.

B. Effects of Non-Gaussian Noise

In order to study the effects of non-Gaussian noise modelson trajectory CP we consider two alternate shapes for noisecorresponding to a Gaussian N (0,Σ): a Gaussian mixturemodel (GMM) consisting of a “thin” term N (0, (α/β)Σ)and a “fat” term N (0, (1 − α)/(1 − β)Σ) with mixtureweight β > 0.5 and (1 − β) respectively and α < 0.5 avariance allocation parameter, and a multivariate t-distribution

Page 7: Evaluating Trajectory Collision Probability through Adaptive Importance …web.stanford.edu/~pavone/papers/Schmerling.Pavone.RSS16.pdf · 2016-03-14 · Adaptive Importance Sampling

with mean 0, ν > 2 degrees of freedom, and scale matrix(ν−2ν

)Σ. Both the GMM and t-distribution have identical

first and second moments to N (0,Σ) but are fatter-tailed(equivalently, sharper-peaked) than a normal distribution. Wenote that despite the shape difference, the CP between noisedistributions did not vary appreciably across a wide range ofparameter choices and CP values. This somewhat surprisingobservation may be a consequence of the fact that summinga series of i.i.d. random variables from any finite-variancedistribution should converge in distribution to a Gaussian(assuming the local dynamics look sufficiently linear): a fat-tailed noise value at one step is effectively washed out bysharp-peaked values over time.

The one noteworthy difference we see when running Al-gorithm 2 to estimate CP with the alternative noise-shapeassumptions is that adaptively fitting a good IS distributionmay take longer. This is because the likelihood ratios of fattails and thin peaks are much more sensitive to fluctuationsin adaptive IS parameter updates (shifts in the mean in par-ticular). For example, t-distributed noise with ν = 4 requiresm a factor of 2 greater than for the corresponding Gaussianfor the estimation examples in Figures 2 and 3 in order toreturn similar results. If only a “ballpark” estimate of the CPis required, e.g., as in the initial search phases of a safe motionplanning algorithm attempting to satisfy a collision chanceconstraint through bisection [6], naive MC may suffice until astrong certificate of safety is required in the last phase. ThenMC samples are best spent deriving an IS distribution thatwill achieve lower estimator variance as a function of totalsamples.

C. Adaptive Importance Sampling Performance

Figures 2 and 3 illustrate two example applications ofAlgorithm 2 to the problem of CP estimation. As noted inSection IV, for the 5% CP nominal trajectory and associatedLQG policy in Figure 3, omitting the CE method-style iterationproduces a low variance estimator in fewer overall samplesthan would be required to further refine the importancesampling distribution Qα,η . For the 0.5% CP example inFigure 2, just one CE iteration works best. As discussed in[18], Algorithm 1 in general requires some minimum samplecount to ensure stability; this is m = 5000 and m = 2000 forthe 0.5% and 5% cases respectively. The number of mixturecomponents is D = 4 in both cases.

It is interesting to note that both of the warm-start ISdistributions (derived from the linearized system (12)) exhibitperformance comparable to naive Monte Carlo. This holdsover a range of scaling parameters β ∈ [0, 0.3]; any largerthan 0.3 and the IS distribution Qα0,η0 is actively antagonistic,with likelihood ratios for collision trajectories skewed worsethan the underlying noise P and preventing the adaptive ISprocedure from finding a suitable KL-divergence minimum.Indeed in some cases we find that β as low as 0.05 worksbest, letting the mixture components naturally differentiatethemselves by adjusting to their most likely members of thecolliding sample population over the course of Algorithm 1.

0 5000 10000 15000 20000 25000 30000

Total MC Samples

0.000

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008Adaptive Importance Sampling, CP ≈ 0.5%

Naive MCIS with Qα0, η0

IS with Qα1, η1

IS with Qα2, η2

Fig. 2. Example run of Algorithm 2 for estimating the collision probabilityof a simple car trajectory perturbed by Gaussian process and measurementnoise with a true CP of approximately 0.5%. The dark lines represent theevolution of the MC estimator p of the collision probability for a set ofimportance sampling distributions (including the true distribution in gray);the shadows around each line represent a confidence interval of ±1 standarderror. The red, green, and blue lines are associated with the initial warm-startIS distribution, IS distribution after one iteration in Alg. 2, and two iterationsin Alg. 2 respectively. The estimator plots for later iterations are shifted toreflect the total number of simulated MC trajectories required to produce boththe IS distribution and the estimate.

0 2000 4000 6000 8000 10000

Total MC Samples

0.00

0.02

0.04

0.06

0.08

0.10Adaptive Importance Sampling, CP ≈ 5%

Naive MCIS with Qα0, η0

IS with Qα1, η1

Fig. 3. Example run of Algorithm 2 for estimating the collision probabilityof a simple car trajectory perturbed by Gaussian process and measurementnoise with a true CP of approximately 5%.

This danger in fully trusting the linearized IS distributionQα0,η0 further reflects the model mismatch between (10) and(12): deviations in the pointwise IS distribution terms causeskewed likelihood ratios that individually may seem small, butwhen multiplied together through the joint distribution P (x)they can wreak havoc on IS variance-reduction performance.

VI. CONCLUSIONS

We have presented an adaptive importance sampling frame-work inspired by the statistics literature [15], [18] and demon-strated its success in quantifying CP (with tight estimatederror) for an LQG with EKF control policy applied to a

Page 8: Evaluating Trajectory Collision Probability through Adaptive Importance …web.stanford.edu/~pavone/papers/Schmerling.Pavone.RSS16.pdf · 2016-03-14 · Adaptive Importance Sampling

non-linear system. In particular, we note that this proceduresucceeds in achieving a level of certifiable accuracy for whichthere are no comparable existing methods. The adaptive natureof the procedure has been demonstrated as essential forconstructing IS distributions that significantly outperform anordinary Monte Carlo procedure in terms of required trajectorysimulations on an estimation problem where safety constraintviolation is desired to be a rare event.

While this work may in its current form see direct appli-cation within a non-linear LQG control planner (for whichwe stress that a parallel implementation would be a keytechnology for enabling real-time use) the observations wemake in Section V open up a large number of future researchavenues. We may imagine alternative control strategies insteadof LQG that accommodate fat-tailed noise distributions withinfinite variance, e.g., noise modeled by non-Gaussian stabledistributions. Such distributions might be useful to modelextreme outliers, as are sometimes seen in perception systems,and we expect that in these cases simulating the true noisedistribution will be an essential aspect of evaluating constraintsas opposed to the relative insignificance we see for LQGcompared to the non-linear dynamics (Section V-B). We alsoseek some coherent means of marshalling the many adaptive IShyperparameters in Algorithms 1 and 2. The warm-start pro-cedure in particular, in light of the discussion in Section V-C,seems ripe for improvement. It is possible that sampling fromthe true distribution P and clustering the samples that passwithin a certain distance threshold from Xobs to form the initialmixture distribution may yield better convergence properties.We may even imagine removing the requirement for a distancemetric to obstacles in the workspace (relying only on a black-box collision checker) by seeding the mixture distributionwith the first D samples from the true distribution P thatcollide with the obstacle set. Finally, we note that althoughthe “rare event” considered throughout this paper has beenobstacle collision, adaptive importance sampling as a MonteCarlo variance reduction technique may be applied to estimatea variety of other performance or safety requirements.

REFERENCES

[1] W. J. A. Dahm, “Technology Horizons: a Vision for Air Force Science& Technology During 2010-2030,” USAF HQ, 2010.

[2] “Science and Technology Strategic Plan, C4ISR,” Office of NavalResearch, Tech. Rep., 2012, available at http://www.onr.navy.mil/Science-Technology/Departments/Code-31.aspx.

[3] J. V. D. Berg, P. Abbeel, and K. Goldberg, “LQG-MP: Optimizedpath planning for robots with motion uncertainty and imperfect stateinformation,” International Journal of Robotics Research, vol. 30, no. 7,pp. 895–913, 2011.

[4] J. A. Starek, B. Acıkmese, I. A. D. Nesnas, and M. Pavone, “SpacecraftAutonomy Challenges for Next Generation Space Missions,” inAdvances in Control System Technology for Aerospace Applications,ser. Lecture Notes in Control and Information Sciences, E. Feron, Ed.Springer, Sep. 2015, vol. 460, ch. 1, pp. 1–48. [Online]. Available:http://link.springer.com/chapter/10.1007/978-3-662-47694-9 1

[5] S. Lavalle, Planning Algorithms. Cambridge University Press, 2006.[6] L. Janson, E. Schmerling, and M. Pavone, “Monte Carlo Motion

Planning for Robot Trajectory Optimization Under Uncertainty,” inInternational Symposium on Robotics Research, Sestri Levante, Italy,Sep. 2015.

[7] S. Patil, J. van den Berg, and R. Alterovitz, “Estimating probability ofcollision for safe motion planning under Gaussian motion and sensinguncertainty,” in Proc. IEEE Conf. on Robotics and Automation, 2012,pp. 3238–3244.

[8] W. Sun, L. G. Torres, J. V. D. Berg, and R. Alterovitz, “Safe MotionPlanning for Imprecise Robotic Manipulators by Minimizing Probabilityof Collision,” in International Symposium on Robotics Research, 2013.

[9] F. Oldewurtel, C. Jones, and M. Morari, “A tractable approximationof chance constrained stochastic MPC based on affine disturbancefeedback,” in Proc. IEEE Conf. on Decision and Control, 2008, pp.4731–4736.

[10] B. Luders, M. Kothari, and J. P. How, “Chance constrained RRT forprobabilistic robustness to environmental uncertainty,” in AIAA Conf.on Guidance, Navigation and Control, 2010.

[11] G. S. Aoude, B. D. Luders, J. M. Joseph, N. Roy, and J. P. How,“Probabilistically safe motion planning to avoid dynamic obstacles withuncertain motion patterns,” Autonomous Robots, vol. 35, no. 1, pp. 51–76, 2013.

[12] M. Kothari and I. Postlethwaite, “A probabilistically robust path plan-ning algorithm for UAVs using rapidly-exploring random trees,” Journalof Intelligent & Robotic Systems, vol. 71, no. 2, pp. 231–253, 2013.

[13] B. D. Luders, S. Karaman, and J. P. How, “Robust sampling-basedmotion planning with asymptotic optimality guarantees,” in AIAA Conf.on Guidance, Navigation and Control, 2013.

[14] W. Liu and M. H. Ang, “Incremental sampling-based algorithm forrisk-aware planning under motion uncertainty,” in Proc. IEEE Conf. onRobotics and Automation, 2014, pp. 2051–2058.

[15] P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A tutorialon the cross-entropy method,” Annals of Operations Research, vol. 134,no. 1, pp. 19–67, 2005.

[16] O. Cappe, A. Guillin, J.-M. Marin, and C. P. Robert, “Population MonteCarlo,” Journal of Computational and Graphical Statistics, vol. 13,no. 4, 2004.

[17] R. Douc, A. Guillin, J.-M. Marin, and C. P. Robert, “Convergenceof adaptive mixtures of importance sampling schemes,” The Annals ofStatistics, vol. 35, no. 1, pp. 420–448, 2007.

[18] O. Cappe, R. Douc, A. Guillin, J.-M. Marin, and C. P. Robert, “Adaptiveimportance sampling in general mixture classes,” Statistics and Comput-ing, vol. 18, no. 4, pp. 447–459, 2008.

[19] S. Haykin, Kalman filtering and neural networks. John Wiley & Sons,2004, vol. 47.