Multiple Model Unscented Kalman Filtering in Dynamic ...mediatum.ub.tum.de/doc/1456521/document.pdfMultiple Model Unscented Kalman Filtering in Dynamic Bayesian Networks for Intention

Multiple Model Unscented Kalman Filtering in Dynamic BayesianNetworks for Intention Estimation and Trajectory Prediction

Jens Schulz1, Constantin Hubmann1, Julian Löchner1, and Darius Burschka2

Abstract— Dynamic Bayesian networks (DBNs) are a popularmethod for driver intention estimation and trajectory predic-tion. To account for hybrid state spaces and non-linear systemdynamics, sequential Monte Carlo (SMC) methods are oftenthe inference method of choice. However, in state estimationproblems with high uncertainty, SMC methods typically sufferfrom either high complexity (using many samples) or lowaccuracy (using an insufficient number of samples). In thispaper, we present a multiple model unscented Kalman filterbased DBN inference method for driver intention estimationand multi-agent trajectory prediction. This inference methodreduces complexity, while still keeping the benefits of sample-based evaluation of non-linear and non-continuous transitionmodels. Firstly, the state of the DBN is approximated as amixture of Gaussians and estimated over time by tracking themulti-agent system. Secondly, a probabilistic forward simula-tion of the belief is performed to generate interaction-awaretrajectories for all agents and all intention hypotheses. Theproposed method is compared to SMC-based inference methodsin terms of accuracy, variance and runtime in both simulationsand real-world scenarios.

I. INTRODUCTIONTo drive in an anticipatory and cooperative manner, au-

tonomous vehicles need to anticipate the future behaviorof surrounding traffic participants and estimate their currentintentions. The trajectories of multiple vehicles in a trafficscene are often highly interdependent because drivers mustavoid collisions, comply with traffic rules, and thus react toother drivers’ actions. This introduces the need for combi-natorial and interaction-aware motion prediction, which stillrepresents a great challenge today [1].

Dynamic Bayesian networks (DBNs) are commonly usedfor driver intention estimation because they allow the def-inition of domain specific hierarchies within the decisionmaking process of human drivers and the modeling of inter-dependencies between multiple agents. In our previous work[2], we proposed such a framework which applies sequentialMonte Carlo (SMC) inference to estimate intentions and pre-dict interaction-aware trajectories for all vehicles in a scene.SMC methods have the advantage that they can representarbitrary distributions (hybrid, non-Gaussian, multi-modal)and can be applied to highly non-linear systems. However,in state estimation problems with high uncertainty, e.g., asa result of unknown intentions, varying human behaviorand noisy measurements, they typically suffer from eitherhigh complexity or low accuracy, depending on the number

1Jens Schulz, Constantin Hubmann, and Julian Löchner arewith BMW Group, Munich, Germany {jens.schulz |constantin.hubmann | julian.loechner}@bmw.de

2Darius Burschka is with the Department of Computer Science, TechnicalUniversity of Munich, Germany [email protected] c© 2018 IEEE

Fig. 1. Estimated belief represented by a mixture of Gaussians andcorresponding sigma points and interaction-aware trajectory prediction forthe different possible intentions.

of samples used. Although using fewer samples does notintroduce bias, the variance in the results increases dueto the randomness in the sampling process. Furthermore,the problems of particle degeneracy (just a few particlesrepresent the state well) and particle impoverishment (mostparticles have the same value) arise because of imprecisemodels and importance resampling [3].

In this paper, we propose a multiple model unscentedKalman filter (MM-UKF) based inference method for ourpreviously presented intention estimation and trajectory pre-diction framework. As MM-UKF approximates the beliefusing a mixture of Gaussians, it is not associated withthe aforementioned problems of SMC methods, while stillkeeping the benefits of sample-based evaluation of complex,non-linear and non-continuous transition models.

The presented DBN models the development of a traf-fic situation as a stochastic process comprising multipleinteracting agents. The continuous actions of each agentare conditioned on the current environment and the agent’sroute and maneuver intentions (see IV-A for definitions).To account for interdependencies between multiple agents,each of the discrete modes of the MM-UKF represents oneof the possible combinations of discrete intentions of allagents, whereas the continuous states are represented by onemultivariate Gaussian given each mode. A typical scenario isdepicted in Fig. 1, showing the Gaussian mixture distributionand the corresponding sigma points.

The proposed inference method is evaluated in both sim-ulations and real world scenarios and compared to the previ-ously presented SMC inference. The results show that giventhe discrete intentions of all agents, the state can reasonablybe approximated by a single multivariate Gaussian. MM-UKF achieves lower variance and higher accuracy comparedto SMC inference with a similar runtime.

II. RELATED WORK

A. Intention Estimation and Motion Prediction

Intention estimation and motion prediction of human traf-fic participants have been widely studied within the fieldof autonomous vehicles. Commonly, route and maneuverintentions are estimated with either discriminative classifiers,such as support vector machines (SVMs) [4], random forests(RFs) [5], or artificial neural networks (ANNs) [6], or withgenerative models, such as Bayesian networks [7]. The pre-diction of continuous trajectories is often based on regressionmethods such as Gaussian processes (GPs) [8], [9], RFs [10],ANNs [11] or planning-based methods [12].

Modeling the development of a traffic situation as astochastic process conditioned on hidden variables repre-senting the agents’ intentions allows both inferring theseintentions at the current time by incorporating measurements,and predicting the future by iteratively applying the transitionmodels to the estimated belief (forward simulation). There-fore, the problems of intention estimation and motion pre-diction are handled within a combined framework, utilizingthe same behavior models. In [8], multiple GPs are definedconditioned on the maneuver intention of a driver (turn-left,turn-right, go-straight, stop-and-go), allowing the estimationof his current intention given a set of observations. TheseGPs are then used as transition models of a particle filterin order to predict the continuous trajectory. As contextualinformation, such as the existence of surrounding trafficparticipants, is not incorporated, the framework is not suitedfor multi-agent scenarios. In our previous work [12], weaddress the interrelated problems of behavior generation ofthe ego vehicle and behavior prediction of the surroundingvehicles in a combined manner. Multi-agent maneuvers basedon the concept of homotopy are determined and correspond-ing trajectories are optimized using mixed integer quadraticprogramming. A multiple model Kalman filter that usesthe trajectories as a transition model infers the intentionsof surrounding agents. The ego vehicle is then controlledso that it complies with the most probable prediction. In[10], a framework is presented that describes the trafficscene using a DBN consisting of multiple interacting agents.Learned context-dependent action models are conditionedon the route intention, allowing a reduction in learningcomplexity. Using SMC inference, they are able to estimatethe multi-modal hybrid belief state and predict the futurescene development based on a non-linear model. Althoughthey do not evaluate the route estimation capabilities, theyshow that their trajectory prediction outperforms a constantturn rate and velocity (CTRV) model in simulation. In ourprevious work [2], we also model the development of a trafficscene with a DBN consisting of multiple interacting agentsusing SMC inference. The decision making process of eachtraffic participant is divided into three levels: route intention,maneuver intention and action (see Sec. IV-A for definitions).Possible routes and maneuvers are queried at runtime givena digital map, and interdependencies between agents as wellas dependencies on the static context (e.g., road curvature)

are represented within the network structure. By including allagents within the state space, it is possible to explicitly ac-count for the combinatorial aspect of interacting agents. Weshow that our interaction-aware approach outperforms puremap-based and CTRV models that neglect other participants.

As this paper focuses on comparing MM-UKF-basedinference to our previous SMC inference, we refer to [2]for a more detailed review of behavior prediction literature.

B. Inference in hybrid dynamic Bayesian networks

DBNs are probabilistic graphical models that consist ofmultiple random variables which are connected via di-rected arcs representing their dependencies. Each variableis described by a probability distribution conditioned on itsparents. A detailed overview of different possible inferencealgorithms for specific variants of DBNs is presented in [13].

In the special case of discrete-only hidden variables orwhen transition and observation models are conjugate (e.g.,linear-Gaussian), exact inference is generally possible. How-ever, for non-linear DBNs with hybrid state space (i.e., bothdiscrete and continuous hidden variables), such as presentedwithin this work, approximate inference becomes necessary.A common method for inference in arbitrary DBNs are SMCmethods, as they can represent arbitrary distributions andcope with non-linear system dynamics. SMC has alreadybeen successfully applied for DBN inference in the contextof behavior prediction, e.g., by [10] and in our previouswork [2]. However, particle-based inference often suffersfrom high complexity, as many particles may be neededto approximate the belief appropriately and to reduce thevariance caused by randomness in the sampling process.Another trade-off can be found in the problems of degener-acy and impoverishment: because of inaccurate models andsystem noise, most particles will end up with small weightseventually (because they represent the state poorly) and onlyfew will have non-negligible weight. A method to reducethis degeneracy and focus the set of particles to regionsin the state space of high probability is called importanceresampling, which, however, introduces another problemcalled impoverishment, meaning that most of the particlesrepresent exactly the same values. A detailed analysis ofthese problems can be found in [3].

In the special case of a switching state space model, it ispossible to represent the belief by a mixture of Gaussians andapply an (interacting) multiple model extended or unscentedKalman filter. Then, the discrete variables are representedby the mode on which the underlying EKFs/e UKFs areconditioned on. The unscented Kalman filter has often beenshown to be superior to the extended Kalman filter (e.g.,[14], [15], [16]). Furthermore, the UKF does not require theanalytic evaluation of any derivatives, making it simpler andmore widely applicable than the EKF [13]. A comparison ofEKF, UKF, and SMC was conducted by [16], highlightingthe benefits of UKF-based inference. They employ a two-step serial process by first sampling the discrete variablesand then applying EKF or UKF for the remaining variablesfor each particle (which they refer to as ’non-strict’ Rao-

Blackwellization). In [17], an in-depth analysis of UKFwas conducted, demonstrating that the advantages and dis-advantages of UKF and SMC in terms of complexity andaccuracy depend on the dimensionality of the state space andthe problem at hand. Furthermore, the authors recommendselection of sigma points that are not too close to the meanbecause otherwise nonlinear effects away from the mean arenot accounted for, although this may be required by the actualspread of the distribution.

III. PROBLEM STATEMENT

A traffic scene S consists of a set of agentsV = {V 0, · · ·, V K}, with K ∈ N0, in a static en-vironment (map) with discrete time, continuous state,and continuous action space. The map consists of aroad network with topological, geometric and infrastruc-ture (yield lines, traffic signs, etc.) information as wellas the prevailing traffic rules. At time step t, the setof agents V is represented by their kinematic statesXt = [x

0t , · · ·,xKt ]>, route intentions Rt = [r0t , · · ·, rKt ]>,

and maneuver intentions Mt = [m0t , · · ·,mKt ]>. The kine-matic state xit = [x

it, y

it, θ

it, v

it]> of agent V i consists of

the Cartesian position, heading, and absolute velocity. Theagent’s length and width are considered to be given determin-istically by the most recent measurement and, for the sakeof brevity, are not included within xi. The route intention ritdefines a path through the road network the agent desires tofollow, the maneuver intention mit the desired order relativeto other agents in cases of intersecting or merging routes (seeSec. IV-A for detailed definitions). Other types of maneuverssuch as lane changes or overtaking are not considered withinthis work. At each time step, each agent executes an actionait that depends on its intentions, the map and the kinematicstates of all agents, transforming the current kinematic statexit to a new state x

it+1. The actions of all agents are denoted

as A = [a0t , · · · ,aKt ]>. The complete dynamic part ofa scene is thus described by St = [Xt, Rt,Mt, At]>. Ateach time step, a noisy measurement Zt = [z0t , · · ·, zKt ]>with zit = [z

ix,t, z

iy,t, z

iθ,t, z

iv,t]> is observed according to

the distribution p(Zt|Xt), containing information about thekinematic states of all agents. The number of agents, possibleroutes and maneuvers is arbitrary and may change over time:agents may appear or disappear and their possible routes andmaneuvers need to be adapted accordingly.

The objective is to estimate the route and maneuverintentions (R,M) of all agents at the current time. This workfocuses on an alternative inference method with improvedruntime complexity compared to our previous approach [2]while keeping high estimation accuracy.

IV. APPROACH

We describe the development of a traffic scene as a Markovprocess consisting of all agents in a scene. Modeling thisprocess in a DBN allows explicit specification of relationsbetween agents, the definition of causal as well as temporaldependencies and handling of the uncertainty of measure-ments and human behavior. The decision making process

V 0 V 0

VK VK

r0

m0

a0

x0

z0

rK

mK

aK

xK

zK

r0

m0

a0

x0

z0

rK

mK

aK

xK

zK

···

···

···

···

time step t time step t + 1

Fig. 2. dynamic Bayesian network (DBN) showing the interdependenciesbetween agents V i. Random variables are drawn as circles, causal andtemporal dependencies as solid and dashed arrows, respectively. r, m, anda depend on the map. [2]

of each agent is composed of three hierarchical layers: theroute intention, i.e. the path the agent desires to follow, themaneuver intention, i.e. whether it is going to pass a conflictarea at an intersection before or after another agent, andthe continuous action comprising acceleration and yaw-rate.First, the set of possible routes and maneuvers is determinedgiven a topological map. Each agent’s continuous action isthen derived by context-dependent behavior models givenits route and maneuver intentions. Recurrently updating thebelief of the DBN with observations of the agents’ posesand velocities allows the drivers’ intentions to be inferred.A probabilistic trajectory prediction is generated by forwardsimulation of the current belief. Thus, the predicted tra-jectories explicitly incorporate the intentions of drivers andrespect the interdependencies between multiple agents.

Fig. 2 shows two consecutive time slices of the DBN andthe dependencies between the single random variables. Ingeneral, this DBN describes a hybrid, non-linear system witha multi-modal, non-Gaussian belief. However, in contrast toour previous work [2], we present multiple model unscentedKalman filtering for inference, which approximates the beliefwith a mixture of Gaussians. To account for changing situ-ations, the network structure is adapted at runtime (creatingand deleting agents as well as route/maneuver hypotheses).Thus, it can be applied to varying situations with an arbitrarynumber of agents, intention hypotheses and different roadlayouts.

A. Transition Models

This section gives an overview of the transition models ofthe DBN. As the focus of this work is the comparison ofthe MM-UKF inference to the previously presented SMC

lH

lH

lH

(a)

1

2

0

{1,2}≺{0}≺{}{1} ≺{0}≺{2}{2} ≺{0}≺{1}{} ≺{0}≺{1,2}(b)

Fig. 3. (a): Possible routes (green). (b): Conflict areas (yellow) fromV 0’s perspective for turning left and the resulting four possible maneuvers,representing the sequence of agents passing the conflict areas. [2]

inference, we refer to our previous work [2] for a moredetailed description of the transition models used. For thesake of brevity, the superscript denoting the single agent isomitted in this subsection (e.g., x instead of xi).

1) Vehicle Kinematics: the action a = [a, θ̇]> of an agentcomprises the longitudinal acceleration a and the yaw rateθ̇. It is the outcome of the agent’s decision making process(described later in this section) and is conditioned on thecurrent context and the agent’s intentions. The kinematicstate is predicted according to the probability distributionp(xt+1|xt,a) = N (x̂t+1,Q), with

x̂t+1=

xt + vt∆T cos(θt+1) +

12at∆T

2 cos(θt+1)yt + vt∆T sin(θt+1) +

12at∆T

2 sin(θt+1)

θt + θ̇t∆Tvt + at∆T

(1)and Q = diag(σ2x, σ

2y, σ

2θ , σ

2v). Although this model is sim-

plistic, we argue that it is sufficient for prediction purposes.2) Measurement: high-level cuboid objects representing

the single agents in a scene are used as measurements, thuslow-level sensor specifics are abstracted. The data associ-ation, i.e., object detection and tracking, is considered tobe given as it is handled by a different software module.We model the kinematic state of each agent to be measuredwith zero-mean Gaussian noise, such that each measurementz = [xz, yz, θz, vz]

> is distributed according to p(z|x) =N (ẑ,R), with ẑ = x and R = diag(σ2zx , σ

2zy , σ

2zθ, σ2zv ).

3) Route Intention: the desired route r ∈ R of an agentrepresents the first layer of its decision making process.It is given by a sequence of consecutive lanes and servesas a path that guides the agent’s behavior. The set ofpossible routes R is determined given the agent’s pose,the topological map, and a specified metric horizon lH(see Fig. 3a). Initially, the desired route r is considered tobe distributed uniformly across the set of possible routes:p(ri|x,map) = |R|−1, ∀ri ∈ R. For the sake of problemsimplification, the decision on a route is considered constant,i.e., the probability of a route switch is assumed to be zero.

4) Maneuver Intention: the desired maneuver m ∈ Mrepresents the second layer of the decision making processand defines the sequence, in which agents desire to mergeor cross at conflict areas (where merging or intersectinglanes overlap), as shown in Fig. 3b. For all vehicle pairs

〈V i, V j〉 having a common conflict area, the maneuverof vehicle V i specifies, whether it will pass this conflictarea before (V i≺V j) or after V j (V i�V j). This kind ofmaneuver is based on the concept of pseudo-homotopy oftrajectories, which we developed in an earlier work [12].Our data suggests, that vehicles that have right of way aretypically only insignificantly influenced by other vehiclesapproaching the intersection. Thus, different maneuvers areonly considered for vehicles that do not have right of way.

An agent’s set of possible maneuvers M is derived givenits route, the map, and the kinematic states of all agents. Thedesired maneuver m is initially distributed uniformly acrossthe set of possible maneuvers: p(mi|X, r,map) = |M|−1,∀mi ∈ M. The decision on a maneuver is also consideredconstant. However, as we track all hypotheses, it is stillpossible to estimate the correct maneuver if a driver changeshis intention and therefore his behavior.

5) Action Model: an agent’s action a = [a, θ̇] de-pends on its route and maneuver intentions, the kine-matic states of all agents, and the map. It represents thethird layer of the decision making process. Within thiswork, the same heuristics-based probabilistic action modelp(a|r,m,X,map) is used as defined in [2]. The accelerationmodel is a non-linear mapping from features to a Gaussiandistribution p(a|r,m,X,map) = N (µa, σ2a), where µa is afunction of (r,m,X,map) and σa is constant. The set offeatures is derived given the map and the kinematic state ofall agents and describes the current context. The yaw rateis distributed according to the Gaussian p(θ̇|r,x, a,map) =N (µθ̇, σ2θ̇), where µθ̇ is a function of (r,x, a,map) so thatthe agent stays close to the center of its desired lane and σθ̇is constant.

B. Inference Algorithm

Our goal is to estimate all drivers’ route and maneuverintentions and to predict their future trajectories. The ex-istence of different possibilities of routes and maneuversgenerally leads to multi-modality in the predicted trajecto-ries. As agents interact with each other and therefore theirtrajectories become interdependent, the discrete intentions ofone agent may also lead to multi-modality in the predictionof another agent. Fig. 4 depicts a typical motion predictionsituation illustrating this combinatorial aspect: dependingon the intended route and maneuver of one agent, anotheragent may completely change its future behavior, resulting indifferent high probability clusters of the continuous state. Aseach UKF only represents a single Gaussian, this potentialmulti-modality creates the need for having multiple modes.

1) Multiple Models: given the route and maneuver ofeach agent, our evaluation suggests that the belief canreasonably be approximated by a single multivariate Gaus-sian. Thus, we define one UKF per possible combinationof route and maneuver intentions of all agents (R,M).The number of possible combinations of K agents de-fines the number of modes of the MM-UKF and isgiven by J =

∏Ki=0

(∑r∈Ri

∑m∈Mi|r 1

). Hence, the be-

lief of the DBN is tracked using a set of J weighted

B

A2 2 5 1

5

4

3,6

1,3

4,6mode rA rB mB

1 s l A≺B2 s l A�B3 s r -4 l l A≺B5 l l A�B6 l r -

Fig. 4. Example of how different modes may result in different clustersof the continuous state (here shown as Gaussians of the positions) at afuture time step, due to the interdependencies between agents. The routesare abbreviated as straight, left, and right.

UKFs U = {(U1, p(U1)), · · · , (UJ , p(UJ))}. Each UKFU j = [Xj , Rj ,M j , Aj ]> represents the complete scene in-cluding all agents and has attached the corresponding modeprobability p(U j).

The general procedure is exemplarily depicted in Fig. 5:Initially, a multivariate Gaussian distribution of the kinematicstate X0 is derived from the first measurement Z0 accordingto p(X|Z) (step 1). Then, for each agent, the set of possibleroutes R0 and the set of possible maneuvers M0 for eachof its possible routes is determined given its own state,the map, and all other agents’ states (step 2-3). For eachpossible combination of (Rj0,M

j0 ) of all agents, one UKF

U j0 is created and initialized to the corresponding Rj0 and

M j0 and the Gaussian distribution Xj0 = X0. Furthermore, it

is assigned with the mode probability p(Rj0,Mj0 ) = p(U

j0 ) =

p(Rj0|Xj0 ,map)p(M

j0 |R

j0, X

j0 ,map).

For each UKF, the belief U jt is predicted to Ujt+1|t

(step 4-5) and updated to U jt+1 (step 6), as shown in the nextsection. The probability of each mode is updated accordingto the measurement likelihood L(Xjt+1|t|Zt+1):

p(U jt+1) =p(U jt )L(X

jt+1|t|Zt+1)∑

l∈J p(Ult)L(X lt+1|t|Zt+1)

. (2)

The Gaussian mixture that represents all modes, i.e., possiblecombinations of routes and maneuvers of all agents, is thengiven by the set of UKFs and corresponding probabilities.

As new agents appear in a new measurement or newroutes or maneuvers emerge due to a route split or a newconflict area within the route horizon, the existing modesare duplicated accordingly in order to represent the newagents or the new possible intentions and the probabilities aresplit uniformly. If routes or maneuvers become impossibleor agents disappear, the corresponding modes are removedor merged. Furthermore, unlikely modes can be disregardedto decrease runtime complexity (however, this option wasdisabled within the evaluation of this paper to ensure a faircomparison with the SMC-based inference). As the numberof modes changes over time, the proposed method is ofthe type multiple model with variable structure [18]. Tosimplify the problem, the intentions of a driver are modeled

r0

m0

a0

x0

z0

x0

z0

time step 0 time step 1

r1 r2

m1 m2m3

x

a

x

a

x

a1) x|z

2) r|x

3) m|x, r

4) a|x, r,m

5)x′ |

x,a

6) x′|x, a, z′

U={(U1, P (U1)),(U2, P (U2)),

(U3, P (U3))}

Fig. 5. MM-UKF inference exemplarily shown for a single vehiclewith three different modes: (r1,m1), (r2,m2), (r2,m3). Distributions aredepicted simplified as being one dimensional. One UKF represents thecomplete state space, i.e., kinematic state, route, maneuver, and action ofall agents in the scene. The steps of the algorithm are marked from 1 to 6.

N (X̂t−1,Pt−1)χt−1

χt|t−1

N (X̂t|t−1,Pt|t−1)

N (X̂t,Pt)

Fig. 6. Estimated belief and annotations for the Gaussians of the lastposterior, the current prior and current posterior as well as the sigma pointsused for the prediction step. The posterior of the last time step and the priorof the current time step are drawn at lower height to improve visualization.

as constant over time, i.e., the probability of switching of thecurrent mode is assumed to be zero. Thus, a non-interactingmultiple model filter is applied. To model the possibility ofmode switches, an interacting multiple model filter could beapplied [19].

2) Single Model UKF: generally a UKF determines adeterministically chosen set of points that captures the meanand covariance of the original distribution and propagatesthem through the non-linear prediction and update functions.Then a new distribution is fitted to the resulting transformedpoints. As in our case the measurement is Gaussian andthe measurement function is linear, sigma points are onlyutilized for the prediction step to be able to capture thenon-linear system dynamics. A Gaussian is fitted directlyafter prediction and before the measurement update. Then astandard Kalman filter correction step is performed using thepredicted Gaussian and the measurement. This procedure isdepicted in Fig. 6 and will be explained in the remainder ofthis section. For the sake of brevity, the mode superscript jis omitted within this section.

Typically, a set of 2L+ 1 sigma points is chosen, with Lbeing the dimensionality of the state, such that one sigmapoint is placed at the mean and the others are symmetri-cally spread around it. If the system is affected by non-additive process noise, the mean state and the covariancemust be augmented by one dimension per noise term, suchthat it can be accounted for by the sigma points. Additivenoise can simply be added to the fitted Gaussian afterthe transformation. In this work, for each agent, the mean

state and covariance are augmented by the non-additiveprocess noise terms for acceleration and yaw-rate resultingin x̂a,i = [x̂i

>µia µ

iθ̇]> and P a,i = diag(P i, σ2a, σ

2θ̇). where

µa is a function of (r,m, X̂,map) and µθ̇ a functionof (r, x̂, â,map) (step 4). The overall augmented distri-bution consisting of all agents is thus given by a Gaus-sian with mean X̂a = [x̂a,1, · · · , x̂a,K ]> and covarianceP a = diag(P a,1, · · · ,P a,K).

For K agents, the state has a size of L = K(4 + 2),resulting in a total of 2L+ 1 = 12K + 1 sigma points beingcreated. The sigma points are given by

χ0t = X̂at (3)

χit = X̂at + (

√(L+ λ)P at )i, i = 1, . . . , L (4)

χit = X̂at − (

√(L+ λ)P at )i, i = L+ 1, . . . , 2L, (5)

with (√

(L+ λ)P at )i being the ith column of the matrixsquare root of (L+ λ)P at . Each sigma point χ

it is then

propagated through the transition function of the kinematicstate (1) to the new time step (step 5). The resulting weightedsigma points χit+1|t are then used to derive the predictedGaussian with mean and covariance

X̂t+1|t =

2L∑i=0

W isχit+1|t (6)

Pt+1|t = Q +

2L∑i=0

W ic [χit+1|t − X̂t+1|t][χ

it+1|t − X̂t+1|t]

>

(7)

with the state and covariance weights

W 0s =λ

L+ λ, W 0c = W

0s + (1− α2 + β),

W is = Wic =

1

2(L+ λ), i = 1, . . . , 2L (8)

and λ = α2(L+ κ′)− L. The choice of how to spreadsigma points is extensively discussed in [17]. We found wewere able to achieve good results when choosing what theauthor refers to as the Gauss set which uses the parametersα = 1, β = 0, κ′ = 3− L. The measurement step is thenperformed according to the standard Kalman filter (step6), correcting the mean and covariance of the belief andallowing the determination of the measurement likelihoodL(Xjt+1|t|Zt+1) of the specific mode.

For the intention estimation process, the DBN is thus ap-plied as a filter, comparing the different model hypotheses tothe actual observations. As DBNs are generative models, i.e.,they can generate values of any of their random variables, itis further possible to do a probabilistic forward simulationby iteratively predicting the current belief (including theestimated intentions) into subsequent time steps, applyingthe same models as for the filtering. Thus, for each UKF,one multi-agent trajectory is generated and weighted withthe corresponding probability p(U jt ).

Fig. 7. Forward simulation of belief to generate multi-agent trajectories.

V. EVALUATION

To highlight the differences of SMC-based and MM-UKF-based inference, their route and maneuver intention estimatesare compared in simulation and real world scenarios. Thedata is recorded with a measuring vehicle using GPS/INS,lidar and radar sensors to detect and track surrounding trafficparticipants. We use a standard sequential importance resam-ple (SIR) particle filter, also known as the bootstrap filter,with low-variance resampling. As SMC inference convergesto the optimal estimate for n → ∞, we compare the resultof each estimator to the mean estimate over ten runs ofSMC with a high number of samples (n = 105), which weabbreviate as BE for best estimator. The imprecision of eachestimator is measured using the Kullback-Leibler divergence

DKL(riBE‖ri) =

|R|∑j=1

riBE,j logriBE,jp(rij)

(9)

between the estimate ri = [p(ri1), · · · , p(ri|R|)] and theestimate of the best estimator riBE (analogous for maneuverestimates).

For each possible combination of routes and maneuversof all agents, one forward simulation resulting in a multi-agent trajectory is executed, as depicted in Fig. 7. Eachforward simulation is initialized to the mean kinematicstate of the given mode. As this procedure is the samefor SMC-based and MM-UKF-based inference, the resultingtrajectories given the same initial belief are identical. For adetailed evaluation of the forward simulation, we refer to [2].

A statistical evaluation of the mean D̄KL of all agents’route intentions over six different scenes (real and simulated)with 15 vehicles in total is depicted in Fig. 8. On average, theMM-UKF outperforms SMC with up to n = 50000 particles,while having a maximum runtime per time step of τ = 0.57 scompared to τ = 7.9 s of SMC. As the estimation results arestrongly dependent on the scenario, three different scenesare evaluated in more detail in Fig. 9. The first sceneconsists of three vehicles with human drivers approachingan intersection. The blue vehicle stops to yield to the greenvehicle. As when turning right, the blue vehicle would nothave to yield, it can be inferred that it either wants to turnleft or go straight ahead (t ≈ 8 s). As soon as the greenvehicle has crossed the intersection, the blue vehicle starts todrive again and goes straight (t ≈ 9 s). The MM-UKF has aruntime comparable to SMC with n = 600 but outperformsSMC with up to n = 50000. In the second scene, alsousing real data, the blue vehicle follows the green vehicle,

102 103 104 10510−4

10−1

102

number of particles

D̄KL

MM-UKFSMC

Fig. 8. Comparison of Kullback-Leibler divergence of MM-UKF and SMCwith different number of particles over multiple scenes (15 agents in total).

which approaches the intersection and slows down beforeturning left. To maintain the desired headway distance, theblue vehicle also slows down, irrespective of its desiredroute (t ≈ 20 s). As soon as the green vehicle has left theintersection, the blue vehicle accelerates again and continuesstraight (t ≈ 30 s). The runtime of MM-UKF is similar toSMC with n = 300, but it performs even better than SMCwith n = 105. The third scene shows how maneuvers androutes are estimated simultaneously. Initially, all three routesof the simulated blue vehicle have equal probability. As itonly slows down slightly because of the upcoming curvature,but does not stop in order to yield to the green vehicle, itis inferred that neither going straight, nor yielding is likely.As soon as it starts turning, it is correctly inferred that it isturning left and merges before the green vehicle.

In all scenes, it was shown that SMC can result in highvariances in the estimates depending on how many samplesare used. As MM-UKF chooses its sigma points determinis-tically, multiple runs of the same scene always result in thesame estimates. Although SMC is naturally able to achievehigher accuracy, this is usually at the cost of a higher runtime.In all of the evaluated examples, MM-UKF outperformedSMC in terms of accuracy per runtime. However, it has tobe noted that the number of needed sigma points and thusthe runtime of MM-UKF is dependent on the number ofmodes and vehicles in a scene. The results indicate that thepresented MM-UKF-based inference provides a reasonablecompromise between accuracy and runtime.

VI. CONCLUSIONS

In this work, we propose a multiple model unscentedKalman filter (MM-UKF) based inference method for theintention estimation and trajectory prediction DBN presentedin [2]. By including all agents within the state space, it ispossible to account for interactions between multiple agents.Each possible combination of discrete route and maneuverintentions of all traffic participants forms one of the multiplemodes of the filter. The comparison to SMC-based filteringsuggests that the continuous belief given a specific modecan reasonably be approximated by a single multivariateGaussian within the presented scenarios given the non-linear transition models. It is shown that MM-UKF has agenerally lower runtime than SMC with comparable accuracyor achieves improved accuracy given a similar runtime. Thecomparison of multiple runs with different SMC randominitialization highlights the high variance in the estimationresults, which is naturally not the case for MM-UKF. This

ensures reproducible results allowing an easier validation ofthe overall framework. However, SMC has the advantagethat the number of samples can be adjusted according toruntime requirements, whereas the number of sigma pointsis defined by the situation. Furthermore, discrete variablesand non-additive noise terms can be added straightforwardlyfor SMC, whereas for MM-UKF the modes need to accountfor each discrete variable and the state of the sigma pointsneeds to be augmented for each non-additive noise term.

Future work should focus on a statistical evaluation of anextensive dataset of urban traffic scenes. Detailed analysisof the scenes in which SMC achieves superior results willhelp in determining the approximation limits using Gaussianmixtures and sigma points in the current domain.

REFERENCES[1] S. Lefèvre, D. Vasquez, and C. Laugier, “A survey on motion predic-

tion and risk assessment for intelligent vehicles,” Robomech J., vol. 1,no. 1, p. 1, 2014.

[2] J. Schulz, C. Hubmann, J. Löchner, and D. Burschka, “Interaction-aware probabilistic behavior prediction in urban environments,” in Int.Conf. on Intell. Robots and Syst. (IROS) (accepted), IEEE, 2018.

[3] T. Li, S. Sun, T. P. Sattar, and J. M. Corchado, “Fight sample degen-eracy and impoverishment in particle filters: A review of intelligentapproaches,” Expert Syst. with Appl., vol. 41, no. 8, pp. 3944–3954,2014.

[4] G. S. Aoude, V. R. Desaraju, L. H. Stephens, and J. P. How, “Be-havior classification algorithms at intersections and validation usingnaturalistic data,” in Intell. Veh. Symp. (IV), pp. 601–606, IEEE, 2011.

[5] M. Barbier, C. Laugier, O. Simonin, and J. Ibanez-Guzman, “Clas-sification of Drivers Manoeuvre for Road Intersection Crossing withSynthetic and Real Data,” in Intell. Veh. Symp. (IV), p. 7, IEEE, 2017.

[6] D. J. Phillips, T. A. Wheeler, and M. J. Kochenderfer, “GeneralizableIntention Prediction of Human Drivers at Intersections,” in Intell. Veh.Symp. (IV), pp. 1665–1670, IEEE, 2017.

[7] M. Liebner, M. Baumann, F. Klanner, and C. Stiller, “Driver intentinference at urban intersections using the intelligent driver model,” inIntell. Veh. Symp. (IV), pp. 1162–1167, IEEE, 2012.

[8] Q. Tran and J. Firl, “Online maneuver recognition and multimodaltrajectory prediction for intersection assistance using non-parametricregression,” in Intell. Veh. Symp. (IV), pp. 918–923, IEEE, 2014.

[9] A. Armand, D. Filliat, and J. Ibanez-Guzman, “Modelling stop inter-section approaches using gaussian processes,” in Intell. Transp. Syst.-(ITSC), pp. 1650–1655, IEEE, 2013.

[10] T. Gindele, S. Brechtel, and R. Dillmann, “Learning context sensitivebehavior models from observations for predicting traffic situations,” inInt. Conf. on Intell. Transp. Syst.-(ITSC), pp. 1764–1771, IEEE, 2013.

[11] D. Lenz, F. Diehl, M. T. Le, and A. Knoll, “Deep neural networksfor Markovian interactive scene prediction in highway scenarios,” inIntell. Veh. Symp. (IV), pp. 685–692, IEEE, 2017.

[12] J. Schulz, K. Hirsenkorn, J. Löchner, M. Werling, and D. Burschka,“Estimation of collective maneuvers through cooperative multi-agentplanning,” in Intell. Veh. Symp. (IV), pp. 624–631, IEEE, 2017.

[13] K. P. Murphy, “Dynamic bayesian networks,” Probabilistic GraphicalModels, M. Jordan, vol. 7, 2002.

[14] S. J. Julier and J. K. Uhlmann, “Unscented filtering and nonlinearestimation,” Proc. of the IEEE, vol. 92, no. 3, pp. 401–422, 2004.

[15] E. A. Wan and R. Van Der Merwe, “The unscented Kalman filter fornonlinear estimation,” in Adaptive Syst. for Signal Process., Commun.,and Control Symp., pp. 153–158, IEEE, 2000.

[16] M. N. Andersen, R. O. Andersen, and K. Wheeler, “Filtering in hybriddynamic Bayesian networks,” in Int. Conf. on Acoustics, Speech, andSignal Process., vol. 5, pp. V–773, IEEE, 2004.

[17] S. Bitzer, “The UKF exposed: How it works, when it works and whenits better to sample.,” doi:10.5281/zenodo.44386, 2016.

[18] X.-R. Li and Y. Bar-Shalom, “Multiple-Model Estimation with Vari-able Structure,” IEEE Trans. on Automat. Control, vol. 41, no. 4, p. 16,1996.

[19] Y. Bar-Shalom, P. K. Willett, and X. Tian, Tracking and data fusion.YBS publishing, 2011.

t = 23 s

t = 9 s

t = 5 s

5 10 1510−8

10−4

100

time [s]

DKL

102 103 104 10510−6

10−1

104

number of particles

D̄KL

0 5 100

0.5

1

time [s]

P(R

)

SMC, n=102, τ=0.022 s

0 5 100

0.5

1

time [s]

SMC, n=103, τ=0.081 s

0 5 100

0.5

1

time [s]

SMC, n=104, τ=0.66 s

0 5 100

0.5

1

time [s]

SMC, n=105, τ=7.6 s

0 5 100

0.5

1

leftright

straight

time [s]

MM-UKF, τ=0.061 s

(a) Scene 1 (real data): The MM-UKF has a maximum runtime τ = 0.061 s, which is comparable to SMC with n = 600, but performs as good as SMCwith n = 50000.

t = 35 s

t = 28 s

t = 20 s

10 15 20 25 30 3510−8

10−4

100

time [s]

DKL

102 103 104 105

10−3

10−1

101

number of particles

D̄KL

10 20 300

0.5

1

time [s]

P(R

)

SMC, n=102, τ=0.013 s

10 20 300

0.5

1

time [s]

SMC, n=103, τ=0.061 s

10 20 300

0.5

1

time [s]

SMC, n=104, τ=0.53 s

10 20 300

0.5

1

time [s]

SMC, n=105, τ=5.7 s

10 20 300

0.5

1

left

right

straight

time [s]

MM-UKF, τ=0.020 s

(b) Scene 2 (real data): The MM-UKF has a maximum runtime τ = 0.020 s, which is comparable to SMC with n = 300, but performs even better thanSMC with n = 100000.

t = 16 s

t = 13 s

t = 10 s

0 5 1010−8

10−4

100

time [s]

DKL

102 103 104 105

10−4

10−2

number of particles

D̄KL

0 5 100

0.5

1

time [s]

P(R

,M)

SMC, n=102, τ=0.024 s

0 5 100

0.5

1

time [s]

SMC, n=103, τ=0.18 s

0 5 100

0.5

1

time [s]

SMC, n=104, τ=1.8 s

0 5 100

0.5

1

time [s]

SMC, n=105, τ=22 s

0 5 100

0.5

1 — left, ≺V 1- - left, �V 1— right— straight, ≺V 1- - straight, �V 1

time [s]

MM-UKF, τ=0.037 s

(c) Scene 3 (simulated data): The MM-UKF has a maximum runtime τ = 0.037 s, which is comparable to SMC with n = 200, but performs as good asSMC with n = 40000.

Fig. 9. Comparison of route/maneuver estimation accuracy and variance of MM-UKF and SMC with different number of particles. The plots depict theroute/maneuver probabilities of the blue agent in each scene over ten runs with different random initialization (mean: thick line, min/max: filled areas). AsMM-UKF has deterministic output, there is no variance in the estimate. Furthermore, the DKL is plotted over time and its mean D̄KL over the numberof used particles. The results of MM-UKF are depicted as dashed lines, whereas the ones of SMC as solid lines with different grayscale depending on thenumber of particles (100, 200, 500, 1000, 5000, 10000, 50000, 100000): the more particles, the darker the line. The maximum runtime τ for one timestep is based on non-optimized C++ code running on an Intel Core i7-5820K CPU @ 3.30GHz.

Documents

Multiple Model Unscented Kalman Filtering in Dynamic ...mediatum.ub.tum.de/doc/1456521/document.pdfMultiple Model Unscented Kalman Filtering in Dynamic Bayesian Networks for Intention