Chapter 2 Basis of probabilistic roboticsbibing.us.es/proyectos/abreproy/70437/fichero/4_Capitulo... · Chapter 2 Basis of probabilistic robotics T his chapter will introduce the

Chapter 2

Basis of probabilistic robotics

This chapter will introduce the basis of probabilistic robotics used by themethods presented in this work. At first some probabilistic theory will beintroduced ending with Bayes rules. Bayes probabilistic theory is then usedto explain the basis of bayesian filters for parametric, non-parametric andsemi-parametric algorithms.

2.1 Introduction

Probabilistic robotics has been a common approach used in robotics in the lastdecade. It pays tribute to the uncertainty in robot perception and motion insteadof relaying on a single best approximation like other classical batch algorithms.This uncertainty, as stated in [2], comes from the incompleteness of robot per-ception and the inherently unpredictable environments which are present evenin structured environments. As it’s not possible to represent all variables of theenvironment, there are always hidden variables not taken into account in observa-tion and motion models. As a result of these hidden variables, there isn’t alwaysa good match between models and real world. The basis of probabilistic roboticsis the use of probabilistic theory to represent the uncertainty of the environmentas well as the uncertainty related with perception and motion capabilities ofrobots. These probabilistic paradigm will allow to use maximum entropy princi-ple to avoid the problem of incompleteness and converting it into uncertainty orestimations (i.e. a probability distributions), employing for that purpose a pre-liminary knowledge (motion and observation models). Then, Bayesian inferencewill allow to make decisions from the uncertainty generated before (in this work,the problem of decision making is not considered). A general Bayesian RobotProgramming (BRP) framework (see Figure 2.1), which models the incomplete-ness and uncertainty present in robotics, and examples of implementations aregiven in [24].

2.1 Introduction

Figure 2.1: Bayesian Robot Programming Framework

Sensors are limited in what they can perceive. In that sense, incompletenessof robot sensors arises from several factors like sensor range, resolution, whileuncertainty use to be related to calibration, noise or planning problems. In thesame way, robot motion is limited by mechanical constraints and noise and themain source of uncertainty comes from simplified models used for prior estimationof robot state. Figure 2.2a shows an example of uncertainty where the robot cannot disambiguate it’s position by only employing a LIDAR sensor. The measuresreceived in that well-structured environment are exactly the same in 4 positionsof the map (an example of incompleteness in robot perception). The probabilistictheory gives some means to represent this uncertainty using different probabilisticdistributions, this probability distribution represents what is called the belief ofthe robot. For example, Figure 2.2b shows a multimodal gaussian distributionused to represent the uncertainty in robot position when only a LIDAR sensoris employed. As in other non probabilistic algorithms, it is possible to solvethis ambiguous situations by merging the information of other sensors like inFigure 2.2c where the corners of the environment are painted with different colorsand the robot uses a camera to disambiguate the ambiguous belief produced byLIDAR sensor.

Now, considering the case where the motion information is include intorobot’s belief as shown in Figure 2.3. At first, in Figure 2.3a, the robot isplaced in an environment, but the robot doesn’t know its initial position (inthis example the map is known) so the probability distribution of robot beliefis a uniform distribution (blue path). Then, the robot sense the environmentwith a LIDAR sensor (range-bearing measurements), as shown in Figure 2.3b,so that the robot’s belief is updated with 4 probable positions represented witha multimodal gaussian distribution. When robot moves down as depicted inFigure 2.3c, the information about the action taken by the robot is included into

18 | Basis of probabilistic robotics

2.2 General concepts of probability theory

(a) Map explored withLIDAR

(b) Robot belief (c) Disambiguated be-lief

Figure 2.2: Figure (a) shows an example of ambiguity generated by a structuredenvironment built with a LIDAR sensor. In (b) a multimodal gaussian distribu-tion is used to represent the belief of robot. In (c) shows a map with paintedwalls used to disambiguate the robot belief when a camera is used together withthe LIDAR sensor.

robot’s belief and, as the robot knows the map of the environment, it is able todisambiguate its position belief since only one of the 4 previous positions allowsthe robot to move in that way.

Stated probabilistically, the robot perception is a state estimation problemwhich can be solved with Bayesian filters. Bayesian filters attempts to updatethe robot belief employing the sensors and motion information. But, this update,can turn out into a information gain or into a loss of information depending onthe dynamics of the robot’s environment.

In contrast with classical solutions, probabilistic algorithms have weaker re-quirements on the accuracy of robotics sensors allowing to represent the degreeof uncertainty of the robot by means of probabilistic distributions. Hence, prob-abilistic algorithms tend to be a more robust solution and scalable for real-worldenvironments. However, the solutions proposed in this paradigm are generallymore complex computationally since they not only offer a single solution but acomplete probabilistic distribution. Indeed, in [12], the probabilistic paradigmhas been demonstrated to be a NP-hard problem. The reason why researchers onthis area are focused to provide solutions as efficient as possible specially whendealing with continuous state spaces, where traditional solutions tend to be moreefficient.


In order to clarify the algorithms presented in this work, some basic probabilisticconcepts will be introduced in this section. In probabilistic robotics, the percep-tion of the environment is represented by space states. Furthermore, all elements

Basis of probabilistic robotics | 19


(a) Uniform distribu-tion

(b) Posterior belief (c) Disambiguated be-lief

Figure 2.3: Figure (a) shows the initial uniform distribution of a robot placedin the map. In (b), a LIDAR set of measurements are incorporated into robot’sbelief producing a mutimodal gaussian distribution. Finally, in (c), the robotsdisambiguate its position by incorporating motion information into robot’s belief.

involved in robot’s world like sensor measurements, actions, and the state of therobot are modelled as random variables. The value of random variables is en-closed in a specific domain which depends on the element of the robot’s worldthey represent. These values are governed by probabilistic rules which must beinferred from other random variables and the observed information.

Let X denote a random variable which defines the domain of possible valuesx that this random variable can take. When the element represented depends onmore than one value, the vector of these random variables is called a multivariate.To represent the probability of a single or multivariate random variable X to takea value x, this document will use the notation p(X = x), which will usually beabbreviated as p(x).

On the other hand, another classification of random variables can be useddepending on the variable domain. In that sense, we might find the followingtype of random variables:

❼ Continuous random variables: These variables takes a continuous spacedomain and their probability distributions are represented by continuousprobability density functions (PDFs). These PDFs must integrate to 1when considering the entire domain:

∫

X

p(x)dx = 1 (2.1)

In this work, the mainly used density function, is the normal or Gaussiandistribution with mean µ and variance σ2, abbreviated as N (x;µ, σ2) andwhich PDF is defined for a single random variable as:

p(X = x) = p(x) =1√2πσ2

e−(x−µ)2

2σ2 (2.2)



Figure 2.4: Discrete multivariate for robot 2D position.

For multivariate random variables, the normal distribution has the follow-ing PDF:

p(X = x) = p(x) =1

√

2π |Σ|e−

12(x−µ)TΣ−1(x−µ) (2.3)

Discrete random variables: These variables takes a discrete space do-main and their probabilistic distributions are represented by a discretesum of the possible values of the random variable domain. Each value isbounded up to 1 and the probability distribution must sum up to 1:

∑

X

p(x) = 1 (2.4)

An example of discrete multivariate random variable might be the positionof a robot in a 2D sampled space (grid) like in Figure 2.4.

Another common term in probability theory and which is used in this workis the joint probability. The joint probability, is the probability of a pair or morerandom variables to take a value in the domain of each random variable and isrepresented as follows:

p(X = x ∧ Y = y) = p(x ∧ y) = p(x, y) (2.5)

Another important concept is known as the absolute independence, whichrefers to a set of variables which are completely independent between each other,


2.2.1 Conditional probability

and hence the joint probability can be divided in the product of independentprobabilities as shown in the following equation:

p(x, y) = p(x)p(y) (2.6)


In robotics, sometimes it is needed to express that a variable carry informationabout other random variables. For example, the measurements of a LIDARsensor carry information about robot’s position. In those cases, this variablesare told to be conditioned (e.g. the position of the robot is conditioned on themeasures of a LIDAR sensor since, depending on the position of the robot, theLIDAR sensor will return a set of range-bearing measurements or others). Theprobability of a random variable X to take a value x, conditioned to a secondrandom variable Y which value is y is denoted as follows:

p(x|y) = p(X = x|Y = y) (2.7)

When p(y) > 0 the conditional probability is calculated as:

p(x|y) = p(x, y)

p(y)(2.8)

Otherwise, for p(y) = 0, it is considered that p(x|y) is undefined. On the otherhand, if the random variables are independent, then the conditional probabilityp(x|y) has the same value as if Y takes any value, i.e. p(x|y) = p(x). This ruleis derived from the property of absolute independence presented above (2.6):

p(x|y) = p(x, y)

p(y)=

p(x)

p(y)

p(y)

= p(x) (2.9)

Once some axioms of probabilistic theory have been defined, the followingrules ( the ones most used in probabilistic robotics) can be derived from them:

1. Theorem of total probability: This property comes from the axioms ofprobability and the rule (2.9), and is described by the following rules fordiscrete and continuous random variables respectively:

p(x) =∑

Y

p(x|y)p(y) (2.10)

p(x) =

∫

Y

p(x|y)p(y)dy (2.11)

The product p(x|y)p(y) in (2.10) and (2.11) is defined as 0 if either p(x|y)or p(y) are 0.



2. Chain rule: This property is derived from the basic conditional proba-bility rule and allows to calculate the joint distribution of a set of randomvariables using only conditional probabilities. To explain this rule, considerthe joint probability of a set of random variables X1...Xn. The chain ruleallow to calculate this probability as:

p(x1, ..., xn) = p(xn|xn−1, ..., x1)p(xn−1, ..., x1)

And, by repeating this rule recursively, the rule can be written as:

p (∩nk=1xk) =

n∏

k=1

p(xk| ∩k−1

j=1 xj

)(2.12)

3. Bayes rule: This is the most important rule in probabilistic robotics andin probabilistic inference in general, as it provides a rule to calculate p(x|y)from its ”inverse” conditional probability p(y|x). As in (2.7), this rulerequires p(y) > 0. The rule for discrete and continuous random variablesis as follows:

p(x|y) = p(y|x)p(x)p(y)

=p(y|x)p(x)

∑

X′ p(y|x′)p(x′)(2.13)

p(x|y) = p(y|x)p(x)p(y)

=p(y|x)p(x)

∫

X′p(y|x′)p(x′)dx (2.14)

In equation (2.13) and (2.14), as p(y)−1 not depend on value x, this rule isoften written in normalized form:

p(x|y) = ηp(y|x)p(x) (2.15)

In this notation, η refers to a normalization factor which avoids the cal-culation of p(y) and implies that the result of equation (2.15) should benormalized to 1.

It is possible to condition the Bayes rules to more than one random vari-ables, thus, for example, for two conditional random variables Y and Z theBayes rule would be expressed as:

p(x|y, z) = p(y|x, z)p(x|z)p(y|z) =

p(z|x, y)p(x|y)p(z|y) (2.16)


2.2.2 Expectation, variance and entropy

4. Conditional independence: This rule extends the conditional indepen-dence for joint probabilities conditioned to a set of random variables andits application is very similar to (2.6):

p(x, y|z) =︸︷︷︸

Eq.2.7

p(x, y, z)

p(z)=

︸︷︷︸

Eq.2.12

p(x|z)

p(x|y, z)p(y, z)p(z)

=︸︷︷︸

Eq.2.12

p(x|z)p(y|z) (2.17)

Despite the rules for absolute and conditional independence are similar,conditional independence does not imply absolute independence (2.6) andvice versa. That means that two variables can be jointly conditionallydependent to a variable Z and at the same time this two variables can bebe independent between each other (2.6). However, in some cases, bothkind of independence might meet.


The algorithms used in probabilistic robotics require to compute a set of statisticsfrom probability distributions. The most important statistics in probabilisticrobotics are the expectation, covariance and entropy. This statistics are describedin the following subsections.

❼ Expectation: The expectation is the expected value of a random variableif the process is repeated infinitely and is calculated as the weighted meanvalue of all possible values of the random distribution. This weighted meanvalue is calculated for the case of discrete and continuous random variablesas follows:

E[X] =∑

X

xp(x) (2.18)

E[X] =

∫

X

xp(x)dx (2.19)

An important property of the expectation is its linearity with respect therandom variables. Suppose a and b ∈ R, then:

E[aX + b] = aE[X] + b (2.20)

❼ Variance and covariance: The variance σ2 is a single value which mea-sures the squared expected deviation σ of a single random variable fromthe mean value obtained with the expectation statistic, whilst the covari-ance Σ is matrix used for multivariate probabilities and calculates not



only the variance of each individual random variable but also the correla-tion between each pair of variables. The following matrix represents thecovariance matrix of two random variables, X and Y .

Σ(X, Y ) = E[(X − E[X])(Y − E[Y ])] (2.21)

Σ(X, Y ) =

[σ2(x) σ(x, y) = σ(x)σ(y)

σ(y, x) = σ(y)σ(x) σ2(y)

]

The correlation between two variables σ(x, y), indicates how a randomvariable is affected by a change in the other. Thus, if the correlationbetween two variables is positive, then both variables change in the sameway, but if the correlation is negative, an increase in a random variablewill suppose a decrease on the other. When two random variables areindependent, then they are uncorrelated, which means that their covariancevalue is zero. With this properties, one may notice that the variance is aspecial case of covariance for a single random variable.

The intrinsic properties of a covariance matrix are:

1. Bilinear: for constants a and b and random variablesX, Y , Z, σ(ax+by, z) = aσ(x, z) + bσ(y, z)

2. Symmetric: σ(x, y) = σ(y, x)

3. Positive semi-definite matrix: σ2(x) = σ(x, x) ≥ 0 for all randomvariables X, and σ(x, x) = 0 implies that X is a constant randomvariable.

When a vector of random variables (or multivariate) x is transformed bya linear transformation A, the covariance matrix is then transformed ac-cording to the following equation due to its derivation of the expectationstatistic which is linear too.

Σ(Ax) = AΣ(x)AT (2.22)

❼ Entropy: The entropy H originates in information theory , and is a mea- Entropy is not used

by algorithms de-

scribed here but

the term is used

throughout this

document.

Entropy is not used

by algorithms de-

scribed here but

the term is used

throughout this

document.

sure of unpredictability or information content:

H(X) = E[−log2P (X)] (2.23)

This expectation resolves for discrete and continuous random variables asfollows:

H(X) = −∑

X

p(x)log2p(x) (2.24)


2.3.1 Concepts

H(X) = −∫

X

p(x)log2p(x)dx (2.25)

Then, the entropy suppose a good statistic to estimate the gain of informa-tion when a robot takes a specific action. Higher values of entropy indicateshigher uncertainty, so decision making algorithms use this statistic to lookfor those actions that make the new entropy lower than the actual one, i.e.a gain of information.

2.3 Bayesian filters

Before describing the general Bayes filter algorithm it is necessary to intro-duce some basic concepts related with this general algorithm. Then the gen-eral Bayes filter algorithm is introduced to further explain Gaussian filters andNon-parametric filters as an implementation of Bayesian filters.

2.3.1 Concepts

One of the most important concepts related with Bayes filters is the concept ofstate. The state represents the actual characteristics of the environment includ-ing the robot characteristic like the position of the robot, the position of peoplearound the robot, the weather or anything which can affect the objective trackedby the robot. States can be composed by dynamic or static elements. Dynamicelements are those which change over time (e.g. people, other cooperative ornon-cooperative robots, etc), while static elements of the state are those whichdoes not change but are relevant to accomplish the robot mission (e.g. staticradio emitters, walls, etc). Those objects which are distinct, stationary featuresof the environment and hence can be recognized reliably are called landmarks, atypical example of a landmark is the visual one shown in Figure 2.5 for an Aug-mented Reality (AR) application, but in this work radio emitters or ultrasonicsensors which transmit its RFID are considered landmarks too.

Following the Markov assumption, a complete state xt is a state which con-The concept of

complete state

comes from process

known as Markov

chains

The concept of

complete state

comes from process

known as Markov

chains

tains the necessary information about past events and states in order to predictfuture states stochastically. In that way, a complete state xt will be indepen-dent from past events like measurements z0:t−1, actions u0:t−1 and even, previousstates x0:t−1.

Measurement vectors z carry information relative to perceived data from theenvironment through any sensor on-board robot or not. An example of measure-ment vector might be a set of n range measurements zt = [r1, r2, . . . , rn]

T , whichis indeed the most common vector state used in this work. On the other hand,control vectors or motion vectors u carry information about changes produced in


2.3.2 State estimation

Figure 2.5: Example of visual landmark.

the state (people moving around, the motion of the robot, the motion of dynamiclandmarks, etc). An example of motion vector might be the data provided byodometers in a ground mobile robot ut = [v, w]T , where v is the linear velocityof the robot and w is the angular velocity of the robot.

Finally, one of the most important concepts in probabilistic robotics is theconcept of belief. The belief of a robot is described as the internal knowledge ofthe current state of the robot, i.e. the belief represents the state of the robot witha probability distribution which is conditioned not only on what robot perceivesbut also in what robot do. As in [30], here bel(xt) is used to represent the beliefover the current state, which is an abbreviation of:

bel(xt) = p(xt|z1:t,u1:t) (2.26)

2.3.2 State estimation

In probabilistic robotics, Bayes rule is the basic rule used in Bayesian filters andin probabilistic inference in general. Taking into account the Bayes rule (2.13),X might be a random variable representing a quantity to be estimated and Y thedata used to infer X. In that rule P (X) is known as the prior distribution, whichsummarizes the current knowledge of X prior incorporating the new data y. Theprobability p(x|y) is called the posterior probability distribution over X. TheBayes rule allows to infer the posterior probability through the probability p(y|x)which is often called the generative model. As explained before, the probabilityp(y) is not calculated, instead equation (2.15) is used.

When updating the state x0:t−1 to xt, generative laws of probabilistic roboticsare used to calculate the posterior p(xt|x0:t, z1:t,u1:t). But, as stated above,assuming the Markov assumption, the posterior is independent from past states,


2.3.3 Gaussian filters

measurements and actions, since xt is complete and hence the posterior can beexpressed as:

bel(xt) = p(xt|zt,ut) (2.27)

To calculate this posterior, it might be useful to calculated a prior proba-bility before incorporating measurement zt, thus a probability often referred asprediction is calculated as:

bel(xt) = p(xt|ut) (2.28)

The process of calculating the posterior belief from this prior belief is calledcorrection or measurement update. Then, a generic Bayesian filter algorithm forcontinuous state is detailed in Algorithm 2.1.

Algorithm 2.1: Generic Bayesian filter algorithm

Input: bel(xt−1), ut and zt

Output: bel(xt)

1 for all xt do

2 bel(xt) =∫p(xt|xt−1,ut)bel(xt−1);

3 bel(xt) = p(zt|xt)bel(xt);

4 endfor

The algorithm shows how the prediction stage is calculated using the theoremof total probability (2.11) by calculating the integral over the previous state ofposterior (2.28) and the previous belief bel(xt−1). The prediction (2.28) dependson the motion model of the control values contained in the motion vector. Then,the predicted belief is used to integrate the measurement zt and get the poste-rior belief bel(xt). To calculate the probability bel(xt) = p(xt|zt), the algorithmuses again the normalized Bayes rule (2.15) where p(xt) = bel(xt) as it is provedin [30]. The probability p(zt|xt) is called the measurement probability and , ascan be seen, depends on the actual state xt which is a common characteristic ofhidden Markov models (HMM) or dynamic Bayes networks (DBM). This proba-bility is calculated from the measurement model of the sensor employed for eachkind of measurement.

Another important aspect of this algorithm is that, as one may notice, isa recursive algorithm since it depends on the belief at previous state. Thisrecursion implies that the initial belief bel(x0) must be given. The initial valueof the belief is one of the major challenges in RO-SLAM and hence, it is one ofthe mayor objectives of this work. The following algorithms will explain differentgeneral strategies to give the initial belief and the means to obtain the posteriorprobability according to the Bayes filter Algorithm 2.1.




Gaussian filters are the earliest and most common implementation of the Bayesianfilter algorithm for continuous spaces. They are based on the representation ofthe belief by multivariate normal distributions (2.3). Then, Gaussian filtersrepresent the belief with the common Gaussian parameters, i.e. the belief isrepresented by two parameters, the mean vector µ and the covariance matrix Σ.The mean µ of these Gaussian filters is directly associated to the state vector x,while the covariance matrix Σ is a matrix with the quadratic dimensionality ofthe state vector x, and formed as described in section 2.2.2 from the mean vec-tor. The covariance Σ represents the degree of uncertainty of the belief for eachparameter and has the advantage to keep the correlation between the differentparameters of the state vector. This representation is known as the momentsparametrization , but exists an alternative representation of this filters known as The mean and co-

variance of a Gaus-

sian distribution

are the first and

second moments of

a probability dis-

tribution. Higher

order moments are

zero for Gaussian

distributions.

The mean and co-

variance of a Gaus-

sian distribution

are the first and

second moments of

a probability dis-

tribution. Higher

order moments are

zero for Gaussian

distributions.

the canonical parametrization which are used for information filters. The trans-formation between canonical and moments parametrization is bijective. In thiswork, canonical representation is not used and will not be explained in detail.

However, Gaussian Filter requires to initiate the parameters of the filter (µand Σ) with an initial estimation and assumes that the posterior of the beliefcan be represented with a unimodal Gaussian distribution, but this assumptiondoesn’t fit with many localization applications where the robot has to localizeit self from the scratch in an unknown environment (e.g. global and kidnappedrobot problems).

In the following sections, the Kalman Filter (KF) and the Extended KalmanFilter (EKF) will be introduced as they are one of the most common and efficientimplementations of Gaussian filters and Bayesian filters in general. The otherimplementations of Gaussian filters, not used in this work, are mentioned withoutgiving many details.

The Kalman Filter - KF

The Kalman filter implements a Gaussian filter for continuous state spaces, dis-crete and hybrid state spaces can not be used with this approximation. Thisimplementation is used as a filter or prediction employing linear Gaussian gener-ative models. Hence, the generative models must be linear systems with addedGaussian noise.

For the motion model, this property is expressed by the following linearexpression:

xt = Atxt−1 +Btut + ǫt (2.29)

Which can be used to calculate p(xt|ut,xt−1) embedding Equation (2.29) in(2.3):

p(xt|ut,xt−1) =1

√

|2πRt|e−

12(xt−Atxt−1−Btut)TR−1

t (xt−Atxt−1−Btut) (2.30)



Where, At and Bt are the matrices which characterize the motion modelwith dimensions nxn and nxm respectively, being n the dimensionality of thestate vector xt and m the dimensionality of the motion vector ut. This represen-tation assumes that the motion model is a linear dynamic system with an addedGaussian noise ǫt characterized by zero mean vector of the same dimensionalityof xt and covariance represented by the matrix Rt.

For the observation model, the linearity is expressed by the following expres-sion:

zt = Ctxt + δt (2.31)

And hence, the measurement probability p(zt|xt) becomes:

p(zt|xt) =1

√

|2πQt|e−

12(zt−Ctxt)TQ−1

t (zt−Ctxt) (2.32)

Where the matrix Ct represents a linear observation model with an addedGaussian noise δt, parametrized with a zero mean vector of the same dimension-ality of zt and covariance Qt.

As described before, Gaussian filters require a initial belief and, in the caseof Kalman filters, that means to give the initial belief as a vector with the initialexpectation of the belief and a covariance matrix which represents the uncer-tainty of the initial expectation. Special care must be taken when initializingthis kind of filters since, a huge initial covariance (i.e. huge degree of initialuncertainty), might lead the filter to diverge and make the covariance matrix tobecome inconsistent making the filter fail in real implementations.

A general algorithm for the Kalman filter is shown in Algorithm 2.2 as apseudo-code algorithm. In that algorithm, given the current mean vector µt−1

and covariance Σt−1, lines 1 and 2 represent the prediction stage of the algo-rithm, using the motion vector ut. The parameters of the transition Gaussiandistribution bel(xt) are µ and Σ. These lines implements the equation (2.30).

Algorithm 2.2: Kalman filter algorithm

Input: µt−1, Σt−1, ut and zt

Output: µt and Σt

/*Prediction stage*/

1 µt = Atµt−1 +Btut;

2 Σt = AtΣt−1ATt +Rt;

/*Correction stage*/

3 S = CtΣtCTt +Qt;

4 Kt = ΣtCTt S

−1;5 µt = µt +Kt(zt −Ctµt);6 Σt = (I −KtCt)Σt;



Then, the posterior bel(xt) is calculated in lines 3 through 6 in the correc-tion stage of the algorithm, incorporating the measurement vector zt. The newparameters of the posterior Gaussian distribution bel(xt) = p(xt|ut, zt) are µ

and Σ. The implementation of Equation (2.32) uses a matrix Kt, called KalmanGain, and specifies the degree in which the new measurement vector is incorpo-rated in the new state estimation. The mean vector is updated using the Kalmangain multiplied by the difference between the received measurement and the ex-pected one (this difference is called the innovation).

The computational complexity of this algorithm depends on the number ofparameters of the state vector and the number of parameters in the measurementvector. The solution is quite efficient and, actually, there are different newversions of the same algorithm which reduce the complexity of the algorithmfor certain sparse updates.

The Extended Kalman Filter - EKF

The extended Kalman filter is an adaptation of the previous Kalman filter, spe-cially designed for non-linear systems which are the most common in real ap-plications. Here, the motion and/or the observation model are supposed to benon-linear and hence linearisation techniques are employed to approximate thebelief into a Gaussian distribution. The difference between KF and EKF isthat, while in KF the Gaussian distribution was exact, in EKF the Gaussiandistribution is approximated. Now, the new generative models are:

xt = g(ut,xt−1) + ǫt (2.33)

zt = h(xt) + δt (2.34)

The matrices At and Bt have been replaced by the non-linear functiong(ut,xt−1), and h(xt) replaces matrix Ct. The linearisation of g(ut,xt−1) andh(xt) is often done with a method called (first order) Taylor expansion, whichconsist to calculate the Jacobian of g(ut,xt−1) at value xt−1 = µt−1 and h(xt)at value xt = µt to approximate both functions:

Gt = g′(ut,µt−1) =∂g(ut,µt−1)

∂xt

(2.35)

H t = h′(µt) =∂h(µt)

∂xt

(2.36)

Then, the approximation of function g and h at point µt and µt respectively is:

g(ut,xt−1) ≈ g(ut,xt−1) +Gt(xt−1 − µt−1) (2.37)

h(xt) ≈ h(xt) +H t(xt − µt) (2.38)

The implementation of the extended Kalman filter is quite similar to thelinear one as shown in Algorithm 2.3.



Algorithm 2.3: Extended Kalman filter algorithm

Input: µt−1, Σt−1, ut and zt

Output: µt and Σt


1 µt = g(ut,µt−1);

2 Σt = GtΣt−1GTt +Rt;

/*Correction stage*/

3 S = H tΣtHTt +Qt;

4 Kt = ΣtHTt S

−1;5 µt = µt +Kt(zt − h(µt));6 Σt = (I −KtH t)Σt;

Other implementations

In addition to the Kalman filter and the extended Kalman filter, there are otherimplementations of the Gaussian filter that in general are lower or equally effi-cient to Kalaman filters (KF and EKF). Some of this implementations are basedon different linearization methods like the moments matching (assumed densityfilter - ADF), which calculates the linearization in a way that preserves the truemean and covariance of the posterior distribution. The problem of this filter isthat, despite the approximation to the true mean and covariance of the poste-rior distribution, it make some assumptions on the moments equations which isinconsistent in general and a bad selection of the number of moments equationmay lead to that inconsistencies as explained in [7].

Another common implementation of the Gaussian filter is known as the Un-scented Kalman Filter (UKF). This algorithm origins in the unscented transformas a method of linearization. The method uses a set of so called sigma pointsin order to sample deterministically the space around the mean of the currentGaussian distribution and assigning weights to these sigma points to predict thenew mean and covariance for each new motion vector received. Then the non-linear function g is then applied to each sigma point checking how this functiong changes the shape of the Gaussian. To compute the predicted observation,the same process of linearization from sigma points is applied to the observa-tion model. The method has the advantage of not requiring to calculate theJacobians of the non-linear functions g and h, and use to provide better resultsthan EKF. The UKF propagates the PDF in a simple and effective way and itis accurate up to second order in estimating mean and covariance [21]. In addi-tion the computational complexity is virtually the same of the extended Kalmanfilter. The draw back of this method is that of the Gaussian filters, they onlymodel uni-modal distributions and hence doesn’t fit with other non-Gaussiandistributions.


2.3.4 Non-parametric filters

Extended Information filters (EIF) presented in [27] are an equivalent imple-mentation of EKF which employ the canonical parametrization of a Gaussian Information filters

(IF) are the equiva-

lent implementation

of KF for canonical

parametrization.

Information filters

(IF) are the equiva-

lent implementation

of KF for canonical

parametrization.

instead the moments representation used by Kalman filters. The canonical rep-resentation is composed by an information matrix Ω = Σ−1 and an informationvector ξ = Σ−1µ. One of the main advantages of this filter is that a completeuncertainty of the initial state can be represented by simply setting the values ofthe information filter to zero. Furthermore, the extended information filter tendsto be more stable numerically than EKF. However, the EIF needs to recover themean value of the moments representation to apply the non-linear functions tothe current state. Despite, the information filter is inefficient compared to EKFfor higher dimensions of the state vector, some solutions have been proposed forthe case where the matrix to be inverted in the information filter are sparse.

Finally, to overcome the problem of multi-modal distributions, some authorshave proposed a semi-parametric, multi-hypotheses algorithms which can dalewith this situations [10, 25]. The multi-hypothesis solution for Kalman filters isknown as the Gaussian Mixture Model (GMM). This solution will be presentedlater as a part of the solution proposed for RO-SLAM.


Unlike previous Gaussian filters, non-parametric filters are not based on a func-tion to describe the probability distribution, instead, this kind of filters use a setof ordered or random samples which cover completely or partially the completestate space by sampling it. Hence the number of parameters of this filters is notfixed, since the algorithm developed can use the number of samples necessary tosolve each particular problem. Moreover, the efficiency of these algorithms relyon the number of parameters/samples used, so that the larger the number ofparameters used, the greater the precision obtained and the greater the amountof computational resources required.

In Figure 2.6, an example of a normal distribution is shown with three dif-ferent representation, the first one (Figure 2.6a) uses the classical continuous 2DGaussian bell (2.2), which is a probability distribution used in Gaussian filters.The second (Figure 2.6b) and third (Figure 2.6c) representations are sample-based representations typical from non-parametric filters.

In the examples shown in Figure 2.6, despite the probabilistic distributionrepresented by these sample-based representations is based on a Gaussian dis-tribution, one of the main advantages of non-parametric representations is itsability to represent multi-modal probability distributions, as is the case of Fig-ure 2.7a, and other non-Gaussian distributions like in Figure 2.7b.



(a) Continuous 2D Gaussian

(b) Grid based Gaussian (c) Particle-based Gaussian

Figure 2.6: Different representations of a 2D Gaussian distribution.

A common improvement of non-parametric filters, which number of param-eters can be selected, is the use of adaptive techniques. Adaptive techniquesin non-parametric filters try to adapt the number of required parameters in theinitialization phase or even during the execution of the algorithm as the distribu-tion of samples concentrates to certain area of the state space in order to reducethe computational resources of the method.

The Particle Filter - PF

As stated in [28], Sequential Monte Carlo methods (also known as particle fil-ters) are used for filtering and smoothing in general state-space models. Thesemethods are based on importance sampling where each sample is called a parti-cle. This method, as other non-parametric methods, approximate the posteriorbel(xt) = p(xt|ut, zt) by a a finite number of particles. The main difference withrespect other non-parametric filters is that, in particle filters, particles are gener-ated randomly from the posterior bel(xt). But, as they are non-parametric, theamount of distributions that can represent is greater than the ones representedby Gaussian filters (even for the case of multi-hypotheses Gaussian filters). As itis a sample-based filter, they are able to model non-linear transformations with-out having to approximate the non-linear model with linearization techniques asis the case of most of Gaussian filters explained above.



(a) Multi-modal distribution

(b) Non-Gaussian distribution

Figure 2.7: Typical probability distributions in non-parametric filters

Particles are a set Pt of concrete instantiations of the state at a particularinstant t, and they are denoted as:

Pt = p[1]t ,p

[2]t , · · · ,p[N ]

t (2.39)

Where N is the number of particles which use to be a large number higherthan 500. With this representation, particles p

[n]t can be seen as hypotheses of

the state of the true world at time t. The likelihood of a particle is representedby a weight ω

[n]t associated to each particle p

[n]t , then they must sum up to 1

(i.e.∑N

n=1 wnt ) = 1. The weight for a state hypotheses xt to be included in the

particle set Pt should be proportional to the posterior bel(xt):

p[n]t ∼ p(xt|zt,ut) (2.40)

Hence, when the density of hypotheses p[n]t in a region of the state space is

very high, then the real state must fall into this region.

The first particle filter implementation [17], which is based on the SequentialImportance Resampling method (SIR), is presented in Algorithm 2.4. In thisalgorithm the initialization stage has been omitted. The initialization stage



consist on initiate N particles with a prior distribution which is assumed to besimilar to the real one.

Algorithm 2.4: General particle filter algorithm

Input: Pt−1, ut and zt

Output: Pt


1 for all pnt in Pt−1 do

2 Draw p[n]t ∼ p(pt|ut,p

[n]t−1);

/*Importance factor*/

3 ωt[n] = p(zt|p[n]

t );

4 endfor

/*Normalize weights*/

5 ω[n]t = ωt

[n]/∑N

i=1 ωt[i]

/*Resampling*/

6 for i = 1 · · ·N do

7 Draw in Pt according to p(p[i]t = p

[j]t ) = w

[j]t for j = 1 · · ·N ;

8 end for

The first part of the algorithm is the prediction stage, and is aimed to drawa new set of particles Pt of the same size than particle set Pt−1, according to the

probability p(pt|ut,p[n]t−1). This prediction stage consist mainly on the application

of the motion model to each particle of the set Pt−1, using the new motionvector ut. The correction stage is divided into two parts, the calculation ofthe importance factor and the step of importance sampling or more commonlyknown as resampling step .The resampling

step is the most

important step of

the particle filter

algorithm.

The resampling

step is the most

important step of

the particle filter

algorithm.

The first step of the correction stage, known as the calculation of the im-portance factor, is mainly focused on the update of weights ω

[n]t of each particle

with the probability p(zt|p[n]t ). Then, the new weights must be normalized in

order to apply the importance sampling step. Once the predicted belief bel(xt)and the weights are updated according to the observation model of the measure-ments, the resampling step calculates the posterior bel(xt) = ηp(zt|p[n]

t )bel(xt)by drawing a new set of random particles Pt with size N . The basic sequentialimportance resampling (SIR) method consist on choosing M random numbersand selecting those particles which correspond to these random numbers. Theresulting particle set usually posses many duplicates since the particles are drawnwith replacement, as line 7 of the Algorithm 2.4 shows. Different implementa-tions of the resampling step are presented in [30] which improve the variance ofthe weights making the filter more robust to different situations.


2.4 Summary and conclusions

Finally, at the end of each iteration, the expected state can be calculated asfollows:

xt = E[bel(xt)] = E[Pt] =∑

n=1

Nω[n]t p

[n]t (2.41)

Other implementations

In addition to the particle filter, a set of different approaches can be found inthe literature. One of these algorithms is the histogram filter, the histogramfilter decompose the state space into finitely many regions and represent thecumulative posterior for each region by a single probability value. When appliedto finite states spaces, the filter is known as discrete Bayes filter, when appliedto continuous state spaces the algorithm is called histogram filter.

An example of discrete Bayes filter, is the occupancy grid map algorithm[15]. This algorithm decompose the the finite state into regions which value takevalues in a binary domain. When the state does not change its state over time,there is another algorithm called binary Bayes filter with static state [30].

The histogram algorithm is a continuous state estimator which, as describedabove, decompose the state space into a finite set of regions. Hence the accuracyof this method depends on the granularity employed to make the decompositionof the state. For this reason, multiple decomposition methods are proposed in theliterature [30], where one of the most adaptative and efficient methods is knownas the density tree decomposition. The advantage of dynamic decompositionmethods such as the density tree decomposition is that they are to achieve ahigher approximation quality with the same number of regions used in a staticdecomposition method. In histogram filters, each cell of the grid has associated auniform probability distribution, similar to the weight associated to each particlein the particle filter, so that each state value has a probability p(xt) =

pk,t|xk,t|

,

where pk,t is the probability of a region of the histogram and |xk,t| is the volumeof the region. An example of a Gaussian distribution formed with an histogramfilter is depicted in Figure 2.6b.


Probabilistic robotics offers a different solution from classical perception tech-niques which are based on methods to extract the optimal solution from a setof observations and motion commands. As the dynamic and observation modelsemployed are not exact, probabilistic robotics treat robotics systems as stochas-tic processes where motion and observation information are modelled with anadditive noise which usually follow a zero mean normal distribution. The al-gorithms of a probabilistic robotics are mainly Bayes estimators based on theBayes rule. The basis of Bayes rule as well as other basic probabilistic terms andrules have been introduced in this chapter. Another important term introduced



in this chapter is the Markov assumption. The Markov assumption is a charac-teristic by which the current state is considered complete, i.e. other past eventsand states are summarized in the current state and hence future states can beestimated from this state. Process which follow this assumption are known asMarkov chain processes.

The basis of probabilistic theory, and specially the Bayes rule are the maintools employed in Bayes filters as shown in this chapter. The chapter describesa generic Bayes filter from which all probabilistic algorithms considered in thiswork are based. This generic algorithm is composed by two stages, which arethe prediction stage and the correction stage. The prediction stage computesa prior distribution of the state belief bel(xt) = p(xt|ut,xt−1), employing themotion information received ut and the previous state xt−1. On the other hand,the correction stage is aimed to incorporate the measurements information attime t into the filter from the prior distribution so that the posterior is bel(xt) =p(zt|xt)bel(xt).

Based on this generic Bayes filter, the chapter introduce some Bayesian filters,divided into Gaussian filters and Non-parametric filters. As commented in thischapter, Gaussian filters are nowadays one of the most implemented approachesbecause of the intrinsic nature of motion and measurement models, which useto follow a Gaussian distribution. The Kalman Filter (KF) for linear models,and its extended version (EKF) for non-linear models, have been described asan example of Gaussian filters. These implementations has the main advantageof being an efficient Bayesian filter compared with other non-parametric filterswhich doesn’t scale very well with the size of the state vector. Other Gaussianfilters are also mentioned without giving many details.

Although Gaussian filters usually fit well with most observation and dynamicmodels, there are other models which doesn’t follow a Gaussian distribution and,instead, they follow a multi-modal distribution which might be modelled witha non-parametric filter. For non-Gaussian distribution and other multi-modalGaussian distributions a particle filter is described in this chapter as an exampleof the most used, non-parametric filter. This filter adapts better to these modelsand might perform better than other solutions when employing a high density ofparticles. The random characteristic of this particle filter make it more adaptivethan other non-parametric distributions but, at the same time, this randomnessmight cause some issues related with the variance of the distribution in somesituations as described in [30].

In this work, two approximations are used, one of them uses a particle filterto model the multi-modality of the range-only observation model, and the otheris based in a semi-parametric model called Gaussian Mixture Model approachwhich is on multiple hypotheses Gaussian filters.


Documents

Chapter 2 Basis of probabilistic roboticsbibing.us.es/proyectos/abreproy/70437/fichero/4_Capitulo... · Chapter 2 Basis of probabilistic robotics T his chapter will introduce the