Auton RobotDOI 10.1007/s10514-014-9386-z
Feature based graph-SLAM in structured environments
P. de la Puente D. Rodriguez-Losada
Received: 19 November 2012 / Accepted: 24 January 2014 Springer Science+Business Media New York 2014
Abstract Introducing a priori knowledge about the latentstructure of the environment in simultaneous localization andmapping (SLAM), can improve the quality and consistencyresults of its solutions. In this paper we describe and analyzea general framework for the detection, evaluation, incorpo-ration and removal of structure constraints into a feature-based graph formulation of SLAM. We specifically showhow including different kinds and levels of features in a hier-archical manner allows the system to easily discover newstructure and why it makes more sense than other possi-ble representations. The main algorithm in this frameworkfollows an expectation maximization approach to iterativelyinfer the most probable structure and the most probable map.Experimental results show how this approach is suitable forthe integration of a large variety of constraints and how ourmethod can produce nice and consistent maps in regular envi-ronments.
Keywords Mobile robots Mapping SLAM Structured environments
Most human designed scenarios present regular geometricalproperties that should be preserved in the maps built and usedby mobile robots aiming at performing safe autonomous orsemi-autonomous navigation. If some information about thelayout characteristics of the operation environment is avail-
P. de la Puente (B) D. Rodriguez-LosadaETSI Industriales, Universidad Politecnica de Madrid,c/Jose Gutierrez Abascal, 2, 28006 Madrid, Spaine-mail: email@example.com
D. Rodriguez-Losadae-mail: firstname.lastname@example.org
able, its proper representation and application may lead tomuch more meaningful models and significantly improvethe accuracy of the resulting maps. Indeed, when simulta-neous localization and mapping (SLAM) is formulated ina Bayesian framework (Thrun et al. 2005), making use of aprior for the environment based on its structure fits very well,appearing as a need that comes up naturally (de la Puente andCensi 2012). Also, human cognition exploits domain knowl-edge to a large extent, usually employing a priori assumptionsfor the interpretation of situations and environments.
There are several different probabilistic SLAM techniquesto which one can apply this perspective. This article con-cerns the so-called graph-based approach, which has recentlybecome very popular (Olson et al. 2006; Grisetti et al. 2007b;Olson 2008; Grisetti et al. 2010b; Kmmerle et al. 2011;Kaess et al. 2008). Other SLAM solutions can be found inthe literature (Durrant-Whyte and Bailey 2006; Bailey andDurrant-Whyte 2006).
In general, the graph formulation of SLAM seeks to find amaximum probability configuration for a sequence of robotposes and a set of observed features (Grisetti et al. 2010a).Several previous methods (Olson et al. 2006; Grisetti et al.2007b, 2009), apply a marginalization of the features andleave only the trajectory, with robot poses represented asnodes in a graph and constraints between those nodes beingconstituted by rigid body transformations and their associ-ated covariances. Some constraints are usually created fromodometry data, while reobserving features in the environ-ment allows constraints between non successive poses to beadded. The generation of graph constraints is referred to asthe SLAM front-end, while the subsequent minimization ofthe error of those constraints is known as the SLAM back-end.
Recently, it has been noted that the marginalization offeatures may not be the best option for dealing with prior
information (Trevor et al. 2010; Parsley and Julier 2011).Particularly, this kind of approaches are not well suited forhandling or exploiting the possible existence of structure inthe environment. In this paper we study how to manage therepresentation of structure in a hierarchy of nodes and con-straints, with different levels of abstraction, marginalizingout poses instead of features. Only the last robot pose andthe features belong to the graph. Poses are marginalized outat each iteration, with some approximations that work wellin practice.
In the paper, we show some advantages of this new rep-resentation method and present an account of the featuresand constraints we use, with some hints on several problemsthat may be encountered. The nodes in our graph are featurescreated at different levels. Structure constraints are detectedfrom the current state and a graph minimization is performediteratively, following a novel expectation maximization (EM)(Dempster et al. 1977) based approach. These constraints areweighted according to their reliability and constraints con-sidered incorrect can be detected and removed. With thisperspective, the SLAM front and back-end stages can bene-fit from each others results. These are the main contributionsof the paper.
The paper is organized as follows. After discussing relatedwork in Sect. 2, the mathematical basis and the derivation ofour EM based method to incorporate structure are explainedin Sect. 3. Then, in Sect. 4, we describe the features andconstraints we use and present our method for hierarchicallyrepresenting the structure of the environment within a fea-ture based graph SLAM approach. In Sect. 5 we describe theincremental graph building process and the structure detec-tion algorithms that we use to create structure features andconstraints, along with the outline of the proposed generalmapping algorithm. Sect. 6 contains our experimental results,comparisons and analysis and, finally, in Sect. 7, we summa-rize our conclusions and future work.
2 Related work
Using prior knowledge about the structure of the environ-ment seems to be promising in robotics. For path planningin unknown but structured environments, Dolgov and Thrun(2008) propose a Markov-random-field model to detect thelocal principal directions of the environment. They extractlines using computer vision morphological operations andthe Hough transform, but their approach is independent ofthe method used to obtain the features. In the SLAM context,depending on the underlying estimation technique, differentapproaches have been followed. Chong and Kleeman (1997)apply collinearity constraints to enhance the state estimationgiven by an Unscented Kalman Filter. Rodrguez-Losada etal. (2006) reduce the linearization errors introduced by the
extended kalman filter (EKF) by the imposition of parallelismand orthogonality constraints between segments within theSPMap (Castellanos et al. 1999) framework. Nguyen et al.(2007) build accurate simplified plane based 3D maps witha fast and lightweight solution based on parallelism and per-pendicularity assumptions. Enforcing a priori known relativeconstraints also leads to consistency and efficiency improve-ments for particle filters, as shown by Beevers and Huang(2008).
The use of prior knowledge in graph-based approacheshas been exploited by Kmmerle et al. (2011) to achieveglobal consistency of outdoor maps by introducing associ-ation constraints for the correspondences detected between3D scans and publicly available aerial images. Also, Karg etal. (2010) propose a solution for SLAM in structured multi-story buildings, adding global alignment constraints betweencorresponding places at different floors.
Parsley and Julier (2010) further study the challenges ofemploying prior information in SLAM. Their work presentssome similarities to ours, as they also consider a hierarchy ofstructure elements, yet theirs is oriented towards part-of rela-tionships and not as much towards pure geometrical relationsbetween features. They propose a probabilistic framework totransfer information between different map representations,for the utilization of generic prior knowledge. Their imple-mentation is EKF based.
Trevor et al. (2010) take an approach closely related toours, too. They propose to incorporate virtual measurementsto create constraints between features in a graph, employ-ing the square root SAM algorithm of Dellaert (2005) andthe M-space feature representation proposed by Folkessonet al. (2007). We deem our framework more flexible, as itintegrates the iterative detection, hierarchical representationand rearrangement of structure. Their tests and conclusionssuggest that several sensors or preprocessing techniques areneeded in order for the alignment of points to be inferred,whereas our system can work with data from a single sensoror from different sensors, with high level features or withdata not preprocessed at all. The hierarchical representa-tion method allows different types of structure elements tobe added or removed at different levels. Furthermore, thismethod allowed us to develop quite a larger variety of con-straints and to conduct more complex experiments, which inturn revealed some possible problems that to our knowledgehave not been tackled before.
A more recent work by Parsley and Julier (2011) alsopresents important similarities to the approach we propose.Their method integrates a prior map into graph SLAM, withdata association between features in each map inferred bymeans of an approximation to EM. This work is anotherone of the very few which consider the need to establishconstraints between features. The main difference of bothapproaches is that we detect the high level features defining
the structure by estimating a hierarchy of structure elements,without needing a prior map. We believe there is value in thiscontribution, as a prior map may not always be available inSLAM and it may not be needed if the environment is indeedstructured. They present one experiment in which planes areobtained from point clouds and constrained to their counter-parts in the prior map. The key novelty of our approach liesin the model selection process over a wide set of priors byusing EM. Actually, the method we propose could be usedfor creating different kinds of prior maps.
The standard graph-based approach originated with thework by Lu and Milios (1997). They presented a brute-forcemethod for globally refining a map by minimizing the errorin a network of pose relations. Thrun and Montemerlo (2006)included landmarks in the graph, together with the robotposes, and proposed applying variable elimination to mar-ginalize out the features. Since then, both the construction(front-end) and the minimization (back-end) of the graphhave been addressed in different ways.
Olson et al. (2006), for instance, developed an efficientstochastic gradient descent based optimization engine thatallowed large pose graphs to be corrected even with poorinitial estimates. Grisetti et al. (2007b) presented an exten-sion of this method, introducing a novel tree parametriza-tion of the nodes. Later, in more recent work, Grisetti et al.(2010b) solved the minimization system with a sparse solverpackage named CSparse (Davis 2006), which is used in thiswork too. Olson (2008) also proposed a front-end based on arobust place recognition algorithm. Kuemmerle et al. (2011)introduced g2o, an open-source C++ general and efficientframework for optimizing graph-based nonlinear error func-tions. Their approach can be easily applied to a large vari-ety of problems, including 2D SLAM with landmarks, andit allows the user to interchange different solvers. Carloneet al. (2011), recently proposed and theoretically analyzeda linear approximation for graph-based SLAM with robotposes.
The full SLAM problem is equivalent to the Structure fromMotion problem studied by the computer vision researchcommunity. In this context, Gallup et al. (2010) use piece-wise planar priors for 3D scene reconstruction in the presenceof non-planar objects. They apply a minimization approachinvolving a data term and a smoothness term, where a classi-fier learned from hand-labeled planar and non-planar imageregions is employed. In general, the concept of priors inSLAM is highly related to the concept of regularizers inoptimization. Newcombe et al. (2011), for instance, pro-pose a regularized cost that promotes smooth reconstructionin dense visual SLAM. In a novel approach to mapping withdense data, the prior is a regularization required to make theproblem well posed (de la Puente and Censi 2012). The workpresented in this paper does not use a prior fully specified inadvance. Yet, once the structure is inferred at each iteration,
its contribution to the error function could be considered sim-ilar to a regularizer.
3 Structure and map inference withexpectation-maximization (EM)
Our graph notation is similar to that used by Olson et al.(2006) and Grisetti et al. (2007b):
x is the state vector representing the nodes. A function f (xi , x j ) = fi j (x) represents the constraint
equations between nodes i and j with expected valuesui j and variances Pi j .
The error value at a given estimate for xi and x j is definedas ei j (x) = fi j (x) ui j .
The cost, or negative log probability of the node positions,in vectorial form, is given by:
ln p(x) eT(x)P1e(x) (1)In our case, we propose to distinguish between two dif-
ferent types of constraints:
We denote the error values due to structural constraintsimposed between features i and j by esi j and the corre-sponding covariance by Psi j .
The rest of the constraints come from the measurements.The corresponding error values will be denoted ezi j , withcovariance Pzi j .
A prior for the structure of the environment could be givenin advance, e.g. the points are aligned and the lines are eitherparallel or orthogonal to each other; or, the points shouldbelong to circles and the circles ought to be concentric; andso on. These priors could be defined in different ways; a goodchoice would be to use a domain-specific language that thesystem could interpret, as proposed by de la Puente and Censi(2012) for a general approach based upon maintaining a com-mon dense representation close to the measurements spacewhile maintaining the expressivity of methods employinggeometric primitives.
With the graph approach and notation described earlier,considering that the involved densities are Gaussian, it seemsintuitive that the cost function to minimize, put in vectorialform as in Eq. 1, would be:
eTz (x)P1z ez(x) + eTs (x)P1s es(x) (2)
However, the prior for the structure may not always beexplicit and it may rather be inferred from the estimate forthe state x .
In a Bayesian context, we will denote the measurementsz and the structure s. The measurements will be modeled by
the corresponding constraints. The structure will be modeledas a collection of virtual features and constraints (see Sect. 4),with a weight associated to each constraint depending on thecurrent state of the features.
We aim at determining the most probable state x consid-ering both the measurements, z, and the unknown structureof the environment, s. Empirical Bayes techniques (Carlinan...