18
Robotics and Autonomous Systems 52 (2005) 229–246 A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences Omar Ait Aider, Philippe Hoppenot , Etienne Colle Laboratoire Systemes Complexes, Univesity of Evry, 40 rue du Pelvoux, 91020 Evry Cedex, France Received in revised form 4 March 2005; accepted 21 March 2005 Available online 23 May 2005 Abstract A model-based method for indoor mobile robot localization is presented herein; this method relies on monocular vision and uses straight-line correspondences. A classical four-step approach has been adopted (i.e. image acquisition, image feature extraction, image and model feature matching, and camera pose computing). These four steps will be discussed with special focus placed on the critical matching problem. An efficient and simple method for searching image and model feature correspondences, which has been designed for indoor mobile robot self-location, will be highlighted: this is a three-stage method based on the interpretation tree search approach. During the first stage, the correspondence space is reduced by virtue of splitting the navigable space into view-invariant regions. In making use of the specificity of the mobile robotics frame of reference, the global interpretation tree is divided into two sub-trees; two low-order geometric constraints are then defined and applied directly on 2D–3D correspondences in order to improve pruning and search efficiency. During the last stage, the pose is calculated for each matching hypothesis and the best one is selected according to a defined error function. Test results illustrate the performance of this approach. © 2005 Elsevier B.V. All rights reserved. Keywords: Mobile robot localization; Visual feature matching 1. Introduction In conjunction with development of the service robotics concept, such as cleaning robots and disabled- person assistance, we have witnessed over the past few Corresponding author. Tel.: +33 1 69 47 75 61; fax: +33 1 69 47 06 03. E-mail addresses: [email protected] (O.A. Aider), [email protected] (P. Hoppenot), [email protected] (E. Colle). years an extension in the field of mobile robot applica- tions to indoor environments (e.g. offices, homes). Au- tonomy becomes a key issue when considering such an extension of potential uses. In this context, autonomy refers to the ability to perform navigational missions, explore the environment and attain goals without hu- man intervention all while avoiding obstacles. It is nei- ther elegant nor practical to equip or modify the home and office environments, as opposed to solutions de- signed for the industrial environment. Autonomy can 0921-8890/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.robot.2005.03.002

A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

Embed Size (px)

Citation preview

Page 1: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

Robotics and Autonomous Systems 52 (2005) 229–246

A model-based method for indoor mobile robot localization usingmonocular vision and straight-line correspondences

Omar Ait Aider, Philippe Hoppenot∗, Etienne Colle

Laboratoire Systemes Complexes, Univesity of Evry, 40 rue du Pelvoux, 91020 Evry Cedex, France

Received in revised form 4 March 2005; accepted 21 March 2005Available online 23 May 2005

Abstract

A model-based method for indoor mobile robot localization is presented herein; this method relies on monocular vision and usesstraight-line correspondences. A classical four-step approach has been adopted (i.e. image acquisition, image feature extraction,image and model feature matching, and camera pose computing). These four steps will be discussed with special focus placedon the critical matching problem. An efficient and simple method for searching image and model feature correspondences,which has been designed for indoor mobile robot self-location, will be highlighted: this is a three-stage method based onthe interpretation tree search approach. During the first stage, the correspondence space is reduced by virtue of splitting thenavigable space into view-invariant regions. In making use of the specificity of the mobile robotics frame of reference, the globalinterpretation tree is divided into two sub-trees; two low-order geometric constraints are then defined and applied directly on2D–3D correspondences in order to improve pruning and search efficiency. During the last stage, the pose is calculated for eachm ormance oft©

K

1

rp

f

he

ica-Au-ch anomyns,hu-ei-mede-

can

0

atching hypothesis and the best one is selected according to a defined error function. Test results illustrate the perfhis approach.

2005 Elsevier B.V. All rights reserved.

eywords:Mobile robot localization; Visual feature matching

. Introduction

In conjunction with development of the serviceobotics concept, such as cleaning robots and disabled-erson assistance, we have witnessed over the past few

∗ Corresponding author. Tel.: +33 1 69 47 75 61;ax: +33 1 69 47 06 03.

E-mail addresses:[email protected] (O.A. Aider),[email protected] (P. Hoppenot),[email protected] (E. Colle).

years an extension in the field of mobile robot appltions to indoor environments (e.g. offices, homes).tonomy becomes a key issue when considering suextension of potential uses. In this context, autonrefers to the ability to perform navigational missioexplore the environment and attain goals withoutman intervention all while avoiding obstacles. It is nther elegant nor practical to equip or modify the hoand office environments, as opposed to solutionssigned for the industrial environment. Autonomy

921-8890/$ – see front matter © 2005 Elsevier B.V. All rights reserved.doi:10.1016/j.robot.2005.03.002

Page 2: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

230 O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246

indeed be obtained if one important task is correctlyaccomplished: self-localization.

Localization entails positioning the robot relativeto its environment during navigational sessions.More formally, it indicates computing and updatingthree parameters (x, y, θ), which, respectively, definethe orientation and position of the robot within theenvironmental frame. While GPS is considered as thebest localization system for outdoor applications, noequivalent system has yet to be identified for indoorlocalization[1].

Researchers have devised a variety of techniques formobile robot positioning. Solutions can be categorizedinto two groups: relative localization (dead-reckoning)and absolute localization. Relative localization utilizesinformation from proprioceptive sensors as odometerreadings. Dead-reckoning algorithms are simple andfast; some factors, however, such as slippage, cause er-ror to increase and locational uncertainty to rise. Dead-reckoning therefore proves insufficient. Absolute lo-calization is based on exteroceptive sensor informa-tion. These methods yield a stable locating error butare more complex and costly in terms of computationtime. One very popular solution for achieving online lo-calization consists of combining the relative and abso-lute methods. Relative localization is used with a highsampling rate in order to maintain the robot pose up-to-date, whereas absolute localization is applied peri-odically with a lower sampling rate to correct relativepositioning misalignments.

en-v e tom thata shed.T s area met-r arec ei-t el)o s-i nics(

tionc thee heg thel ings

- image acquisition from current robot position;- image segmentation and feature extraction;- matching 2D-image and 3D-model features;- camera pose computation.

It is observed that model-based localization requiresa succession of several processing steps and that eachstep may be time-consuming due to the large amountof data involved. The strategy ultimately adopted foreach phase must then be very well-assessed for bothonline and real-time use.

The matching step represents one of the most im-portant and complex phases in model-based localiza-tion; it entails determining the correct correspondencebetween image features and model features. An alter-native approach to avoid the matching stage duringon-line navigating could be the use of pattern track-ing (an example can be found in[4]) but most ofthe trackers use locally planar (or nearly planar) pat-terns of the environment. Consequently, when navigat-ing in 3D environments the changes in the pattern ap-pearance and occlusion occurrences may cause track-ing failure. Another approach presented in[5] con-sists in using only unique features to model the en-vironment. These features are extracted from the vi-sual scene of the robot using Kernel Principal Com-ponent Analysis. Unfortunately, finding a sufficientnumber of such features seems very hard especiallyin presence of noise and illumination variations. Inthis case, the border between two features becomesf telyn rob-l itiond b-l thei og-n keni intot thintg ch[ for-m form[

sedr archh Onep

As regards absolute localization within indoorironments, map-based approaches naturally comind. In the large majority of cases, it is assumedmap (model) of the workspace has been establihree main types of environmental representationvailable: topological maps, grid maps and the geoic primitive model. The latter two representationsalled metric maps. The environmental model canher be pre-stored (architect’s drawing or CAD modr built online simultaneously with the localization u

ng sensor data fusion or structure from motion techSLAM) as in[2].

The classical approach to model-based localizaonsists of matching the local representation ofnvironment built from sensor information with tlobal model map. When using monocular vision,

ocalization process is composed of the four followtages[3]:

uzzy. Thus, the matching phase remains absoluecessary. Most methods designed to treat this p

em have been developed for the object-recognomain[6]. In the field of mobile robotics, this pro

em is equivalent to considering local parts ofndoor environment model as objects to be recized; however, certain particularities must be ta

nto account. Matching methods can be classifiedwo groups: methods that search for a solution wihe “correspondence space” such as alignment[7,8],eometric hashing[9] or interpretation tree sear

10,11]) and methods that search within the “transation space” such as the generalized Hough trans

12].A considerable body of work exists on model-ba

obot localization using visual sensors. Some reseas focused specially on the matching problem.opular approach is described by Kosaka and Kak[13]

Page 3: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246 231

and by DeSouza and Kak[3]. In these works, both robotorientation and position are updated from an initial set-ting, in accordance with the navigational commands.The new robot position and orientation obtained arethen used to project model features on the image planeand compare them with the actual image being viewed.Uncertainty on the new robot position and orientationis propagated to predict uncertainty zones in the image,where a search for corresponding features is required.Uncertainty propagation is performed using covariancematrix propagation by means of the Jacobian matrixand Mahalanobis distance and by assuming local lin-earity of the command and projection equations. Thecorrect correspondences are selected through apply-ing the nearest neighbor rule. Once matching has beenachieved, the pose is corrected using Kalman filtering.Due to the dimensions of sensed features as well asto perspective effects, the uncertainty propagation pro-cess may produce oversized zones on the image plane.Since uncertainty grows as the distance browsed be-tween two image acquisitions increases, this approachnecessitates a high sampling rate. Moreover, it must berecalled that not every image acquisition provides suf-ficient information to accurately update the robot pose.An alternative approach has been adopted by Talluriand Aggarwal[14]. In this work, the localization prob-lem is handled by introducing the notion of Edge Vis-ibility Regions, which are equivalent to aspect graphswithin the object-recognition domain. These regionsare used to restrict the matching process to a smalls eta-t oball ormi re-g toh on-v -l earchi izest n isi ead-r rainta ed ont andi rre-s archt reals an-

tage of model-based localization is in fact the requiredminimum number of distinguishable features eligibleto be used for matching. Eliminating some of the envi-ronmental features would not seem to be a wise strat-egy, especially when the reliability of a matching resultis correlated with the number of correspondence pairs.

It thus appears that a method not requiring a highsampling rate would be preferable. In some cases, itmay actually prove necessary to delay application ofabsolute localization to the subsequent image acquisi-tion, thereby increasing pose estimation uncertainty. Itis not guaranteed that a sufficient number of viewedfeatures would be contained at each image acquisi-tion. In addition, mobile robots are generally multitasksystems and localization is not the only process run-ning during navigation. For some image acquisitions,therefore, system resources may already be allocated.Consequently, the periodicity of absolute localizationis variable, which necessitates developing a localiza-tion method less sensitive to pose estimation quality.

In this paper, an incremental model-based local-ization method using monocular vision will be pre-sented. The classical four-step general approach ofmodel-based localization, as discussed above, has beenadopted. The most important contribution herein is thedevelopment of an effective strategy for resolving thecrucial problem of finding the correct match betweenimage and model features. The aim of the approachderived is to preserve matching efficiency without in-creasing the sampling rate of absolute location opera-t

on-d tiallyk enti tch-i full-p odelp agea tiona ls aspm edf am-e a full-p beeno

d int rep-

ubset of features in the global model; an interprion tree search method is then applied. For the glocalization problem, a generalized Hough transfs implemented to select a small set of candidateions. The investigation is unfortunately restrictedorizontal features representing rooftop edges. Cersely, Munoz and Gonzalez[15] used only verticaline correspondences and an interpretation tree sn order to determine the interpretation that maximhe number of correspondence pairs. A restrictiontroduced on the global model edges thanks to deckoning pose estimation. The geometric constdded to eliminate false correspondences is bas

he order of model vertices stored in a circular listnvolves identifying the order of appearance of coponding segments on the image. Confining the seo a particular category of segments constitutes ahortcoming in the indoor application. One disadv

ions.The method is based on straight-line corresp

ences. The environment is assumed to be parnown (modeled), which implies that the environms constructed offline and pre-stored. Both the mang and pose-computing methods are based on aerspective camera model. The equations that merspective projections of 3D features on the imre non-linear. A large number of object-recognipplications make use of simplified camera modeara-perspective and weak-perspective models[16]. Inobile robotics applications, the dimension of view

eatures is high in comparison with distance to the cra focal center. It then becomes necessary to useerspective model. A pinhole camera model hasbtained by means of offline calibration.

The global localization process will be presentehe next three sections. The indoor environment

Page 4: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

232 O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246

resentation used in the method will be developed inSection2. The model contains the main parts of theenvironment that are not supposed to move, such aswall and door edges, or whose movements are knownand cannot be due to randomness, e.g. heavy pieces offurniture. Furthermore, the navigation plane has beensplit into View-Invariant Regions (or VIRs)[17] in or-der to improve efficiency by taking the occlusion phe-nomenon into account during the matching process, aswill be shown below. The construction of VIRs willbe briefly developed in this section as well. The algo-rithm generated to solve the critical feature-matchingproblem will then be discussed in detail in Section3.The method chosen is based on the interpretation treesearch approach, with the originality lying in the sub-division of the correspondence space using a constrainton the orientation of model edges and image features.Two interpretation trees are then obtained and the di-mension of the space to be analyzed is reduced expo-nentially. Another contribution of this work consists ofdefining low-order geometric constraints, which are di-rectly applicable on 2D–3D correspondences, in orderto prune these interpretation trees. In addition, someof the restrictions on existing methods due to singu-lar model edge configurations have been eliminated.Lastly, the camera pose-computing method based onnumerical optimization has been simplified by takingthe specific configuration of our robot into account.This method is then evaluated with both synthetic dataand real images. Results, presented in Section4, in-d on isd effi-c theq za-t d en-a oore

2

theg ap-p agesi cer-t t ise . Ex-i iven

set of 3D–2D correspondences) generally use point-or straight-line correspondences. Image features resultfrom image segmentation into contours which corre-spond to physical elements within the indoor workspace, such as edges constituted by intersections be-tween flat surfaces. These edges tend to be straight seg-ments. Lines are easier to extract from contour imagesand their characterization, by means of polygonal ap-proximation, is reliable even in the presence of noise.Partial occlusion (due to either the view angle or thepresence of non-modeled objects) does not affect linerepresentation parameters; using straight lines wouldtherefore seem to be more judicious.

Indoor environments are manmade and semi-structured. This kind of environment can generally beviewed as a polygonal navigable space limited by ver-tical planes (walls). The environment can then be sim-ply represented by a wire frame model composed ofa set of line segments corresponding to the edges ofthese planes. The model is fixed and built offline; giventhe capability of camera pose estimation in a semi-structured environment (e.g. indoor environments), itis relatively straightforward to predict which parts willbe sensed and then accurately describe these parts. Thisprocedure allows avoiding online map construction,which necessitates major processing of 3D reconstruc-tion from stereovision or image sequences and intro-duces noise on model feature parameters. The 3D en-vironment model is thus merely a wire frame represen-tation comprising a set of straight segments.

nta-t e-n ; thiss w-i aphsu ilerl o-n osi-t ingp ileS -I i-a bseto (seeF

hasb

icate that the correct correspondence combinatierived in more than 98% of cases with greatiency and that the method is less sensitive touality of the initial pose estimate. Moreover, locali

ion accuracy satisfies contextual requirements anbles achieving a navigational session within an indnvironment.

. Indoor environment modeling

The model adopted for the present work is ofeometric feature type. This type is well-suited tolications using visual sensors. One of its advant

s concision and the possibility of representing unainty and frame changing in order to predict whaxpected to be seen from a predicted camera pose

sting camera pose-computing methods (for a g

One disadvantage with the wire frame represeion is its inability to incorporate the occlusion phomenon. The model appears to be transparentituation is managed by applying the notion of vienvariant regions. This adaptation of the aspect grsed in object recognition to the domain of mobobotics has been proposed by Guibas et al.[18]; Tal-uri and Aggarwal[14] suggested that the polygal navigable space (which defines all possible p

ions of the camera) is clustered into non-overlappolygons, called Edge-Visibility Regions (EVR), whimsarian et al.[17] introduced the notion of View

nvariant Regions (VIR), with a visibility list assocted to each region. Each list is created from the suf segments visible from the corresponding regionig. 1) within the global environment model.

The method employed to construct the VIRseen well-explained in Simsarian et al.[17]. Beginning

Page 5: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246 233

Fig. 1. Simple example of view-invariant regions.

with a global polygon, the principle herein is to firstdefine the region boundaries. Each boundary splits thepolygons it crosses into two new polygons. Crossinga boundary causes either the disappearance or appear-ance of at least one model edge. The starting assump-tion is that the occlusion phenomenon results from theposition of the camera relatively to the convex verticesof the environment. Thus, VIR boundaries are definedas follows: (i) for a vertical convex verticeLc in the en-vironment map, trace all the straight lines defined by theprojection ofLc on the ground plan and the projection ofeach vertical edge (or extremity of non-vertical edge)as shown inFig. 1. (ii) Erase the part of the straightlines situated between the projection ofLc and thoseof the vertical edges. (iii) Repeat the previous steps forall the convex vertices of the environment. Remainingstraight-line segments are the VIRs boundaries.

As range of orientations (from which the edge couldbe viewed) is associated with each segment of eachvisibility list, the way to determine the orientation range[φmin, φmax] is shown inFig. 2:

F dgei

- φmin = min(φ1, φ2, . . .)- φmax= max(φ1, φ2, . . .)

where (φ1,φ2, . . .) are the view angles of the edge fromeach VIR corner.

The same reasoning may be applied to a non-verticaledge using the projections of its extremities on the hor-izontal plane.

3. Straight-line matching strategy

In this section, we will focus on the critical phase ofidentifying the correct match between the set of imagelines and model lines. The most popular method is theinterpretation tree search using point correspondences,as presented by Grimson and Lozano-Perez[10]. Mur-ray and Cook[19] developed a variant using line cor-respondences. Considering a set ofmmodel segmentsL ={L1, L2, . . ., Lm} and a set ofn image linesl ={l1,l2, . . ., ln}, the interpretation tree represents all possi-ble combinations ofL andl element couples. The the-oretical number of possibilities (in recognition of thefact that one or more model lines may not be presentin the image due to occlusion or non-detection dur-ing contour extraction) is (n+ 1)m. Yet, a considerableportion of the tree never gets examined. Some geo-metric constraints are introduced in order to prune thetree. The most commonly used constraints are unaryand binary. Unary constraints must be satisfied by eachL pec-ia sti-m icc ss int -p odels re in2 linem isiont toi cessb thet-i thei aringt ingc e un-f osts

ig. 2. Definition of a range of orientations from which a model es visible.

i–lj couple, such as length. Binary constraints sfy that for each set of couple pairsLi–lj , Lk–ll , thengle betweenLi andLk must be the same as the eated angle betweenlj andll . Higher-order geometr

onstraints can then be introduced. One weaknehe approach is that the two setsL and l must be exressed in the same dimensional space number. Megments lie in a 3D space since image lines aD space. The 3D parameters of each 2D imageust then be estimated by means of the stereov

echnique. Another eventual possibility would bentroduce a verification step into the search proy computing the camera pose from current hypo

cal matching. In projecting each model line ontomage using the computed pose and then comphe lines obtained with viewed lines, current matchan either be accepted or rejected. This procedurortunately dramatically increases computational c

Page 6: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

234 O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246

and makes the algorithm unusable in real-time appli-cations.

Our goal here is to develop a matching method well-adapted to mobile robotics by improving both the effi-ciency and computational costs of the search process.To attain this objective, certain constraints have to beimposed:

- correspondence space dimensions must be reducedto the greatest extent possible by making use of theavailable robot pose estimate, in the aim of limitingthe number of possible combinations;

- geometric constraints must be directly applicable on2D–3D correspondences in order to minimize thenumber of pose-computing operations;

- a full-perspective camera model must be adopted forexpressing equations.

We are proposing a three-stage method that workswithin the correspondence space and utilizes thespecificity of the mobile robotics context. The firststage consists of reducing the correspondence space.Based on approximate knowledge of the camera pose,one or two VIRs, which possibly contain the robotpose, are selected. Only the edges of their visibilitylists are then retained during the subsequent stages ofthe matching process. Each of the two sets of features(model and image features) is divided into two subsetsby use of a constraint on their orientation. Twointerpretation trees are then built with these featuresets. In the second stage, a finite and reduced set ofc ratedb tricc stentc e. Int esisi pon-d ctedi ereda thep teda agesw g am m.

oft twoc (x co-o a

optical center and where the image plane is orthogonalto zc, situated at a distancef (focal lens) fromOc. Inthis work, it has been assumed that the homogeneoustransformation between the robot frame and a cameraframe tied to the robot base is known at each moment.Localizing the mobile robot is thus considered equiv-alent to computing the camera pose. Camera locationis characterized by a translation vectorT = [x y z]t anda rotational matrixR. T represents the translation be-tweenOc andO. R is composed of the three elementaryanglesΨ , θ andφ about thexc, yc andzc-axes, respec-tively. R andT carry the camera frame onto the globalframe. In general, indoor mobile robots operate withina 3D environment, yet their displacements lie in a 2Dspace. The camera thus moves in a horizontal plane ata known and constant height.zandθ are then assumedto be known andΨ is zero. At this point, the systemfeatures 3 degrees of freedom (DOF) withφ, xandyasparameters.

3.1. Reducing correspondence space (selectingmodel segments)

The presented algorithm can achieve both incremen-tal and global localization. The difference is that inthe case of incremental localization, the robot knowsan approximate estimate of its location, thanks to anydead-reckoning system (generally odometry). This as-sumption is not very restricting. Indeed, most of mo-bile robots are equipped with wheel encoders whichp es-t ingi re-s f thei bero rien-ta e-g gingt thes hem , thes ll theV aturem t af-f etelyl rre-s stem

orrespondence combination hypotheses is geney pruning the interpretation trees. Two geomeonstraints are used at this point to discard inconsiorrespondences by testing their local coherenche third stage, the most globally consistent hypoths selected. A pose is computed for each corresence combination. Model features are then proje

nto the image plane. Matching hypotheses are ordccording to an error function defined usingrinciple of the Hausdorff distance between projecnd viewed features. Details of these three still be presented in the next subsections followinathematical formulation of the localization probleIn order to provide a mathematical formulation

he camera-localization problem, let us consideroordinate systems: the global coordinate systemO,, y, z) related to the environment, and the camerardinate system (Oc, xc, yc, zc), whereOc is the camer

rovide an estimation of the pose. In addition, theimate can be of very poor quality. This enables taknto account the error increasing of odometry withpect to covered distances. The complete model ondoor work space contains generally a great numf segments. Having an estimate of the camera o

ation φe and position [xe, ye] and higher bounds�φnd�t of orientation and position error, only few rions can be retained. Thus, only segments belon

o the visibility lists of these regions will composeetL of model segments which will participate to tatching process. In the case of global localization

ame process described above is used including aIRs as candidates. The search for a consistent featching becomes slower but the final result is no

ected. Note that the case when the robot is complost in its environment occurs rarely. Generally it coponds to situations where the robot is stopped (sy

Page 7: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246 235

Fig. 3. Projection of a model segment.

resetting). Under this condition, the time constraint ismore flexible.

3.2. Matching hypothesis generation

Unary geometric constraint definition: Let us con-sider a 3D line segmentLi of L defined by its direc-tion vectorvi and its position vectorpi in the globalcoordinate system.v′

i = Rvi andp′i = R(pi − T ) are

the expressions ofvi andpi in the camera coordinatesystem. Assuming thatlj is the projection ofLi in theimage plane,Li andli belongs to an interpretation planepassing through the focus pointOc. Let nj be the unitvector normal to this plane expressed in the camera co-ordinates system (Fig. 3). It is possible to calculatenjknowing lj and the intrinsic camera parametersu0, v0,αu,αv [16]. Expressing the dot productsnj v

′i andnjpi ′,

one can write

ntjRvi = 0 (1.a)

ntjR(pi − T ) = 0 (1.b)

According to the mobile robotics context presented be-fore, the rotation matrix can be written as follows:

R =

cos(θ)cos(φ) − cos(θ) sin(φ) sin(θ)

sin(φ) cos(φ) 0

−sin(θ) cos(φ) sin(θ) sin(φ) cos(θ)

By replacingR we obtain two equations of the follow-i

a

Fig. 4. Unary geometric constraint.

Aijx+ Bijy + Cij = 0 (2.b)

with:

aij = (njx cos(θ) − njz sin(θ))vix + njyviy

bij = njyvix − (njx cos(θ) − njz sin(θ))viycij = (njx sin(θ) + njz sin(θ))vizAij = −(njx cos(θ) − njz sin(θ)) cos(φ)

−njy sin(φ)

Bij = −njy cos(φ) + (njx cos(θ)

−njz sin(θ)) sin(φ)

Cij = [(njx cos(θ) − njz sin(θ))xi + njyyi] cos(φ)

+[niyxi − (njx cos(θ) − njz sin(θ))yi] sin (φ)

+[njx sin(θ) + njz cos(θ)](zi − z)

Thus, each correspondence permits to calculateφ using(1.a)and then to replace it by its value in(1.b) to con-strain the position of the robot to belong to a straight linein thexy-plane whose equation isAij x+Bij y+Cij = 0.

A unary geometric constraint is then expressed asfollows: a correspondenceLi–lj is an acceptable match-ing if the corresponding calculated orientation verifies|φ−φe| < �φ and the calculated straight line intersectswith the circle whose center is [xe,ye] and whose radiusis �t (Fig. 4).

Unfortunately, if the model segment is vertical, pa-rametersai andbi are zero. The orientation cannot becalculated in this way. Unary geometric constraint isthen unusable. We propose the following solution.

lw st sumet ti-

ng forms:

ij cos(φ) + bij sin(φ) + cij = 0 (2.a)

Binary geometric constraint definition: Severaorks, as Betke and Gurvits[20], use vertical line

o calculate the camera pose. They generally ashat the angleθ = 90◦, so that the projection of a ver

Page 8: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

236 O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246

Fig. 5. Angle of view of vertical edges forθ = 90◦.

cal segment of the model is a vertical line in the image.A classical triangulation method is then used to con-strain the camera position. In practice, in most mobilerobotsθ is known but variable (pan and tilt camerasystems). The projection of a vertical segment is thennot necessarily vertical in the image. The triangulationmethod cannot be used directly. We propose to extendthe method to the case whereθ �= 90◦. The solutionis first presented forθ = 90◦ and then extended to thegeneral case.

Let us consider a subset of vertical segmentsLv of Land a subset of vertical image lineslv of l. The equationof lvi in the image plane can be writtenv = vi (Fig. 5).

Considering a couple of correspondences (Li–lj ,Lk–ll) observed from an unknown camera position (x,y). The angle of viewω betweenlj and ll can be cal-culated from the image measurementsvj, vl and thecamera intrinsic parameters as follows by computingfirst the view angle of each feature:

ωj = arctan

(vj − v0

αv

)and

ωl = arctan

(vl − v0

αv

)

and then by:

ωjl = |ωj − ωl| (3)

Fig. 6. Binary geometric constraint (top view of two vertical modelsegmentsLi andLk).

The relationship between the view angle and the coor-dinates of the two vertical segments can be written:

d2ik = d2

i + d2k − 2didk cos(ωjl) (4)

where di =√

(xi − x)2 + (yi − y)2 anddk =

√(xk − x)2 + (yk − y)2; xi , xk, yi and yk

being the coordinates of the vertical edges in thehorizontal plane.

According to this relationship, the view angle re-stricts the possible camera positions to an arc of a circleas shown inFig. 6. Theoretically, two circles can satisfythe angular constraint. The false one can be eliminatedby satisfying if the right to left order of the vertical linesin the image is consistent with each range of positions.

Then, a binary geometric constraints can be ex-pressed as follows: a couple of correspondences (Li–lj ,Lk–ll) is acceptable if the corresponding circle arc inter-sects with the circle whose center is [xe, ye] and whoseradius is�t and the right to left image order oflj andllis preserved.

To generalize the binary constraint to the case whereθ is not equal to 90◦, one have first to identify the imagelines which potentially correspond to vertical segmentsin the model. An image linel is defined in the imageplane by its slopeρ and its distance to the origind. lmay correspond to a vertical segment in the model ifsome rules are verified:

- if 0 < θ < 90◦ thene

ρmin <ρ <ρmax when l is contained in the imag

right half,−ρmax<ρ <−ρmin otherwise.

Page 9: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246 237

Fig. 7. Only lines in bold are selected as possibly corresponding tovertical edges.

- if 90◦ < θ < 180◦ then−ρmax<ρ <−ρmin whenl is contained in the im-

age right half,ρmin <ρ <ρmax otherwise.

ρmin and ρmax are proportional toθmin and θmaxwhich generally can be situated around 45◦ and 135◦,respectively (Fig. 7).

The limits imposed by these rules must be madefuzzy in introducing flexible thresholds to take into ac-count noisy data such as line parameters estimation andcamera calibration errors.

We now have a set of image lines which can possi-bly correspond to vertical edges. To apply the binaryconstraint, we need to calculate thevi measurements.If the camera was in a horizontal position (θ = 90◦), thelines would be vertical and have as equationv = vi. Wedefine a so-called “horizon” line in the image whoseequation isu=uh0. The value ofuh0 can easily be cal-culated knowingθ as follows:

uhor = tan(θ − π

2

)αu + u0 (5)

To use the triangulation as in the case whereθ = 90◦, oneneed to retrieve the measurementsvi. In other words,one has to build a virtual image corresponding to thecase whereθ = 90◦ of observed segments at any valueof θ. vi measurements can be obtained from the inter-sectionsvhi of each line with the horizon line (Fig. 8)as follows:

v

T

m w bes

Fig. 8. Binary constraint in the general case (θ �= 90◦).

- Starting from a set ofm model linesL ={L1, L2,. . ., Lm} and a set ofn image linesl ={l1, l2, . . .,ln}, rather than using a classical interpretation treeconstructed fromL andl elements,L is first dividedinto two groups of vertical segmentsLv ={L1, L2,. . ., Lmv} and non-vertical segmentsLh ={L1, L2,. . ., Lmh}. A subsetlv is extracted froml using theprevious procedure. Two smaller interpretation sub-trees are constructed, one fromLh and l elementsand the other from couples ofLv and couples oflvelements.

- The unary geometric constraint is applied to prunethe first sub-tree. Only retained correspondencesserve to compose a list of combinations ofLh andl elements.

- The binary geometric constraint is used to prune thesecond sub-tree. It is applied on the couples of cor-respondences formed fromLv andlv. Only retainedcouples serve to compose a list of combinations ofLv andlv elements. One more constraint is appliedon the combinations of this list. It verifies the consis-tency of right to left apparition order of the segmentson the image. The correct order can be computed of-fline and does not change when the camera movesinside the same VIR. The way to compute this orderis presented inAppendix B. The combinations whichdo not verify the constraint are discarded.

The obtained two lists are then combined to form acomplete list of possible matching hypothesis ofL and

i = (vhi − v0) sin(θ) + v0 (6)

he demonstration is given inAppendix A.Constructing the matching hypothesis list: The

atching hypothesis generation process can noummarized as follows:

Page 10: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

238 O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246

l elements. The combinations which do not contain asufficient number of correspondences are discarded.This number represents a low threshold on the fractionof hypothesized matches among model edges whichare supposed to be viewed to eliminate the possibilityof accidental correspondence combinations, also calledfalse positives[6,21].

3.3. Matching hypothesis verification

The thresholds used in unary and binary constraintsare not tight enough to eliminate the totality of falsematching hypothesis. The previous step serves to ver-ify the local coherence of each correspondence or cou-ple of correspondences. But a combination composedof locally consistent correspondences may be non-globally consistent. The last step is to select the op-timal match (the most globally consistent), if it doesexist, among the set of retained hypothesis. Severalapproaches exist taking into account the fraction ofcorrectly matched model features, the probability ofaccidental pairings and image reconstruction errors[6,21,22].

At this level, a small set of matching hypothesesare retained. Thus, it becomes now feasible to testeach hypothesis by computing the corresponding cam-era pose. Indeed, for each correspondence combina-tion, a camera pose is computed by mean of leastsquare method. Model edges are projected using thispose onto image plan. The degree of fit between pro-j atedu dis-t hy-p im-a iffer-e con-t ra-t un-c thisf com-b unc-t b-s

-p ud[ am uareo by

the equation system(1) expressed with respect toφ, x, y. The Levenberg–Marquardt algorithm isused.

Distance between projected and viewed segments:One may think that it is enough to use the residualsat convergence of the error criterion (1) as a metricto evaluate a distance between projected and viewedsegments on image plan. This is what is done in[22]where aχ2 test is derived from these residuals to eval-uate the reliability of hypothesized matching. This isnot a good idea because this criterion considers in factimage segments as infinite straight lines. This meansthat a pose which makes a projected segment co-linearwith its hypothetical viewed correspondent can be in-terpreted as correct even if the two segments are notsuperimposed. The function used in this work is in-spired from the Hausdorff distance between two setsof features. Hausdorff distance is the longest metricdistance among the distances between each feature ofone set and its nearest neighbor in the second set. Theproblem is then to define the metric distance betweentwo straight segments with different orientations po-sitions and lengths. The solution adopted consists inmeasuring the sum of distances between each point onthe projected segment to its nearest point on the viewedsegment.

Let us consider two segments, a viewed segmentl and a projected segmentlp defined by their start-ing and ending points in the image planms, me andms , me , respectively. Lety = ax+b be the equationo

x

ts hm tt

--

-

T finedb

D

ected segments and viewed segments is estimsing a distance function based on Hausdorff

ance. In the theoretical case, the right matchingothesis would produce null errors. In practice,ge measurements contain noise introduced at dnt stages (image acquisition, segmentation into

ours, polygonal approximation). In addition, calibion errors make the projection of model segmentsertain. Thus, the best hypothesis according tounction is selected as the correct correspondenceination. The pose computing and the distance f

ion definition are explained in the following suections.Camera pose computing: The method used to com

ute R and T is an adaptation of Phong–Hora23] method to mobile robotics context. It isethod based on numerical non-linear least sqptimization of the error criterion obtained

p p

f the segmentl in the 2D frame (msp, x, y) where

p = −−−→mspm

ep

/

∥∥∥mspm

ep

∥∥∥, yp = −−−→mspmi

/

∥∥∥mspmi

∥∥∥ andmi

he intersection of the straight line containingl and thetraight line perpendicular tolp and passing througsp (Fig. 9). Letm(xm, ym) be a point on the segmenl.

he nearest point tomon lp is:

msp if xm≤ 0,

its perpendicular projection onlp if 0 < xm<λp(whereλp is the length oflp),mep if xm≥ λp.

hus, the distance between the two segments is dey

(lp, l) = 1

λp

∫ xe

xsd(x) dx (7)

Page 11: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246 239

Fig. 9. Distance between two segments in image plan.

where

- d(xm) =√x2m + (ym − ysp)2 if xsm ≤ 0;

- d(xm) = (perpendicular distance tolp) if 0 <

xm < λp;

- d(xm) =√

(xm − xep)2 + (ym − yep)2 if xm ≥λp.

It is possible now to define the distance between twosets of projected and viewed segments,Lp ={lp1, lp2,. . ., lpm} andL={l1, l2, . . ., lm}, using the Hausdorffprinciple, by:

h(Lp,L) = maxi=1,...,k

(kmin(D(lpi, li))) (8)

wherekmin represents thekshortest distancesD amongthe distances between the segments of correspondencecomposing the matching hypothesis.k is the thresholdon the number of correspondences defined above.

3.4. Taking uncertainty due to noise into account

The unary and binary constraints used to test the lo-cal coherence of correspondences and to prune the twointerpretation sub-trees are based both on pose estima-tion with error margin and on image segments. Eachimage segment is represented by the vector normal toits interpretation planen. n components are noisy dueto errors in contour detection as well as to polygonal ap-proximation, with this uncertainty propagating to thep aightlc pre-s

in the case of a binary constraint). In order to estimatethe variation margins of parameters in these constraintregions, a Monte Carlo method is applied offline. Thisapproach is most useful due to the strong non-linearityof the equations yielding constraint region parameterswith respect ton image components.

Uncertainty in the unary constraint: For eachmodelLi segment, a set of synthetic image segmentsljrepresented by their normal interpretation plane vectornj is used. For each correspondence,Li–lj randomsamples ofns

j, belonging to a normal distributionof mean valuenj and standard deviation matrixσnj,are generated.φSij, A

Sij, B

Sij andCSij are computed for

each sample. Mean values and standard deviations ofparametersφSij, A

Sij, B

Sij andCSij are then estimated.

During the online application of geometric constraintson a hypothetic correspondenceLi–lj , the orientationφij and parametersAij , Bij , Cij of the straight-lineconstraint are computed. The test thus consists ofsearching within the zone established by the offline-computed standard deviations aroundφij andAij , Bij ,Cij , provided a straight line intersecting the circledefined by pose-estimation margins exists. If at leastone such straight line actually exists, local coherenceof the correspondence is then validated.

Uncertainty in the binary constraint: The same rea-soning is applied for the binary constraint. For eachcouple of model vertical segmentsLi–Lj , a set of syn-thetic image segment couplesli–lj , as represented bytheir respective measurement of intersectionv –v witht cou-pis em etersx be-i ofg ence(d g in-s tionsa hec ts.

3

rizedb

arameters of the pose constraint regions (a strine represented by three parametersAij , Bij , Cij in thease of a unary constraint and an arc of a circle reented by its center coordinatesxc, yc and its radiusd

si j

he horizon line, is used. For each correspondencele (Li–li , Lj–lj), random samples ofvsSi –vSj belong-

ng to a normal distribution of mean valuevi–vj andtandard deviation matrixσv are generated, with thean values and standard deviations of paramScij, y

Scij, d

Sij of the corresponding constraint arc

ng computed offline. During the online applicationeometric constraints on a hypothetic correspondLi–li , Lj–lj), the arc of constraint parametersxcij , ycij ,ij is computed; the test then consists of searchinide the zone established by the standard deviaroundxcij , ycij anddij , provided an arc intersecting tircle defined by the pose-estimation margins exis

.5. Summary of the matching strategy

The complete matching process may be summay the following diagram (Fig. 10).

Page 12: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

240 O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246

Fig. 10. Summary of the correspondence search strategy.

3.6. Complexity and efficiency analysis

The improvements of the presented matching ap-proach are analyzed considering two parameters: thecomplexity of the pruning process with respect to thenumber of detected features and the quality of the initialpose estimate which is directly related to the frequencyof absolute pose computing.

Let us assume a setL of mmodel segments (com-posed ofmv vertical edges andmh non vertical edges)and a setl of n image segments (withnv of possiblyvertical edges). The complexity of a classical match-ing search strategy (as defined by Grimson[11]) isO((n+ 1)m) while the complexity of the presented strat-

egy isO((n+ 1)mh + (nv + 1)mv). Considering a sim-ple example wheremv = 5, mh = 5, n= 10, nv = 5, theinterpretation space dimension is divided by 240,660.

Let us now compare the matching search efficiencyof the presented strategy with the strategy of De Souzaand Kak[3] described in the introduction consideringan initial pose error of�φ = 10◦ and�t= 0.5m. Whenobserving an edge at a distance of 6 m from the cam-era, the propagation of the robot pose uncertainty intothe image yields a predicted uncertainty zone of size�v ≈ 400 pixels obliging the search in almost all theimage. The same error selects using our algorithm onlyone VIR restricting the search to the set of segmentsbelonging to this region.

Page 13: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246 241

4. Experimental evaluation

The method presented has been tested on both syn-thetic and real images. The goal of the test with syn-thetic data is to evaluate method performance from astatistical point of view using a great number of variouspractical situations. The test with real images and a realindoor environment serves to illustrate the feasibility ofthe approach under realistic conditions.

4.1. Evaluation with synthetic images

This test consists of generating synthetic views of avirtual indoor model from known poses and then run-ning the proposed algorithm in order to both determinethe correct correspondence combination and locate thecamera.

A geometrical model of a hall composed of 65edges was used for purposes of this test. A largenumber of camera poses are generated randomly; foreach pose, a synthetic image is obtained by project-ing the viewed model segments using a camera sim-ulator. Specifications of the camera simulator are asfollows:

- resolution = 640× 480;- intrinsic parametersu0 = 240,v0 = 320,αu =−1000

andαv = 1000.

Uncertainty due to calibration errors has been mod-ea dis-t ardd nm[

I andl isew rs asf[

w ithn ns of

Table 1Noise parameter values used in the synthetic data test

Noise parameter Value

ηm 0.02ηa 500 (f)ηρ 1.0 (◦)ηd 3 (pixels)

Table 2PEQ values used for the tests

QI 1 2 3

�t (m) 0.30 0.50 0.75�φ (◦) 10 15 20

ηρ andηd. The actual values used in this test have beendisplayed inTable 1.

To simulate the presence of non-modeled objectsin the scene, randomly generated lines were addedto the synthetic image. Initial pose estimations nec-essary to the algorithm were randomly generated aswell. The number of model segments viewed in eachimage varied between 5 and 12; the number of added(non-modeled) segments also varied from 5 to 12. Im-ages not containing sufficient viewed segments werediscarded. A set of 430 images was ultimately selectedfor the test.

As mentioned, initial pose estimations necessary tothe algorithm were generated randomly. A pose esti-mation quality (QI) parameter, which determines thedifference between reference pose and estimated pose,was defined. Three series of tests were conducted byvarying the QI value (seeTable 2).

First of all, the capacity of the algorithm to identify asufficiently reliable combination of correct correspon-dences is studied. The results presented inTable 3havebeen categorized into four groups:

- Success: a correct combination has been found;- False negatives: no combination found;

Table 3Synthetic data test results

QI 1 2 3

Success 421 418 415FCI

led by two noise parametersεm (multiplicative error)ndεa (additive error), which belong to a Gaussian

ribution with null mean values and respective standeviationsηm andηa, introduced into the projectioodel as follows:

u

v

]=

(1 + εm)αu

x

z+ u0 + εa

(1 + εm)αvx

z+ v0 + εa

n order to model noise due to an imaging processine-extraction program, uniformly distributed noas added on the synthetic image line paramete

ollows:

ρ

d

]=

[ρ + ερ

d + εd

],

ith ερ andεdbelonging to a Gaussian distribution wull mean values and respective standard deviatio

alse negatives 5 5 5onsistent false positives 4 7 7

nconsistent false positives 0 0 3

Page 14: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

242 O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246

Fig. 11. Pruning performance with unary and binary constraints.

- Consistent false positives: the proposed combinationis composed of at least one false correspondence, butthe interpretation is still globally consistent and thecomputed camera pose remains accurate;

- Inconsistent false positives: the algorithm has pro-posed a combination with several incorrect corre-spondences and the global interpretation is false.

The second result concerns efficiency of the match-ing strategy.Fig. 11 shows the number of matchinghypotheses retained after pruning interpretation treesusing both unary and binary constraints. Samples wereordered in accordance with the maximum number ofcombinations.

Results analysis: Two remarks can be forwardedbased on the results obtained with synthetic data.First, the pruning strategy using unary and binary con-straints offers a good level of selectivity. As shown inFig. 11, the number of retained hypotheses does notactually rise considerably with respect to the numberof model and image segments. The exponential ex-plosion in the maximum number of correspondence

combinations is avoided. The second remark pertainsto the capacity of the distance function in selectingthe most globally consistent correspondence combi-nation.Table 3reveals that the number of algorithmfailures is very small (less than 2%, if the false neg-atives and inconsistent false positives are consideredas failures with QI = 3). Furthermore, results showthat method performance does not decrease when thequality of the initial pose estimate is deteriorated(QI = 3).

4.2. Evaluation with real images

Camera calibration: A SONY FCB-IX70P camerawith a 640× 480 resolution was used for this test. Toobtain the intrinsic parameters of the pinhole cameramodel, the Zhang camera calibration method[24] wasimplemented; this method is better adapted to the mo-bile robotics context by virtue of its enhanced flexi-bility. As opposed to the classical Faugeras–Toscanimethod, the Zhang method does not degenerate withcoplanar points. Several images of a planar pattern areintroduced. The four intrinsic parametersαu, αv, u0andv0, along with the extrinsic model for each view,are computed using a least-squares optimization tech-nique. The transformation (rotation and translation)need not be known between the pattern poses in eachview. This procedure enables modeling a large spacearound the camera without requiring very accurate orcostly equipment. In addition, this method yields re-s Thei singt

ag achr ns,i thatc sents em.T thep eanso arc usinga ases s int l toc nedb n is

ults comparable to those of classical methods.ntrinsic camera parameters are then computed uhe Zhang method.Image feature extraction: The image contains

reat amount of unprocessed information. For eobotics application, especially real-time applicatiot is necessary to identify the parts of the imageontain pertinent information and then to repreuch information in a form usable by the systhis is the role of low-level image processing. Inresent work, contours have been extracted by mf a Canny filter. Among the contour pixels, linehains longer than a set threshold are detectedHough transform of the image plane. To incre

egment detection efficiency, the voting proceshe Hough table is limited for each contour pixeells corresponding to a range of orientations defiy the gradient orientation of the pixel. Each chai

Page 15: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246 243

Fig. 12. Examples of images taken during the navigational session.

then represented by two parameters (ρ, d) obtained bypolygonal approximation of the pixel coordinates[16].

Experimental evaluation: The experimental testconsists of locating the robot during a navigational ses-sion in the real-world environment that corresponds tothe previous model. A sequence of images (some sam-ples are presented onFig. 12) is saved during a robotnavigational session along a path defined in the navi-gation plane, as shown inFig. 13a. The ground truthvalue is measured for each acquisition, with gradua-

Fig. 13. Localization process: (a) updating the robot pose (red tri-angle), starting from an initial position (black circle) and orientationestimate (blue angle); (b) selection of a subset of model featuresu (Fori , ther

tions being expressed in meters. For each image, theinitial position estimate is represented by a black circleand the orientation margin by a blue angle. This initialestimate is obtained by updating the pose in light ofthe navigation commands; the value of QI consideredin the test is 3. Computed robot poses are symbolizedby red triangles.Fig. 13b shows the wire frame of thehall. The edges selected thanks to VIRs are plotted inred. An example of an image used for location withcorrectly matched segments is also shown inFig. 13c.Results from the localization process are presented inTable 4.

These results demonstrate that localization accu-racy is maintained throughout the navigational session(�φ < 3◦ and �t< 0.2 m). The number of correspon-dence combinations retained is quite low. The relativelypoor initial pose estimation quality allows adopting a

Table 4Results of real image test

Imagenumber

Location error Hypothesisnumber

Correspondencenumber

�φ (◦) �t (m)

1 2.1 0.15 65 52 1.8 0.12 55 43 1.5 0.19 24 54 2.5 0.13 120 65 2.0 0.16 65 56 2.1 0.09 180 47 2.3 0.13 121 48 – – – –

1

sing VIRs; (c) viewed image and correctly matched features.

nterpretation of the references to colour in this figure legendeader is referred to the web version of the article.)

9 0.5 0.08 360 40 1.8 0.20 391 9

Page 16: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

244 O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246

low sampling rate for image acquisition. As could beexpected, an acquisition with insufficient features oc-curred during navigation (Image 8). The tolerance oninitial pose estimation quality was then increased dur-ing the next acquisition, but this change did not affectalgorithm performance.

Results analysis: Results obtained with real imageshave confirmed expected performance, according to re-sults of the test conducted using synthetic data. Thealgorithm served to efficiently update the robot pose.Performance in terms of localization accuracy satisfiesthe context requirement and allows the robot to accom-plish automatic indoor navigational sessions.

5. Conclusion and subsequent work program

A method for incremental model-based localiza-tion of an indoor mobile robot using monocular visionand straight-line correspondences has been presentedherein. All steps composing the localization processwere studied, with special attention focused on the maincontribution of this work, which consists of a strategyfor the online matching of image and model correspon-dences.

According to simulation and experimental results,the previous objective of this work, which was to guar-antee online matching search efficiency and absoluterobot pose computing, has been attained. This effi-ciency is due to several factors. First, the interpretationt l ex-p ions.S et-r alln tion,t cor-r singe eta-t n oft

toc posec tialp ap-p ings e ob-t atedb

Current interest is focused on introducing color andtexture parameters to accelerate the matching process.These parameters can be used as additional constraintsfor testing the local consistency of hypothetical corre-spondences. Another area of concentration is the globallocalization problem, which consists of identifying apose estimate when the robot is completely lost. Thiscase may arise, for example, upon resetting the sys-tem. The approach entails introducing recognition us-ing content-based image retrieval.

Appendix A

Let l i be an image line selected as possibly corre-sponding to a vertical edge (seeFig. 7) and viewed witha camera orientationR(φ, θ, 0) and translationT([x yz]T). vhi is the coordinate of its intersection with thehorizon line. This intersection point is the projectionof a pointpi(xi , yi , zi) on the corresponding model seg-ment. In projectingpi , we have

(xi′, yi′, zi′) = R(pi − T ) and

(uhi, vhi) = (αuxi

zi′+ u0, αv

yi′

zi′+ v0)

Sincezi =zwe obtain

vhi = αv((xi−x) sin(φ) + (yi−y) cos(φ))

(−(xi−x) cos(φ) + (yi−y) sin(φ)) sin(θ)+ v0

T

v

T

A

m mea-s[ iona

ω

ree subdivision enables avoiding the exponentialosion in the number of correspondence combinatecond, the low order and good selectivity of geom

ic constraints make it possible to retain just a smumber of correspondence combinations. In addi

hese constraints are directly applicable on 2D–3Despondences. Finally, accurate pose computing uach one of the small set of retained global interpr

ions serves to make a robust and efficient selectiohe correct matching hypothesis.

Another advantage of this method is its abilityonsiderably reduce the sampling rate of absoluteomputing thanks to relative insensitivity to the iniose estimate quality. This characteristic enableslying the method even without any dead-reckonystem. The necessary pose estimate can thus bained by merely integrating the commands genery the navigation system.

he goal is to calculatevi if θ was equal to 90◦.

i = αv((xi − x) sin(φ) + (yi − y) cos(φ))

−(xi − x) cos(φ) + (yi − y) sin(φ)+ v0

hus,vi = (vhi − v0). sin(θ) + v0

ppendix B

Let us consider a set{Lv1, Lv2, . . . , Lvi} of verticalodel segments and their respective intersection

urements with the horizon line{v1, v2, . . . , vi}. Letφmin,φmax] be an estimation of the camera orientatndω the angle (Fig. 14).

The angle of view of a segmentLvi is:

i = arctan

(vi − v0

αv

)− φmax − ω

Page 17: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246 245

Fig. 14. Computing the left to right order of apparition of verticalsegments on image.

The order of apparition from right to left on theimage of the segments is inversely proportional toωi .

References

[1] J. Borenstein, H.R. Everett, L. Feng, D. Wehe, Mobile robotpositioning: sensors and techniques, J. Rob. Syst. 14 (4) (1997)231–249.

[2] S. Se, D.G. Lowe, J. Little, Mobile robot localization and map-ping with uncertainty using scale-invariant visual landmarks,Int. J. Rob. Res. 21 (8) (2002) 735–758.

[3] G.N. DeSouza, A.C. Kak, Vision for mobile robot navigation:a survey, IEEE Trans. Pattern Anal. Mach. Intell. 24 (2) (2002)237–267.

[4] O. Ait Aider, T. Chateau, J.T. Lapreste, Indoor autonomous nav-igation using visual memory and pattern tracking, in: Proceed-ings of the British Machine Vision Conference, vol. 2, Kingston,England, September, 2004, pp. 657–665.

[5] H. Tamimi, A. Zell, Vision based localization of mobile robotsusing kernel approaches, in: Proceedings of the IEEE/RSJ Inter-national Conference on Intelligent Robots and Systems, Sendai,Japan, September, 2004, pp. 1896–1901.

[6] X. Pennec, Toward a generic framework for recognition basedon uncertain geometric features, Videre, J. Comput. Vis. Res.Q. 1 (2) (1998) 58–88, The MIT Press.

[7] N. Ayache, O. Faugeras, O.D. Hyper, A new approach for therecognition and positioning of two-dimensional objects, IEEETrans. Pattern Anal. Mach. Intell. 8 (1) (1986) 44–54.

[8] D.G. Lowe, Three-dimensional object recognition from sin-gle two dimensional images, Artif. Intell. 31 (3) (1987) 355–395.

[9] Y. Lamdan, H.J. Wolfson, Geometric hashing: a general andefficient model-based recognition scheme, in: Proceedings of

[10] W.E.L. Grimson, T. Lozano-Perez, Localizing overlapping partsby searching the interpretation tree, IEEE Trans. Pattern Anal.Mach. Intell. 9 (4) (1987) 469–481.

[11] W.E.L. Grimson, Object Recognition: The Role of GeometricConstraints, MIT Press, 1990.

[12] W.E.L. Grimson, D.P. Huttenlocher, On the sensitivity of theHough transform for object recognition, IEEE Trans. PatternAnal. Mach. Intell. 12 (3) (1990) 255–274.

[13] A. Kosaka, A.C. Kak, Fast vision-guided mobile robot naviga-tion using model-based reasoning and prediction of uncertain-ties, Comput. Vis. Graph. Image Process. Image Understanding56 (3) (1992) 271–329.

[14] R. Talluri, J.K. Aggarwal, Mobile robot self-location usingmodel-image feature correspondence, IEEE Trans. Rob. Au-tom. 12 (1) (1996) 63–77.

[15] A.J. Munoz, J. Gonzalez, Two-dimensional landmark-based po-sition estimation from a single image, in: Proceedings of IEEEInternational Conference on Robotics and Automation, 1998,pp. 3709–3714.

[16] R. Horaud, O. Monga, Vision par Ordinateur, Outils Fondat-mentaux, Hermes, Paris, France, 1993.

[17] K.T. Simsarian, T.J. Olson, N. Nandhakumar, View invariantregions and mobile robot self-localization, IEEE Trans. Rob.Autom. 12 (5) (1996) 810–815.

[18] L. Guibas, R. Motwani, P. Raghavan, The robot localizationproblem in two dimensions, in: Proceedings of Third ACM-SIAM Symposium on Discrete Algorithms, 1992, pp. 259–268.

[19] D.W. Murray, D.B. Cook, Using the orientation of fragmentary3D edges segments for polyhedral object recognition, Int. J.Comput. Vis. 2 (1988) 153–169.

[20] M. Betke, L. Gurvits, Mobile robot localization using land-marks, IEEE Trans. Rob. Autom. 13 (2) (1997) 251–263.

[21] W.E.L. Grimson, D.P. Huttenlocher, On the verification of hy-pothesized matches in model-based recognition, IEEE Trans.

[ osettern

[ 3-D995)

[ ion,30–

mbo-ofs afor-lo-r

thee

Second ICCV, 1988, pp. 238–289.

Pattern Anal. Mach. Intell. 13 (12) (1991) 1202–1213.22] C.C. Chang, W.H. Tsai, Reliable determination of object p

from line features by hypothesis testing, IEEE Trans. PaAnal. Mach. Intell. 21 (12) (1999) 1235–1241.

23] T.Q. Phong, R. Horaud, P.D. Tao, Object pose from 2-D topoint and line correspondences, Int. J. Comput. Vis. 15 (1225–243.

24] Z. Zhang, A flexible new technique for camera calibratIEEE Trans. Pattern Anal. Mach. Intell. 22 (11) (2000) 131334.

DrOmar Ait Aider spent three years, fro1999 to 2002, as a junior researcher at Laratoire Systemes Complexes, UniversityEvry, France preparing his PhD and ateaching assistant in Electronics and Inmatics at Institut Universitaire de Technogie of Evry. From 2003 to 2005, Dr Ait Aidewas a postdoctoral research fellow inGRAVIR team at LASMEA (CNRS), BlaisPascal University, France.

Page 18: A model-based method for indoor mobile robot localization using monocular vision and straight-line correspondences

246 O.A. Aider et al. / Robotics and Autonomous Systems 52 (2005) 229–246

Philippe Hoppenot obtained his PhD inrobotics in 1997. He is presently associateprofessor at LSC, the University of Evry,France. He is at the origin of the ARPH(Assistance with a Robot to Persons withHandicap) project. His research interests in-clude man–machine co-operation, rehabil-itation engineering and robotics. This lastpoint deals with planning, navigation andlocalization for mobile robots. A new cen-tre of interest is manipulator arm, embarked

on a mobile base. He also works on a tool robot to help autisticchildren to have better relations with their environment.

Etienne Colle is a professor at the Univer-sity of Evry, France. His research interestsinclude signal processing for perception de-vices, telerobotics and rehabilitation engi-neering. He takes an active part in the anima-tion of the scientific community interestedby rehabilitation engineering. Since January2005 he is the head of LSC (CNRS), Uni-versity of Evry.