27
http://ijr.sagepub.com/ Robotics Research The International Journal of http://ijr.sagepub.com/content/31/4/403 The online version of this article can be found at: DOI: 10.1177/0278364911435160 2012 31: 403 originally published online 27 February 2012 The International Journal of Robotics Research Wen Lik Dennis Lui and Ray Jarvis A pure vision-based topological SLAM system Published by: http://www.sagepublications.com On behalf of: Multimedia Archives can be found at: The International Journal of Robotics Research Additional services and information for http://ijr.sagepub.com/cgi/alerts Email Alerts: http://ijr.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://ijr.sagepub.com/content/31/4/403.refs.html Citations: What is This? - Feb 27, 2012 OnlineFirst Version of Record - Apr 4, 2012 Version of Record >> at FLORIDA ATLANTIC UNIV on November 21, 2014 ijr.sagepub.com Downloaded from at FLORIDA ATLANTIC UNIV on November 21, 2014 ijr.sagepub.com Downloaded from

A pure vision-based topological SLAM system

  • Upload
    r

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A pure vision-based topological SLAM system

http://ijr.sagepub.com/Robotics Research

The International Journal of

http://ijr.sagepub.com/content/31/4/403The online version of this article can be found at:

 DOI: 10.1177/0278364911435160

2012 31: 403 originally published online 27 February 2012The International Journal of Robotics ResearchWen Lik Dennis Lui and Ray Jarvis

A pure vision-based topological SLAM system  

Published by:

http://www.sagepublications.com

On behalf of: 

  Multimedia Archives

can be found at:The International Journal of Robotics ResearchAdditional services and information for    

  http://ijr.sagepub.com/cgi/alertsEmail Alerts:

 

http://ijr.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

http://ijr.sagepub.com/content/31/4/403.refs.htmlCitations:  

What is This? 

- Feb 27, 2012OnlineFirst Version of Record  

- Apr 4, 2012Version of Record >>

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 2: A pure vision-based topological SLAM system

A pure vision-based topological SLAMsystem

The International Journal ofRobotics Research31(4) 403–428© The Author(s) 2012Reprints and permission:sagepub.co.uk/journalsPermissions.navDOI: 10.1177/0278364911435160ijr.sagepub.com

Wen Lik Dennis Lui and Ray Jarvis

AbstractIn this paper, we present a feasible solution to the problem of autonomous navigation in initially unknown environmentsusing a pure vision-based approach. The mobile robot performs range sensing with a unique omnidirectional stereovisionsystem, estimates its motion using visual odometry and detects loop closures via a place recognition system as it performstopological map building and localization concurrently. Owing to the importance of performing loop closing regularly,the mobile robot is equipped with an active loop closure detection and validation system that assists it to return to targetloop closing locations, validates ambiguous loop closures and provides it with the ability to overturn the decision ofan incorrectly committed loop closure. A refined incremental probabilistic framework for an appearance-based placerecognition system is fully described and the final system is validated in multiple experiments conducted in indoor, semi-outdoor and outdoor environments. Lastly, the performance of the probabilistic framework is compared with the rank-basedframework with additional experiments conducted in the semi-autonomous mode, where the mobile robot, provided with apriori information in the form of a topological map that is built in a separate occasion in an offline manner, is required toreach its target destination.

Keywordstopological SLAM, autonomous navigation, place recognition, loop closure, vision-based approach, appearance-basedapproach

1. Introduction

A mobile robot is considered as being autonomous whenit can safely navigate itself in the environment with min-imal or no human guidance. However, there are manycases where the practical application of autonomous mobilerobots would require other high-level perceptual capabili-ties. For instance, if a mobile robot is directed by a user tocollect a cup of coffee from the local cafe down the street, ithas to be equipped with the ability to locate and recognizethe cafe upon arrival, identify the location and manipulatethe cup of coffee, and to interact with people when ambi-guity arises. Nevertheless, the main focus of this research isto concentrate on the fundamental problem of autonomousnavigation, which is the problem of self-navigation fromone point to another, whilst exploring an initially unknownenvironment. The extent to which a mobile robot is ableto navigate autonomously in an initially unknown environ-ment is critically dependent on the quality of its sensorsand the sophistication of its mapping and planning strate-gies, which, together, make up its sensor driven spatialintelligence.

The way that we model and understand the world isprimarily dependent on visual information. Visual sensorshave become an attractive source of information on robotsas they are cheap, passive and widely available. This growthis further enhanced by the availability of cheap computingpower and off-the-shelf cameras with better image qualityand higher frame rates. However, similar to other sensingmodes, passive vision-based systems are plagued with theirown set of problems such as: dependency on texture-richscenes, they are affected by lighting variation, higher com-putational requirements, color inconstancy, etc. For mobilerobots, the list of problems extends to motion blurring, thewindowing problem suffered on perspective camera systems(e.g. the need to register different viewpoints of the samelocation and problems with minimally or non-overlapping

Intelligent Robotics Research Centre, Department of Electrical and Com-puter Systems Engineering, Monash University, Australia

Corresponding author:Wen Lik Dennis Lui, Intelligent Robotics Research Centre, Departmentof Electrical and Computer Systems Engineering, Building 72, MonashUniversity, Clayton, Victoria 3800, Australia.Email: [email protected]

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 3: A pure vision-based topological SLAM system

404 The International Journal of Robotics Research 31(4)

regions), greater power consumption due to increasing com-putational requirements and the need to satisfy real-timeconstraints.

In this paper, we present a feasible solution to this fun-damental problem of self-navigation in mobile robotics byprimarily relying on visually perceived information, such asrange estimation, motion estimation, place recognition, etc.These components were integrated together into the mobilerobot’s navigation system which balances its effort amongstloop closing and exploration, decides its next course ofaction, performs path planning and executes the selectedpath. As the mobile robot engages the environment, thepositional drift associated with the mobile robot’s estimatedlocation increases over time, thus, making it necessary toperform loop closing regularly by detecting it via a placerecognition system and maintaining the global consistencyof its internal representation of the environment (in the formof a topological map) by employing a relaxation technique.Owing to the importance of performing loop closing regu-larly, an active loop closure detection and validation system,that enables the mobile robot to actively search for loop clo-sures and to validate ambiguous loop closures, is proposed,developed and validated.

The paper draws on earlier work (Lui and Jarvis2010a,b,c; Tungadi et al. 2010). The main contributions ofthis paper are: (a) a full description and analysis of theproposed probabilistic Haar-based place recognition sys-tem briefly outlined by Tungadi et al. (2010) that providesthe developer control over what is deemed as a discrim-inative match and its convergence speed, (b) experimentsconducted in a wide range of environments (indoor, semi-outdoor and outdoor environments), (c) comparison of therank-based framework in Lui and Jarvis (2010c) with theprobabilistic framework for the place recognition systemand (d) results on semi-autonomous experiments where themobile robot, provided with a priori information in theform of a topological map built on a separate occasionin an offline manner, is required to reach a user specifieddestination (goal oriented).

The rest of this paper is organized as follows: Section 2provides an overview of related works and Section 3 pro-vides an overview of the system, which briefly summarizesprevious work that is crucial towards the understanding ofthe material presented in this paper. Subsequently, Section4 presents the place recognition system in two frameworksand Section 5 provides details of the semi-autonomous nav-igation mode of the mobile robot. Experimental results areillustrated in Section 6, where links to videos of the exper-iments presented are provided, followed by a discussion inSection 7. Finally, conclusions are drawn in Section 8.

2. Related work

It is generally agreed that autonomous robot navigation canonly be achieved in an initially unknown environment ifthe mobile robot performs simultaneous localization and

mapping (SLAM), and is equipped with a path planner thatdecides, plans and executes its next course of action. SLAMhas been a very popular and important research problemin the field of mobile robotics which has made signifi-cant progress over the past two decades (Thrun et al. 2005;Bailey and Durrant-Whyte 2006). Owing to the extensiveliterature in SLAM systems, this review will only focusspecifically on appearance-based approaches.

In recent years, appearance-based SLAM systems havegained widespread attention within the mobile roboticscommunity. There are several factors that play a majorrole in its development, such as the improving technolo-gies and decreasing prices of both cameras and proces-sors, the nature of vision sensors being passive and thepotential of visual information being used for develop-ing other high-level perceptual capabilities desirable on amobile robot. Moreover, appearance-based approaches arenaturally suited for loop closure detection (a key elementin SLAM), map merging and solving the kidnapped robotproblem (global localization).

Normally, appearance-based systems maintain a set oflocations, with each represented by an appearance model.For this reason, the internal representation of the exploredenvironment in such systems can be conveniently repre-sented in the form of a topological map, which is a linkedcollection of waypoints/nodes, each based on some distinc-tive abstract feature; in this case, an appearance model.Since each location is represented by its correspondingappearance model, the mobile robot can localize itself asit revisits previously explored areas of the environment bysearching for the corresponding node, associated with theappearance model that best matches the appearance modelassociated with its current location. However, this schememay only work for image retrieval systems (Rui and Huang1999; Smeulders et al. 2000), where a priori information inthe form of a database of appearance models for each loca-tion in its map is provided. In addition to self-relocalizing tothe correct existing node in its map if it was indeed revisit-ing, appearance-based SLAM systems have to first discrim-inate whether its current location is a previously seen or newlocation in the environment, thus, making it function like aplace recognition system.

The appropriate appearance model to employ is depen-dent on the target application, since different models havedifferent strengths and weaknesses. The most convenientapproach is to select an appropriate colour space andemploy this as the appearance model. Images are then com-pared using image similarity metrics such as Euclidean,sum of squared differences, sum of absolute differencesor normalized cross-correlation metrics. However, suchapproaches are very primitive; they are error prone to illu-mination changes, are scale, rotation and translation vari-ant, and require large amounts of memory as the databaseincreases in size.

There are other methods such as the use of image his-tograms and principal component analysis (PCA). Image

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 4: A pure vision-based topological SLAM system

Lui and Jarvis 405

histograms (Ulrich and Nourbakhsh 2000) are rotationand translation invariant. However, their matching accu-racy is greatly affected by illumination variation and theirmaximum database capacity is limited by the number ofbins used in the histogram representation. The total num-ber of bins also determines the total memory requiredto maintain the database of appearance models. On theother hand, PCA, which initially gained popularity in thedomain of face recognition (Turk and Pentland 1991; Hsiehand Tung 2009), reduces the high-dimensional data ontolow-dimensional subspaces. A set of principal components(eigenvectors) is used to represent each image and alsofor measuring the similarity between images. Linear PCA(Kröse et al. 1999) is rotation, translation and scale invari-ant; however, its matching accuracy is similarly affected byillumination variation. As such, both approaches are bettersuited for indoor applications.

Nevertheless, there are other approaches that were exper-imentally proven to be more robust towards illuminationvariation and occlusion such as the approach of Zhangand Kleeman (2009) where image cross-correlation is per-formed in the Fourier domain on patch-normalized omni-directional images or the use of Haar wavelet coefficients,as in Ho and Jarvis (2008). In addition, the bag-of-visualwords paradigm, over the past few years, has become apopular model in appearance-based systems. It uses a com-bination of visual cues extracted from the image and buildsa visual dictionary such as those proposed by Angeli et al.(2008) and Cummins and Newman (2008).

It has been acknowledged that a probabilistic frameworkis the most appropriate solution to deal with the problemof non-ideal sensory data and associated uncertainties. Itis an elegant solution because it provides a way to measureand account for the uncertainty in the location of the mobilerobot and features in the map, allows the performance of themobile robot to degrade gracefully as uncertainty becomesprevalent and provides a general platform for multisen-sor fusion. Despite this, it was reported by Milford andWyeth (2008) that a biologically inspired appearance-basedlocalization and mapping system, known as RatSLAM, hadshown remarkable results even when these uncertaintieswere not explicitly taken into account.

An initial attempt to develop a probabilistic appearance-based localization system was reported by Kröse et al.(1999). A priori information of the environment, in theform of PCA features extracted from greyscale panoramicimages, was required by the system. These panoramicimages were acquired from respective positions in the envi-ronment, where this environment could be visualized asa two-dimensional (2D) grid map divided into resolutionsof 10 cm × 10 cm. Based on this information togetherwith a nearest-neighbour model, the distribution whichdescribes the probability of the occurrence of a set of one-dimensional (1D) features for every location in the environ-ment, was derived. Subsequently, the posterior probabilitydistribution, which describes the probability of the current

mobile robot’s location to be within a particular grid/cellin its metric map given a set of features extracted fromthe image, was derived by applying Bayes’ Theorem. Themobile robot’s location could then be determined in anindoor environment by finding the location associated withthe index having the maximum posterior probability.

Similarly, Ho and Jarvis (2008) proposed an appearance-based localization system, using the Haar coefficients asthe appearance model, in a particle filter framework whichrequired a priori information of the environment. The a pri-ori information provided was in the form of a database ofimage signatures, created from Haar decomposed syntheticpanoramic images. These synthetic panoramic images weregenerated from the extensive three-dimensional (3D) vir-tual model of the world built using a Riegl Z420i terrestriallaser scanner. Unfortunately, a mobile robot with no priorknowledge of the environment is required to differentiatebetween a new location or one visited previously.

It was only recently that this problem in appearance-based localization and mapping systems was properlyaddressed using a complete probabilistic framework by twoindependent research groups (Angeli et al. 2008; Cum-mins and Newman 2008). Both groups employed the bag-of-visual words as the appearance model to represent rawimage data as a collection of attributes/words chosen fromthe vocabularies in the dictionary. However, there are majordifferences between the two systems in several aspects, suchas the construction and maintenance of the visual dictionaryand the way that the problem was formulated and resolved.

Cummins and Newman (2008, 2010) formulated thisproblem as a recursive Bayes estimation problem and wereable to provide a measure on the probability of the mobilerobot being at a previously visited or new location. Ideally,to provide an accurate measure of the mobile robot’s likeli-hood of being at a new location would imply the evaluationagainst all possible unknown places. This is not practi-cal due to (a) it not being possible to have a database ofall unknown places and (b) the computational cost asso-ciated with this process is expected to be very expen-sive. In order to resolve this issue and provide a reason-able estimation, a sampler, which randomly selects from alarge collection of observations from street-side imagery(Google Street View or previous runs), was used to sam-ple an observation to build the place model which repre-sents the unmapped locations. For optimal performance, aChow Liu tree which captures the co-occurrence statisticsof visual words (SURF features (Bay et al. 2006)) was con-structed from a large number of observations. This processwas reported to involve 2,800 images from 28 km of urbanstreets and the entire process took 2 hours on a 3 GHzPentium IV. Nevertheless, the system performed reasonablywell even if standard dictionaries were used.

On the other hand, Angeli et al. (2008) formulated thisas a loop closure detection problem and solved it via aBayesian framework which involved the creation of a vir-tual image, using a number of most frequently seen words

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 5: A pure vision-based topological SLAM system

406 The International Journal of Robotics Research 31(4)

(scale-invariant feature transform [SIFT] features (Lowe2004) and local colour histograms), to represent the appear-ance model of all unmapped locations. This method wasreported as being fast and incremental as the constructionof the visual dictionary (standard tree structure) was per-formed as an online process as new images were acquiredby the system. Kawewong et al. (2011) extended this towork with their proposed position-invariant robust fea-tures (PiRF) using the probabilistic framework described inAngeli et al. (2008). There are other notable works in gen-eral topological SLAM systems such as those of Choset andNagatani (2001), Folkesson and Christensen (2007), Tapusand Siegwart (2008), and Tully et al. (2009). Nevertheless,they are not strictly appearance-based approaches but maybe applied with necessary modifications.

3. System overview

As illustrated in Figure 1, the research platform used inthis work is powered by a wheelchair motor/gear-set. Itis equipped with a unique variable single/multi-baselineomnidirectional stereovision system, which we refer to asthe Eye-Full Tower, a Bumblebee stereo range camera(Point Grey 2010), a Logitech web camera and two lap-tops. Prior work that is crucial towards the understandingof the material presented in this paper is summarized in thefollowing sections.

3.1. The Eye-Full Tower

The Eye-Full Tower features a vertical variable baseline thatis adjustable in the range of 30 cm to 65 cm, monitored by asonar range sensor. Two omnidirectional catadioptric visionsystems can be found on the Eye-Full Tower, each made upof a Canon Powershot S3 IS camera, with a maximum res-olution of 6MP and a 320 × 240 live video stream at 30Hz, pointing upwards to an equiangular mirror designed byChahl and Srinivasan (1997). The device features a verticalbaseline due to several factors as highlighted in Ollis et al.(1999): (a) the resolution of both catadioptric vision sys-tems fall off at an approximately equal rate, (b) the angularfield of view matches and (c) easing the search of corre-sponding epipolar lines. In addition, the catadioptric visionsystems were intentionally designed to have the camerapointing upwards to the mirror in order to cover more ofthe ground area. This configuration maintains the overlap-ping of crucial regions surrounding the mobile robot evenwhen a wide/long baseline is used. The missing parts, due tothis wider/longer separation between the cameras, are tallerobjects such as tree tops and higher levels of buildings,which do not concern most ground mobile robots.

The Eye-Full Tower is the primary range estimation sen-sor used by the mobile robot where stereo correspondenceswere established using a local-area based approach and

Fig. 1. The mobile research platform.

depth was estimated via triangulation (also utilizing the pro-posed camera calibration technique specifically for omni-directional catadioptrics with equiangular mirrors). Bothof these processes were offloaded to the graphics process-ing unit (GPU) using the Nvidia CUDA libraries (NvidiaCUDA Zone 2008), providing an option to use this as areal-time sensor. An automatic baseline selection techniquewas proposed to select baseline(s) for both the single/multi-baseline versions of the system where multiple stereo pairscould be fused together via a voxel voting approach. Thebaseline(s) was/were selected according to the histogram ofdisparity distribution and was/were adjusted to the environ-ment. Figures 2(a) and 2(b) show the results from a sin-gle stereo pair in an indoor environment. For more details,please refer to Lui and Jarvis (2010b).

3.2. Visual odometry

A three-degree-of-freedom (3DoF) visual odometry systemis available on the mobile robot. Motion estimation wasachieved by combining the distance travelled estimated bya ground plane optical flow tracking system (Fernandez andPrice 2004), with its bearing estimated by the panoramicvisual compass system (Labrosse 2007; Scaramuzza andSiegwart 2008). Modifications were made to the originalalgorithms (Fernandez and Price 2004; Labrosse 2007) todeal with observed issues such as the vibration induced bythe motion of the robot, computational efficiency and easeof experimentation. Figure 3 shows the estimated trajectorywhich combined the bearing measured by the panoramic

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 6: A pure vision-based topological SLAM system

Lui and Jarvis 407

(a) Voxelated Environment1 (b) Voxelated Environment2

(c) Segmenting Environment 1 (d)Segmenting Environment 2 (e) Grid Map 1 (f) Grid Map 2

(g) Unwarped Image 1

(h) Unwarped Image 2

Fig. 2. Segmenting the 2D local grid map (indoor environment).

visual compass and the distance travelled measured by theodometry system on an ActivMedia Pioneer P3-AT (MobileRobots 2010) mobile robot. The complete visual odom-etry system (with the distance travelled estimated by theground plane optical flow system) was tested more rigor-ously in an indoor and semi-outdoor environment using themobile platform in Figure 1, reporting an average positionaldrift of 5.64% of the total distance travelled in the semi-outdoor environment. For more details, please refer to (Luiand Jarvis 2010c).

3.3. Reactive obstacle avoidance

Lui and Jarvis (2010a) briefly mentioned that a secondarysource of pseudo-range estimation is available in the formof disparity maps, returned by a Bumblebee stereo rangecamera system, that were used by a low-level reactive obsta-cle avoidance system. This system was made available as asafety measure to steer the mobile robot away from obsta-cles which the primary sensor had failed to detect. The two

sensors complement one another due to the vertical stereosetup for the primary sensor and the horizontal stereo setupfor this secondary sensor, where the former is more sensi-tive towards horizontal features whereas the latter is betterfor vertical features.

This system determines whether the mobile robot shouldavoid an obstacle based on the compressed 1D disparitymap, where each element of this 1D map is the maximumdisparity value of each column in the original 2D dispar-ity map. Using this 1D disparity map, the system coulddecide whether it should head straight forward, steer leftor right, depending on the direction perceived to have lessobstruction.

3.4. Complete loop closing system

As described in Lui and Jarvis (2010a), besides loop closuredetection, a complete loop closing system should exhibit thefollowing properties: (a) actively search for loop closures

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 7: A pure vision-based topological SLAM system

408 The International Journal of Robotics Research 31(4)

Fig. 3. Testing the panoramic visual compass system in an out-door environment: ground truth (white) and estimated trajectory(yellow).

instead of waiting for loop closures to happen by chance, (b)there should be ways of measuring the validity of a detectedloop closure, (c) validate an ambiguous loop closure eventby gathering new evidence in the future if past and presentinformation is not capable of resolving its ambiguity and(d) there should be some way of undoing the modificationsmade to the map for a system maintaining a single hypothe-sis or for the case of a multiple hypotheses system (Tomatiset al. 2002; Beevers and Huang 2005; Tully et al. 2009),pruning down the number of hypotheses it has to maintainbased on the validation results.

A complete loop closing system was proposed to be madeup of the following modules: (a) a loop closure detectionmodule with an active loop closing strategy, (b) a loop clo-sure validation module and (c) system restoration. Loopclosure detection was achieved by using a place recognitionsystem, that is described later in this paper. To perform loopclosing, it actively searches for the selected target loop clos-ing location, using the proposed active loop closing strat-egy, before the accumulated positional drift in the mobilerobot’s estimated location becomes too large for it to makewell-informed decisions. When a loop closure was detected,past and present information (e.g. number of SURF matches(Bay et al. 2006), minimum Bayes risk (Radke et al. 2005),location of matching node) were used to validate it. If theloop closure was found to be ambiguous, the mobile robotproactively gathers new evidence to validate the loop clo-sure based on its local context consistency (trying to matchto the neighbourhood surrounding the matching node inn number of iterations). If the validation process failed,system restoration was performed in order to reverse the

decision of loop closure to no loop closure in that partic-ular time step, since modifications were made to the mapand the mobile robot’s pose when it commits to a detectedloop closure. To achieve the active loop closing strategyand loop closure validation module, this system was tightlyintegrated with the path planner.

For path planning purposes, the 3D voxelated environ-ment produced by the Eye-Full Tower would be com-pressed into a 2D local grid map, where frontier headings(Yamauchi 1997) were extracted to provide the system withtraversable directions (Figure 2). The path planner wouldthen select a frontier as its goal location depending onwhether it is in the pure exploration, exploration and loopclosing or loop closure validation mode. For example, if thesystem is in the pure exploration mode, it will avoid select-ing a goal location that would direct it back to an existingnode in its evolving topological map since the objectivehere is to explore and find new locations in the environ-ment. Paths were planned to the frontier headings using the2D Distance Transform (Jarvis 1985).

The topological map built by the system is used both forspecifying start/end points and for waypoints. If no fron-tier headings were valid for exploration at the current loca-tion of the mobile robot, it would proceed by searching forexisting nodes in its evolving topological map registeredwith valid frontier headings that have yet to be explored.Under such circumstances, the system would backtrack tothe selected node in the topological map using it for way-points. The selected node to backtrack to would be the endpoint of the backtracking process and the starting point ofthe exploration process. The nodal propagation technique(Jarvis 1992) was used to plan a path back to the selectednode. The use of the topological map for specifying way-points is illustrated more clearly in the semi-autonomousnavigation mode (Section 5) where the objective of the sys-tem is to navigate to an existing node in a topological mapprovided to the system.

For some SLAM systems, the proposed system restora-tion may not be directly applicable without modification,since it is possible for systems to retain all informationor to maintain multiple hypotheses with the potential toretrospectively remove invalidated loop closures. For moredetails on the proposed active loop closing strategy, loopclosure validation and system restoration modules, pleaserefer to Lui and Jarvis (2010a).

3.5. Topological SLAM

The mobile robot starts off without a priori knowledge ofthe environment in the pure exploration mode. It wouldbuild a topological map and establish the spatial relation-ship between nodes in the map by relying on the 3DoFmotion estimate provided by the visual odometry system.Each node in the evolving topological map contains infor-mation of its global 2D location, global heading of the

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 8: A pure vision-based topological SLAM system

Lui and Jarvis 409

omnidirectional image registered to the node and an esti-mation variance (initialized with variance of previous nodeplus 6% of distance travelled). The initial position of themobile robot was assumed to be at the origin of the globalcoordinate system with its current heading initialized as 0◦

and a variance of 0. An appearance model would be asso-ciated with each node in the map and a global relaxationalgorithm (Duckett et al. 2000) was employed in order tomaintain its global consistency.

The mobile robot would switch from the pure explorationto the exploration and loop closing mode after travellingfor a certain distance since the last deliberated loop closureevent (the active loop closing strategy selects a target loopclosure location and the mobile robot has to be able to local-ize itself to the node or at least to its neighbours). Relativemetric information was recovered via a RANSAC proce-dure detailed in (Lui and Jarvis 2010c) when a loop closurewas detected, providing the system with the best transfor-mation matrix. The parameters of the transformation matrixwere calculated using a closed form solution to the leastsquares estimation of two 3D point patterns (Umeyama1991) with the two 3D point patterns generated using SURFcorrespondences (Bay et al. 2006) and 3D information fromthe Eye-Full Tower. The relative orientation was recov-ered using the difference between locations of SURF corre-spondences to yield more reliable estimates. In the follow-ing section, the place recognition system, responsible fordetecting loop closures, will be thoroughly described.

4. Place recognition system

The place recognition system utilizes image signatures,created by decomposing unwarped omnidirectional imagesusing the standard 2D Haar decomposition technique, asthe appearance model for each node/location in the evolv-ing topological map. It was chosen due to its simplic-ity, robustness, scalability and small memory requirements.The unwarped omnidirectional images were first downsam-pled to 512 × 128 before they were converted into theYIQ colour space. Since the Haar wavelets are not rotationinvariant, the unwarped omnidirectional image would becolumn-wise shifted every 10◦ equivalent in pixels before itwas decomposed, quantized, with the image signature, cre-ated from coefficients in a bounding box of size 56 × 14from the origin, stored into the database. When the sys-tem is presented with a new omnidirectional image, theimage is first converted into an image signature using theabove-mentioned processes and then compared with thosein the database using the following image query metric asproposed by Jacobs et al. (1995),

score = w0|Q[0, 0] − T[0, 0]|−∑x,y:Q(x,y)�=0

wbin(x,y)( Q[x, y] == T[x, y]) (1)

where Q is the query image, T is the target image, w arethe weights determined using a set of independent training

images via the logistic regression model, x and y representthe location of the coefficient in the decomposed imagewith the overall average intensity of the channel stored inthe location x = y = 0 and wbin(x,y) is the weight whichcorresponds to the bin number derived by the followingexpression:

bin( x, y) = min( max( x, y) , 5) . (2)

These bins group the coefficients depending on itslocation in the decomposed image and its correspond-ing weights bias the final similarity score based on thelocation of matching coefficients. The Haar wavelets wereexperimentally validated to be robust for use in indoor,semi-outdoor and outdoor environments. For more infor-mation, please refer to Lui and Jarvis (2010c). The follow-ing describes the two proposed frameworks for the placerecognition system.

4.1. Rank-based framework

The rank-based framework operates directly on theweighted scores computed using Equation (1). An imagesignature, I1, in the database is considered more similar tothe current input query image signature, Q, than anotherimage signature, I2, in the database if the returned weightedscore for the I1-Q pair is smaller than the weighted scorefor the I2-Q pair (I1-Q < I2-Q). This framework basicallyranks the weighted scores in ascending order, where eachweighted score is the comparison of the input query imagesignature with an image signature in the database, andselects the top S matches that are smaller than a threshold,TR. In addition, for a match to qualify as a probable can-didate, the location of the matching node has to be withinthe boundary of the circle centred in the current position ofthe mobile robot with a radius defined by the variance of itscurrent estimated position. Each candidate, which satisfiesthese conditions, is also associated with a pair of omnidi-rectional images taken at the point in time when the node iscreated in the topological map. Using the unwarped bottomomnidirectional image of the candidate and the unwarpedbottom omnidirectional image taken at the current locationof the mobile robot, image correspondences are establishedby matching SURF features (Bay et al. 2006). Only candi-dates that contain more than TS number of similar SURFfeatures to the current unwarped bottom omnidirectionalimage are retained in the pool of final candidates. The rank-based framework is summarized into a flowchart shown inFigure 4.

4.2. Probabilistic framework

The proposed probabilistic framework was developed basedon the incremental loop closing detection system describedby Angeli et al. (2008), which employs the discrete Bayesfilter. It was originally proposed to maintain temporalcoherency and reduce transient detection errors which give

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 9: A pure vision-based topological SLAM system

410 The International Journal of Robotics Research 31(4)

Fig. 4. Flow chart of rank-based framework.

rise to false-positive detections. However, several modifi-cations were made due to some significant differences insystem characteristics. First, Haar coefficients of the omni-directional images were used to discriminate between theimages instead of the bag-of-words approach using perspec-tive cameras. Then, omnidirectional images were capturedintermittently (i.e. after the mobile robot has travelled aconsiderable amount of distance from the previous node)and only a single omnidirectional image is associated witha node in the topological map instead of using a contin-uous stream of video images. Moreover, the topologicalrelationship between these images were maintained in abidirectional graph structure.

The problem is defined as follows. Given a set of nodes,ns = 0, 1, . . . , t, each associated with an omnidirectionalimage in the sequence I0, I1, . . . , It, compute the proba-bility of the mobile robot being in the vicinity of a pre-viously explored node and the probability of it being atan unmapped/yet to explore location. The variable St, isused to describe the hypothesis that at time t, the mobilerobot is at a previously visited node if St = j, where jdenotes the index of an existing node in the topologicalmap, or a new location if j = −1. In the Bayesian frame-work, this is equivalent to searching for the past imageIk where the index k is derived by using the followingexpression,

k = arg maxj=−1,...,t−e

p( St = j|I t) (3)

where I t = I0, I1, . . . , It. Subsequently, by using Bayes ruleand the Markov assumption, the full posterior is decom-posed into

p( St|I t) = ηp( It|St) p( St|I t−1) (4)

where η is the normalization factor and p( It|St) is thelikelihood L( St|It) of St given image It. By marginalizing

the belief distribution, p( St|I t−1), it can then be expressedas

p( St|I t−1) =t−e∑

k=−1

p( St|St−1 = k)︸ ︷︷ ︸state transition

p( St−1 = k|I t−1)︸ ︷︷ ︸prior posterior

(5)

where e is the number of most recently visited nodes to beexcluded and the prior initialized to p( St−1 = −1|I t−1) = 1.This decomposition is simply elegant because the state tran-sition model can be used to maintain temporal coherencyand reduce transient errors due to perceptual aliasing andthe prior posterior is readily available from the previoustime step. The following describes the new state transi-tion model and the likelihood-like function in a scoringscheme. These modules were redefined, resulting in a gen-eral set of expressions such that the proposed frameworkcan be easily adapted to various systems with differentcharacteristics (i.e. convergence speed, image similaritymetrics).

4.2.1. State transition model The state transition probabil-ities were modelled according to our system’s characteris-tics which were highlighted previously. The state transitionprobabilities can be described as follows (note that the valueof e is ignored until t ≥ e since it is not possible to excludemore nodes than what is available):

• p( St = −1|St−1 = −1) = 2.0t−e+2.0 if the system believes

it was at a new location in the previous time step (SI =−1) or 1.5

t−e+1.5 otherwise: describes the probability thatthe mobile robot is at a new location at time t, given thatit was previously at a new location at time t − 1.

• p( St = j|St−1 = −1) = 1−p(St=−1|St−1=−1)t−e+1.0 , j ∈ [0, t − e]:

describes the probability that the mobile robot is cur-rently at node j, given that it was previously at a newlocation at time t-1.

• p( St = −1|St−1 = k) = 1 − Nk+1Nk+2 if the system

believes it was at new location in the previous time step(SI = −1) or 0.2 otherwise, k ∈ [0, t − e]: describes theprobability that the mobile robot is currently at a newlocation, given that it was previously at node k at timet − 1.

• p( St = j|St−1 = k), where j ∈ [0, t−e] and k ∈ [0, t−e]:describes the probability that the mobile robot is cur-rently at node j, given that it was previously at node k attime t-1 can be expressed as

1.6Nk+2.0 : SI > −1, j = k (6)

2Nk+2.0(Nk+2.0)2 : SI = −1, j = k (7)

0.8− 1.6Nk+2.0

Nk: SI > −1, j ∈ neighbours of k (8)

0.8− 2Nk+2.0

(Nk+2.0)2

Nk: SI = −1, j ∈ neighbours of k. (9)

Here Nk represents the total number of neighbours of nodek, which can be found using the topological map, and SI is

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 10: A pure vision-based topological SLAM system

Lui and Jarvis 411

Fig. 5. Probability of p( St = j|St−1 = −1) as total nodesincreases.

the selected index (largest probability) from the prior pos-terior, p( St−1 = k|I t−1), which represents the actual stateof the system being at an unexplored location if SI = −1or previously visited location if SI > −1 in the previoustime step. The derivation of these equations is provided inAppendix A.

4.2.2. Likelihood-like function in a scoring scheme Thetotal time it takes the original likelihood voting scheme in(Angeli et al. 2008) to converge, when applied directly toour system, is considerably slow due to the way that imageswere captured intermittently. This issue becomes more sig-nificant as the total number of nodes increases in the topo-logical map, since the probability of the mobile robot to beat a previously visited location, given that it was at a newlocation in the previous time step, p( St = j|St−1 = −1)(refer to the state transition model), becomes smaller (asshown in Figure 5, assuming that A = 2.0 (defined inAppendix A)). This implies that the mobile robot has toproceed with navigation and remain around the vicinity of apreviously explored area for a longer period of time beforeit builds up its belief and transits from no loop closure toloop closure. As such, necessary modifications were madeand detailed in the following sections.

Prior to the description of the likelihood-like function,it is important to revisit the image querying metric usedto derive the score between a query and target image pair.As described in Section 4.2, a matching score is producedbetween a query and target image pair, i, by searching formatching coefficients in each channel. Depending on thelocation of the matching coefficients, it will contribute avalue equivalent to the weight associated to the location(different weights for different bins) to the final score.The weight for each bin in each channel is tuned using alogistic regression (logit) model on a set of independenttraining data as proposed by Jacobs et al. (1995). In the

logit model, a regression model of the following form isassumed:

r∗i = v +

∑�

w�q�,i + ui (10)

where q�,i is the sum of terms of Equation (1) in bin �,v and w� are the unknowns to be found, ui is an errorterm to make the equality hold and r∗

i is called a ‘latentvariable’, which is not observed directly. Observed insteadis a dummy variable ri defined by

ri ={

1 if r∗i > 0,

0 otherwise.(11)

In the set of independent training data, we make ri = 1 ifthe query–target pair, i, should be a match and ri = 0 whenit is not a match. Under the assumptions of the logit model,the probability that the query–target pair is indeed a matchis given by

pi = Sigmoid

(v +

∑�

w�q�,i + ui

)(12)

where

Sigmoid (α) = eα

1 + eα. (13)

The weights, w, are estimated using a maximum likeli-hood estimation method. Since Sigmoid( α) is a monotonicfunction and v is a constant for all query–target pairs, itwould be sufficient to only compute

∑� w�q�,i to provide a

measure of how probable the query–target pair, i, is indeed amatch based on the coefficients of the decomposed images.This is also the underlying expression used to arrive at thefinal image querying metric described in Equation (1).

The likelihood-like function, L∗( St|It), is proposed toreplace the true likelihood function, L( St|It). L∗( St|It) iscomputed by finding a subset Ht ⊆ I t−e of images whosescores, Di (i ∈ [−1, t − e])), are smaller than the thresholdcomputed using the mean of all scores, μa, minus its stan-dard deviation, σ (smaller Haar scores representing a bettermatch). The scores, Di (i ∈ [0, t − e])), are calculated usingEquation (1) whose weights are estimated based on the logitmodel as described previously. At the same time, the meanof all inliers, μin, which represents the average score ofthose larger than the threshold, is computed. Subsequently,L∗( St|It) is expressed as

L∗ (St|It) ={

(Di−μin)B+μinμin

Di ≤ μa − σ ,

1.00 otherwise.(14)

Equation (14) amplifies the difference between the score ofan image with respect to the mean score of all inliers byvarying the power factor B, if Di ≤ μa−σ is true. The valueof the power factor B is modelled according to the proba-bility of the state transition, p( St = j|St−1 = −1), whose

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 11: A pure vision-based topological SLAM system

412 The International Journal of Robotics Research 31(4)

value depends on the total number of nodes in the topolog-ical map. To estimate the power factor B, two parameters Fand G are introduced and expressed as follows:

G = FBp( St = j|St−1 = −1) . (15)

The parameter F is chosen by the system developer todetermine what is deemed as a discriminative score (e.g.Di − μin ≥ F, then Di is a good score) and the param-eter G (also chosen by the system developer) in Equation(15) roughly determines the value that each discrete loca-tion represented by the prior posterior distribution would bemultiplied with if Di − μin = F. These two parameters willbe discussed in more detail in Section 4.2.3. Given Equation(15), B can be expressed as

B = log(G/p (St = j|St−1 = −1))

log( F)− offset (16)

where offset is basically an offset of the power factor Bwhen a loop closure was detected in the previous timestep. If the loop closure detected was indeed a true event,there is a higher probability for new locations in the vicin-ity exhibiting similar appearance. Owing to the previouslydetected loop closure, there is also a higher probability forthe system to be matching to an existing node in the map(e.g. possibly neighbour of the previously detected loopclosure) given sufficient similarity in the location’s visualappearance. As such, this offset ensures that false positivesare reduced for such scenarios and a discriminative match-ing score is required for the system to remain in the state ofloop closure detected from the previous to the current timestep. The offset is currently defined as

offset ={

0.3 j = −1 and SI > −1,

0 otherwise.(17)

In Equation (15), F defines what is deemed as a discrim-inative score for a matching node and the combination ofF and G affects the convergence speed of this framework.In Angeli et al. (2008), a virtual image, I−1, is created andmaintained such that it is statistically more likely for thisvirtual image to match the incoming query image if themobile robot is currently at an unexplored location. How-ever, for our system, the score of this virtual image is calcu-lated by subtracting the mean, μa, with H times the standarddeviation, σ . As such, L∗ (St = −1|It) is expressed as

L∗ (St = −1|It) = μa − Hσ . (18)

Finally, the full posterior, p(St|I t

), is computed and sub-

sequently normalized. Similar to the rank-based framework,the probabilistic framework ensures the matching node tobe within the boundary of the circle centred in the currentposition of the mobile robot, with a radius defined by thevariance measure and the total number of SURF correspon-dences is above the threshold TS , if the matching node cor-responds to a previously visited location. The probabilisticframework can be summarized into the flowchart illustratedin Figure 6.

Fig. 6. Flow chart of probabilistic framework.

4.2.3. Testing the refined probabilistic framework The fol-lowing test aims to provide additional insight as to how theprobabilistic framework might perform under some simplesynthetic data. The test is devised as follows:

1. Assume that there are t number of nodes in the topolog-ical map, where node t − 2 is connected to node t − 1and node t − 1 is connected to node t in sequence.

2. Initialize the prior posterior, p( St−1 = k|I t−1), suchthat p( St−1 = −1|I t−1) = 0.75, with the rest equallydistributed.

3. Manually create a set of scores for each node (except forthe virtual node), such that the score for node t is Di=t,with the rest of the nodes having a score of −190. Thiscan be conceived in a scenario where the mobile robot isstationary at the same location and continuously takingthe same image.

4. Keep feeding this set of scores to the probabilisticframework until it converges or until some large numberof iterations are performed.

5. Repeat steps (1) to (4) with t = t +1 until t = 100, withthe initial value of t = 3.

This procedure was repeated with different values forthe parameters F, G and H and the test results are illus-trated in Figures 7–9. Figure 7(a) clearly shows how theparameter F could be used to offer greater flexibility forthe developer to determine what is deemed as a discrimina-tive matching score. For example, given F = 5, G = 15 andDi=t = 200 (resulting in Di=t−μin ≈ 10 as the total numberof nodes increases), the system converged in two or threeiterations for any number of nodes in the map. On the otherhand, when F = 15 and the values of G and H remained

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 12: A pure vision-based topological SLAM system

Lui and Jarvis 413

(a) F Parameter (b) G Parameter

Fig. 7. Analyzing the effects of the F and G parameters (H fixed at 1.5).

(a) F = 5, G =15 (b) F =10, G =15

(c) F =15, G =15

Fig. 8. Convergence with different values of F when G = 15 and H = 1.5.

unchanged, the system failed to converge when the totalnumber of nodes in the map is less than 86. In reality, giventhat F = 15, G = 15, H = 1.5 and Di=t − μin is approxi-mately 10.0 consistently, the system is unlikely to convergeeven when the total number of nodes in the map is greaterthan 86, given the number of iterations it requires beforeconverging and taking into account the effect of perceptualaliasing as the size of the database increases.

The test allows several conclusions to be drawn,

1. The speed of convergence increases as the total numberof nodes in the topological map increases. This conclu-sion sounded counter-intuitive at first. However, after

taking into account of the higher likelihood of percep-tual aliasing as the map size increases, this was found tobe a desirable trait.

2. The system would not converge even if it was con-tinuously fed with the same set of scores if Di=t −μin was deemed indiscriminative given some valuesfor the parameters F, G and H . For example, whenF = 15, G = 15, H = 1.5 and Di=t=200 (as illus-trated in the test results), the system did not convergesince this was not a discriminative score for the sys-tem to believe it was indeed revisiting a location in themap. As discussed earlier, even when there are morethan 86 nodes in the map, the system is unlikely to

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 13: A pure vision-based topological SLAM system

414 The International Journal of Robotics Research 31(4)

(a) F =5, G =15 (b) F =10, G =15

(c) F =15, G =15

Fig. 9. Convergence with different values of F when G = 15 and H = 2.0.

converge due to the higher likelihood of perceptualaliasing.

3. The speed of convergence increases as the differencebetween Di=t and μin becomes larger, given that theparameters F, G and H remained constant.

4. The speed at which the framework converges might dif-fer in real experiments depending on the path takenby the mobile robot and the severeness of perceptualaliasing.

5. The parameters F, G and H can be varied appropriatelyfor them to adapt to different image query metrics andto tune the speed of convergence of the framework fora particular system to provide reliable place recognitionresults.

4.3. The revisiting problem

Regardless of whether the rank-based or probabilisticframework was chosen, the final output should either be oneor more candidates qualifying as matching nodes, implyingthat the mobile robot was revisiting a previously exploredlocation, or in the other case where there were no qualify-ing candidates which suggests that it was at a new location.The rank-based framework could produce one or more can-didates whereas the probabilistic framework would onlyprovide one candidate. Since it is highly unlikely for themobile robot to have exactly the same pose as the matchingnode, it is important to recover the relative metric informa-tion between the current position of the mobile robot withthe location of the matching node. As described in Sec-tion 3.5, the method developed to recover the relative metric

Fig. 10. Flowchart resolving the revisiting problem.

information was briefly outlined with more details availablein Lui and Jarvis (2010c).

Since there could be one or more candidates providedby the rank-based framework, the system would select thematching node with the minimum Euclidean distance to thecurrent location of the mobile robot based on the recov-ered relative metric information. In addition, the minimumEuclidean distance has to be below a threshold, TD. Simi-larly, this could be applied to the matching candidate pro-vided by the probabilistic framework. The final selection ofa matching candidate is summarized into the flowchart inFigure 10.

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 14: A pure vision-based topological SLAM system

Lui and Jarvis 415

(a) Topological map with initial location(green), goal location (red) and shortestpath (light blue)

(b) Goal seeking scenario 1: nodeskipping (– actual path taken)

(c) Goal seeking scenario 2: possibleshorter path (– actual path taken)

Fig. 11. Example of goal seeking mode.

5. Semi-autonomous navigation

A semi-autonomous robot is one which could not achieveits designated task completely without human guidance noroperate in an uncontrolled environment but illustrates con-siderable autonomy in achieving many of the sub-objectivesrequired to complete the task. This section describes howour mobile robot performs goal seeking autonomouslygiven a coarse map of the environment where the positionof the target destination is restricted to lie within the givenmap. It is semi-autonomous because a priori information inthe form of a coarse map is provided to the system, wherethis map is built autonomously in a separate occasion by thesystem as the mobile robot was manually driven by a humanoperator. The support of a human operator in building amap, whilst not as impressive as autonomous map building,can be justified if the mobile robot is to operate for sometime in this environment since this approach is more reli-able and simpler. Once this map is loaded into the system,the human operator specifies the initial and goal locationsin the map and the mobile robot autonomously finds its wayto the goal.

As the mobile robot loads the map into its system (withits initial and goal locations specified), it plans a path to thegoal using the nodal propagation technique (Jarvis 1992).The path generated is the shortest path for our system butthis algorithm can accommodate a more comprehensivecost function which takes relevant factors into considera-tion (i.e. terrain traversability). Assume that the topologicalmap loaded into the system is as shown in Figure 11(a). Theinformation available in this topological map is identical towhat has been described in Section 3.5. For this example,the initial location of the mobile robot is specified to be

at node 1 and the goal is to reach node 9 autonomously.The system initially plans a path using the nodal propaga-tion technique and yields the shortest path (based on thetopological map) from node 1 to node 9 via nodes 2–3–4–5–6–7–8–9 in sequence. However, there are times whenthe mobile robot deviates from the planned sequence, whichmay be due to the reactive obstacle avoidance overridingthe system (Section 3.3), ambiguities with the matching ofimage signatures, drift in estimated position as a result ofsevere perceptual aliasing or a combination of them. Forexample, in Figure 11(b), the mobile robot was supposedto navigate to node 3 but instead ended up at the nodein orange. Nevertheless, this goal seeking mode is able todeal with deviations from the planned path and drives themobile robot to the next best node of the current path. Inthis case, the current path computed using nodal propaga-tion is 2–3–4–5–6–7–8–9 when the mobile robot is at theorange node. However, the system decides it should headfor node 4 instead based on the Euclidean distance of themobile robot’s current position to each node in the currentpath and the 2D local grid map generated at the currentlocation.

Although the path planned using the nodal propagationtechnique is guaranteed to return the shortest path, but it isrestricted to the path where the human operator has driventhe mobile robot in during the map building process. Nor-mally, this map would not be a dense topological map but arather coarse one implying that there may be better ways ofreaching the target node. The system does not blindly fol-low the path suggested by the nodal propagation techniquebut is always on the lookout for shortcuts which may bringthe mobile robot to the target node in a shorter time usingthe 2D local grid map. As demonstrated in Figure 11(c),

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 15: A pure vision-based topological SLAM system

416 The International Journal of Robotics Research 31(4)

the mobile robot might find that it can possibly reach node9 by taking this shortcut from node 2, deviating from theplanned path and reaching node 9 using less time and effort.Of course, this requires the mobile robot to explore newareas and there may be a risk that this might be leading themobile robot to a dead end instead. Nevertheless, there areways to prevent the mobile robot to take risky shortcuts bytaking into consideration of the effective range of the sensorand the Euclidean distance to the goal if the mobile robot isto head off in this direction.

6. Experimental results

The mobile robot was tested in both semi-outdoor and out-door environments in the fully autonomous mode. It wasalso illustrated to work in an indoor environment in Luiand Jarvis (2010c) using the rank-based framework with-out an active loop closure detection and validation system.To illustrate the system operating in an easier setting, theindoor experiment will be presented later in this section.In this case, the mobile robot was able to return to a pre-viously visited location due to the way the environmentwas structured for the experiment. In this paper, experi-ments were conducted in the fully autonomous mode withthe complete loop closing system that includes the activeloop closure detection and validation system described inSection 3.4. Both the autonomous and semi-autonomousexperiments were performed using the rank-based frame-work. It was chosen over the probabilistic framework due tothe following reasons:

1. The rank-based framework was anticipated to performworse in terms of the total number of false-positivedetection since a fixed threshold was used. As such,online experiments conducted using this framework candemonstrate the robustness of the system as a whole.

2. In order to compare the two frameworks, the samedataset has to be used for consistency. This implies thatthe experiments have to be conducted using either oneof the two frameworks.

The following defines all relevant system parameters forthe autonomous and semi-autonomous experiments:

1. TD = 1 m: Euclidean distance threshold for accept-ing a matching candidate as a valid loop closure event(Section 4.3).

2. TR = −200: Haar similarity score threshold for select-ing matching candidates for the rank-based framework(Section 4.1).

3. The parameter TS described in Section 4.1 is set at19. However, this classification is further extended dueto the use of the total number of SURF correspon-dences as an indicator for measuring the certainty ofthe loop closure event for the complete loop closingsystem described in Section 3.4. Matching candidateswhich produce a number of SURF correspondences in

the range of 0–19 are rejected, 20–35 requires more evi-dence and those having more than 35 correspondencesare directly accepted as candidates without the needto gather new evidence, provided that these candidatespass the Euclidean distance test in Figure 10.

4. The active loop closure validation strategy is iterated fora maximum of three times where n ∈ ( 1, 3) (describedin Section 3.4, with more details available in Lui andJarvis (2010a)).

5. Size of Haar image signatures: 56 × 14.6. Resolution of omnidirectional images is 640 × 480 (not

unwarped).7. Stereo correlation mask size is 31 × 31, maximum dis-

parity search range of 64 pixels, disparity step size of0.5 pixels with a baseline fixed at 30 cm unless statedotherwise.

8. The set of weights, w, determined using a set ofindependent training images via the logistic regres-sion model, for the image querying metric described inEquation (1) is defined in Table 1.

Table 1. Haar wavelet weights.

Channel Y Channel I Channel Q

Bin 0 -30.9657 -9.1906 -144.8866Bin 1 0.6369 0.4206 0.8616Bin 2 0.9599 0.8220 1.2840Bin 3 0.7929 0.4761 0.8645Bin 4 0.6542 0.5790 1.0526Bin 5 0.08416 0.0991 0.1687

(a) Semi-outdoor environment 1 (grid resolution: 60 cm ×60 cm ± 5 cm in each dimension)

(b) Outdoor Environment

Fig. 12. Plan views of stitched laser scans.

The resulting topological map for each experiment wascompared with the ground truth. Using the plan views of the

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 16: A pure vision-based topological SLAM system

Lui and Jarvis 417

experimental area created using stitched laser scans froma Riegl LMS Z420i terrestrial laser scanner, the groundtruth for each node in the topological map was recov-ered by utilizing the omnidirectional images captured dur-ing the experiments and exploiting the grid pattern onthe floor (for the semi-outdoor environment) or distinctivelandmarks (for the outdoor environment) to localize themobile robot in the environment. The plan views of thesemaps created by the terrestrial laser scanner are shown inFigure 12. For the indoor experiment, no plan view of theenvironment is available and a crude ground truth trajectoryis approximated (manually measuring it by hand) by esti-mating the location of the mobile robot in the environmentin the captured video and images.

A number of experiments were conducted, but onlyselected experiments are presented in the following sec-tions. These experiments were categorized according to thedegree of lighting variation and flatness of the terrain in theenvironment. The following are the three categories:

• Indoor environment—least lighting variation and ter-rain undulation (least violation of the flat world assump-tion).

• Semi-outdoor environment—moderate lighting varia-tion and terrain undulation.

• Outdoor environment—out of the three environments,an outdoor environment would experience the mostlighting variation and terrain undulation.

All experiments were conducted in a static or partiallystatic environment where in some cases, the mobile robotwas operating while people were moving in the environ-ment, with plants in the vicinity swaying back and forth dueto strong winds.

6.1. Autonomous indoor experiment

The indoor environment and experiment are shown inFigures 13 and 14. The mobile robot started to explore theenvironment at t = 0. At t = 12, it returned close to its orig-inal starting location (loop closing location). Some drift isnoticeable in its estimated location. At t = 13, it rotated andtravelled in the opposite direction due to forward obstruc-tion as perceived by the forward facing Bumblebee stereo-vision system (the pathway is rather narrow). This resultedit to loop close to node 11 in the topological map. Neverthe-less, it managed to turn back and close the main loop (node0 of the topological map) at t = 16. It continued to roamaround the environment and on its way detecting multipleloop closures from t = 17 to t = 21. A total of eight loopclosures took place in this experiment and none were falsepositives. The rough ground truth trajectory approximatedby hand is shown in Figure 15. A subset of matching omni-directional images which raises loop closure events are pro-vided in Figure 16. A video of this experiment is availableat http://www.youtube.com/watch?v=z077PZpPjnI

(a) (b)

Fig. 13. Indoor environment.

6.2. Autonomous semi-outdoor experiments

As described previously, a semi-outdoor environment ischaracterized as having moderate lighting variation andterrain undulation. Parts of the semi-outdoor environmentillustrated in Figure 17 were partially covered by a roof withsemi-opaque stripes, and other parts under a ceiling withfluorescent lamps that automatically switches on and off atdifferent times of the day or when it is too dark. The terrainis also not as flat as compared with the indoor environment.

6.2.1. Semi-outdoor experiment 1 The semi-outdoorexperiment is illustrated in Figure 18. The mobile robotstarted to explore the environment at t = 0. At t = 4,the mobile robot detected a loop closure at node 2 of thetopological map, which has been verified to be valid basedon ground truth (matching node 5 to node 3 according tothe node numbers of the ground truth trajectory in Figure19). Subsequently, at t = 5, it detected a loop closure eventmatching the current location of the mobile robot to node3 of the topological map (matching node 6 to node 4 of theground truth trajectory). The mobile robot tried to validatethis loop closure event but was unsuccessful, promptingsystem restoration at t = 8. Observe the position of node4 in the topological map as compared to its ground truthposition (node 6 of the ground truth trajectory). The mobilerobot lost track of its heading for a while due to perceptualaliasing but eventually recovered as it closed the loop att = 14. It proceeded with exploration and closed anothersmaller loop at t = 22. At t = 25, it created node 20 in thetopological map (node 26 of the ground truth trajectory).The position of node 20 was found to be incorrect whencompared with ground truth but the mobile robot recoveredfrom the localization error at t = 26 when it revisitednode 14 in the topological map. Finally, it proceeded withexploration and closed a bigger loop at t = 32.

The scale and rotationally corrected topological mapwas superimposed onto the plan view of the laser mapwith ground truth for comparison shown in Figure 19.A subset of matching omnidirectional image pairs whichraises loop closure events are provided in Figure 20. Thefollowing are node pairs (node numbers based on theground truth trajectory) which raise loop closure eventsin this experiment: 5–3, 15–3, 23–19, 24–18, 25–17,27–17 and 33–1. Please take note that this does not

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 17: A pure vision-based topological SLAM system

418 The International Journal of Robotics Research 31(4)

(a) t =1 (b) t =10 (c) t =12 (d) t =13

(e) t =15 (f) t =16 (g) t =17 (h) t =21

Fig. 14. Indoor experiment.

Fig. 15. Indoor experiment: ground truth comparison.

include false loop closure events that had been restoredand rectified. A video of this experiment is available athttp://www.youtube.com/watch?v=YXzYGkdjLiE

6.2.2. Semi-outdoor experiment 2 Although conducted inthe same semi-outdoor environment, this experiment isslightly different and interesting due to the low lightingconditions. As illustrated in Figure 21, the mobile robothas, again, clearly illustrated to be switching from thepure exploration to the exploration and loop closing mode,detecting its first loop closure at t = 11.

Note that the blurring in the omnidirectional imagetaken at t = 11 (Figure 22) did not deter it from matchingto the correct node. This blurring was the result of thelow lighting condition which causes the camera to beout of focus when the image was captured. The mobilerobot accumulated positional errors due to the low lightingconditions but managed to partially recover with the loop

(a) t = 16

(b) t = 21

Fig. 16. Matching omnidirectional image pairs for indoor experi-ment.

(a) (b)

Fig. 17. Semi-outdoor environment.

closures detected at t = 24 and t = 26. Comparing theresulting topological map with ground truth, it can beclearly seen that the low lighting conditions have causeda severe drift in orientation for the loop on the left.

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 18: A pure vision-based topological SLAM system

Lui and Jarvis 419

(a) t = 2 (b) t = 3 (c) t = 4 (d) t = 5

(e) t = 6 (f) t = 7 (g) t = 8 (Recovery) (h) Recovery

(i) Recovery (j) Recovery (k) t = 13 (l) t = 14

(m) t = 21 (n) t = 22 (o) t = 25 (p) t = 26

(q) t = 31 (r) t = 32

Fig. 18. Semi-outdoor experiment 1.

Nevertheless, the shape of the mobile robot’s perceivedtrajectory with ground truth is very similar and the positionof nodes in the map can be more accurate by revisiting.The scale and rotationally corrected topological map wassuperimposed onto the plan view of the map created usingstitched laser scans with ground truth for comparisonwhich can be seen in Figure 23. A subset of matchingomnidirectional image pairs which raises loop closureevents are provided in Figure 22. The following are nodepairs (node numbers based on the ground truth trajectory)

which raises loop closures events in this experiment: 12–2,25–15 and 27–1. A video of this experiment is available athttp://www.youtube.com/watch?v=KLuramqN-2Q

6.3. Autonomous outdoor experiment

This experiment was conducted in the outdoor environmentillustrated in Figures 24 and 25. This area is not coveredand contains an abundance of natural features (trees, plants,etc.). The terrain is an unpaved road surface which becomes

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 19: A pure vision-based topological SLAM system

420 The International Journal of Robotics Research 31(4)

Fig. 19. Semi-outdoor experiment 1: ground truth comparison.

soft during wet seasons. The size of the pathway is rela-tively small as compared with previous environments butwide enough for the mobile robot to traverse in. It startedexploring at t = 0, turned back at t = 7 and detected loopclosures at t = 8 and t = 13. This experiment showedthat the appearance-based localization and mapping systemcould work in an outdoor environment. Unfortunately, themobile robot platform was not designed to operate in suchsoft terrain. As a result, there were numerous times that themobile robot’s casters were stuck in the soil, which requiressome human assistance to overcome the resistance of thesoil built up on the sides of the casters as the mobile robottries to rotate.

The scale and rotationally corrected topological mapwas superimposed onto the plan view of map cre-ated using stitched laser scans with ground truth forcomparison in Figure 26. A subset of matching omni-directional image pairs which raises loop closure eventsare provided in Figure 27. The following are node pairs(node numbers based on the ground truth trajectory)which raise loop closure events in this experiment: 9–5and 11–2. A video of this experiment is available athttp://www.youtube.com/watch?v=LaFFzRYEWGQ

6.4. Semi-autonomous experiment

The following experiment was conducted in the semi-autonomous navigation mode described in Section 5. Asdescribed previously, a priori knowledge in the form of acoarse topological map, built autonomously in a separateoccasion as the mobile robot was manually driven in theenvironment, was provided to the system. Similar to thefully autonomous experiments, the semi-autonomous nav-igation mode employs the rank-based framework using rel-evant parameters specified for the autonomous experiments.The offline map (manually driven in the semi-outdoor envi-ronment in Figure 17) provided to the mobile robot in theseexperiments is illustrated in Figure 28.

(a) t = 4

(b) t = 32

Fig. 20. Matching omnidirectional image pairs for semi-outdoorexperiment 1.

(a) t = 10 (b) t = 11

(c) t = 23 (d) t = 24

(e) t = 25 (f) t = 26

Fig. 21. Semi-outdoor experiment 2.

In this experiment, the initial position of the mobile robotwas defined by the user as node 16 and its goal was to reachnode 33 in the topological map. Once the map was loadedinto the system, with the initial and goal locations defined,the nodal propagation technique was used to plan a path tothe goal based on topological map (yellow nodes at t = 0in Figure 29). This planned path is basically a rough planfor the mobile robot to reach its target destination based

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 20: A pure vision-based topological SLAM system

Lui and Jarvis 421

(a) t = 11

(b) t = 26

Fig. 22. Matching omnidirectional image pairs for semi-outdoorexperiment 2.

Fig. 23. Semi-outdoor experiment 2: ground truth comparison.

(a) (b)

Fig. 24. Outdoor environment.

on previously visited locations. The mobile robot can devi-ate from this planned path depending on whether it thinksthere is a viable shortcut or whether it has been temporar-ily overridden by the reactive obstacle avoidance system. Inthis experiment, the mobile robot navigated according to theplanned path at t = 1 and t = 2. However, at t = 2, it failedto detect the loop closure at node 18. It deviated from theplanned path and detected a loop closure at node 26. Thenew planned path based on the mobile robot’s current posi-tion is also illustrated at t = 2. Subsequently, it followedits planned path until it decided to take a detour at t = 6and detected a loop closure at t = 8, which required fur-ther validation. In the semi-autonomous mode, the mobile

(a) t = 1 (b) t = 3 (c) t = 6

(d) t = 7 (e) t = 8 (f) t = 9

(g) t = 12 (h) t = 13

Fig. 25. Outdoor experiment.

Fig. 26. Outdoor experiment: ground truth comparison.

(a) t = 8

(b) t = 13

Fig. 27. Matching omnidirectional image pairs for the outdoorexperiment.

robot is not equipped with the active loop closing valida-tion strategy since it is regularly trying to visit previouslyseen nodes when it is not taking a shortcut. The loop closuredetected at t = 8 was later invalidated at t = 11 and sys-tem restoration was performed. For the case when it takesa detour, it should not be detecting loop closures anyway

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 21: A pure vision-based topological SLAM system

422 The International Journal of Robotics Research 31(4)

Table 2. Performance of rank-based and probabilistic frameworks (G = 15, H = 2).

Ground Total Type I(a) Type I(b) Type II(b) Type II(a)Framework truth LC errors errors errors errors

Rank 133 169 43 (25.4%) 28 (16.5%) 2 (1.6%) 7 (5.3%)(TR = −190)Rank 133 134 19 (14.2%) 10 (7.5%) 2 (1.7%) 18 (13.5%)(TR = −200)Rank 133 101 6 (5.9%) 5 (4.9%) 0 (0%) 38 (28.6%)(TR = −210)Probability 133 104 8 (7.7%) 4 (3.8%) 1 (1.04%) 37 (27.8%)(F = 5)Probability 133 90 7 (7.7%) 3 (3.3%) 0 (0%) 50 (37.6%)(F = 10)Probability 133 68 3 (4.4%) 1 (1.47%) 0 (0%) 68 (51.1%)(F = 15)

Fig. 28. Offline map (ground truth (trajectory with white nodes)).

since it is in unexplored areas. It reached its target destina-tion at t = 13. A video of this experiment is available athttp://www.youtube.com/watch?v=UU_DwkmowFo

6.5. Comparison of the rank-based andprobabilistic frameworks

A total of 1 indoor experiment, 10 semi-outdoor experi-ments, 1 outdoor experiment, 3 semi-autonomous experi-ments and 2 offline experiments were conducted (with onlyselected experiments presented in this paper). The experi-ments conducted were of mixed difficulty levels with theone conducted in the indoor environment being the easiestrelative to the other environments. As a result, there wereno false positives reported and the system managed to closethe loop even when it was not equipped with the active loopclosure detection and validation system. Therefore, it is lessinformative for comparison purposes and explains the needfor more challenging experiments.

The two frameworks were then evaluated, using realexperimental data, based on the number of hypotheses that aloop closure has occurred is true according to ground truthand the total number of Type I and II errors defined as:

1. Type I(a) error—Incorrectly accepting a loop closurehypothesis, when in fact it should be rejected (falsepositive).

2. Type I(b) error—Incorrectly accepting a loop closurehypothesis, when in fact it should be rejected aftervalidation (false positive after validation).

3. Type II(a) error—Incorrectly rejecting a loop closurehypothesis, when in fact it should be accepted (falsenegative), causing it to miss a loop closing opportunity.

4. Type II(b) error—Incorrectly rejecting a loop closurehypothesis after validation, when in fact it shouldremain accepted (false negative after validation), caus-ing it to invalidate a good loop closure.

The probabilistic framework was evaluated on the samedataset using three different values for the parameter F (5,10 and 15) with G = 15 and H = 2, where the parame-ters F, G and H were thoroughly described in Section 4.2.In addition, the performance of the rank-based frameworkwas evaluated using different threshold values (TR = −190,−200 and −210), with TR = −200 being the originalthreshold used in the experiments. Based on the groundtruth locations of each image in the dataset, the actual num-ber of loop closures that were supposed to be detected bythe system is 133. This number was derived based on themobile robot’s location with respect to previously seen loca-tions (existing locations in the topological map). If its cur-rent location was found to be within a proximity of 0.5 mfrom a previously seen location according to ground truth,then a loop closure should be detected, matching the currentlocation of the mobile robot to this existing node in the map.

The results in Table 2 show that the threshold selected(TR = −200) to run the experiments using the rank-based framework has the best balance between the different

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 22: A pure vision-based topological SLAM system

Lui and Jarvis 423

error types in comparison to the other thresholds (TR =−190, −210). A lower threshold (TR = −190) increasesType I(a) errors but reduces Type II(a) errors. This is due tothe fact that it is easier to produce a match with a morelenient threshold. On the other hand, a higher threshold(TR = −210) reduces Type I(a) errors but increases TypeII(a) errors. In comparison with the probabilistic frame-work, the results in Table 2 show that the rank-based frame-work with TR = −210 seems to perform equally to theprobabilistic framework when F = 5. Although the totalerrors in each respective category were found to be highlysimilar between the two, in fact, we wish to highlight thatthe probabilistic framework with F = 5 will outperformthe rank-based framework with TR = −210 due to thefollowing reasons:

• The rank-based framework, which uses a fixed thresh-old, will consistently miss a loop closure event (TypeII(a) errors) even if the mobile robot continues to roamaround the vicinity of a previously visited area if theselected threshold is too high. It is less likely for thesystem to detect a loop closure event at neighbours ofthe target loop closure node since the selected thresholdis too high for the system to detect the initial deliber-ated loop closure at the first place (neighbours would beexpected to exhibit similar visual appearance).

• The number of Type II(a) errors reported by the prob-abilistic framework may be the same as the rank-basedframework using this dataset, but the final map qualitydiffers since the probabilistic framework would be morelikely to converge given more time to roam around thevicinity of the target loop closure location.

Therefore, it will only be fair to compare the perfor-mance of the probabilistic framework with respect to therank-based framework using a selected threshold that iscapable of producing decent maps (in this case (TR =−200)). Referring to Table 2, the rank-based frameworkwith TR = −200 only missed 13.5% of all loop closures thatit was supposed to detect (133 for all experiments). Despitereporting less Type II(a) errors, it produces more Type I(a)and Type I(b) errors as compared with the probabilisticframework. Type I errors were considered worse than TypeII errors since an incorrectly accepted loop closure damagesthe map building and localization process.

The severeness of Type II errors greatly differs depend-ing on whether it is in the context of the probabilistic orthe rank-based framework. As discussed previously, it isless severe for the former but more severe for the lattersince the latter is more likely to be required to travel agreater distance in search for loop closure. In general, TypeII errors were more tolerable than Type I errors as long asthe mobile robot managed to detect loop closures before thepositional drift in its localization becomes too unreliable forit to make well-informed decisions. Without the active val-idation strategy, the rank-based framework (TR = −200)produced a total of 19 Type I(a) errors, which is equivalent

to 14.2% of all true loop closures to be detected accordingto ground truth (134 − 19 = 115 for the rank-based frame-work). With the complete loop closing system described inSection 3.4, the system managed to reduce the total numberof Type I(a) errors to 10 after validation. These remainingerrors were categorized as Type I(b) errors. In fact, 60% ofthe remaining errors were close matches since the match-ing node was an immediate neighbour of the true nodeaccording to ground truth. Given the system’s ability torecover local metric information from its current locationwith the matching node, these close matches would causeminimum damage to the map building and localization pro-cess. Unfortunately, the validation process has incorrectlyinvalidated 1.7% of the good loop closures detected, whichwere classified as Type II(b) errors. As mentioned earlier,Type II errors in general were more tolerable than Type Ierrors.

In this comparison, the probabilistic framework wasdemonstrated to produce far less Type I(a) errors as com-pared to the rank-based framework with TR = −200, whichimplied that the active validation strategy was being usedless often. With the active validation strategy, it furtherreduced the total number of Type I(a) by approximately50%, with the remaining unrectified loop closures classifiedas Type I(b) errors. Unfortunately, the reduction in Type Ierrors comes at the price of higher Type II(a) errors, whichimplied that more effort was required for the mobile robotto detect a loop closure. The tradeoff is basically betweenan increase in the total time required to detect loop closuresfor the probabilistic framework or an increase in total oper-ational time to conduct loop closure validation for the rank-based framework. However, the probabilistic framework isstill considered to be superior to the rank-based frameworksince it does not impose a fixed threshold to discriminatebetween possible previously seen and new locations and isarguably more general and suitable for all types of environ-ment (although the fixed threshold used in the rank-basedframework with TR = −200 has been shown to work inindoor, semi-outdoor and outdoor environments). For eachexperiment conducted in the fully autonomous mode, thefinal topological map was compared with ground truth.The topological map built by the mobile robot was shownto be coherent although not highly precise as comparedwith ground truth. This reinforces the fact that autonomousnavigation in unknown environments does not necessarilyrequire the utmost precision, making it suitable for use inmany applications that are not interested in building themost accurate map possible. Nevertheless, these maps canbe further improved given sufficient time for the mobilerobot to make more observations of an ambiguous locationin the topological map.

7. Discussion

The mobile robot was tested in various environments oper-ating in the fully autonomous mode, using the rank-based

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 23: A pure vision-based topological SLAM system

424 The International Journal of Robotics Research 31(4)

(a) Map loaded (b) t = 0 (c) t = 1 (d) t = 2

(e) t = 3 (f) t = 4 (g) t = 5 (h) t = 6

(i) t = 7 (j) t = 8 (k) t = 9 (l) t = 10

(m) t = 11 (Recovery) (n) Recovery (o) Recovery (p) Recovery

(q) t = 12 (r) t = 13

Fig. 29. Semi-autonomous experiment.

framework. These experiments clearly illustrated the capa-bility of the mobile robot to perform autonomous naviga-tion in unknown environments. A topological map, whichmaintains the spatial relationship between its nodes, wasbuilt by the mobile robot as it performs navigation. Usingthe active loop closing strategy, mobile robot was ableto revisit nodes in the environment in order to maintain

the global consistency of the map and recover from posi-tional drifts in the localization of the mobile robot vialoop closing. The complete loop closing system, whichincorporates active loop closure validation, has also beenshown to be effective for reducing the number of Type 1(a)errors (defined in Section 6.5) for both the rank-based andprobabilistic frameworks.

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 24: A pure vision-based topological SLAM system

Lui and Jarvis 425

Although multiple experiments were conducted, onlya subset of these experiments was presented (properlyselected to provide a good overview of the system’s perfor-mance in different environments and conditions, with addi-tional experiments available in Appendix B). Semi-outdoorexperiment 1 demonstrated the capability of the mobilerobot to actively search for and detect loop closures, loopclosure validation, system restoration and switching fromthe pure exploration to the exploration and loop closingmode while it performs topological SLAM. Semi-outdoorexperiment 2 was interesting due to the low lighting condi-tions, where the mobile robot had demonstrated its robust-ness to match to the correct node even though the queryimage was actually out of focus. The mobile robot was fur-ther challenged to operate in an outdoor environment asillustrated in the outdoor experiment. The outdoor experi-ment was considered successful, although there were mul-tiple times that the mobile robot required human assis-tance for it to overcome the resistance of the soil buildupnext to its caster. This is more of a hardware issue andcould be resolved by using a suitable outdoor mobile robotplatform. In addition, the semi-autonomous experiment,which utilizes an offline map, showed that the mobile robotwas capable of finding its way to a target goal locationgiven that the topological map provided may not be highlyaccurate.

Although not presented in this paper, the mobile robotwas also tested with the automatic baseline selection tech-nique developed by Lui and Jarvis (2010b). The entireoperation of the system remained the same except that itautomatically determines its baseline before it commencednavigation and readjusting its baseline after every 15 m ofdistance travelled. This was tested in the semi-outdoor envi-ronment illustrated in Figure 17. The automatically selectedbaseline was consistent with the results presented in Lui andJarvis (2010b), with the baseline automatically selected at30 cm. Since all of the autonomous experiments includingthose in the same environment were conducted using a fixedbaseline of 30 cm, the experimental results that includethe use of the automatic baseline selection technique wereomitted.

To truly illustrate the benefit of using the automaticbaseline selection technique in a fully autonomous exper-iment, the mobile robot would be required to navigatein a large and diverse environment containing both clut-tered and open spaces. For example, the mobile robotshould be navigating from a cluttered indoor corridorto an open semi-outdoor/outdoor environment, whichmight require it to engage automatic doors, unlevelledground, recognize man-made pathways (to contain itto travel only along man-made pathways in wide openareas), etc. In addition, the mobile robot should alsodecide when to execute the automatic baseline selectionprocedure by detecting changes in the environment (e.g.from cluttered to open spaces or vice versa) instead ofperforming this procedure after every 15 m of distance

travelled. A video showing the automatic baseline tech-nique operating in a semi-outdoor environment is availableat http://www.youtube.com/watch?v=KxAsp8a3KlQ

The rank-based framework was compared with theprobabilistic framework, revealing the pros and cons ofboth frameworks, on the same dataset collected using therank-based framework. The probabilistic framework wasconcluded to be more superior than the rank-based frame-work although it might require more effort to detect loopclosures, depending on the parameters. It produces lessType I errors which were considered more risky than TypeII errors in general and compensated for the longer timerequired to detect loop closures by having to perform lessvalidations. In addition, the probabilistic framework wasconcluded to be a more attractive and elegant frameworkto utilize because it eliminated the requirement of a fixedthreshold, could be conveniently adapted to other systemsoperating on different appearance models and imagesimilarity metrics, and for systems requiring differentconvergence speeds.

In comparison to the current state-of-the-art system,FAB-MAP (Cummins and Newman 2008), the frameworkproposed in this article is not as probabilistically accurateas FAB-MAP. Nevertheless, our proposed framework hasthe potential to scale up reasonably well. The use of theHaar coefficients as the appearance model has been demon-strated to work reliably in an outdoor environment in Luiand Jarvis (2010c) with a database size of approximately9,500 images. Ho and Jarvis (2008) demonstrated thatthis appearance model works reliably in a semi-outdoorenvironment with a database size of 150,000 images. Anadditional attractive property of the proposed appearancemodel is its small memory requirements. The robustnessof the place recognition system can be further improvedby optimizing the weights, w, described in Section 4.2.2periodically as the system captures new images. Never-theless, more experiments have to be conducted to testthe scalability of the fully autonomous system. The initialindependent training dataset size required to estimate theweights is considerably small (very sparse locations). Thetotal time spent on estimating the weights is in the order ofseconds. In comparison, FAB-MAP requires a longer initialtraining time due to the large number of feature descriptorsdetected which are required to be clustered into distinctvisual words. Of course, FAB-MAP still works reasonablywell with standard dictionaries. Similarly, our system alsoworks well with standard weights. Lastly, the proposedframework is both simpler from an implementation andconceptual point of view but with the downside of beingprobabilistically less accurate.

There are many possible extensions that can be made tothe current system. An interesting consideration would beto find locations containing unique and viewpoint invariantlandmarks to perform loop closing. The ease and accu-racy with which an appearance-based system can relocalizeitself to this unique and viewpoint invariant landmark are

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 25: A pure vision-based topological SLAM system

426 The International Journal of Robotics Research 31(4)

expected to improve. As a result, this will make the overallsystem more robust.

In addition, the notion of ‘k-reachability’ and ‘k-connectivity’ sets in nodal graphs might be a beneficial fac-tor to consider while deciding the target loop closure node.For example, a more connected node does not only implya higher likelihood of being able to detect loop closures(e.g. due to a better position estimate) but also providingmore options for loop closure validation due to its denserconnectivity (since the loop closure validation is based onlocal context consistency). Moreover, as a node is moreconnected, it makes it more crucial to ensure that the loopclosure event is valid since it is expected to inflict greaterdamage to the map if the system commits to an incorrectloop closure due to its dense connections.

Another possible improvement is the development of aproper key frame detection system, which detects uniquekey frames in a live video stream, for the place recognitionsystem based on some measure of the image similaritybetween the current input image with a reference image.This should replace our current method which relies on thedistance from the previous node (e.g. ≥1 m). Appearance-based approaches to determine key frames based on thedetection of sufficient change in its image content havebeen studied (Sivic and Zisserman 2003; Nourani-Vataniand Pradelier 2010; Zhang et al. 2010), but not extensively.Unfortunately, the most difficult problem in the develop-ment of a key frame detection system is to devise a properframework for the performance evaluation of differentappearance-based approaches since there is no clearjustification of the correct number of key points and wherethey should be for a given environment (ground truth).Finally, in addition to balancing the mobile robot’s effortsamongst loop closing and exploration, it is definitely worth-while considering ways to include goal seeking into theequation.

8. Conclusion

The system described in this paper demonstrates a feasi-ble vision-based solution for a fully autonomous mobilerobot navigating in initially unknown environments. Thiswas achieved through the research and development ofa variable single/multiple baseline omnidirectional stereo-vision system, visual odometry system and place recog-nition system that were integrated into a completeautonomous navigation system that performs map build-ing and localization concurrently. We have validated thesystem in the fully autonomous mode in indoor, semi-outdoor and outdoor environments, demonstrated the sys-tem in a goal-oriented semi-autonomous mode, thoroughlydescribed the refined probabilistic framework and com-pared the performance of the rank-based and proba-bilistic frameworks for the Haar-based place recognitionsystem.

Funding

This work was supported by the School of Engineering, Depart-ment of Electrical and Computer Systems Engineering and theIntelligent Robotics Research Centre at Monash University.

Acknowledgments

Many thanks to Nghia Ho for providing plan views of the stitchedlaser scans for use in this work.

References

Angeli A, Filliat D, Doncieux S and Meyer J-A (2008) A fast andincremental method for loop-closure detection using bags ofvisual words. IEEE Transactions on Robotics 24: 1027–1037.

Bailey T and Durrant-Whyte H (2006) Simultaneous localisationand mapping (SLAM): Part II state of the art. IEEE Roboticsand Automation Magazine 13: 108–117.

Bay H, Tuytelaars T and Gool LV (2006) SURF: Speeded uprobust features. In Proceedings of Ninth European Conferenceon Computer Vision.

Beevers KR and Huang WH (2005) Loop closing in topologi-cal maps. In IEEE International Conference on Robotics andAutomation, Barcelona, Spain, pp. 4368–4372.

Chahl J and Srinivasan M (1997) Reflective surfaces forpanoramic imaging. Applied Optics 36: 8275–8285.

Choset H and Nagatani K (2001) Topological simultaneous local-ization and mapping (SLAM) toward exact localization with-out explicit localization. IEEE Transactions on Robotics andAutomation 17: 125–137.

Cummins M and Newman P (2008) FAB-MAP probabilisticlocalization and mapping in the space of appearance. TheInternational Journal of Robotics Research 27: 647–665.

Cummins M and Newman P (2010) Appearance-only SLAM atlarge scale with FAB-MAP 2.0. The International Journal ofRobotics Research, in press.

Duckett T, Marsland S and Shapiro J (2000) Learning globallyconsistent maps by relaxation. In IEEE International Confer-ence on Robotics and Automation, San Francisco, CA, pp.3841–3846.

Fernandez D and Price A (2004) Visual odometry for an outdoormobile robot. In IEEE Conference on Robotics, Automationand Mechatronics, Singapore, pp. 816–821.

Folkesson J and Christensen HI (2007) Graphical SLAM foroutdoor applications. Journal of Field Robotics 24: 51–70.

Ho N and Jarvis R (2008) Vision based global localisation usinga 3D environmental model created by a laser range scanner. InIEEE/RSJ International Conference on Intelligent Robots andSystems, Nice, France.

Hsieh P-C and Tung P-C (2009) A novel hybrid approach based onsub-pattern technique and whitened PCA for face recognition.Pattern Recognition 42: 978–984.

Jacobs CE, Finkelstein A and Salesin DH (1995) Fast multiresolu-tion image querying. Computer Graphics (Annual ConferenceSeries) 29: 277–286.

Jarvis R (1985) Collision-free trajectory planning using distancetransforms. Mechanical Engineering Transactions, Journal ofthe Institution of Engineers 10: 187–191.

Jarvis R (1992) Optimal pathways for road vehicle naviga-tion. In IEEE Region 10 International Conference Technology

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 26: A pure vision-based topological SLAM system

Lui and Jarvis 427

Enabling Tomorrow: Computers, Comunications, and Automa-tion Towards the 21st Century, Melbourne, Australia, pp.876–880.

Kawewong A, Tongprasit N, Tangruamsub S and Hasegawa O(2011) Online and incremental appearance-based SLAM inhighly dynamic environments. The International Journal ofRobotics Research 30: 33–55.

Kröse B, Bunschoten R, Vlassis N and Motomura Y (1999)Appearance based robot localization. In Kraetzschmar G(ed.), IJCAI-99 Workshop Adaptrive Spatial Representations ofDynamic Environments, pp. 53–58.

Labrosse F (2007) The visual compass: Performance and limita-tions of an appearance-based method. Journal of Field Robotics23: 913–941.

Lowe D (2004) Distinctive image features from scale invari-ant keypoints. International Journal of Computer Vision 60:91–110.

Lui WLD (2011) Autonomous Robot Navigation: Appear-ance Based Topological SLAM. PhD thesis, IntelligentRobotics Research Centre, Department of Electrical and Com-puter Systems Engineering, Monash University, Australia,http://arrow.monash.edu.au/hdl/1959.1/473850.

Lui WLD and Jarvis R (2010a) An active visual loop closuredetection and validation system for topological SLAM. In Aus-tralasian Conference on Robotics and Automation, Brisbane,Australia.

Lui WLD and Jarvis R (2010b) Eye-Full Tower: A GPU-based variable multibaseline omnidirectional stereovision sys-tem with automatic baseline selection for outdoor mobile robotnavigation. Robotics and Autonomous Systems 58: 747–761.

Lui WLD and Jarvis R (2010c) A pure vision-based approach totopological SLAM. In IEEE/RSJ International Conference onIntelligent Robots and Systems, Taipei, Taiwan, pp. 3784–3791.

Milford M and Wyeth G (2008) Mapping a suburb with a sin-gle camera using a biologically inspired SLAM system. IEEETransactions on Robotics 24: 1038–1053.

Mobile Robots (2010) Autonomous mobile robot cores, basesand accessories. Avaialable at http://www.mobilerobots.com/Mobile_Robots.aspx.

Nourani-Vatani N and Pradelier C (2010) Scene change detec-tion for vision-based topological mapping and localization. InIEEE/RSJ International Conference on Intelligent Robots andSystems, Taipei, Taiwan, pp. 3792–3797.

Nvidia CUDA Zone (2008) Available at: http://www.nvidia.com/.Ollis M, Herman H and Singh S (1999) Analysis and Design of

Panoramic Stereo Vision using Equi-angular Pixel Cameras.Technical Report CMU-RI-TR-99-04, The Robotics Institute,Carnegie Mellon University, Pittsburgh, PA.

Point Grey (2010) Products and services. Available at: http://www.ptgrey.com/products/stereo.asp.

Radke RJ, Andra, S., Al-Kohafi, O., and Roysam, B. (2005).Image change detection algorithms: A systematic survey. IEEETransactions on Image Processing 14: 294–307.

Rui Y and Huang TS (1999) Image retrieval: Current tech-niques, promising directions and open issues. Journal of VisualCommunication and Image Representation 10: 39–62.

Scaramuzza D and Siegwart R (2008) Appearance-guided monoc-ular omnidirectional visual odometry for outdoor ground vehi-cles. IEEE Transactions on Robotics 24: 1015–1026.

Sivic J and Zisserman A (2003) Video google: A text retrievalapproach to object matching in videos. In IEEE InternationalConference on Computer Vision, pp. 1470–1477.

Smeulders AW, Worring M, Santini S, Gupta A and Jain R(2000) Content-based image retrieval at the end of the earlyyears. IEEE Transaction on Pattern Analysis and MachineIntelligence 22: 1349–1380.

Tapus A and Siegwart R (2008) Topological SLAM, Probabilis-tic Reasoning and Decision Making in Sensory-motor Sys-tems. (Springer Tracts in Advanced Robotics, vol. 46). Berlin:Springer-Verlag, pp. 99–127.

Thrun S, Burgard W and Fox D (2005) Probabilistic Robotics.Casmbridge, MA: MIT Press.

Tomatis N, Nourbakhsh I and Siegwart R (2002) Hybrid simul-taneous localization and map building closing the loop withmulti-hypotheses tracking. In IEEE International Conferenceon Robotics and Automation, Washington, DC, pp. 2749–2754.

Tully S, Kantor G, Choset H and Werner F (2009) A multi-hypothesis topological SLAM approach for loop closing onedge-ordered graphs. In IEEE/RSJ International Conference onIntelligent Robots and Systems, St. Louis, MO, pp. 11–15.

Tungadi F, Lui WLD, Kleeman L and Jarvis R (2010) Robustonline map merging system using laser scan matching andomnidirectional vision. In IEEE/RSJ International Conferenceon Intelligent Robots and Systems, Taipei, Taiwan.

Turk M and Pentland A (1991) Eigenfaces for recognition. Jour-nal of Cognitive Neuroscience 3: 71–86.

Ulrich I and Nourbakhsh I (2000) Appearance-based place recog-nition for topological localization. In IEEE International Con-ference on Robotics and Automation, San Francisco, CA, pp.1023–1029.

Umeyama S (1991) Least-squares estimation of transformationparameters between two point patterns. IEEE Transactions onPattern Analysis and Machine Intelligence 13: 376–380.

Yamauchi B (1997) A frontier-based approach for autonomousexploration. In IEEE International Symposium on Computa-tional Intelligence in Robotics and Automation, Monterey, CA,pp. 146–151.

Zhang AM and Kleeman L (2009) Robust appearance basedvisual route following for navigation in large-scale outdoorenvironments. The International Journal of Robotics Research28: 331–356.

Zhang H, Li B and Yang D (2010) Keyframe detection forappearance-based visual SLAM. In IEEE/RSJ InternationalConference on Intelligent Robots and Systems, Taipei, Taiwan,pp. 2071–2076.

Appendix A: Deriving the state transitionmodel

The new state transition model is expressed as

p( St = −1|St−1 = −1) = PSt−1=−1nc (19)

p( St = j|St−1 = −1) = PSt−1=−1c

t − e + 1.0,j ∈ [0, t − e] (20)

p( St = −1|St−1 = k) = PSt=−1c , k ∈ [0, t − e] (21)

p( St = j|St−1 = k) ={

PSt=−1nc,m if j = k

pSt=−1nc,1 j ∈ neighbourhood of k

(22)

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from

Page 27: A pure vision-based topological SLAM system

428 The International Journal of Robotics Research 31(4)

where

PSt−1=−1nc = A

( t − e + A)

{A = 2.0 SI = −1

A = 1.5 SI > −1(23)

PSt−1=−1c = 1 − PSt−1=−1

nc (24)

PSt=−1nc =

{Nk+1Nk+2 SI = −1

0.2 SI > −1(25)

PSt=−1c = 1 − PSt=−1

nc (26)

PSt=−1nc,m = PSt=−1

nc ∗ C

Nk + C(27)

PSt=−1nc,1 = PSt=−1

nc − PSt=−1nc,m

Nk(28)

where Nk is the total number of neighbors of node k, C is aconstant set at 2.0 and SI is the selected index (largest prob-ability) of the prior posterior, p( St−1|I t−1). These expres-sions have several implications. First, p( St = −1|St−1 =−1) ≈ Ap( St = 0|St−1 = −1) given that t is sufficientlylarge (depending on the value selected for e and A) whilestill ensuring that the condition p( St ≥ −1|St−1 = −1) =1.0 remains true. Similarly, PSt=−1

nc,m = CPSt=−1nc,1 while the

condition, p( St ≥ −1|St−1 = k) = 1.0, remains true.

(a) (b)

Fig. 30. A different semi-outdoor environment.

Appendix B: Additional videos of experiments

For the autonomous experiment in the semi-outdoor envi-ronment in Figure 17, see

• http://www.youtube.com/watch?v=oe7k2TODrlE• http://www.youtube.com/watch?v=JKWDhsLfgJs• http://www.youtube.com/watch?v=4u5cBLzqitY

For the autonomous experiment in the semi-outdoor envi-ronment in Figure 30, see

• http://www.youtube.com/watch?v=mIVwhsa-5ss

A full account of all experiments is available in Lui (2011).

at FLORIDA ATLANTIC UNIV on November 21, 2014ijr.sagepub.comDownloaded from