Image matching techniques for vision-based indoor navigation systems: a 3D map-based approach 1

  • View

  • Download

Embed Size (px)

Text of Image matching techniques for vision-based indoor navigation systems: a 3D map-based approach ...

  • This article was downloaded by: [Aston University]On: 29 January 2014, At: 02:48Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

    Journal of Location Based ServicesPublication details, including instructions for authors andsubscription information:

    Image matching techniques for vision-based indoor navigation systems: a 3Dmap-based approachXun Li & Jinling Wangaa School of Surveying and Geospatial Engineering, University ofNew South Wales, Sydney, AustraliaPublished online: 20 Jan 2014.

    To cite this article: Xun Li & Jinling Wang , Journal of Location Based Services (2014): Imagematching techniques for vision-based indoor navigation systems: a 3D map-based approach, Journalof Location Based Services, DOI: 10.1080/17489725.2013.837201

    To link to this article:


    Taylor & Francis makes every effort to ensure the accuracy of all the information (theContent) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

    This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at

  • Image matching techniques for vision-based indoor navigationsystems: a 3D map-based approach1

    Xun Li* and Jinling Wang

    School of Surveying and Geospatial Engineering, University of New South Wales, Sydney, Australia

    (Received 8 March 2013; final version received 13 August 2013; accepted 15 August 2013)

    In this paper, we give an overview of image matching techniques for variousvision-based navigation systems: stereo vision, structure from motion and map-based approach. Focused on map-based approach, which generally uses feature-based matching for mapping and localisation, and based on our early developedsystem, a performance analysis has been carried out and three major problemshave been identified: inadequate geo-referencing and imaging geometry formapping, vulnerability to drastic viewpoint changes and big percentage ofmismatches at positioning. We introduce multi-image matching for mapping. Byusing affine-scale-invariant feature transform for viewpoint changes, the majorimprovement takes place on the epoch with large viewpoint changes. In order todeal with mismatches that were unable to be removed by Random SampleConsensus (RANSAC), we propose new method that use cross-correlationinformation to evaluate the quality of homography model and select the properone. The conducted experiments have proved that such an approach can reducethe chances of mismatches being included by RANSAC and final positioningaccuracy can be improved.

    Keywords: indoor user localisation; map matching; navigation technologies;positioning systems; vision-based localisation

    1. Introduction

    Image matching techniques have been used in a variety of applications, such as 3D

    modelling, image stitching, motion tracking, object recognition and vision-based

    localisation. Over the past few years, many different methods have been developed, which

    can be generally classified into two groups: area-based matching [intensity-based, such as

    cross-correlation and least-squares matching (Gruen 1985), and feature-based matching

    (e.g. scale-invariant feature transform (SIFT); Lowe 2004)]. Area-based methods (Trucco

    and Verri 1998) may be comparatively more accurate, because they take into account a

    whole neighbourhood around the feature points being analysed to establish

    correspondences. Feature-based methods, on the other hand, use symbolic descriptions

    of the images that contain certain local image information to establish correspondence. No

    single algorithm, however, has been universally regarded as optimal for all applications

    since they all have their pros and cons. In this paper, we explore the performance of

    q 2014 Taylor & Francis

    *Corresponding author. Email:

    Journal of Location Based Services, 2014




    by [A

    ston U


    ity] a

    t 02:4

    8 29 J


    ry 20


  • various image matching techniques according to the specific needs of vision-based

    positioning systems for indoor navigation applications.

    The main purpose of vision-based navigation is to determine the position (and

    orientation) of the vision sensor. Then mobile vehicles (the platform carrying the vision

    sensor) motion/trajectory can be recovered. However, it is not without its limitation. Vision

    sensor canmeasure relative positionwith derivative order of 0 but senses only a 2Dprojection

    of the 3D world direct depth information is lost. Several approaches have been made to

    tackle such a problem. One way is to use stereo cameras by which the distance to a landmark

    can be directlymeasured.Anotherway is to usemonocular visionwith the integration of data

    frommultiple viewpoints (structure frommotion; SFM) or rely on the prior knowledge of the

    navigation environment (e.g. map or models). Despite the variety form of vision-based

    navigation systems, they all need the same basic but essential function to support their

    self-localisation mechanism: image matching. The way they use such function is however

    different, which in terms affect their choice of image matching methods.

    For stereo vision-based approaches, stereo matching is employed to create a depth map

    (i.e. disparity map) for navigation. Area-based algorithms solve the stereo correspondence

    problem for every single pixel in the image. Therefore, these algorithms result in dense

    depth maps as the depth is known for each pixel (Kuhl 2004). Typical methods include

    Census (Zabih and Woodfill 1994), sum of absolute differences and sum of squared

    differences. The common drawback is that they are computationally demanding. To deal

    with the problem, some efforts have been made. Nalpantidis, Chrysostomou, and

    Gasteratos (2009) proposed a quad-camera-based system which used a custom-tailored

    correspondence algorithm to keep the computation load within reasonable limits.

    Meanwhile, feature-based methods are less error sensitive and require less work load. But

    the resulting maps will be less detailed as the depth is not calculated for every pixel (Kuhl

    2004). Therefore, how to achieve a disparity map, which is both dense and accurate, while

    the system maintains reasonable refresh rate is the cornerstone of its success and still

    remains to be an open question.

    For monocular vision sensor, the two approaches also differ from each other. For SFM,

    consecutive frames present a very small parallax and small camera displacement. Given

    the location of a feature in one frame, a common strategy for SFM is to use feature tracker

    to find its correspondence in the consecutive image frame. KanadeLucasTomasi

    tracker (Zhang et al. 2010) is widely used for small baseline matching. The methodology

    for a feature tracker to track interest points through image sequence is usually based on a

    combined use of feature- and area-based image matching methods. First interest points are

    extracted by operators from the first image (e.g. Forstner 1986; Harris and Stephens 1988).

    Then due to the very short baseline, positions of corresponding interest points in the

    second image are predicted and matched with cross-correlation, which can be further

    refined using least squares matching. Some approaches perform outlier rejection based

    either on epipolar geometry (Remondino and Ressl 2006) or Random Sample Consensus

    (RANSAC) (Fischler and Bolles 1981) for the last step (Zhang et al. 2010).

    For the map-based approach, first a map in the form of image collection (or model) is

    built by a learning step. Then, self-localisation is realised by matching the query image

    with the corresponding scene in the database/map. Whenever a match is found, the

    position information of this reference image is transferred directly to the query image and

    used as user position. A major difference between map-based approach and two previous

    methods lies in that the real-time query image might be taken at substantially different

    viewpoint, distance or different illumination conditions from the map images in the

    X. Li and J. Wang2




    by [A

    ston U


    ity] a

    t 02:4

    8 29 J


    ry 20


  • database. Besides, the two images might be taken using