Improved Geometric Veriﬁcation for Large Scale Landmark ...sunw.csail.mit.edu › 2013 › papers › Raguram_43_SUNw.pdf · sites (such as Flickr), it has been observed that invariably,

Improved Geometric Verification for Large Scale Landmark Image Collections

Rahul Raguram Joseph Tighe Jan-Michael Frahm{rraguram, jtighe, jmf}@cs.unc.edu

Department of Computer ScienceUniversity of North Carolina at Chapel Hill

Abstract

In this work, we address the issue of geometric verifi-cation, with a focus on modeling large-scale landmark im-age collections gathered from the internet. In particular, weshow that we can compute and learn descriptive statisticspertaining to the image collection by leveraging informa-tion that arises as a by-product of the matching and verifica-tion stages. Our approach is based on the intuition that val-idating numerous image pairs of the same geometric scenestructures quickly reveals useful information about two as-pects of the image collection: (a) the reliability of individ-ual visual words and (b) the appearance of landmarks inthe image collection. Both of these sources of informationcan then be used to drive any subsequent processing, thusallowing the system to bootstrap itself. The main result ofthis work is that this unsupervised “learning-as-you-go”approach significantly improves performance; our exper-iments demonstrate significant improvements in efficiencyand completeness over standard techniques.

1. IntroductionIn this work, we address the issue of geometric verifica-

tion, with a focus on modeling large-scale landmark imagecollections. In particular, we show that we can compute andlearn descriptive statistics pertaining to the image collectionby leveraging information that arises as a by-product of thematching and verification stages.

In designing a 3D reconstruction system for internetphoto collections, one of the key considerations is robust-ness to “clutter” – when operating on datasets downloadedusing keyword searches on community photo sharing web-sites (such as Flickr), it has been observed that invariably, alarge fraction of images in the collection are unsuitable forthe purposes of 3D reconstruction [4, 6]. Thus, one of thefundamental steps in a 3D reconstruction system is geomet-ric verification: the process of determining which imagesin an internet photo collection are geometrically related toeach other. This is a computationally expensive process,

and much work in recent years has focused on developingefficient ways to perform this step. For example, Agarwalet al. [1] use image retrieval techniques to determine, forevery image in the dataset, a small set of candidate imagesto match against. An alternate approach, adopted by Frahmet al. [3], is to first cluster the images based on global im-age descriptors and to then perform the verification withineach cluster. While these approaches are extremely promis-ing, there are still some limitations. For instance, even thecarefully optimized approach described in [3] spends ap-proximately 50% of the processing time simply verifyingimage pairs against each other. In addition, the approach in[3] suffers from “incompleteness”; due to the coarse clus-tering, a large fraction of images are discarded followingthe clustering and verification steps. In this work, we aimto overcome these limitations.

2. ApproachThus far, the typical way to perform geometric verifica-

tion has been to estimate the geometric relationship betweenpairs independently. In other words, given a pair of images,features are matched between the images to obtain a set ofputative correspondences, and then a robust estimation al-gorithm is used to identify a set of inliers. This processthen repeats for the next pair of images, typically ignoringthe results produced by any previous rounds of verification.Our main idea in this work is simple: as the geometric ver-ification progresses, we learn information about the imagecollection, and subsequently use this learned informationto improve efficiency and completeness. More specifically,since images of the same geometric structures are being re-peatedly verified against each other, this process of repeatedmatching reveals useful information about two things:

(a) the stability and validity of low-level image features

(b) the global appearance of the landmarks in the dataset

While current techniques either ignore this information,or leverage it for other tasks via an offline processing stage,we feed this information directly back into the verification

1

(a) (b)

Figure 1. (a) All detected features for a single image (b) Featuresfiltered based on the results of geometric verification – only vi-sual words that were inliers in at least 10 previous image pairs areshown. The features in (b) are heatmap colour coded based on theinlier counts.

pipeline. This approach, while simple, is also very effec-tive; our results demonstrate significant improvements inefficiency compared to current techniques.

2.1. Identifying useful visual words

As a motivating example, consider Figure 1(a), whichshows all detected SIFT features for a single image. Notethat a large number of features lie in areas of the image thatare very unlikely to pass any geometric consistency check(for e.g., features on vegetation, people, and in the sky).Now, if we have previously verified other images of thesame scene, we can weight each visual word in the currentimage by the number of times it has previously passed thegeometric check in other image pairs (see Figure 1(b)). Thisweighting tends to emphasize visual words that are stableand reliable, while also suppressing spurious visual words.

Consider a visual vocabulary, W , consisting of N visualwords. In addition, consider a set of visual word priori-ties, C = {c1, c2, ..., cN}, where each ci represents a scorethat is proportional to the validity of the visual word. Inthe absence of any prior information, we assign ci = 0,∀i.We then carry out pairwise geometric verification, using arobust estimator to identify a set of inliers. We then up-date the priority of the inlier visual words based on the re-sults of this process. In the simplest possible scheme, foreach feature match that was found to be an inlier, we updatea count ci for the corresponding visual word. Intuitively,over time, we expect that these counts will help identify vi-sual words that are frequently matched as inliers, as well aswords that repeatedly fail the geometric consistency check.This weighting of visual words can then be easily incor-porated into a RANSAC framework [2, 5] that biases thesampling in favour of the more reliable words. In our exper-iments, this resulted in a significant computational speedup.

2.2. Identifying landmark images

As a second contribution, we show that it is possible tolearn additional useful information capturing higher-levelinformation about the dataset. For instance, once we have

-1.39 -0.81 -0.74 -0.62 -0.59 -0.58 -0.58 -0.57 -0.57 -0.56

2.25 2.21 2.16 2.04 1.84 1.70 1.68 1.62 1.61 1.60

Highest Scoring Veri�ed Images

Lowest Scoring Veri�ed Images

Figure 2. The top and bottom ten verified images, according toclassifier scores. Our approach uses the output of geometric ver-ification to generate a training set in an online manner, withoutmanual labeling. The resulting classifier is able to reliably distin-guish between landmark and non-landmark images.

obtained a sufficiently large set of successfully verified im-age pairs, we hypothesize that this set captures useful infor-mation about the global appearance of various landmarkspresent in the dataset. This information can then be usedto train a classifier that distinguishes between landmark andnon-landmark images (see Figure 2). Once trained, we canuse this classfier in the registration stage. In other words,we first run the classifier on each image before verification.If the classifier has a positive response we continue withgeometric verification, but if the response is negative we re-ject the image immediately, thus significantly reducing theoverall compute time.

3. ConclusionIn summary, this work presents techniques for taking ad-

vantage of the information generated during geometric ver-ification, to improve the overall efficiency of the process.Our approach thus integrates online knowledge extractionseamlessly into structure-from-motion systems, and is par-ticularly relevant for large-scale image collections. Ourresults demonstrate both improved efficiency, as well ashigher image registration performance, potentially yieldingmore complete 3D models for these large-scale datasets.

References[1] S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and R. Szeliski.

Building Rome in a day. In ICCV, 2009.[2] O. Chum and J. Matas. Matching with PROSAC - progressive

sample consensus. In CVPR, 2005.[3] J.-M. Frahm, P. Fite-Georgel, D. Gallup, T. Johnson, R. Ragu-

ram, C. Wu, Y.-H. Jen, E. Dunn, B. Clipp, S. Lazebnik, andM. Pollefeys. Building rome on a cloudless day. In ECCV,volume 6314, pages 368–381, 2010.

[4] L. Kennedy, S.-F. Chang, and I. Kozintsev. To search or tolabel?: Predicting the performance of search-based automaticimage classifiers. In ACM MIR, 2006.

[5] R. Raguram, O. Chum, M. Pollefeys, J. Matas, and J.-M.Frahm. Usac: A universal framework for random sample con-sensus. PAMI, 99, 2012.

[6] R. Raguram, C. Wu, J.-M. Frahm, and S. Lazebnik. Modelingand recognition of landmark image collections using iconicscene graphs. IJCV, 95(3):213–239, 2011.

2

Documents

Improved Geometric Veriﬁcation for Large Scale Landmark ...sunw.csail.mit.edu › 2013 › papers › Raguram_43_SUNw.pdf · sites (such as Flickr), it has been observed that invariably,