Building Rome in a Day Sameer Agarwal1 Noah Snavely2 Ian Simon1 Steven M. Seitz1 Richard Szeliski3 1University of Washington 2Cornell University 3Microsoft

Building Rome in a Day

Sameer Agarwal1Noah Snavely2 Ian Simon1 Steven M. Seitz1 Richard Szeliski3

1University of Washington 2Cornell University

3Microsoft Research

Outline

• 1. Introduction• 2. System Design• 3. Result• 4. Conclusion

Introduction

• Entering the search term “Rome” on flickr returns more than two million photographs.

• 3D reconstruction• in Google Earth and Microsoft’s Virtual Earth

Exploring Photo Collection in 3D

Outline

• 1. Introduction• 2. System Design– 1.pre-processing & feature extraction– 2.matching– 3.geometric estimation

• 3. Result• 4. Conclusion

Scene reconstruction

• Automatically estimate • position, orientation, and focal length of cameras• 3D positions of feature points

Feature detectionDetect features using SIFT [Lowe, IJCV 2004]



Feature matchingMatch features between each pair of images

approximate nearest neighbor matching

Feature matchingRefine matching using RANSAC [Fischler & Bolles 1987] to estimate fundamental matrices between pairs

Structure from motion

structure for motion: automatic recovery of camera motion and scene structure from two or more images. It is a self calibration technique and called automatic camera tracking or match moving.

Unknowncameraviewpoints

Structure from motion

Camera 1

Camera 2

Camera 3R1,t1

R2,t2

R3,t3

p1

p4

p3

p2

p5

p6

p7

minimize

f (R, T, P)

rotations R, positions t, and 3D point locations P that minimize sum of squared reprojection errors f

Incremental structure from motion



Vocabulary trees (Nister & Stewenius, 2006)

• Computational efficiency• k-means tree is used to quantize the feature descriptors

TF-IDF（ term frequency–inverse document frequency）

• Consider a document containing 100 words wherein the word cow appears 3 times.

• (TF) = (3 / 100) = 0.03.

• Assume we have 10 million documents and cow appears in one thousand of these.

• (IDF) = log(10 000 000 / 1 000) = 4.

• TF-IDF score is the product of these quantities: 0.03 × 4 = 0.12

• The word is important if the TF-IDF score is large某一特定文件內的高詞語頻率，以及該詞語在整個文件集合中的低文件頻率，可以產生出高權重的 TF-IDF。因此， TF-IDF傾向於過濾掉常見的詞語，保留重要的詞語。

Query expansion • Large-scale image matching• Better approach: use bag-of-words technique

to find likely matches• For each image, find the top M scoring other

images, do detailed SIFT matching with those

Outline


Matching and reconstruction statics for the three sets

Building Rome in a Day

Rome, Italy. Reconstructed 150,000 in 21 hours on 496 machines

Colosseum

St. Peter’s Basilica

Trevi Fountain

Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).Total reconstruction time: 23 hoursNumber of cores: 352


Dubrovnik


San Marco Square

San Marco Square and environs, Venice. 14,079 photos, out of an initial 250,000. Total reconstruction time: 3 days. Number of cores: 496.

Outline


Conclusion• Our experimental results demonstrate that it is now

possible to reconstruct cities consisting of 150K images in less than a day on a cluster with 500 compute cores.

Large-scale image matching

3D modelshttp://grail.cs.washington.edu/rome/http://phototour.cs.washington.edu/applet/index.html

http://grail.cs.washington.edu/rome/

http://phototour.cs.washington.edu/applet/index.html

http://phototour.cs.washington.edu/applet/index.html

Documents

Building Rome in a Day Sameer Agarwal1 Noah Snavely2 Ian Simon1 Steven M. Seitz1 Richard Szeliski3 1University of Washington 2Cornell University 3Microsoft