Content Based Image Search

7/29/2019 Content Based Image Search

1/41

Content Based Image Search

Mark Desnoyer


2/41

Text Search - Bag of Words

Gnome

grass

him

caught

moonwalked

walksGnome

him

house

internet

walks

shuttle

PC

Nord


3/41

Text Search - TF-IDF


4/41

Text Search - Query


5/41

Current Image Search

On the web uses contextaround image

Words around itWords in the alt tag

Those words are treatedas a documentSame as normal textsearch

But we want pictures, nottext!

Query: Horse


6/41

Searching With Pictures

How about searching with pictures instead


7/41

Using Visual Words For Search

Use visual words paradigmwe've seen beforeCan use all the text search

machinery we already haveBut, we're searching withpictures now


8/41

The Players

O. Chum et al., Total recall: Automatic query expansionwith a generative feature model for object retrieval, in Proc.ICCV, vol. 2, 2007.

N. Snavely, S. M. Seitz, and R. Szeliski, Photo tourism:exploring photo collections in 3D, in InternationalConference on Computer Graphics and InteractiveTechniques (ACM New York, NY, USA, 2006), 835-846.

D. Nister and H. Stewenius, Scalable recognition with avocabulary tree, in Proc. CVPR, vol. 5, 2006.


9/41

SIFT Features

Succinct descriptorsScale invariantRobust to changes in lighting, viewpoint, blur etc.Therefore, normally used in this search context


10/41

Bag of Words First Step - Build a

Dictionary

Must be big to be expressive enough to differentiate objects

So, cluster SIFT features

Each cluster is a word in the dictionary

But K-means clustering 10M+ descriptors is O(NK)Hierarchical K-means (Nister)

Approximate K-means (Chum)


11/41

Vocabulary Tree

Copyright David Nistr, Henrik Stewnius

Vocabulary Tree (Nister)


12/41

Vocabulary Tree




13/41

Vocabulary Tree




14/41




15/41




16/41

Approximate Nearest Neighbour (Chum)

Most of the time in K-Means is spent doing NearestNeighbour

Nearest Neighbour can be approximated using kd-trees

O(N logK) vs. O(NK)


17/41

Another Problem - Synonyms

Visual Polysemy. Single visual word occurring on different (but locallysimilar) parts on different object categories.

Visual Synonyms. Two different visual words representing a similar part ofan object (wheel of a motorbike).


18/41

Video Google

Search for recurring objectsin a movie

Synonyms suppressed by

enforcing consistency intime

Stop list used to throw outwords that are too common


19/41

Photo Tourism Overview

Scenereconstruction

Photo Explorer

Inputphotographs

Relative camerapositions and orientations

Point cloud

Sparse correspondence

Copyright Noah Snavely


20/41

Photo Tourism Scene Recontruction

Automatically estimate

position, orientation, and focal length of cameras

3D positions of feature points

Feature detection

Pairwise

feature matching

Incrementalstructure from

motion

Correspondenceestimation

Copyright Noah Snavely


21/41

Photo Tourism

Demo
http://phototour.cs.washington.edu/applet/index.html


22/41

Photo Tourism Limitations

Matching is only performed between pairs of imagesDoes not scale to large datasets


23/41

Chum et al. Experiment

5K labeled images of sights at Oxford

1M Flickr images from popular tags (distractors)

Dictionary built from Oxford Images16M descriptions -> 1M word dictionary

Query for landmarks and calculate PR curve using differentforms of query expansion


24/41

Copyright James Philbin


25/41



26/41



27/41



28/41



29/41



30/41



31/41



32/41

Text Query Expansion

In text search, some words are similar, but they are differentin the dictionary

e.g. gray and grey

Improve results by expanding the query to include similarwords

e.g. "grey goose" -> "grey goose gray"

Similar words are found by clustering on document data

At query time, relevant clusters are found and pulled in

False positives add a lot of noise to the results


33/41

Image Query Expansion - Baseline


34/41

Transitive Closure


35/41

Average Query Expansion


36/41

Recursive Average Query Expansion


37/41

Multiple Image Resolution Expansion


38/41



39/41

Demo

http://arthur.robots.ox.ac.uk:8080/search/?id=oxc1_hertford_000011
http://arthur.robots.ox.ac.uk:8080/search/?id=oxc1_hertford_000011http://arthur.robots.ox.ac.uk:8080/search/?id=oxc1_hertford_000011


40/41

Results - PR Curves Before & After

Expansion


41/41

Results - Effect Of Distractors

My distractors: 9K images from searches like "building","cathedral", "library", "historic", "spire" etc.

Documents

Content Based Image Search