Class 6: Fast Indexing and Image Retrieval - IBM · Even with an efficient implementation –128 SIFT descriptors –Euclidean distance ... Visual Recognition And Search 62 Columbia

Visual Recognition And Search Columbia University, Spring 2013 1

EECS 6890 – Topics in Information Processing Spring 2013, Columbia University

http://rogerioferis.com/VisualRecognitionAndSearch

Class 6: Fast Indexing and Image Retrieval

Liangliang Cao, Feb 28, 2013


Recall of SIFT

SIFT Helps to Find Matched Local Patches


Recall of SIFT

A Naïve Way to Match SIFT

Code from David Lowe: http://www.cs.ubc.ca/~lowe/keypoints/


Even with an efficient implementation

– 128 SIFT descriptors

– Euclidean distance

– 10 nearest neighbor search

– 1000 queries on 1 million vectors

– 8 core CPU

Total time: 5.5 seconds --- too slow for large scale applications!

Recall of SIFT

Problem of the Naïve Approach

Numbers courtesy to Perronnin and Jégou


Can These Problems Be Solved Efficiently?

• Problem 1

• Problem 2


• Problem 1

Goal: object retrieval with local features

Application: product search, person verification

• Problem 2

Goal: object retrieval with global representation

Application: image/video search engine



Outline

• Local feature based retrieval

– Vocabulary tree

– Scoring

– Verification

• Global representation based retrieval

– Fast nearest neighbor search

– KD-tree

– Local sensitive hashing

– Supervised hashing


Local Feature Based Retrieval

Scalable Recognition with a Vocabulary

Tree

Most of the slides in this subsection are courtesy to David Nistér and Henrik Stewénius


110,000,000 Images in 5.8 Seconds


vocabulary tree is essentially

a hierarchical vector

quantizer

Vocabulary Tree


Vocabulary Tree

Vocabulary Tree


Vocabulary Tree


Vocabulary Tree


Dynamically Changing Dataset

Add Remove

Query

Vocabulary Tree


Common Approach Our approach

Dynamically Changing Dataset

Vocabulary Tree





















Inverted Index

Vocabulary Tree

Image courtesy to Kristen Grauman


Performance

Vocabulary Tree


• Hierachical vectorization

– Make it possible and efficient to use large vocabulary

– The larger vocabulary is, the better accuracy

• Inverted index

– Make it scalable to millions or billions of images

– Can be seamlessly integrated with text-based searching engines

Vocabulary Tree

Summary of Key Techniques


• Vocabulary tree can be viewed as a special bag-of-words (BOW) model and hence share the same limitations

– Information loss in quantization

– No spatial information

Vocabulary Tree

Limitations

Better scoring strategies

(unsupervised learning)

Spatial verification


Scoring Strategies

TF-IDF Weighting


Scoring Strategies

Contextual Weighting

Wang et al, Contextual Weighting for Vocabulary Tree Based Image Retrieval, ICCV’11

Spatial weighting

Weights for quantizing patches in query image

Weights for quantizing patches in the dataset


Spatial Verification

• Images are NOT text documents

• Exploring spatial information to improve precision

• Post-processing is expensive:

– Only validate top images

– (optional) Query expansion after validating

• Two models for spatial verification

– RANdom SAmple Consensus (RANSAC)

– Hough transform

Post-processing After Matching



RANSAC Algorithm


RANSAC Example

What do we do about the “bad” matches?

Example courtesy to Rick Szeliski


RANSAC Example

• The key idea is not just that there are more inliers than outliers, but that the outliers are wrong in different ways.

• Inliers share the same spatial transform (translation or affine projection).

Underlining Assumption


RANSAC Example

Select one match, count inliers


RANSAC Example

Select one match, count inliers


RANSAC Example

Find “average” translation vector


RANSAC

• Vote in image space

• Computational complexity is determined by the number of matches


RANSAC vs. Hough Transform

Hough Transform

• Vote in model space

• Computational complexity is linear to number of correspondences and also voting cells (high dimensional impractical)



Hough Transform

x

y b

m

(x1, y1)

y0=x0m + b

y1=x1m + b

Image space Model space

(x0, y0)


• What if the model has more parameter than a line?

• What if there are more than one single models?


Generalized Hough Transform


• What if the model has more parameter than a line?

voting in multi-dimensional space instead of 2D

• What if there are more than one single models?

clustering in model space


Generalized Hough Transform



Query Expansion

Chum et al, Total Recall: Automatic Query Expansion with a Generative Feature Model

for Object Retrieval, ICCV’07


• Problem 1

Goal: object retrieval with local features

Application: product search, person verification

• Problem 2

Goal: object retrieval with global representation

Application: image/video search engine



From local feature to global representation

Fast Nearest Neighbor Search



KD-Tree



KD-Tree Construction

Heuristics to make splitting decisions:

• Which dimension do we split along?

– Widest – axis with highest variance

• Which value do we split at?

– Median of value of that split dimension for the points.

• When do we stop?

– When there are fewer then m p



Search via KD-Tree

Slides courtesy to Brigham Anderson



Search via KD-Tree




Search via KD-Tree




Search via KD-Tree




Search via KD-Tree




Search via KD-Tree




Problem of KD-Tree

Example courtesy to Andrew Moore



PCA-Tree

k-d tree partitions with hyperplane

perpendicular to an axis

PCA tree partitions with hyperplane

perpendicular to the principal direction

Verma et al., UAI 2009

Exa

mp

le c

ou

rte

sy to

Ru

i W

ang



Rand Projection Tree

• RP tree: overcome the high-dimensional problem

• Approach: picks a random direction from the unit sphere and split the data at the median value

Dasgupta , Sanjoy, and Freund, STOC 2008

KD tree: RP tree:



• More efficient than PCA-tree

• More accurate than RP-tree

Approximate Principal Direction Trees

McCartin-Lim, McGregor, Wang, ICML 2012


Locality-Sensitive Hashing

• Sublinear search time for –approximate NN.

• Long hash bits (>=1k) and multiple hash tables.

0

1

0

1 0

1

Feature Vector

67

hash function

random

101 Query

[Gionis, Indyk, and Motwani 1999] [Datar et al. 2004]


Slid

e c

ourt

esy t

o W

ei Liu



Spectral Hashing

• Goal: find an embedding where Hamming distance approximates Euclidean distance

• Method: find projection as eigen vector

Y. Weiss, A. Torralba, Rob Fergus, Spectral Hashing, NIPS 2008

Relax this constraint


1. Fit a multidimensional rectangle to the data

Run PCA to align axes, then bound uniform distribution

2. For each dimension, calculate k smallest eigenfunctions.

3. This gives dk eigenfunctions. Pick ones with smallest k eigenvalues.

4. Threshold eigenfunctions at zero to give binary codes


1. Fit multidimensional rectangle - Run PCA to align axes - Bound uniform distribution

•


2 Calculate eigenfunctions


3. Pick k smallest Eigenfunctions

Eigenvalues


4. Threshold chosen Eigenfunctions



Supervised Information for Hashing

Jain, Kulis, & Grauman, CVPR 2008



Models for Supervised Hashing

Liu et al, CVPR 2012

Jain, Kulis, & Grauman, CVPR 2008

Solve by sequential optimization

Solve by information based metric learning

Li et al, NIPS 2011 Solve by ensemble tree learning


Discussions

Query 1 1 0 0

Sample 1 1 0 0

Inverted Index

Data Structure in Real Systems

Compact codes


Discussions

Query 1 1 0 0

Sample 1 1 0 0

Inverted Index

Data Structure in Real Systems

Compact codes

Fast speed

Good for searching engines

Memory efficient

Good for mobile applications

Not good for long bits codes


Discussion

• Many existing systems for general image retrieval

– Google Image, Bing Image, SnapTell, …

• Chances in domain-specific of image retrieval

– Clothes, Pets, etc

– Face verification

– Surveillance

– Geolocation

Potential Research Questions

Interesting techniques

- Spatial verification

- Metric learning

- Mobile

- …

Documents

Class 6: Fast Indexing and Image Retrieval - IBM · Even with an efficient implementation –128 SIFT descriptors –Euclidean distance ... Visual Recognition And Search 62 Columbia