Visual Recognition And Search Columbia University, Spring 2013 1
EECS 6890 – Topics in Information Processing Spring 2013, Columbia University
http://rogerioferis.com/VisualRecognitionAndSearch
Class 6: Fast Indexing and Image Retrieval
Liangliang Cao, Feb 28, 2013
Visual Recognition And Search Columbia University, Spring 2013 2
Recall of SIFT
SIFT Helps to Find Matched Local Patches
Visual Recognition And Search Columbia University, Spring 2013 3
Recall of SIFT
A Naïve Way to Match SIFT
Code from David Lowe: http://www.cs.ubc.ca/~lowe/keypoints/
Visual Recognition And Search Columbia University, Spring 2013 4
Even with an efficient implementation
– 128 SIFT descriptors
– Euclidean distance
– 10 nearest neighbor search
– 1000 queries on 1 million vectors
– 8 core CPU
Total time: 5.5 seconds --- too slow for large scale applications!
Recall of SIFT
Problem of the Naïve Approach
Numbers courtesy to Perronnin and Jégou
Visual Recognition And Search Columbia University, Spring 2013 5
Can These Problems Be Solved Efficiently?
• Problem 1
• Problem 2
Visual Recognition And Search Columbia University, Spring 2013 6
• Problem 1
Goal: object retrieval with local features
Application: product search, person verification
• Problem 2
Goal: object retrieval with global representation
Application: image/video search engine
Can These Problems Be Solved Efficiently?
Visual Recognition And Search Columbia University, Spring 2013 7
Outline
• Local feature based retrieval
– Vocabulary tree
– Scoring
– Verification
• Global representation based retrieval
– Fast nearest neighbor search
– KD-tree
– Local sensitive hashing
– Supervised hashing
Visual Recognition And Search Columbia University, Spring 2013 8
Local Feature Based Retrieval
Scalable Recognition with a Vocabulary
Tree
Most of the slides in this subsection are courtesy to David Nistér and Henrik Stewénius
Visual Recognition And Search Columbia University, Spring 2013 10
vocabulary tree is essentially
a hierarchical vector
quantizer
Vocabulary Tree
Visual Recognition And Search Columbia University, Spring 2013 14
Dynamically Changing Dataset
Add Remove
Query
Vocabulary Tree
Visual Recognition And Search Columbia University, Spring 2013 15
Common Approach Our approach
Dynamically Changing Dataset
Vocabulary Tree
Visual Recognition And Search Columbia University, Spring 2013 35
Inverted Index
Vocabulary Tree
Image courtesy to Kristen Grauman
Visual Recognition And Search Columbia University, Spring 2013 37
• Hierachical vectorization
– Make it possible and efficient to use large vocabulary
– The larger vocabulary is, the better accuracy
• Inverted index
– Make it scalable to millions or billions of images
– Can be seamlessly integrated with text-based searching engines
Vocabulary Tree
Summary of Key Techniques
Visual Recognition And Search Columbia University, Spring 2013 38
• Vocabulary tree can be viewed as a special bag-of-words (BOW) model and hence share the same limitations
– Information loss in quantization
– No spatial information
Vocabulary Tree
Limitations
Better scoring strategies
(unsupervised learning)
Spatial verification
Visual Recognition And Search Columbia University, Spring 2013 39
Scoring Strategies
TF-IDF Weighting
Visual Recognition And Search Columbia University, Spring 2013 40
Scoring Strategies
Contextual Weighting
Wang et al, Contextual Weighting for Vocabulary Tree Based Image Retrieval, ICCV’11
Spatial weighting
Weights for quantizing patches in query image
Weights for quantizing patches in the dataset
Visual Recognition And Search Columbia University, Spring 2013 41
Spatial Verification
• Images are NOT text documents
• Exploring spatial information to improve precision
• Post-processing is expensive:
– Only validate top images
– (optional) Query expansion after validating
• Two models for spatial verification
– RANdom SAmple Consensus (RANSAC)
– Hough transform
Post-processing After Matching
Visual Recognition And Search Columbia University, Spring 2013 42
Spatial Verification
RANSAC Algorithm
Visual Recognition And Search Columbia University, Spring 2013 43
RANSAC Example
What do we do about the “bad” matches?
Example courtesy to Rick Szeliski
Visual Recognition And Search Columbia University, Spring 2013 44
RANSAC Example
• The key idea is not just that there are more inliers than outliers, but that the outliers are wrong in different ways.
• Inliers share the same spatial transform (translation or affine projection).
Underlining Assumption
Visual Recognition And Search Columbia University, Spring 2013 45
RANSAC Example
Select one match, count inliers
Visual Recognition And Search Columbia University, Spring 2013 46
RANSAC Example
Select one match, count inliers
Visual Recognition And Search Columbia University, Spring 2013 47
RANSAC Example
Find “average” translation vector
Visual Recognition And Search Columbia University, Spring 2013 48
RANSAC
• Vote in image space
• Computational complexity is determined by the number of matches
Spatial Verification
RANSAC vs. Hough Transform
Hough Transform
• Vote in model space
• Computational complexity is linear to number of correspondences and also voting cells (high dimensional impractical)
Visual Recognition And Search Columbia University, Spring 2013 49
Spatial Verification
Hough Transform
x
y b
m
(x1, y1)
y0=x0m + b
y1=x1m + b
Image space Model space
(x0, y0)
Visual Recognition And Search Columbia University, Spring 2013 50
• What if the model has more parameter than a line?
• What if there are more than one single models?
Spatial Verification
Generalized Hough Transform
Visual Recognition And Search Columbia University, Spring 2013 51
• What if the model has more parameter than a line?
voting in multi-dimensional space instead of 2D
• What if there are more than one single models?
clustering in model space
Spatial Verification
Generalized Hough Transform
Visual Recognition And Search Columbia University, Spring 2013 52
Spatial Verification
Query Expansion
Chum et al, Total Recall: Automatic Query Expansion with a Generative Feature Model
for Object Retrieval, ICCV’07
Visual Recognition And Search Columbia University, Spring 2013 53
• Problem 1
Goal: object retrieval with local features
Application: product search, person verification
• Problem 2
Goal: object retrieval with global representation
Application: image/video search engine
Can These Problems Be Solved Efficiently?
Visual Recognition And Search Columbia University, Spring 2013 54
From local feature to global representation
Fast Nearest Neighbor Search
Visual Recognition And Search Columbia University, Spring 2013 55
Fast Nearest Neighbor Search
KD-Tree
Visual Recognition And Search Columbia University, Spring 2013 56
Fast Nearest Neighbor Search
KD-Tree Construction
Heuristics to make splitting decisions:
• Which dimension do we split along?
– Widest – axis with highest variance
• Which value do we split at?
– Median of value of that split dimension for the points.
• When do we stop?
– When there are fewer then m p
Visual Recognition And Search Columbia University, Spring 2013 57
Fast Nearest Neighbor Search
Search via KD-Tree
Slides courtesy to Brigham Anderson
Visual Recognition And Search Columbia University, Spring 2013 58
Fast Nearest Neighbor Search
Search via KD-Tree
Slides courtesy to Brigham Anderson
Visual Recognition And Search Columbia University, Spring 2013 59
Fast Nearest Neighbor Search
Search via KD-Tree
Slides courtesy to Brigham Anderson
Visual Recognition And Search Columbia University, Spring 2013 60
Fast Nearest Neighbor Search
Search via KD-Tree
Slides courtesy to Brigham Anderson
Visual Recognition And Search Columbia University, Spring 2013 61
Fast Nearest Neighbor Search
Search via KD-Tree
Slides courtesy to Brigham Anderson
Visual Recognition And Search Columbia University, Spring 2013 62
Fast Nearest Neighbor Search
Search via KD-Tree
Slides courtesy to Brigham Anderson
Visual Recognition And Search Columbia University, Spring 2013 63
Fast Nearest Neighbor Search
Problem of KD-Tree
Example courtesy to Andrew Moore
Visual Recognition And Search Columbia University, Spring 2013 64
Fast Nearest Neighbor Search
PCA-Tree
k-d tree partitions with hyperplane
perpendicular to an axis
PCA tree partitions with hyperplane
perpendicular to the principal direction
Verma et al., UAI 2009
Exa
mp
le c
ou
rte
sy to
Ru
i W
ang
Visual Recognition And Search Columbia University, Spring 2013 65
Fast Nearest Neighbor Search
Rand Projection Tree
• RP tree: overcome the high-dimensional problem
• Approach: picks a random direction from the unit sphere and split the data at the median value
Dasgupta , Sanjoy, and Freund, STOC 2008
KD tree: RP tree:
Visual Recognition And Search Columbia University, Spring 2013 66
Fast Nearest Neighbor Search
• More efficient than PCA-tree
• More accurate than RP-tree
Approximate Principal Direction Trees
McCartin-Lim, McGregor, Wang, ICML 2012
Visual Recognition And Search Columbia University, Spring 2013 67
Locality-Sensitive Hashing
• Sublinear search time for –approximate NN.
• Long hash bits (>=1k) and multiple hash tables.
0
1
0
1 0
1
Feature Vector
67
hash function
random
101 Query
[Gionis, Indyk, and Motwani 1999] [Datar et al. 2004]
Fast Nearest Neighbor Search
Slid
e c
ourt
esy t
o W
ei Liu
Visual Recognition And Search Columbia University, Spring 2013 68
Fast Nearest Neighbor Search
Spectral Hashing
• Goal: find an embedding where Hamming distance approximates Euclidean distance
• Method: find projection as eigen vector
Y. Weiss, A. Torralba, Rob Fergus, Spectral Hashing, NIPS 2008
Relax this constraint
Visual Recognition And Search Columbia University, Spring 2013 69
1. Fit a multidimensional rectangle to the data
Run PCA to align axes, then bound uniform distribution
2. For each dimension, calculate k smallest eigenfunctions.
3. This gives dk eigenfunctions. Pick ones with smallest k eigenvalues.
4. Threshold eigenfunctions at zero to give binary codes
Visual Recognition And Search Columbia University, Spring 2013 70
1. Fit multidimensional rectangle - Run PCA to align axes - Bound uniform distribution
•
Visual Recognition And Search Columbia University, Spring 2013 72
3. Pick k smallest Eigenfunctions
Eigenvalues
Visual Recognition And Search Columbia University, Spring 2013 73
4. Threshold chosen Eigenfunctions
Visual Recognition And Search Columbia University, Spring 2013 74
Fast Nearest Neighbor Search
Supervised Information for Hashing
Jain, Kulis, & Grauman, CVPR 2008
Visual Recognition And Search Columbia University, Spring 2013 75
Fast Nearest Neighbor Search
Models for Supervised Hashing
Liu et al, CVPR 2012
Jain, Kulis, & Grauman, CVPR 2008
Solve by sequential optimization
Solve by information based metric learning
Li et al, NIPS 2011 Solve by ensemble tree learning
Visual Recognition And Search Columbia University, Spring 2013 76
Discussions
Query 1 1 0 0
Sample 1 1 0 0
Inverted Index
Data Structure in Real Systems
Compact codes
Visual Recognition And Search Columbia University, Spring 2013 77
Discussions
Query 1 1 0 0
Sample 1 1 0 0
Inverted Index
Data Structure in Real Systems
Compact codes
Fast speed
Good for searching engines
Memory efficient
Good for mobile applications
Not good for long bits codes
Visual Recognition And Search Columbia University, Spring 2013 78
Discussion
• Many existing systems for general image retrieval
– Google Image, Bing Image, SnapTell, …
• Chances in domain-specific of image retrieval
– Clothes, Pets, etc
– Face verification
– Surveillance
– Geolocation
Potential Research Questions
Interesting techniques
- Spatial verification
- Metric learning
- Mobile
- …