78
Visual Recognition And Search Columbia University, Spring 2013 1 EECS 6890 – Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/VisualRecognitionAndSearch Class 6: Fast Indexing and Image Retrieval Liangliang Cao, Feb 28, 2013

Class 6: Fast Indexing and Image Retrieval - IBM · Even with an efficient implementation –128 SIFT descriptors –Euclidean distance ... Visual Recognition And Search 62 Columbia

Embed Size (px)

Citation preview

Visual Recognition And Search Columbia University, Spring 2013 1

EECS 6890 – Topics in Information Processing Spring 2013, Columbia University

http://rogerioferis.com/VisualRecognitionAndSearch

Class 6: Fast Indexing and Image Retrieval

Liangliang Cao, Feb 28, 2013

Visual Recognition And Search Columbia University, Spring 2013 2

Recall of SIFT

SIFT Helps to Find Matched Local Patches

Visual Recognition And Search Columbia University, Spring 2013 3

Recall of SIFT

A Naïve Way to Match SIFT

Code from David Lowe: http://www.cs.ubc.ca/~lowe/keypoints/

Visual Recognition And Search Columbia University, Spring 2013 4

Even with an efficient implementation

– 128 SIFT descriptors

– Euclidean distance

– 10 nearest neighbor search

– 1000 queries on 1 million vectors

– 8 core CPU

Total time: 5.5 seconds --- too slow for large scale applications!

Recall of SIFT

Problem of the Naïve Approach

Numbers courtesy to Perronnin and Jégou

Visual Recognition And Search Columbia University, Spring 2013 5

Can These Problems Be Solved Efficiently?

• Problem 1

• Problem 2

Visual Recognition And Search Columbia University, Spring 2013 6

• Problem 1

Goal: object retrieval with local features

Application: product search, person verification

• Problem 2

Goal: object retrieval with global representation

Application: image/video search engine

Can These Problems Be Solved Efficiently?

Visual Recognition And Search Columbia University, Spring 2013 7

Outline

• Local feature based retrieval

– Vocabulary tree

– Scoring

– Verification

• Global representation based retrieval

– Fast nearest neighbor search

– KD-tree

– Local sensitive hashing

– Supervised hashing

Visual Recognition And Search Columbia University, Spring 2013 8

Local Feature Based Retrieval

Scalable Recognition with a Vocabulary

Tree

Most of the slides in this subsection are courtesy to David Nistér and Henrik Stewénius

Visual Recognition And Search Columbia University, Spring 2013 9

110,000,000 Images in 5.8 Seconds

Visual Recognition And Search Columbia University, Spring 2013 10

vocabulary tree is essentially

a hierarchical vector

quantizer

Vocabulary Tree

Visual Recognition And Search Columbia University, Spring 2013 11

Vocabulary Tree

Vocabulary Tree

Visual Recognition And Search Columbia University, Spring 2013 12

Vocabulary Tree

Visual Recognition And Search Columbia University, Spring 2013 13

Vocabulary Tree

Visual Recognition And Search Columbia University, Spring 2013 14

Dynamically Changing Dataset

Add Remove

Query

Vocabulary Tree

Visual Recognition And Search Columbia University, Spring 2013 15

Common Approach Our approach

Dynamically Changing Dataset

Vocabulary Tree

Visual Recognition And Search Columbia University, Spring 2013 16

Visual Recognition And Search Columbia University, Spring 2013 17

Visual Recognition And Search Columbia University, Spring 2013 18

Visual Recognition And Search Columbia University, Spring 2013 19

Visual Recognition And Search Columbia University, Spring 2013 20

Visual Recognition And Search Columbia University, Spring 2013 21

Visual Recognition And Search Columbia University, Spring 2013 22

Visual Recognition And Search Columbia University, Spring 2013 23

Visual Recognition And Search Columbia University, Spring 2013 24

Visual Recognition And Search Columbia University, Spring 2013 25

Visual Recognition And Search Columbia University, Spring 2013 26

Visual Recognition And Search Columbia University, Spring 2013 27

Visual Recognition And Search Columbia University, Spring 2013 28

Visual Recognition And Search Columbia University, Spring 2013 29

Visual Recognition And Search Columbia University, Spring 2013 30

Visual Recognition And Search Columbia University, Spring 2013 31

Visual Recognition And Search Columbia University, Spring 2013 32

Visual Recognition And Search Columbia University, Spring 2013 33

Visual Recognition And Search Columbia University, Spring 2013 34

Visual Recognition And Search Columbia University, Spring 2013 35

Inverted Index

Vocabulary Tree

Image courtesy to Kristen Grauman

Visual Recognition And Search Columbia University, Spring 2013 36

Performance

Vocabulary Tree

Visual Recognition And Search Columbia University, Spring 2013 37

• Hierachical vectorization

– Make it possible and efficient to use large vocabulary

– The larger vocabulary is, the better accuracy

• Inverted index

– Make it scalable to millions or billions of images

– Can be seamlessly integrated with text-based searching engines

Vocabulary Tree

Summary of Key Techniques

Visual Recognition And Search Columbia University, Spring 2013 38

• Vocabulary tree can be viewed as a special bag-of-words (BOW) model and hence share the same limitations

– Information loss in quantization

– No spatial information

Vocabulary Tree

Limitations

Better scoring strategies

(unsupervised learning)

Spatial verification

Visual Recognition And Search Columbia University, Spring 2013 39

Scoring Strategies

TF-IDF Weighting

Visual Recognition And Search Columbia University, Spring 2013 40

Scoring Strategies

Contextual Weighting

Wang et al, Contextual Weighting for Vocabulary Tree Based Image Retrieval, ICCV’11

Spatial weighting

Weights for quantizing patches in query image

Weights for quantizing patches in the dataset

Visual Recognition And Search Columbia University, Spring 2013 41

Spatial Verification

• Images are NOT text documents

• Exploring spatial information to improve precision

• Post-processing is expensive:

– Only validate top images

– (optional) Query expansion after validating

• Two models for spatial verification

– RANdom SAmple Consensus (RANSAC)

– Hough transform

Post-processing After Matching

Visual Recognition And Search Columbia University, Spring 2013 42

Spatial Verification

RANSAC Algorithm

Visual Recognition And Search Columbia University, Spring 2013 43

RANSAC Example

What do we do about the “bad” matches?

Example courtesy to Rick Szeliski

Visual Recognition And Search Columbia University, Spring 2013 44

RANSAC Example

• The key idea is not just that there are more inliers than outliers, but that the outliers are wrong in different ways.

• Inliers share the same spatial transform (translation or affine projection).

Underlining Assumption

Visual Recognition And Search Columbia University, Spring 2013 45

RANSAC Example

Select one match, count inliers

Visual Recognition And Search Columbia University, Spring 2013 46

RANSAC Example

Select one match, count inliers

Visual Recognition And Search Columbia University, Spring 2013 47

RANSAC Example

Find “average” translation vector

Visual Recognition And Search Columbia University, Spring 2013 48

RANSAC

• Vote in image space

• Computational complexity is determined by the number of matches

Spatial Verification

RANSAC vs. Hough Transform

Hough Transform

• Vote in model space

• Computational complexity is linear to number of correspondences and also voting cells (high dimensional impractical)

Visual Recognition And Search Columbia University, Spring 2013 49

Spatial Verification

Hough Transform

x

y b

m

(x1, y1)

y0=x0m + b

y1=x1m + b

Image space Model space

(x0, y0)

Visual Recognition And Search Columbia University, Spring 2013 50

• What if the model has more parameter than a line?

• What if there are more than one single models?

Spatial Verification

Generalized Hough Transform

Visual Recognition And Search Columbia University, Spring 2013 51

• What if the model has more parameter than a line?

voting in multi-dimensional space instead of 2D

• What if there are more than one single models?

clustering in model space

Spatial Verification

Generalized Hough Transform

Visual Recognition And Search Columbia University, Spring 2013 52

Spatial Verification

Query Expansion

Chum et al, Total Recall: Automatic Query Expansion with a Generative Feature Model

for Object Retrieval, ICCV’07

Visual Recognition And Search Columbia University, Spring 2013 53

• Problem 1

Goal: object retrieval with local features

Application: product search, person verification

• Problem 2

Goal: object retrieval with global representation

Application: image/video search engine

Can These Problems Be Solved Efficiently?

Visual Recognition And Search Columbia University, Spring 2013 54

From local feature to global representation

Fast Nearest Neighbor Search

Visual Recognition And Search Columbia University, Spring 2013 55

Fast Nearest Neighbor Search

KD-Tree

Visual Recognition And Search Columbia University, Spring 2013 56

Fast Nearest Neighbor Search

KD-Tree Construction

Heuristics to make splitting decisions:

• Which dimension do we split along?

– Widest – axis with highest variance

• Which value do we split at?

– Median of value of that split dimension for the points.

• When do we stop?

– When there are fewer then m p

Visual Recognition And Search Columbia University, Spring 2013 57

Fast Nearest Neighbor Search

Search via KD-Tree

Slides courtesy to Brigham Anderson

Visual Recognition And Search Columbia University, Spring 2013 58

Fast Nearest Neighbor Search

Search via KD-Tree

Slides courtesy to Brigham Anderson

Visual Recognition And Search Columbia University, Spring 2013 59

Fast Nearest Neighbor Search

Search via KD-Tree

Slides courtesy to Brigham Anderson

Visual Recognition And Search Columbia University, Spring 2013 60

Fast Nearest Neighbor Search

Search via KD-Tree

Slides courtesy to Brigham Anderson

Visual Recognition And Search Columbia University, Spring 2013 61

Fast Nearest Neighbor Search

Search via KD-Tree

Slides courtesy to Brigham Anderson

Visual Recognition And Search Columbia University, Spring 2013 62

Fast Nearest Neighbor Search

Search via KD-Tree

Slides courtesy to Brigham Anderson

Visual Recognition And Search Columbia University, Spring 2013 63

Fast Nearest Neighbor Search

Problem of KD-Tree

Example courtesy to Andrew Moore

Visual Recognition And Search Columbia University, Spring 2013 64

Fast Nearest Neighbor Search

PCA-Tree

k-d tree partitions with hyperplane

perpendicular to an axis

PCA tree partitions with hyperplane

perpendicular to the principal direction

Verma et al., UAI 2009

Exa

mp

le c

ou

rte

sy to

Ru

i W

ang

Visual Recognition And Search Columbia University, Spring 2013 65

Fast Nearest Neighbor Search

Rand Projection Tree

• RP tree: overcome the high-dimensional problem

• Approach: picks a random direction from the unit sphere and split the data at the median value

Dasgupta , Sanjoy, and Freund, STOC 2008

KD tree: RP tree:

Visual Recognition And Search Columbia University, Spring 2013 66

Fast Nearest Neighbor Search

• More efficient than PCA-tree

• More accurate than RP-tree

Approximate Principal Direction Trees

McCartin-Lim, McGregor, Wang, ICML 2012

Visual Recognition And Search Columbia University, Spring 2013 67

Locality-Sensitive Hashing

• Sublinear search time for –approximate NN.

• Long hash bits (>=1k) and multiple hash tables.

0

1

0

1 0

1

Feature Vector

67

hash function

random

101 Query

[Gionis, Indyk, and Motwani 1999] [Datar et al. 2004]

Fast Nearest Neighbor Search

Slid

e c

ourt

esy t

o W

ei Liu

Visual Recognition And Search Columbia University, Spring 2013 68

Fast Nearest Neighbor Search

Spectral Hashing

• Goal: find an embedding where Hamming distance approximates Euclidean distance

• Method: find projection as eigen vector

Y. Weiss, A. Torralba, Rob Fergus, Spectral Hashing, NIPS 2008

Relax this constraint

Visual Recognition And Search Columbia University, Spring 2013 69

1. Fit a multidimensional rectangle to the data

Run PCA to align axes, then bound uniform distribution

2. For each dimension, calculate k smallest eigenfunctions.

3. This gives dk eigenfunctions. Pick ones with smallest k eigenvalues.

4. Threshold eigenfunctions at zero to give binary codes

Visual Recognition And Search Columbia University, Spring 2013 70

1. Fit multidimensional rectangle - Run PCA to align axes - Bound uniform distribution

Visual Recognition And Search Columbia University, Spring 2013 71

2 Calculate eigenfunctions

Visual Recognition And Search Columbia University, Spring 2013 72

3. Pick k smallest Eigenfunctions

Eigenvalues

Visual Recognition And Search Columbia University, Spring 2013 73

4. Threshold chosen Eigenfunctions

Visual Recognition And Search Columbia University, Spring 2013 74

Fast Nearest Neighbor Search

Supervised Information for Hashing

Jain, Kulis, & Grauman, CVPR 2008

Visual Recognition And Search Columbia University, Spring 2013 75

Fast Nearest Neighbor Search

Models for Supervised Hashing

Liu et al, CVPR 2012

Jain, Kulis, & Grauman, CVPR 2008

Solve by sequential optimization

Solve by information based metric learning

Li et al, NIPS 2011 Solve by ensemble tree learning

Visual Recognition And Search Columbia University, Spring 2013 76

Discussions

Query 1 1 0 0

Sample 1 1 0 0

Inverted Index

Data Structure in Real Systems

Compact codes

Visual Recognition And Search Columbia University, Spring 2013 77

Discussions

Query 1 1 0 0

Sample 1 1 0 0

Inverted Index

Data Structure in Real Systems

Compact codes

Fast speed

Good for searching engines

Memory efficient

Good for mobile applications

Not good for long bits codes

Visual Recognition And Search Columbia University, Spring 2013 78

Discussion

• Many existing systems for general image retrieval

– Google Image, Bing Image, SnapTell, …

• Chances in domain-specific of image retrieval

– Clothes, Pets, etc

– Face verification

– Surveillance

– Geolocation

Potential Research Questions

Interesting techniques

- Spatial verification

- Metric learning

- Mobile

- …