Download pdf - Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)

Image Retrieval with Fisher Vectors of Binary Features

KDDI R&D Laboratories, Inc. Yusuke Uchida, Shigeyuki Sakazawa

2014/8/1 2

Image retrieval using local features

• Local Invariant Feature: – Robust against occlusion, illumination change, viewpoint

change, and so on • Applications:

– Product search (Amazon Flow), landmark recognition (Google Goggles), augmented reality (Qualcomm Vuforia), …

2014/8/1 3

Trends in image retrieval using local features

• 1999: SIFT [Lowe,ICCV’99]

• 2003: SIFT + Bag-of-visual words [Sivic+,ICCV’03]

• 2007: SIFT + Fisher vector [Perronnin+,CVPR’07,ECCV’10]

– New effective image representation • 2011: Local binary features (ORB [Rublee+,ICCV’11], FREAK, BRISK)

– Efficient alternatives to SIFT or SURF

• In this presentation： – Propose Fisher vector of binary features for image retrieval – Model binary features by Bernoulli mixture model (BMM) – Derive closed-form approximation of Fisher vector of BMM – New normalization method is applied to Fisher vector

2014/8/1 4

Pipeline of image retrieval using local features

Classifier (e.g. SVM) Similarity search

－－・・・－

－－・・・－

－－・・・－

－－・・・－

－－・・・－

A single vector representation of the image

－－・・・－

Region detection

Feature description

Aggregation A set of feature vector X

2014/8/1 5

Position of this research

Bag-of-visual words Fisher vector Continuous (SIFT, SURF)

[1] [2, 3]

Binary (ORB, FREAK, BRISK)

[4] This research

Aggregation methods

Desc

ripto

r typ

e

[1] J. Sivic and A. Zisserman, "Video google: A text retrieval approach to object matching in videos," in Proc. of ICCV’03. [2] F. Perronnin and C. Dance, "Fisher kernels on visual vocabularies for image categorization," in Proc. of CVPR’07. [3] F. Perronnin, et al., "Improving the fisher kernel for large-scale image classification,” in Proc. of ECCV’10. [4] D. Galvez-Lopez and J. D. Tardos, "Real-time loop detection with bags of binary words," in Proc. of IROS’11.

Accurate

Fast

2014/8/1 7

Fisher kernel [Jaakkola+, NIPS’98]

• The generation process of X is modeled by a probability density function p(X|λ) with a parameter set λ

• Describe X by the gradient of the log-likelihood function L(X|λ) = log P(X|λ) (=Fisher score)

• Similarity between X and X’ is defined by the Fisher kernel K(X,X’):

)|'(L)|(L)',( 1T λλ λλλ XFXXXK ∇∇= −

Fisher score (gradient of log-likelihood function)

Fisher information matrix ])|()|([E Tλλ λλλ xLxLF ∇∇=

[5] T. Jaakkola and D. Haussler, "Exploiting generative models in discriminative classifiers," in Proc. of NIPS'98.

2014/8/1 8

Fisher vector [Perronnin+, CVPR’07]

• Explicit feature mapping for Fisher kernel – As the Fisher information matrix (FIM) F is positive

semidefinite and symmetric, it has a Cholesky decomposition:

– Thus Fisher kernel can be rewritten as a dot-product between Fisher vectors zX and zX’: where )|(L λλλ XLzX ∇=

Fisher score Decomposed FIM

)|'(L)|(L)',( 1T λλ λλλ XFXXXK ∇∇= −

λλλ LLF T=−1

'T)',( XX zzXXK =

2014/8/1 9

Fisher vector of GMM [Perronnin+, CVPR’07]

∑=

Σ=N

iiitit xNwxp

1

),;()|( µλ

F

,)|()|(1∏=

=T

ttxpXp λλ

• SIFT features are modeled by Gaussian mixture model (GMM) Closed-form approximation of the Fisher vector of GMM under the following assumptions: 1. The Fisher information matrix F is diagonal 2. The number of features extracted from an image is

constant and equal to T 3. The posterior probability r(i) is peaky

• Compared with bag-of-visual words, Fisher vector contains higher order information

2014/8/1 10

Local binary features [Rublee+, ICCV’11]

• Local binary features: ORB, BRISK, FREAK, and many others – One or two magnitudes faster than SIFT or SURF – Multi-scale FAST detector or its variants – Binary descriptor based on binary tests on pixel’s luminance

• Resulting in a binary vector (0, 0, 1, 0, 1, …, 1)

A part of binary tests of ORB

256

• Binary tests are defined by pairs of positions • If the luminance of the first position is brighter

than the luminance of second position; then the test generate bit ‘1’

Loca

l fea

ture

regi

on

2014/8/1 11

Modeling binary features by BMM

• Model binary features by Bernoulli mixture model (BMM)

∏=

=T

ttxpXp

1

)|()|( λλ

∑=

=N

itiit xpwxp

1)|()|( λλ

∏=

−−=D

d

xid

xidti

tdtdxp1

1 )1()|( µµλ

}..1,..1,,{ DdNiw idi === µλ

Naïve Bayes assumption

Each feature is generated from one of the N components

Single multivariate Bernoulli distribution

X: A set of T binary feature X = (x1, …, xT) with D dimension (D bits) λ: a set of parameters

Notations

2014/8/1 12

Visualizing clustering results of BMM

• The parameters λ are estimated by EM algorithm (for N =32) using 1M training ORB features binary tests with top 5 high probability of generating bit “0” binary tests with top 5 high probability of generating bit “1”

• Mixture model successfully captures underlying bit correlation

All binary tests defined in ORB Four components (clusters) out of N = 32 components

2014/8/1 13

Fisher vector of BMM

• Definition of Fisher vector:

• Fisher score w.r.t. μid

)|(L λλλ XLzX ∇=Fisher score Decomposed FIM

)|(log)|(L λλ tt xpx =

∑=

=N

itiit xpwxp

1)|()|( λλ

)1()1()()|(L

1

1

tdtd

td

xid

xid

x

tid

t ix−

−

−−

=∂

∂µµ

γµ

λ

∑=

== N

jtjj

tiitt

xpw

xpwxipi

1)|(

)|(),|()(λ

λλγ

Posterior probability

∑= t txX )|(L)|(L λλ

∏=

−−=D

d

xid

xidti

tdtdxp1

1 )1()|( µµλ

2014/8/1 14

Fisher vector of BMM

• Fisher information w.r.t. μid (fμid)

= 0

Posterior probability is peaky

Fisher score

2014/8/1 15

Posterior probability 𝑝(𝑖|𝑥𝑡 , 𝜆)

• Histogram of max𝑖𝑝 𝑖 𝑥𝑡 , 𝜆 (𝑁 = 256)

• Peaky!

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

2014/8/1 16

Vector normalization • Normalization is essential part of Fisher vector representation [3] • Power normalization [3]

– 𝑧𝑧𝑖𝑖 = sgn 𝑧𝑖𝑖 |𝑧𝑖𝑖|𝛼 (𝛼 = 0.5) • L2 normalization [3]

– 𝑧𝑧𝑖𝑖 = 𝑧𝑖𝑖/ ∑ 𝑧𝑖𝑖2𝑖𝑖

• Intra normalization [6] (originally proposed for VLAD not for FV) – perform L2 normalization within each BMM component

– 𝑧𝑧𝑖𝑖 = 𝑧𝑖𝑖/ ∑ 𝑧𝑖𝑖2𝑖

[3] F. Perronnin, et al., "Improving the fisher kernel for large-scale image classification,” in Proc. of ECCV’10. [6] R. Arandjelovic and A. Zisserman, "All about VLAD," in Proc. of CVPR'13.

Originally proposed for FV

2014/8/1 17

Experimental setup

• Dataset: Stanford Mobile Visual Search – http://www.stanford.edu/~dmchen/mvs.html – CD class is used for evaluation

• Performance measure: mean average precision (MAP) • Binary feature: ORB (OpenCV implementation, 4 scales, 900 features/image)

100 Reference image

400 Query images

2014/8/1 18

Experimental results (1)

• Compare the proposed Fisher vector with BoVW (N=1024) • Evaluate normalization methods (P=Power, In=Intra normalization)

Number of mixture components

BoVW Imp. FV

• Fisher vector without any normalization achieves poor results • Power and/or L2 normalization significantly improves FV • Intra normalization outperforms the others in all N!

Pure FV

better In Norm FV

2014/8/1 19

Experimental results (2)

• Add independent images to database as a distractor

• The Fisher vector achieves better performance in all database sizes • The degradation of the Fisher vector is relatively small

=Proposed FV

2014/8/1 20

Summary

• Proposed Fisher vector of binary features for image retrieval – Model binary feature by Bernoulli mixture model (BMM) – Derive closed-form approximation of Fisher vector of BMM – Apply new normalization method to Fisher vector

• Future work

– Encode Fisher vector into a compact code for efficiency (The method proposed in [7] seems promising)

– Apply proposed Fisher vector to other binary features (e.g., audio fingerprints)

[7] Y. Gong et al., "Learning Binary Codes for High-Dimensional Data Using Bilinear Projections," in Proc. of CVPR'13.

2014/8/1 21

2014/8/1 22

Fisher vector of BMM (Fisher score)

• Fisher score w.r.t. μid

∏≠=

−− −−=∂

∂ D

dee

xie

xie

x

id

ti tetetdxp

,1

11 )1()1()|( µµµ

λ)1(

)1()()|(

)|()|(L

1

1

tdtd

td

xid

xid

x

tt

tid

id

t ixp

xpx

−

−

−−

=∂∂

=∂

∂µµ

γλ

λµ

µλ

∑=

== N

jtjj

tiitt

xpw

xpwxipi

1)|(

)|(),|()(λ

λλγOccupancy probability (posterior probability)

)|(L λλλ XLzX ∇=Fisher score Decomposed FIM )|(log)|(L λλ tt xpx =

∑=

=N

itiit xpwxp

1)|()|( λλ