Image Retrieval with Fisher Vectors of Binary Features
KDDI R&D Laboratories, Inc. Yusuke Uchida, Shigeyuki Sakazawa
2014/8/1 2
Image retrieval using local features
• Local Invariant Feature: – Robust against occlusion, illumination change, viewpoint
change, and so on • Applications:
– Product search (Amazon Flow), landmark recognition (Google Goggles), augmented reality (Qualcomm Vuforia), …
2014/8/1 3
Trends in image retrieval using local features
• 1999: SIFT [Lowe,ICCV’99]
• 2003: SIFT + Bag-of-visual words [Sivic+,ICCV’03]
• 2007: SIFT + Fisher vector [Perronnin+,CVPR’07,ECCV’10]
– New effective image representation • 2011: Local binary features (ORB [Rublee+,ICCV’11], FREAK, BRISK)
– Efficient alternatives to SIFT or SURF
• In this presentation: – Propose Fisher vector of binary features for image retrieval – Model binary features by Bernoulli mixture model (BMM) – Derive closed-form approximation of Fisher vector of BMM – New normalization method is applied to Fisher vector
2014/8/1 4
Pipeline of image retrieval using local features
Classifier (e.g. SVM) Similarity search
--・・・-
--・・・-
--・・・-
--・・・-
--・・・-
A single vector representation of the image
--・・・-
Region detection
Feature description
Aggregation A set of feature vector X
2014/8/1 5
Position of this research
Bag-of-visual words Fisher vector Continuous (SIFT, SURF)
[1] [2, 3]
Binary (ORB, FREAK, BRISK)
[4] This research
Aggregation methods
Desc
ripto
r typ
e
[1] J. Sivic and A. Zisserman, "Video google: A text retrieval approach to object matching in videos," in Proc. of ICCV’03. [2] F. Perronnin and C. Dance, "Fisher kernels on visual vocabularies for image categorization," in Proc. of CVPR’07. [3] F. Perronnin, et al., "Improving the fisher kernel for large-scale image classification,” in Proc. of ECCV’10. [4] D. Galvez-Lopez and J. D. Tardos, "Real-time loop detection with bags of binary words," in Proc. of IROS’11.
Accurate
Fast
2014/8/1 7
Fisher kernel [Jaakkola+, NIPS’98]
• The generation process of X is modeled by a probability density function p(X|λ) with a parameter set λ
• Describe X by the gradient of the log-likelihood function L(X|λ) = log P(X|λ) (=Fisher score)
• Similarity between X and X’ is defined by the Fisher kernel K(X,X’):
)|'(L)|(L)',( 1T λλ λλλ XFXXXK ∇∇= −
Fisher score (gradient of log-likelihood function)
Fisher information matrix ])|()|([E Tλλ λλλ xLxLF ∇∇=
[5] T. Jaakkola and D. Haussler, "Exploiting generative models in discriminative classifiers," in Proc. of NIPS'98.
2014/8/1 8
Fisher vector [Perronnin+, CVPR’07]
• Explicit feature mapping for Fisher kernel – As the Fisher information matrix (FIM) F is positive
semidefinite and symmetric, it has a Cholesky decomposition:
– Thus Fisher kernel can be rewritten as a dot-product between Fisher vectors zX and zX’: where )|(L λλλ XLzX ∇=
Fisher score Decomposed FIM
)|'(L)|(L)',( 1T λλ λλλ XFXXXK ∇∇= −
λλλ LLF T=−1
'T)',( XX zzXXK =
2014/8/1 9
Fisher vector of GMM [Perronnin+, CVPR’07]
∑=
Σ=N
iiitit xNwxp
1
),;()|( µλ
F
,)|()|(1∏=
=T
ttxpXp λλ
• SIFT features are modeled by Gaussian mixture model (GMM) Closed-form approximation of the Fisher vector of GMM under the following assumptions: 1. The Fisher information matrix F is diagonal 2. The number of features extracted from an image is
constant and equal to T 3. The posterior probability r(i) is peaky
• Compared with bag-of-visual words, Fisher vector contains higher order information
2014/8/1 10
Local binary features [Rublee+, ICCV’11]
• Local binary features: ORB, BRISK, FREAK, and many others – One or two magnitudes faster than SIFT or SURF – Multi-scale FAST detector or its variants – Binary descriptor based on binary tests on pixel’s luminance
• Resulting in a binary vector (0, 0, 1, 0, 1, …, 1)
A part of binary tests of ORB
256
• Binary tests are defined by pairs of positions • If the luminance of the first position is brighter
than the luminance of second position; then the test generate bit ‘1’
Loca
l fea
ture
regi
on
2014/8/1 11
Modeling binary features by BMM
• Model binary features by Bernoulli mixture model (BMM)
∏=
=T
ttxpXp
1
)|()|( λλ
∑=
=N
itiit xpwxp
1)|()|( λλ
∏=
−−=D
d
xid
xidti
tdtdxp1
1 )1()|( µµλ
}..1,..1,,{ DdNiw idi === µλ
Naïve Bayes assumption
Each feature is generated from one of the N components
Single multivariate Bernoulli distribution
X: A set of T binary feature X = (x1, …, xT) with D dimension (D bits) λ: a set of parameters
Notations
2014/8/1 12
Visualizing clustering results of BMM
• The parameters λ are estimated by EM algorithm (for N =32) using 1M training ORB features binary tests with top 5 high probability of generating bit “0” binary tests with top 5 high probability of generating bit “1”
• Mixture model successfully captures underlying bit correlation
All binary tests defined in ORB Four components (clusters) out of N = 32 components
2014/8/1 13
Fisher vector of BMM
• Definition of Fisher vector:
• Fisher score w.r.t. μid
)|(L λλλ XLzX ∇=Fisher score Decomposed FIM
)|(log)|(L λλ tt xpx =
∑=
=N
itiit xpwxp
1)|()|( λλ
)1()1()()|(L
1
1
tdtd
td
xid
xid
x
tid
t ix−
−
−−
=∂
∂µµ
γµ
λ
∑=
== N
jtjj
tiitt
xpw
xpwxipi
1)|(
)|(),|()(λ
λλγ
Posterior probability
∑= t txX )|(L)|(L λλ
∏=
−−=D
d
xid
xidti
tdtdxp1
1 )1()|( µµλ
2014/8/1 14
Fisher vector of BMM
• Fisher information w.r.t. μid (fμid)
= 0
Posterior probability is peaky
Fisher score
2014/8/1 15
Posterior probability 𝑝(𝑖|𝑥𝑡 , 𝜆)
• Histogram of max𝑖𝑝 𝑖 𝑥𝑡 , 𝜆 (𝑁 = 256)
• Peaky!
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
2014/8/1 16
Vector normalization • Normalization is essential part of Fisher vector representation [3] • Power normalization [3]
– 𝑧𝑧𝑖𝑖 = sgn 𝑧𝑖𝑖 |𝑧𝑖𝑖|𝛼 (𝛼 = 0.5) • L2 normalization [3]
– 𝑧𝑧𝑖𝑖 = 𝑧𝑖𝑖/ ∑ 𝑧𝑖𝑖2𝑖𝑖
• Intra normalization [6] (originally proposed for VLAD not for FV) – perform L2 normalization within each BMM component
– 𝑧𝑧𝑖𝑖 = 𝑧𝑖𝑖/ ∑ 𝑧𝑖𝑖2𝑖
[3] F. Perronnin, et al., "Improving the fisher kernel for large-scale image classification,” in Proc. of ECCV’10. [6] R. Arandjelovic and A. Zisserman, "All about VLAD," in Proc. of CVPR'13.
Originally proposed for FV
2014/8/1 17
Experimental setup
• Dataset: Stanford Mobile Visual Search – http://www.stanford.edu/~dmchen/mvs.html – CD class is used for evaluation
• Performance measure: mean average precision (MAP) • Binary feature: ORB (OpenCV implementation, 4 scales, 900 features/image)
100 Reference image
400 Query images
2014/8/1 18
Experimental results (1)
• Compare the proposed Fisher vector with BoVW (N=1024) • Evaluate normalization methods (P=Power, In=Intra normalization)
Number of mixture components
BoVW Imp. FV
• Fisher vector without any normalization achieves poor results • Power and/or L2 normalization significantly improves FV • Intra normalization outperforms the others in all N!
Pure FV
better In Norm FV
2014/8/1 19
Experimental results (2)
• Add independent images to database as a distractor
• The Fisher vector achieves better performance in all database sizes • The degradation of the Fisher vector is relatively small
=Proposed FV
2014/8/1 20
Summary
• Proposed Fisher vector of binary features for image retrieval – Model binary feature by Bernoulli mixture model (BMM) – Derive closed-form approximation of Fisher vector of BMM – Apply new normalization method to Fisher vector
• Future work
– Encode Fisher vector into a compact code for efficiency (The method proposed in [7] seems promising)
– Apply proposed Fisher vector to other binary features (e.g., audio fingerprints)
[7] Y. Gong et al., "Learning Binary Codes for High-Dimensional Data Using Bilinear Projections," in Proc. of CVPR'13.
2014/8/1 21
2014/8/1 22
Fisher vector of BMM (Fisher score)
• Fisher score w.r.t. μid
∏≠=
−− −−=∂
∂ D
dee
xie
xie
x
id
ti tetetdxp
,1
11 )1()1()|( µµµ
λ)1(
)1()()|(
)|()|(L
1
1
tdtd
td
xid
xid
x
tt
tid
id
t ixp
xpx
−
−
−−
=∂∂
=∂
∂µµ
γλ
λµ
µλ
∑=
== N
jtjj
tiitt
xpw
xpwxipi
1)|(
)|(),|()(λ
λλγOccupancy probability (posterior probability)
)|(L λλλ XLzX ∇=Fisher score Decomposed FIM )|(log)|(L λλ tt xpx =
∑=
=N
itiit xpwxp
1)|()|( λλ