Improving the Fisher Kernel for Large-Scale Image Classiﬁcation Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January

Improving the Fisher Kernelfor Large-Scale Image

ClassificationFlorent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010

VGG reading group, January 2011, presented by V. Lempitsky

From generative modeling to features

dataset

Generative

model

Inputsampl

efitting

Parameters of the

fit

Discriminati

ve classfiermodel

Simplest example

Dataset of vectors

K-means

Codebook

Inputvector

fitting

Closest codewor

d

Discriminati

ve classfiermodel

– Codebooks– Sparse or dense component analysis– Deep belief networks– Color GMMs– ....

Fisher vector idea

Generative

model

Inputsampl

e

fitting

Parameters of the

fit

Discriminati

ve classfiermodel

Information loss (generative models are always inaccurate!)

Can we retain some of the lost information without building better generative model?

Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. NIPS’99

Main idea: retain information about the fitting error for the best fit.

Same best fit, but different fitting errors!

Fisher vector idea

Generative

model

Inputsampl

e

fitting

Fisher vector

Discriminati

ve classfiermodel

Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. NIPS’99

λMain idea: retain information about the fitting error of the best fit.X

Fisher vector:

(λ1,λ2)

Fisher vector for image classificationF. Peronnin and C. Dance //

CVPR 2007• Assuming independence between the observed T features•Encoding each visual feature (e.g. SIFT) extracted from image to a Fisher vector• Using N-component gaussian mixture models with diagonalized covariance matrices:

N dimensions

128N dimensions

128N dimensions

Relation to BoW

N dimensions

128N dimensions

128N dimensions

BoWExtra info

F. Peronnin and C. Dance // CVPR 2007

Whitening the dataFisher matrix (covariance matrix for Fisher vectors):

Whitening the data (setting the covariance to identity):

Fisher matrix is hard to estimate. Approximations needed:

[Peronnin and Dance//CVPR07] suggest a diagonal approximation to Fisher matrix:

Classification with Fisher kernels

• Use whitened Fisher vectors as an input to e.g. linear SVM

• Small codebooks (e.g. 100 words) are sufficient• Encoding runs faster than BoW with large codebooks

(although with approximate NN this is not so straightforward!)

• Slightly better accuracy than “plain, linear BoW”

F. Peronnin and C. Dance // CVPR 2007

Improvements to Fisher KernelsPerronnin, Jorge Sanchez, and Thomas Mensink,

ECCV 2010Overall very similar to how people improve regular BoW classificationIdea 1: normalization of Fisher vectors.Justification:

probability distribution of VW in an image

Assume:

our GMM

Image specific “content”

Then: =0

Thus:

Observation: image non-specific “content” affects the length of the vector, but not direction

Conclusion: normalize to remove the effect of non-specific “content”...also L2-normalization ensures K(x,x) = 1 and improves BoV [Vedaldi et al. ICCV’09]

Improvement: power normalization

α =0.5 i.e. square root works wellc.f. for example [Vedaldi and Zisserman// CVPR10]or [Peronnin et al.//CVPR10]on the use of square root andHellinger’s kernel for BoW

Improvement 3: spatial pyramids

• Fully standard spatial pyramids [Lazebnik et al.] with sum-pooling

Results: Pascal 2007Details: regular grid, multiple scales, SIFT and local RGB color layout, both reduced to 64 dimensions via PCA

Results: Caltech 256

PASCAL + additional training data• Flickr groups up to 25000 per class

• ImageNet up to 25000 per class

Conclusion• Fisher kernels – good way to exploit your

generative model• Fisher kernels based on GMMs in SIFT

space lead to state-of-the-art results (on par with the most recent BoW with soft assignments)

• Main advantage of FK over BoW are smaller dictionaries

• ...although FV are less sparse than BoV• Peronnin et al. trained their system

within a day for 20 classes for 350K images on 1 CPU

Documents

Improving the Fisher Kernel for Large-Scale Image Classiﬁcation Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January