Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Lec 08 Summary • Fisher Vector •Aggregate features {Xk} in RD against GMM •Super Vector •Aggregate GMM against a global

Spring 2019: Venu: Haag 315, Time: M/W 4-5:15pm

ECE 5582 Computer VisionLec 08: Feature Aggregation II

Zhu LiDept of CSEE, UMKC

Office: FH560E, Email: [email protected], Ph: x 2346.http://l.web.umkc.edu/lizhu

Z. Li: ECE 5582 Computer Vision, 2019. p.1

slides created with WPS Office Linux and EqualX LaTex equation editor

Outline

• ReCap of Lecture 07• Image Retrieval System• BoW • VLAD

• Dense SIFT• Fisher Vector Aggregation• AKULA• Summary

Z. Li, Image Analysis & Retrv. Spring 2018 p.2

Precision, Recall, F-measure

• Precision, TPR = TP/(TP + FP),

• Recall = TP/(TP + FN),

• FPR=FP/(TP+FP)

• F-measure

= 2*(precision*recall)/(precision + recall)

Precision: is the probability that a

retrieved document is relevant.

Recall: is the probability that a

relevant document is retrieved in a search.


Why Aggregation ?

• Curse of Dimensionality

•Decision Boundary / Indexing


+

…..

Bag-of-Words: Histogram Coding


k

n

Kernel Code Book Soft Encoding


VLAD- Vector of Locally Aggregated Descriptors


3

x

v1 v2 v3 v4

v5

1

4

2

5

① assign descriptors

② compute x- i

③ vi=sum x- i for cell i

VLAD on SIFT

• Example of aggregating SIFT with VLAD• K=16 codebook entries• Each cell is a SIFT visualized as centroids in blue, and

VLAD difference in red• Top row: left image, bottom row: right image, red: code

book, blue: encoded VLAD


Outline

• ReCap of Lecture 07• Image Retrieval System• BoW • VLAD

• Dense SIFT• Fisher Vector Aggregation• AKULA• Summary


One more trick

• Recall that SIFT is a powerful descriptor

• VL_FEAT: vl_dsift • A dense description of image by computing SIFT descriptor

(no spatial-scale space extrema detection) at predetermined grid

• Supplement HoG as an alternative texture descriptor


VL_FEAT: vl_dsift

• Compute dense SIFT as a texture descriptor for the image• [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘step’, 2);

• There’s also a FAST option• [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘fast’, ‘step’, 2);• Huge amount of SIFT data will be generated


Fisher Vector

• Fisher Vector and variations:• Winning in image classification:

• Winning in the MPEG object re-identification:o SCFV(Scalable Coded Fisher Vec) in CDVS


Codebook: Gaussian Mixture Model (GMM)


A bit of Theory: Fisher Kernel


X1 +

A bit of Theory: Fisher Kernel


Fisher Vector

• KFK(X, Y) is a measure of similarity, w.r.t. the generative model• Similar to the Mahanolibis distance

case, we can decompose this kernel as,

• That give us a kernel feature mapping of X to Fisher Vector

• For observed images features {xt}, can be computed as,


GMM Fisher Vector


weight

mean

variance

GMM Fisher Vector VL_FEAT implementation


GMM Fisher Vector VL_FEAT implementation

• FV encoding• Gradient w.r.t. the mean, variance, for GMM component k,

j=1..D

• In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances


VL_FEAT GMM/FV API

• Compute GMM model with VL_FEAT• Prepare data:numPoints = 1000 ; dimension = 2 ;data = rand(dimension,N) ;

• Call vl_gmm:numClusters = 30 ;[means, covariances, priors] = vl_gmm(data, numClusters) ;

• Visualize:figure ;hold on ;plot(data(1,:),data(2,:),'r.') ;for i=1:numClusters vl_plotframe([means(:,i)' sigmas(1,i) 0 sigmas(2,i)]);end


VL_FEAT API

• FV encodingencoding = vl_fisher(data_to_Be_Encoded, means, covariances, priors);

• Bonus points:• Encode HoG features with Fisher Vector ?• randomly collect 2~3 images from each class• Stack all HoG features together into an n x 36 data matrix• Compute its GMM• Use this GMM to encode all image HoG features (other than

average)


Super Vector Aggregation – Speaker ID

• Fisher Vector: Aggregates Features against a GMM• Super Vector: Aggregates GMM against GMM

• Ref:o William M. Campbell, Douglas E. Sturim, Douglas A. Reynolds: Support vector

machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5): 308-311 (2006)


“Yes, We Can !”

?

Super Vector from MFCC• Motivated from Speaker ID work

• Speech is a continuous evolution of the vocal tract• Need to extract a sequence of spectra or sequence of spectral coefficients• Use a sliding window - 25 ms window, 10 ms shift


DCTLog|X(ω)|MFCC

GMM Model from MFCC• GMM on MFCC feature


• The acoustic vectors (MFCC) of speaker s is modeled by a prob. density function parameterized by

• Gaussian mixture model (GMM) for speaker s:

Universal Background Model

• UBM GMM Model:


• The acoustic vectors of a general population is modeled by another GMM called the universal background model (UBM):

• Parameters of the UBM

MAP Adaption

• Given the UBM GMM, how is the new observation derivate ?• The adapted mean is given by:


Supervector Distance


Supervector Performance in NIST Speaker ID

• System 5: Gaussian SV• DCF (Detection Cost Function)


m31491

AKULA – Adaptive KLUster Aggregation

2013/10/25

Abhishek Nagar, Zhu Li, Gaurav Srivastava and Kyungmo Park


Outline

•Motivation•Adaptive Aggregation•Results with TM7•Summary


Motivation

•Better Aggregation• Fisher Vector and VLAD type aggregation depending on a

global model• AKULA removes this dependence, and directly coding the

cluster centroids and sift count• SCFV/RVD all having situations where clusters are turned

off due to no assignment, this can be avoided in AKULA

SIFT detection & selection K-means AKULA description


Motivation

•Better Subspace Choice• Both SCFV and RVD do fixed normalization and PCA

projection based on heuristic.• What is the best possible subspace to do the aggregation ?• Using a boosting scheme to keep adding subspaces and

aggregations in an iterative fashion, and tune TPR-FPR to the desired operating points on FPR.


CE2: AKULA – Adaptive KLUster Aggregation

• AKULA Descriptor: cluster centroids + SIFT count

A2={yc21, yc2

2, …, yc2k ; pc2

1, pc22, …, pc2

k }

• Distance metric:• Min centroids distance, weighted

by SIFT count

A1={yc11, yc1

2, …, yc1k ; pc1

1, pc12, …, pc1

k },


AKULA implementation in TM7

• Inner loop aggregation• Dimension is fixed at 8• Numb of clusters, or nc=8, 16, 32, to hit 64, 128, and 256

bytes• Quantization: scale by ½ and quantized to int8, sift count is

8 bits, total (nc+1)*dim bytes per aggregation



•Outer loop subspace optimization by boosting• Initial set of subspace models {Ak} computed from MIR

FLICKR data set SIFT extractions by k-means the space to 4096 clusters

• Iterative search on subspaces to generate AKULA aggregation that can improve performance in precision-recall

• Notice that aggregation is de-coupled in subspace iteration, to allow more DoF in aggregation, to find subspaces that provides complimentary info.

•The algorithm is still being debugged, hence only having 1st iteration results in TM7



• Outer loop subspace optimization by boosting• Initial set of subspace models {Ak} computed from MIR

FLICKR data set SIFT extractions by k-means the space to 4096 clusters

• Iterative search on subspaces to generate AKULA aggregation that can improve performance in precision-recall

• Notice that aggregation is de-coupled in subspace iteration, to allow more DoF in aggregation, to find subspaces that provides complimentary info.

• The algorithm is still being debugged, hence only having 1st iteration results in TM7

• Indexing/Hashing is required for AKULA, it involves nc x dim multiplications and additions at this time. A binarization scheme will be considered once its performance is optimized in non-binary form.


GD Only TPR-FPR: AKULA vs SCFV

•Data set 1:• AKULA (128bytes, dim=8, nc=16) distance is just 1-way

dmin1.*wt• Forcing a weighted sum on SCFV (512 bytes) hamming

distances without 2D decision fitting, i.e, count hamming distance between common active clusters, and sum up their distances


GD Only TPR-FPR: AKULA vs SCFV

•Data set 2, 3:• AKULA distance is just 1-way dmin1.*wt• AKULA=128bytes, SCFV = 512 bytes.


3D object set: 4 , 5

•Data set4, 5:


AKULA in PM

•FPR performance:

•AKULA rates:

pm rates m akula rates 512 8 64 1K 16 128 2K 16 128 1K_4K 16 128 2K_4K 16 128 4K 16 128 8K 32 256 16K 32 256


TPR@1% FPR


TPR@1%FPR:


TPR@1%FPR:


TPR@1%FPR:


AKULA Localization

•Quite some improvements: 2.7%


AKULA Summary

• Benefits:• Allow more DoF in aggregation optimization,

o by an outer loop boosting scheme for subspace projection optimization

o And an inner loop adaptive clustering without the constraint of the global GMM model

• Simple weighted distance sum metric, with no need to tune a multi-dimensional decision boundary

• The overall pair wise matching matched up with TM7 SCFV with 2-dimensional decision boundary

• In GD only matching outperforms the TM7 GD• Good improvements to the localization accuracy• Light in extraction, but still heavy in pair wise matching, and

need binarization scheme and/or indexing scheme to work for retrieval

• Future Improvements:• Supervector AKULA ?


Lec 08 Summary

• Fisher Vector• Aggregate features {Xk} in RD

against GMM

•Super Vector• Aggregate GMM against a global

GMM (UBM)

• AKULA• Direct Aggregation, non-

indexable


++ + +

Documents

Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Lec 08 Summary • Fisher Vector •Aggregate features {Xk} in RD against GMM •Super Vector •Aggregate GMM against a global