View
0
Download
0
Embed Size (px)
Spring 2019: Venu: Haag 315, Time: M/W 4-5:15pm
ECE 5582 Computer Vision Lec 08: Feature Aggregation II
Zhu Li Dept of CSEE, UMKC
Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu
Z. Li: ECE 5582 Computer Vision, 2019. p.1
slides created with WPS Office Linux and EqualX LaTex equation editor
Outline
• ReCap of Lecture 07 • Image Retrieval System • BoW • VLAD
• Dense SIFT • Fisher Vector Aggregation • AKULA • Summary
Z. Li, Image Analysis & Retrv. Spring 2018 p.2
Precision, Recall, F-measure
• Precision, TPR = TP/(TP + FP), • Recall = TP/(TP + FN), • FPR=FP/(TP+FP) • F-measure = 2*(precision*recall)/(precision + recall)
Precision: is the probability that a
retrieved document is relevant.
Recall: is the probability that a
relevant document is retrieved in a search.
Z. Li, Image Analysis & Retrv. Spring 2018 p.3
Why Aggregation ?
• Curse of Dimensionality
•Decision Boundary / Indexing
Z. Li, Image Analysis & Retrv. Spring 2018 p.4
+
…..
Bag-of-Words: Histogram Coding
Z. Li, Image Analysis & Retrv. Spring 2018 p.5
k
n
Kernel Code Book Soft Encoding
Z. Li, Image Analysis & Retrv. Spring 2018 p.6
VLAD- Vector of Locally Aggregated Descriptors
Z. Li, Image Analysis & Retrv. Spring 2018 p.7
3
x
v1 v2 v3 v4
v5
1
4
2
5
① assign descriptors
② compute x- i
③ vi=sum x- i for cell i
VLAD on SIFT
• Example of aggregating SIFT with VLAD • K=16 codebook entries • Each cell is a SIFT visualized as centroids in blue, and
VLAD difference in red • Top row: left image, bottom row: right image, red: code
book, blue: encoded VLAD
Z. Li, Image Analysis & Retrv. Spring 2018 p.8
Outline
• ReCap of Lecture 07 • Image Retrieval System • BoW • VLAD
• Dense SIFT • Fisher Vector Aggregation • AKULA • Summary
Z. Li, Image Analysis & Retrv. Spring 2018 p.9
One more trick
• Recall that SIFT is a powerful descriptor
• VL_FEAT: vl_dsift • A dense description of image by computing SIFT descriptor
(no spatial-scale space extrema detection) at predetermined grid
• Supplement HoG as an alternative texture descriptor Z. Li, Image Analysis & Retrv. Spring 2018 p.10
VL_FEAT: vl_dsift
• Compute dense SIFT as a texture descriptor for the image • [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘step’, 2);
• There’s also a FAST option • [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘fast’, ‘step’, 2); • Huge amount of SIFT data will be generated
Z. Li, Image Analysis & Retrv. Spring 2018 p.11
Fisher Vector
• Fisher Vector and variations: • Winning in image classification:
• Winning in the MPEG object re-identification: o SCFV(Scalable Coded Fisher Vec) in CDVS
Z. Li, Image Analysis & Retrv. Spring 2018 p.12
Codebook: Gaussian Mixture Model (GMM)
Z. Li, Image Analysis & Retrv. Spring 2018 p.13
A bit of Theory: Fisher Kernel
Z. Li, Image Analysis & Retrv. Spring 2018 p.14
X1 +
A bit of Theory: Fisher Kernel
Z. Li, Image Analysis & Retrv. Spring 2018 p.15
Fisher Vector
• KFK(X, Y) is a measure of similarity, w.r.t. the generative model • Similar to the Mahanolibis distance
case, we can decompose this kernel as,
• That give us a kernel feature mapping of X to Fisher Vector
• For observed images features {xt}, can be computed as,
Z. Li, Image Analysis & Retrv. Spring 2018 p.16
GMM Fisher Vector
Z. Li, Image Analysis & Retrv. Spring 2018 p.17
weight
mean
variance
GMM Fisher Vector VL_FEAT implementation
Z. Li, Image Analysis & Retrv. Spring 2018 p.18
GMM Fisher Vector VL_FEAT implementation
• FV encoding • Gradient w.r.t. the mean, variance, for GMM component k,
j=1..D
• In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances
Z. Li, Image Analysis & Retrv. Spring 2018 p.19
VL_FEAT GMM/FV API
• Compute GMM model with VL_FEAT • Prepare data: numPoints = 1000 ; dimension = 2 ; data = rand(dimension,N) ;
• Call vl_gmm: numClusters = 30 ; [means, covariances, priors] = vl_gmm(data, numClusters) ;
• Visualize: figure ; hold on ; plot(data(1,:),data(2,:),'r.') ; for i=1:numClusters vl_plotframe([means(:,i)' sigmas(1,i) 0 sigmas(2,i)]); end
Z. Li, Image Analysis & Retrv. Spring 2018 p.20
VL_FEAT API
• FV encoding encoding = vl_fisher(data_to_Be_Encoded, means, covariances, priors);
• Bonus points: • Encode HoG features with Fisher Vector ? • randomly collect 2~3 images from each class • Stack all HoG features together into an n x 36 data matrix • Compute its GMM • Use this GMM to encode all image HoG features (other than
average)
Z. Li, Image Analysis & Retrv. Spring 2018 p.21
Super Vector Aggregation – Speaker ID
• Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM against GMM
• Ref: o William M. Campbell, Douglas E. Sturim, Douglas A. Reynolds: Support vector
machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5): 308-311 (2006)
Z. Li, Image Analysis & Retrv. Spring 2018 p.22
“Yes, We Can !”
?
Super Vector from MFCC • Motivated from Speaker ID work
• Speech is a continuous evolution of the vocal tract • Need to extract a sequence of spectra or sequence of spectral coefficients • Use a sliding window - 25 ms window, 10 ms shift
Z. Li, Image Analysis & Retrv. Spring 2018 p.23
DCTLog|X(ω)| MFCC
GMM Model from MFCC • GMM on MFCC feature
Z. Li, Image Analysis & Retrv. Spring 2018 p.24
• The acoustic vectors (MFCC) of speaker s is modeled by a prob. density function parameterized by
• Gaussian mixture model (GMM) for speaker s:
Universal Background Model
• UBM GMM Model:
Z. Li, Image Analysis & Retrv. Spring 2018 p.25
• The acoustic vectors of a general population is modeled by another GMM called the universal background model (UBM):
• Parameters of the UBM
MAP Adaption
• Given the UBM GMM, how is the new observation derivate ? • The adapted mean is given by:
Z. Li, Image Analysis & Retrv. Spring 2018 p.26
Supervector Distance
Z. Li, Image Analysis & Retrv. Spring 2018 p.27
Supervector Performance in NIST Speaker ID
• System 5: Gaussian SV • DCF (Detection Cost Function)
Z. Li, Image Analysis & Retrv. Spring 2018 p.28
m31491
AKULA – Adaptive KLUster Aggregation
2013/10/25
Abhishek Nagar, Zhu Li, Gaurav Srivastava and Kyungmo Park
Z. Li, Image Analysis & Retrv. Spring 2018 p.29
Outline
•Motivation •Adaptive Aggregation •Results with TM7 •Summary
Z. Li, Image Analysis & Retrv. Spring 2018 p.30
Motivation
•Better Aggregation • Fisher Vector and VLAD type aggregation depending on a
global model • AKULA removes this dependence, and directly coding the
cluster centroids and sift count • SCFV/RVD all having situations where clusters are turned
off due to no assignment, this can be avoided in AKULA
SIFT detection & selection K-means AKULA description
Z. Li, Image Analysis & Retrv. Spring 2018 p.31
Motivation
•Better Subspace Choice • Both SCFV and RVD do fixed normalization and PCA
projection based on heuristic. • What is the best possible subspace to do the aggregation ? • Using a boosting scheme to keep adding subspaces and
aggregations in an iterative fashion, and tune TPR-FPR to the desired operating points on FPR.
Z. Li, Image Analysis & Retrv. Spring 2018 p.32
CE2: AKULA – Adaptive KLUster Aggregation
• AKULA Descriptor: cluster centroids + SIFT count
A2={yc21, yc22, …, yc2k ; pc21, pc22, …, pc2k }
• Distance metric: • Min centroids distance, weighted
by SIFT count
A1={yc11, yc12, …, yc1k ; pc11, pc12, …, pc1k },
Z. Li, Image Analysis & Retrv. Spring 2018 p.33
AKULA implementation in TM7
• Inner loop aggregation • Dimension is fixed at 8 • Numb of clusters, or nc=8, 16, 32, to hit 64, 128, and 256
bytes • Quantization: scale by ½ and quantized to int8, sift count is
8 bits, total (nc+1)*dim bytes per aggregation
Z. Li, Image Analysis &