47
Spring 2020: Venu: Haag 315, Time: M/W 4-5:15pm ECE 5582 Computer Vision Lec 08: Feature Aggregation II Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: [email protected], Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li: ECE 5582 Computer Vision, 2020 p.1 slides created with WPS Office Linux and EqualX LaTex equation editor

Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

  • Upload
    others

  • View
    6

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Spring 2020: Venu: Haag 315, Time: M/W 4-5:15pm

ECE 5582 Computer VisionLec 08: Feature Aggregation II

Zhu LiDept of CSEE, UMKC

Office: FH560E, Email: [email protected], Ph: x 2346.http://l.web.umkc.edu/lizhu

Z. Li: ECE 5582 Computer Vision, 2020 p.1

slides created with WPS Office Linux and EqualX LaTex equation editor

Page 2: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Outline

• ReCap of Lecture 07• Image Retrieval System• BoW • VLAD

• Dense SIFT• Fisher Vector Aggregation• AKULA• Summary

Z. Li: ECE 5582 Computer Vision, 2020 p.2

Page 3: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Precision, Recall, F-measure

• Precision, TPR = TP/(TP + FP),

• Recall = TP/(TP + FN),

• FPR=FP/(TP+FP)

• F-measure

= 2*(precision*recall)/(precision + recall)

Precision: is the probability that a

retrieved document is relevant.

Recall: is the probability that a

relevant document is retrieved in a search.

Z. Li: ECE 5582 Computer Vision, 2020 p.3

Page 4: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Why Aggregation ?

• Curse of Dimensionality

•Decision Boundary / Indexing

Z. Li: ECE 5582 Computer Vision, 2020 p.4

+

…..

Page 5: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Bag-of-Words: Histogram Coding

•Codebook:• Feature space: Rd, k-means to get k centroids, {��, ��,…,��}

• BoW Hard Encoding:• For n feature points,{x1, x2, …,xn} assignment matrix: kxn,

with column only 1-non zero entry• Aggregated dimension: k

Z. Li: ECE 5582 Computer Vision, 2020 p.5

k

n

Page 6: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Kernel Code Book Soft Encoding

•Kernel Code Book Soft Encoding• Kernel Affinity: ����, ��� = �−�|�� −��|

• Assignment Matrix: ���� = �(��, ��)/���(��, ��)• Encoding: k-dimensional: X(k)= �1��������

Z. Li: ECE 5582 Computer Vision, 2020 p.6

Page 7: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

VLAD- Vector of Locally Aggregated Descriptors

• Aggregate feature difference from the codebook• Hard assignment by finding

the NN of feature {xk} to {��}

• Compute aggregated differences

• L2 normalize

• Final feature: k x d

Z. Li: ECE 5582 Computer Vision, 2020 p.7

3

x

v1 v2 v3 v4

v5

1

4

2

5

① assign descriptors

② compute x- i

③ vi=sum x- i for cell i

�� = �∀�,�.�.������=��

�� −��

�� = ��/||��||�

Page 8: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

VLAD on SIFT

• Example of aggregating SIFT with VLAD• K=16 codebook entries• Each cell is a SIFT visualized as centroids in blue, and

VLAD difference in red• Top row: left image, bottom row: right image, red: code

book, blue: encoded VLAD

Z. Li: ECE 5582 Computer Vision, 2020 p.8

Page 9: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Outline

• ReCap of Lecture 07• Image Retrieval System• BoW • VLAD

• Dense SIFT• Fisher Vector Aggregation• AKULA• Summary

Z. Li: ECE 5582 Computer Vision, 2020 p.9

Page 10: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

One more trick

• Recall that SIFT is a powerful descriptor

• VL_FEAT: vl_dsift • A dense description of image by computing SIFT descriptor

(no spatial-scale space extrema detection) at predetermined grid

• Supplement HoG as an alternative texture descriptor

Z. Li: ECE 5582 Computer Vision, 2020 p.10

Page 11: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

VL_FEAT: vl_dsift

• Compute dense SIFT as a texture descriptor for the image• [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘step’, 2);

• There’s also a FAST option• [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘fast’, ‘step’, 2);• Huge amount of SIFT data will be generated

Z. Li: ECE 5582 Computer Vision, 2020 p.11

Page 12: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Fisher Vector

• Fisher Vector and variations:• Winning in image classification:

• Winning in the MPEG object re-identification:o SCFV(Scalable Coded Fisher Vec) in CDVS

Z. Li: ECE 5582 Computer Vision, 2020 p.12

Page 13: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Codebook: Gaussian Mixture Model (GMM)

• GMM is a generative model to express data • Assuming data is generated from with parameters {��, ��,��}

Z. Li: ECE 5582 Computer Vision, 2020 p.13

�� ~ ��=1

����(��,��)

�(��,��) =1

(2�)�2 |Σ�|�/�

�−�12� (�− ��)

���−�(�−��)

Page 14: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

A bit of Theory: Fisher Kernel

•Encode the derivation from the generative model• Observed feature set, {x1, x2, …,xn} in Rd, e.g, d=128 for

SIFT.• How’s these observations derivate from the given GMM

model with a set of parameter, � = {��, ��,��}?o i.e, how the parameter, e.g, mean will move to best fit the

observation ?

Z. Li: ECE 5582 Computer Vision, 2020 p.14

����

��

��

X1 +

Page 15: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

A bit of Theory: Fisher Kernel

•Score function w.r.t. the likelihood function ��(�)• ��� = �� log��(�): derivative on the log likelihood • The dimension of score function is m, where m is the number

of generative model parameters, m=3 for GMM • Given the observed data X, score function indicate how

likelihood function parameter (e.g, mean) should move to better fit the data.

•Distance/Derivation of two observation X, Y w.r.t the generative model• Fisher Info Matrix (roughly the covariance in the

Mahanolibis distance)�� = �����

������

• Fisher Kernel Distance: normalized by the Fisher Info Matrix:

Z. Li: ECE 5582 Computer Vision, 2020 p.15

���(�,  �) = ������

−����

Page 16: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Fisher Vector

• KFK(X, Y) is a measure of similarity, w.r.t. the generative model• Similar to the Mahanolibis distance

case, we can decompose this kernel as,

• That give us a kernel feature mapping of X to Fisher Vector

• For observed images features {xt}, can be computed as,

Z. Li: ECE 5582 Computer Vision, 2020 p.16

���(�,  �) = ������

−���� = ��

����′�����

Page 17: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

GMM Fisher Vector

•Encode the derivation from the generative model• Observed feature set, {x1, x2, …,xn} in Rd, e.g, d=128 (!) for SIFT.• How’s these observations derivate from the given GMM model with a set

of parameter, � = {��, ��,��}?

• GMM Log Likelihood Gradient• Let �� =

������

��, Then we have

Z. Li: ECE 5582 Computer Vision, 2020 p.17

weight

mean

variance

Page 18: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

GMM Fisher Vector VL_FEAT implementation

• GMM codebook• For a K-component GMM, we only allow 3K parameters, {��,  ��,��|� = 1. .�}, i.e, iid Gaussian component

• Posterior prob of feature point xi to GMM component k

Z. Li: ECE 5582 Computer Vision, 2020 p.18

Σ� =�

���

�� 0  0  00 �� 0   0

….              �� 

���

Page 19: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

GMM Fisher Vector VL_FEAT implementation

• FV encoding• Gradient w.r.t. the mean, variance, for GMM component k,

j=1..D

• In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances

Z. Li: ECE 5582 Computer Vision, 2020 p.19

�X= [��,  ��,  …,  ��,  ��, ��,  …,  ��]

Page 20: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

VL_FEAT GMM/FV API

• Compute GMM model with VL_FEAT• Prepare data:numPoints = 1000 ; dimension = 2 ;data = rand(dimension,N) ;

• Call vl_gmm:numClusters = 30 ;[means, covariances, priors] = vl_gmm(data, numClusters) ;

• Visualize:figure ;hold on ;plot(data(1,:),data(2,:),'r.') ;for i=1:numClusters vl_plotframe([means(:,i)' sigmas(1,i) 0 sigmas(2,i)]);end

Z. Li: ECE 5582 Computer Vision, 2020 p.20

Page 21: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

VL_FEAT API

• FV encodingencoding = vl_fisher(data_to_Be_Encoded, means, covariances, priors);

• Bonus points:• Encode HoG features with Fisher Vector ?• randomly collect 2~3 images from each class• Stack all HoG features together into an n x 36 data matrix• Compute its GMM• Use this GMM to encode all image HoG features (other than

average)

Z. Li: ECE 5582 Computer Vision, 2020 p.21

Page 22: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Super Vector Aggregation – Speaker ID

• Fisher Vector: Aggregates Features against a GMM• Super Vector: Aggregates GMM against GMM

• Ref:o William M. Campbell, Douglas E. Sturim, Douglas A. Reynolds: Support vector

machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5): 308-311 (2006)

Z. Li: ECE 5582 Computer Vision, 2020 p.22

“Yes, We Can !”

?

Page 23: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Super Vector from MFCC• Motivated from Speaker ID work

• Speech is a continuous evolution of the vocal tract• Need to extract a sequence of spectra or sequence of spectral coefficients• Use a sliding window - 25 ms window, 10 ms shift

Z. Li: ECE 5582 Computer Vision, 2020 p.23

DCTLog|X(ω)|MFCC

Page 24: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

GMM Model from MFCC• GMM on MFCC feature

Z. Li: ECE 5582 Computer Vision, 2020 p.24

• The acoustic vectors (MFCC) of speaker s is modeled by a prob. density function parameterized by

• Gaussian mixture model (GMM) for speaker s:

Page 25: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Universal Background Model

• UBM GMM Model:

Z. Li: ECE 5582 Computer Vision, 2020 p.25

• The acoustic vectors of a general population is modeled by another GMM called the universal background model (UBM):

• Parameters of the UBM

Page 26: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

MAP Adaption

• Given the UBM GMM, how is the new observation derivate ?• The adapted mean is given by:

Z. Li: ECE 5582 Computer Vision, 2020 p.26

Page 27: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Supervector Distance

• Assuming we have UBM GMM model����  = {��, ��, Σ�},

with identical prior and covariance

• Then for two utterance samples a and b, with GMM models• ��  = {��, ��

�, Σ�}, • ��  = {��, ��

�, Σ�},

The SV distance is,

It means the means of two models need to be normalized by the UBM covariance induced Mahanolibis distance metricThis is also a linear kernel function scaled by the UBM covariances

Z. Li: ECE 5582 Computer Vision, 2020 p.27

�(��, ��) = ��� ���

−(12)�����

( ��Σ�−(12)��

�)

Page 28: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Supervector Performance in NIST Speaker ID

• System 5: Gaussian SV• DCF (Detection Cost Function)

Z. Li: ECE 5582 Computer Vision, 2020 p.28

Page 29: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

m31491

AKULA – Adaptive KLUster Aggregation

2013/10/25

Abhishek Nagar, Zhu Li, Gaurav Srivastava and Kyungmo Park

Z. Li: ECE 5582 Computer Vision, 2020 p.29

Page 30: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Outline

•Motivation•Adaptive Aggregation•Results with TM7•Summary

Z. Li: ECE 5582 Computer Vision, 2020 p.30

Page 31: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Motivation

•Better Aggregation• Fisher Vector and VLAD type aggregation depending on a

global model• AKULA removes this dependence, and directly coding the

cluster centroids and sift count• SCFV/RVD all having situations where clusters are turned

off due to no assignment, this can be avoided in AKULA

SIFT detection & selection K-means AKULA description

Z. Li: ECE 5582 Computer Vision, 2020 p.31

Page 32: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Motivation

•Better Subspace Choice• Both SCFV and RVD do fixed normalization and PCA

projection based on heuristic.• What is the best possible subspace to do the aggregation ?• Using a boosting scheme to keep adding subspaces and

aggregations in an iterative fashion, and tune TPR-FPR to the desired operating points on FPR.

Z. Li: ECE 5582 Computer Vision, 2020 p.32

Page 33: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

CE2: AKULA – Adaptive KLUster Aggregation

• AKULA Descriptor: cluster centroids + SIFT count

A2={yc21, yc2

2, …, yc2k ; pc2

1, pc22, …, pc2

k }

• Distance metric:• Min centroids distance, weighted

by SIFT count

A1={yc11, yc1

2, …, yc1k ; pc1

1, pc12, …, pc1

k },

Z. Li: ECE 5582 Computer Vision, 2020 p.33

Page 34: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

AKULA implementation in TM7

• Inner loop aggregation• Dimension is fixed at 8• Numb of clusters, or nc=8, 16, 32, to hit 64, 128, and 256

bytes• Quantization: scale by ½ and quantized to int8, sift count is

8 bits, total (nc+1)*dim bytes per aggregation

Z. Li: ECE 5582 Computer Vision, 2020 p.34

Page 35: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

AKULA implementation in TM7

•Outer loop subspace optimization by boosting• Initial set of subspace models {Ak} computed from MIR

FLICKR data set SIFT extractions by k-means the space to 4096 clusters

• Iterative search on subspaces to generate AKULA aggregation that can improve performance in precision-recall

• Notice that aggregation is de-coupled in subspace iteration, to allow more DoF in aggregation, to find subspaces that provides complimentary info.

•The algorithm is still being debugged, hence only having 1st iteration results in TM7

Z. Li: ECE 5582 Computer Vision, 2020 p.35

Page 36: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

AKULA implementation in TM7

•Outer loop subspace optimization by boosting• Initial set of subspace models {Ak} computed from MIR

FLICKR data set SIFT extractions by k-means the space to 4096 clusters

• Iterative search on subspaces to generate AKULA aggregation that can improve performance in precision-recall

• Notice that aggregation is de-coupled in subspace iteration, to allow more DoF in aggregation, to find subspaces that provides complimentary info.

•The algorithm is still being debugged, hence only having 1st iteration results in TM7 • Indexing/Hashing is required for AKULA, it involves nc x

dim multiplications and additions at this time. A binarization scheme will be considered once its performance is optimized in non-binary form.

Z. Li: ECE 5582 Computer Vision, 2020 p.36

Page 37: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

GD Only TPR-FPR: AKULA vs SCFV

•Data set 1:• AKULA (128bytes, dim=8, nc=16) distance is just 1-way

dmin1.*wt• Forcing a weighted sum on SCFV (512 bytes) hamming

distances without 2D decision fitting, i.e, count hamming distance between common active clusters, and sum up their distances

Z. Li: ECE 5582 Computer Vision, 2020 p.37

Page 38: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

GD Only TPR-FPR: AKULA vs SCFV

•Data set 2, 3:• AKULA distance is just 1-way dmin1.*wt• AKULA=128bytes, SCFV = 512 bytes.

Z. Li: ECE 5582 Computer Vision, 2020 p.38

Page 39: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

3D object set: 4 , 5

•Data set4, 5:

Z. Li: ECE 5582 Computer Vision, 2020 p.39

Page 40: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

AKULA in PM

•FPR performance:

•AKULA rates:

pm rates m akula rates 512 8 64 1K 16 128 2K 16 128 1K_4K 16 128 2K_4K 16 128 4K 16 128 8K 32 256 16K 32 256

Z. Li: ECE 5582 Computer Vision, 2020 p.40

Page 41: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

TPR@1% FPR

0

10

20

30

40

50

60

70

80

90

100

1a 1b 1c 2 3 4 5

TPR

(%)

bitrate: 512

TM7

AKULA

0

10

20

30

40

50

60

70

80

90

100

1a 1b 1c 2 3 4 5TP

R (%

)

bitrate: 1k

TM7

AKULA

Z. Li: ECE 5582 Computer Vision, 2020 p.41

Page 42: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

TPR@1%FPR:

0

20

40

60

80

100

120

1a 1b 1c 2 3 4 5

TPR

(%)

bitrate: 2k

TM7

AKULA

0102030405060708090

100

1a 1b 1c 2 3 4 5

TPR

(%)

bitrate: 1k-4k

TM7

AKULA

Z. Li: ECE 5582 Computer Vision, 2020 p.42

Page 43: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

TPR@1%FPR:

0

20

40

60

80

100

120

1a 1b 1c 2 3 4 5

TPR

(%)

bitrate: 2k-4k

TM7

AKULA

0

20

40

60

80

100

120

1a 1b 1c 2 3 4 5

TPR

(%)

bitrate: 4k

TM7

AKULA

Z. Li: ECE 5582 Computer Vision, 2020 p.43

Page 44: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

TPR@1%FPR:

75

80

85

90

95

100

105

1a 1b 1c 2 3 4 5

TPR

(%)

bitrate: 8k

TM7

AKULA

80828486889092949698

100102

1a 1b 1c 2 3 4 5

TPR

(%)

bitrate: 16k

TM7

AKULA

Z. Li: ECE 5582 Computer Vision, 2020 p.44

Page 45: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

AKULA Localization

•Quite some improvements: 2.7%

Z. Li: ECE 5582 Computer Vision, 2020 p.45

Page 46: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

AKULA Summary

•Benefits:• Allow more DoF in aggregation optimization,

o by an outer loop boosting scheme for subspace projection optimization

o And an inner loop adaptive clustering without the constraint of the global GMM model

• Simple weighted distance sum metric, with no need to tune a multi-dimensional decision boundary

• The overall pair wise matching matched up with TM7 SCFV with 2-dimensional decision boundary

• In GD only matching outperforms the TM7 GD• Good improvements to the localization accuracy• Light in extraction, but still heavy in pair wise matching, and

need binarization scheme and/or indexing scheme to work for retrieval

• Future Improvements:• Supervector AKULA ?

Z. Li: ECE 5582 Computer Vision, 2020 p.46

Page 47: Lec 08: Feature Aggregation II - sce.umkc.edu€¦ · Super Vector Aggregation – Speaker ID • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM

Lec 08 Summary

• Fisher Vector• Aggregate features {Xk} in RD

against GMM

•Super Vector• Aggregate GMM against a global

GMM (UBM)

• AKULA• Direct Aggregation, non-

indexable

Z. Li: ECE 5582 Computer Vision, 2020 p.47

++ + +