Lec 08: Feature Aggregation II - sce.umkc.edu Lec 08 Summary • Fisher Vector •Aggregate features

  • View
    0

  • Download
    0

Embed Size (px)

Text of Lec 08: Feature Aggregation II - sce.umkc.edu Lec 08 Summary • Fisher Vector •Aggregate...

  • Spring 2019: Venu: Haag 315, Time: M/W 4-5:15pm

    ECE 5582 Computer Vision Lec 08: Feature Aggregation II

    Zhu Li Dept of CSEE, UMKC

    Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu

    Z. Li: ECE 5582 Computer Vision, 2019. p.1

    slides created with WPS Office Linux and EqualX LaTex equation editor

  • Outline

    • ReCap of Lecture 07 • Image Retrieval System • BoW • VLAD

    • Dense SIFT • Fisher Vector Aggregation • AKULA • Summary

    Z. Li, Image Analysis & Retrv. Spring 2018 p.2

  • Precision, Recall, F-measure

    • Precision, TPR = TP/(TP + FP), • Recall = TP/(TP + FN), • FPR=FP/(TP+FP) • F-measure = 2*(precision*recall)/(precision + recall)

    Precision: is the probability that a

    retrieved document is relevant.

    Recall: is the probability that a

    relevant document is retrieved in a search.

    Z. Li, Image Analysis & Retrv. Spring 2018 p.3

  • Why Aggregation ?

    • Curse of Dimensionality

    •Decision Boundary / Indexing

    Z. Li, Image Analysis & Retrv. Spring 2018 p.4

    +

    …..

  • Bag-of-Words: Histogram Coding

    Z. Li, Image Analysis & Retrv. Spring 2018 p.5

    k

    n

  • Kernel Code Book Soft Encoding

    Z. Li, Image Analysis & Retrv. Spring 2018 p.6

  • VLAD- Vector of Locally Aggregated Descriptors

    Z. Li, Image Analysis & Retrv. Spring 2018 p.7

     3

    x

    v1 v2 v3 v4

    v5

     1

     4

     2

     5

    ① assign descriptors

    ② compute x-  i

    ③ vi=sum x-  i for cell i

  • VLAD on SIFT

    • Example of aggregating SIFT with VLAD • K=16 codebook entries • Each cell is a SIFT visualized as centroids in blue, and

    VLAD difference in red • Top row: left image, bottom row: right image, red: code

    book, blue: encoded VLAD

    Z. Li, Image Analysis & Retrv. Spring 2018 p.8

  • Outline

    • ReCap of Lecture 07 • Image Retrieval System • BoW • VLAD

    • Dense SIFT • Fisher Vector Aggregation • AKULA • Summary

    Z. Li, Image Analysis & Retrv. Spring 2018 p.9

  • One more trick

    • Recall that SIFT is a powerful descriptor

    • VL_FEAT: vl_dsift • A dense description of image by computing SIFT descriptor

    (no spatial-scale space extrema detection) at predetermined grid

    • Supplement HoG as an alternative texture descriptor Z. Li, Image Analysis & Retrv. Spring 2018 p.10

  • VL_FEAT: vl_dsift

    • Compute dense SIFT as a texture descriptor for the image • [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘step’, 2);

    • There’s also a FAST option • [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘fast’, ‘step’, 2); • Huge amount of SIFT data will be generated

    Z. Li, Image Analysis & Retrv. Spring 2018 p.11

  • Fisher Vector

    • Fisher Vector and variations: • Winning in image classification:

    • Winning in the MPEG object re-identification: o SCFV(Scalable Coded Fisher Vec) in CDVS

    Z. Li, Image Analysis & Retrv. Spring 2018 p.12

  • Codebook: Gaussian Mixture Model (GMM)

    Z. Li, Image Analysis & Retrv. Spring 2018 p.13

  • A bit of Theory: Fisher Kernel

    Z. Li, Image Analysis & Retrv. Spring 2018 p.14

    X1 +

  • A bit of Theory: Fisher Kernel

    Z. Li, Image Analysis & Retrv. Spring 2018 p.15

  • Fisher Vector

    • KFK(X, Y) is a measure of similarity, w.r.t. the generative model • Similar to the Mahanolibis distance

    case, we can decompose this kernel as,

    • That give us a kernel feature mapping of X to Fisher Vector

    • For observed images features {xt}, can be computed as,

    Z. Li, Image Analysis & Retrv. Spring 2018 p.16

  • GMM Fisher Vector

    Z. Li, Image Analysis & Retrv. Spring 2018 p.17

    weight

    mean

    variance

  • GMM Fisher Vector VL_FEAT implementation

    Z. Li, Image Analysis & Retrv. Spring 2018 p.18

  • GMM Fisher Vector VL_FEAT implementation

    • FV encoding • Gradient w.r.t. the mean, variance, for GMM component k,

    j=1..D

    • In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances

    Z. Li, Image Analysis & Retrv. Spring 2018 p.19

  • VL_FEAT GMM/FV API

    • Compute GMM model with VL_FEAT • Prepare data: numPoints = 1000 ; dimension = 2 ; data = rand(dimension,N) ;

    • Call vl_gmm: numClusters = 30 ; [means, covariances, priors] = vl_gmm(data, numClusters) ;

    • Visualize: figure ; hold on ; plot(data(1,:),data(2,:),'r.') ; for i=1:numClusters vl_plotframe([means(:,i)' sigmas(1,i) 0 sigmas(2,i)]); end

    Z. Li, Image Analysis & Retrv. Spring 2018 p.20

  • VL_FEAT API

    • FV encoding encoding = vl_fisher(data_to_Be_Encoded, means, covariances, priors);

    • Bonus points: • Encode HoG features with Fisher Vector ? • randomly collect 2~3 images from each class • Stack all HoG features together into an n x 36 data matrix • Compute its GMM • Use this GMM to encode all image HoG features (other than

    average)

    Z. Li, Image Analysis & Retrv. Spring 2018 p.21

  • Super Vector Aggregation – Speaker ID

    • Fisher Vector: Aggregates Features against a GMM • Super Vector: Aggregates GMM against GMM

    • Ref: o William M. Campbell, Douglas E. Sturim, Douglas A. Reynolds: Support vector

    machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5): 308-311 (2006)

    Z. Li, Image Analysis & Retrv. Spring 2018 p.22

    “Yes, We Can !”

    ?

  • Super Vector from MFCC • Motivated from Speaker ID work

    • Speech is a continuous evolution of the vocal tract • Need to extract a sequence of spectra or sequence of spectral coefficients • Use a sliding window - 25 ms window, 10 ms shift

    Z. Li, Image Analysis & Retrv. Spring 2018 p.23

    DCTLog|X(ω)| MFCC

  • GMM Model from MFCC • GMM on MFCC feature

    Z. Li, Image Analysis & Retrv. Spring 2018 p.24

    • The acoustic vectors (MFCC) of speaker s is modeled by a prob. density function parameterized by

    • Gaussian mixture model (GMM) for speaker s:

  • Universal Background Model

    • UBM GMM Model:

    Z. Li, Image Analysis & Retrv. Spring 2018 p.25

    • The acoustic vectors of a general population is modeled by another GMM called the universal background model (UBM):

    • Parameters of the UBM

  • MAP Adaption

    • Given the UBM GMM, how is the new observation derivate ? • The adapted mean is given by:

    Z. Li, Image Analysis & Retrv. Spring 2018 p.26

  • Supervector Distance

    Z. Li, Image Analysis & Retrv. Spring 2018 p.27

  • Supervector Performance in NIST Speaker ID

    • System 5: Gaussian SV • DCF (Detection Cost Function)

    Z. Li, Image Analysis & Retrv. Spring 2018 p.28

  • m31491

    AKULA – Adaptive KLUster Aggregation

    2013/10/25

    Abhishek Nagar, Zhu Li, Gaurav Srivastava and Kyungmo Park

    Z. Li, Image Analysis & Retrv. Spring 2018 p.29

  • Outline

    •Motivation •Adaptive Aggregation •Results with TM7 •Summary

    Z. Li, Image Analysis & Retrv. Spring 2018 p.30

  • Motivation

    •Better Aggregation • Fisher Vector and VLAD type aggregation depending on a

    global model • AKULA removes this dependence, and directly coding the

    cluster centroids and sift count • SCFV/RVD all having situations where clusters are turned

    off due to no assignment, this can be avoided in AKULA

    SIFT detection & selection K-means AKULA description

    Z. Li, Image Analysis & Retrv. Spring 2018 p.31

  • Motivation

    •Better Subspace Choice • Both SCFV and RVD do fixed normalization and PCA

    projection based on heuristic. • What is the best possible subspace to do the aggregation ? • Using a boosting scheme to keep adding subspaces and

    aggregations in an iterative fashion, and tune TPR-FPR to the desired operating points on FPR.

    Z. Li, Image Analysis & Retrv. Spring 2018 p.32

  • CE2: AKULA – Adaptive KLUster Aggregation

    • AKULA Descriptor: cluster centroids + SIFT count

    A2={yc21, yc22, …, yc2k ; pc21, pc22, …, pc2k }

    • Distance metric: • Min centroids distance, weighted

    by SIFT count

    A1={yc11, yc12, …, yc1k ; pc11, pc12, …, pc1k },

    Z. Li, Image Analysis & Retrv. Spring 2018 p.33

  • AKULA implementation in TM7

    • Inner loop aggregation • Dimension is fixed at 8 • Numb of clusters, or nc=8, 16, 32, to hit 64, 128, and 256

    bytes • Quantization: scale by ½ and quantized to int8, sift count is

    8 bits, total (nc+1)*dim bytes per aggregation

    Z. Li, Image Analysis &amp