Lec 13: Low Dimension Embedding · Laplacian Eigenmap direct data embedding without explicit projection input data -> affinity graph -> graph Laplacian eigenvectors - > embedding

Spring 2020: Venu: Haag 315, Time: M/W 4-5:15pm

ECE 5582 Computer VisionLec 13: Low Dimension Embedding

Zhu LiDept of CSEE, UMKC

Office: FH560E, Email: [email protected], Ph: x 2346.http://l.web.umkc.edu/lizhu

Z. Li: ECE 5582 Computer Vision, 2020 p.1

slides created with WPS Office Linux and EqualX LaTex equation editor

Outline

Recap: Part I

Linear Algebra Refresher SVD and Principal Component Analysis (PCA) Laplacian Eigen Map (LEM) Stochastic Neigborhood Embedding (SNE)


Handcrafted Feature Pipeline

An image retrieval pipeline (hand crafted features)

p.3

Image Formation

Feature Computing

Feature Aggregation

Classification

Knowledge/Data Base

Z. Li: ECE 5582 Computer Vision, 2020

Homography,Color space

Color histogramFiltering, Edge DetectionHoG, Harris Detector, LoG Scale space, SIFT

BowVLADFisher VectorSupervector

TPR, FPR, Precision, Recall, mAP

kNN, BayesianSVM, Kernel Machine

Vector and Matrix Notations

Vector

Matrix

p.4Z. Li: ECE 5582 Computer Vision, 2020

Vector Products

Inner Product

Outer Product


Matrix-Vector Product

y=Ax

So, y is a linear combination of basis {ak} with weights from x


Matrix Product

C=AB

Associative: ABC = (AB)C = A(BC)

Distributive: A(B+C) = AB + AC

p.7

A: nxp B: pxm A: nxm=


Outer Product/Kron

Vector outer product:

Example


Matrix Transpose

Transpose


Matrix Trace and Determinant

Trace:Tr(A): only for nxn square matrix

Determinant: Det(A): The size of volumes spanned by A, All possible linear combinations of a1 and a2

p.10

Det(A) = |2-9| = 7;


Eigen Values and Eigen Vectors

Definition: for nxn matrix A:

In Matlab: [P, V]=eig(A);


Eigen Vectors of Symmetric Matrix

If square matrix A: nxn is symmetric A=AT

Then its Eigen Values are real, and Eigen Vectors are othonormal:

� = �X��

where S is a diagonal matrix with eigen values of A.

Application: solution to the Quadratic form maximization:

will be the largest eigen value, and x* will be the corresponding eigen vector of A.


SVD for non square matrix: A mxn:


� = �Σ��

��

Σ

SVD as Signal ExpansionThe Singular Value Decomposition (SVD) of an nxm matrix A, is,

Where the diagonal of S are the eigen values of AAT, [��,��,…, ��], called “singular values” U are eigenvectors of AAT, and V are eigen vectors of ATA,

the outer product of uiviT, are basis of A in reconstruction:

p.14

� = �X�� =��

�

A(mxn) = U(mxm) S(mxn)

V(nxn)

The 1st order SVD approx. of A is:

�� ∗��: , 1� ∗��: , 1��


SVD approximation of an image

Very easy…function [x]=svd_approx(x0, k)dbg=0;if dbg x0= fix(100*randn(4,6)); k=2;end[u, s, v]=svd(x0);[m, n]=size(s);x = zeros(m, n); sgm = diag(s);for j=1:k x = x + sgm(j)*u(:,j)*v(:,j)'; end


SVD for Separable Filtering

Take LoG filter for eg. h=fspecial('LoG', 11, 2.0); [u,s,v]=svd(h); h1=s(1,1)*u(:,1)*v(:,1)';


h1 is 1-SVD approx of LoG

Many implications forDeep networks acceleration !

NormVector Norm: Length of the vector Euclidean Norm (L2 Norm): norm(x, 2)

Lp norm:

Matrix Norm: Forbenius Norm


Quadratic Form

Quadratic form f(x)=xTAx in R:

Positive Definite (PD): For non-zero x, xTAx > 0

Positive Semi-Definite (PSD): For non-zero x, xTAx >= 0

Indefinite: Exists x1, x2 non zero, but x1

TAx1 >0, while x2TAx2 < 0;


Matrix Calculus

Gradient of f(A):

Matrix Gradient Properties


Hessian of f(X)

For function:�:�� →�

Gradient & Hessian of Quadratic Form: f(x)= xTAx


PCA -Dimension Reduction

A typical image retrieval pipeline


Image Formation

Feature Computing

Feature Aggregation

Classification

Knowledge/Data Base

e.g, dense SIFT: 12000 x 128 e.g, Fisher Vector: k=64, d=128

Rd -> Rp

Outline

Recap: Part I

Linear Algebra Refresher SVD and Principal Component Analysis (PCA) Laplacian Eigen Map (LEM) Stochastic Neigborhood Embedding (SNE)


Principal Component Analysis

The formulation: for data points {x1, x2,…, } in Rn, find a lower dimensional

representation in Rm, via a projection W,: mxn, s.t., the energy of the data is preserved


PCA solution

Take the Lagrangian of the problem

Take the derivative w.r.t. to w, and KKT condition gives us,

This is an Eigen problem, finding projection s.t. it is just a scaling along the scatter matrix eigen vectors.


X� = ��

PCA – how to compute

PCA via SVD on the Covariance matrix


S: covariance, nxn

2d Data

2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5-2

0

2

4

6

8

10


Principal Components

-5 -4 -3 -2 -1 0 1 2 3 4 5-5

-4

-3

-2

-1

0

1

2

3

4

5

1st principal vector

2nd principal vector

Gives best axis to project Minimum RMS

errorPrincipal vectors

are orthogonal


PCA on HoGs

Matlab Implementation of PCA: [A, s, eig_values]=princomp(hogs);


HoG basis function

PCA Application in Aggregation

SIFT aggregation Usually a PCA is

done on SIFT features, to reduce the dimension from 128 to say 24, 32. Then a GMM is

trained in R32 space, for FV encoding

Homework-2 Aggregation Fisher Vector

Aggregation of SIFT


load../../dataset/cdvs_sift_aggregation_test_data.mat;

[n_sift, kd_sift]=size(gd_sift_cdvs);offs = randperm(n_sift); offs = offs(1:200*2^10);% PCA[A1, s1, lat1]=princomp(double(gd_sift_cdvs(offs,:)));

figure(41); hold on; grid on; stem(lat1, '.'); title('sift pca eigen values');

SIFT PCA

Eigen values

That is why we have kd=[24, 32 48] for SIFT GMM in FV aggregation


SIFT PCA Basis Functions

Capturing max variation directions


Visualizing SIFT in lower dimensional space

Project SIFTs from 2 images to 2D space


Laplacian Eigen Map

Directly compute an embedding {yk} from input {xk} in RD without the explicit projection model A, s.t. Y=AX Objective function

where the nxn affinity matrix W reflects the data points relationship in the original space X.


M. Belkin and P. Niyogi. Laplacian Eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems, volume 14, pages 585–591, Cambridge, MA, USA, 2002. The MIT Press

Graph Laplacian

Graph Laplacian: L= D - W


Laplacian Eigenmap

Minimizing the following

Is equivalent to

Where D is the degree matrix (diagonal) with


Laplacian Eigen Map Solution

Numerically, solve the eigen problem:

where the first d smallest eigen values’ corresponding eigenvectors, will form a d-dimensional feature of {yk}


�� = ��

eigen vec

n-point gives nxn Laplacianfirst k eigenvectors of 1xn,give us k-dimension inducedembedding

Stochastic Neigbor Embedding

Hinton's work: For high dimensional data {xi} in R20x20, e.g., digits in

MINST Find its lower dimension (e.g, 2D) embedding such that

their relative affinity is preserved Unsupervised (no label info utilized)


Probability Preserving Embedding

• Each point in high-dim has a conditional probability of picking each other point as its neighbor.

• The distribution over

neighbors is based on the high-dim pairwise distances.– If we do not have

coordinates for the datapoints we can use a matrix of dissimilarities instead of pairwise distances.

High-D Space

i

jk

probability of picking j given that you start at i


Problem Formulation SNE starts by converting the Euclidean distances between

high-dimensional data points pair distance d(xi, xj) into conditional probabilities that represent similarity. It can be described as:

its lower dimensional embedding{xi} in RD to {yi} in Rd, has similar distance as conditional prob as,

Stochastic Neighbor Embedding

not symmetric


Stochastic Neighbor Embedding (SNE) Preserving pair wise prob relationship in terms of

conditional prob, i.e, minimizes the differences of p(j|i) and q(j|i) for all pairs of {xi, xj} and {yi, yj}

KL distance, measures the difference in two distributions (bonus points for HW-1, using KL to measure histogram distance) Has coding penality interpretation:

http://sce2.umkc.edu/csee/lizhu/teaching/2018.fall.video-com/notes/lec02.pdf

40


SNE solution

Gradient of the total KL distance:

this gives us gradient search solution:

moving along the gradient, with a momentum factor


Matlab Implementation

t-Distribution SNE example: HW-2 data embedding


Summary SVD and PCA SVD – non-square matrix decomposition, left transform and

right transform, with scaling in between SVD – as an image decomposition, linear combination of

outer-product basis PCA – eigen values indicate amount of info/energy in each

dimension, PCA – basis are eigen vectors to the covariance matrix

Laplacian Eigenmap direct data embedding without explicit projection input data -> affinity graph -> graph Laplacian eigenvectors -

> embedding by picking up eigenvectorsStochastic Neigbor Embedding No explict projection matrix Embedding by preserving probabilitic affinity Solution via gradient algorithm


Documents

Lec 13: Low Dimension Embedding · Laplacian Eigenmap direct data embedding without explicit projection input data -> affinity graph -> graph Laplacian eigenvectors - > embedding