Download pdf - CVPR2012 Poster Linear Discriminative Image Processing Operator Analysis

Linear Discriminative Image Processing Operator Analysis Toru Tamaki, Bingzhi Yuan, Kengo Harada, Bisser Raytchev, Kazufumi Kaneda

Most discriminative image processing operators (IPOs)

reco

gniti

on

Feat

ure

spac

e

LDA

clas

sifie

r

Generating matrices (image processing operators)

Goal

Motivation

Contribution

Find a most discriminative set of image processing operations for LDA.

For a small sample size problem, many studies use an approach to increase training samples by synthetically generating new training samples. But, HOW ?

Ad-hoc… discriminatively !

Simultaneous estimation of both LDA feature space and a set of discriminative generating matrices.

Linear IPO + LDA = LDA with increased samples

xj = Gjx

m0 = Gm

m

0i =

1

Jni

JX

j=1

X

x2Xi

Gjx =1

J

JX

j=1

Gjmi = Gmi

S0i = G (Si �Ri) G

T +1

J

JX

j=1

GjRiGTj

S0W = G (SW �RW ) GT +

1

J

JX

j

GjRWGjT

S0B = GSBG

T

Ri =1

ni

X

x2Xi

xx

T

RW =cX

i

Ri

X 0 = G (X �Rall) GT +

1

J

JX

j=1

GjRallGTj

eS0i =ATPTS0

iPA

eS0W =ATPTS0

WPA

eS0B =ATPTS0

BPA

yj =ATPTxj = ATPTGjx

m

0i =ATPT = ATPT Gmi

m

0 =ATPTm

0 = ATPT Gm

�PTS0

WP��1

PTS0BP

tr(eS0B)

tr(eS0W )

increased sample

Mean of class i for increased samples

Mean of all increased samples

an original sample

a generating matrix (an image processing operator)

average of image processing operations

Scatter matrix of class i for increased samples

Within-class scatter matrix for increased samples

Between-class scatter matrix for increased samples

are scatter matrices for original (non-increased) samples

scatter matrices Rayleigh quotient

Generalized Eigenvalue problem

Training sample

Scatter matrices

PCA

LDA Feature space

PPCA projection matrix

Covariance matrix

SW , SB

Dimensionality reduction

Given , we don’t need to actually increase training samples. But, need more memory to store…

{Gj}

Analysis of IPO: the spectral decomposition

Definition 1 Let f(x), g(x) 2 L2(R2

) be complex-

valued 2D functions where x 2 R2. The inner

product is defined as

(f, g) ⌘Z

R2

f(x)g(x)dx,

where g is the complex conjugate of g.An operator G : f 7! g is linear if it satis-

fies G(af + bg) = aG(f) + bG(g), 8a, b 2 R.

G⇤is the adjoint operator of G if it satis-

fies (Gf, g) = (f,G⇤g).

Corollary 1 Filtering or geometric transformation opera-

tors G are normal operators which satisfy G⇤G = GG⇤.

G =X

�iPiA normal operator can be decomposed into projection operators!

(a) x (b) Gx (c) GTGx (d) GTx

x

E1x E2x E3x E4x E5x E6x

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

(a) H11, H21

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

(b) H12, H22

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

indexei

gen

valu

e

(c) H13, H23 P11ix P21ix P12ix P22ix P13ix P23ix

But, is it feasible for a generating matrix? Yes! Is a fltering Hermite?

||G�GT || < 10�6Almost symmetric

Is a geometric trans. Unitary? Transpose is apparently inverse

G = H1 + iH2, H1 =G+GT

2, H2 =

G�GT

2i

i =p�1

Are eigenvalues complex? Use Hermite decomposition.

So, two step approximation.

an operator two Hermite operators (which have real eigenvalues)

G = H1 + iH2, H1 =G+GT

2, H2 =

G�GT

2i

G 'X

j

ajEj =X

aj(H1j + iH2j) 'X

j

ajX

i

(�1jiP1ji + i�2jiP2ji)

Examples

Real eigenvalues can be small so that we can compress them. Eigenprojections of eigenoperators transorm images to … wavelets?

Eigenoperators transorm images to variants.

Q: To reduce the memory cost of generating matrices, can we use a decomposition for operators just like for images?

A: Yes.

LDA + IPO = LDIPOA: find a set of discriminative IPOs

G(k) = 1k+1

Pkl=0 G

(l) D = PA

eS0(k)W = DT G(k) (SW �RW ) G(k)TD +

1

k + 1

kX

l=0

DTG(l)RWG(l)TD

eS0(k)B = DT G(k)SBG

(k)TD

X 0(k) = G(k) (X �Rall) G(k)T +

1

k + 1

kX

l=0

G(l)RallG(l)T

S0(k)W = G(k) (SW �RW ) G(k)T +

1

k + 1

kX

l=0

G(l)RWG(l)T

S0(k)B = G(k)SBG

(k)T

Algorithm 1 LDIPOA

1: Compute PCA P and LDA A. G0 I.2: for k = 1, . . . , do3: repeat

4: ↵ step: ↵(k) = argmax↵ E(A,P,↵)5: PCA step: Compute P with ↵(k).6: LDA step: A = argmaxA E(A,P,↵(k))7: until E converges8: end for

alpha step

PCA step

LDA step

At each step k, estimate a single generating matrix represented as a linear combination.

G(k) =JX

j

↵(k)j Gj (↵(k)

1 ,↵(k)2 , . . . ,↵(k)

J )T = ↵(k)

Experiments with FERET dataset

The proposed algorithm iteratively estimates • α (coeffs. of generating matrices) • P (PCA) • A (LDA) at the same time.

k: the number of estimated generating matrices

10 generating matrices are used to increase the dataset 11 times.

1 generating matrix is used to increase the dataset double.

No generating matrices are used (normal LDA)

The Rayleigh quotient

xj = Gjx

Proposition 1 A filtering is defined as

Gf(x) =

ZG(x,y)f(y)dy,

where the kernel is symmetric G(x,y) = G(y,x) and real valued.

G is an Hermite operator which satisfies G⇤ = G.

Proposition 2 A geometric (a�ne) transformation G is defined

as

Gf(x) = |A|1/2f(Ax+ t),

where |A| 6= 0. G is a unitary operator which satisfies G⇤G = I.

real imag real imag real imag

Size of images: 32x32 Size of generating matrices: 1024x1024 Number of classes: 1001 (fa) Training images per class: 1 (fa) Test images per class: 1 (fb) Eigen-generating matrices: 96 Initial generating matrices: 567 (3 scaling, 7 rotations, 3 Gaussian and 9 motion blurs) Classifiers: nearest neighbor PCA rates: 80% and 95% for eigen-generating matrices (G-PCA) for PCA step (LDA-PCA)

Maximized in a few steps

A few generating matrices are enough to improve the performance.

Bad approximation of generating matrices do not lead to any improvement…

i = 1

i = 2

...