CVPR2012 Poster Linear Discriminative Image Processing Operator Analysis

Preview:

DESCRIPTION

Toru Tamaki, Bingzhi Yuan, Kengo Harada, Bisser Raytchev, Kazufumi Kaneda: "Linear Discriminative Image Processing Operator Analysis", Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR2012), pp. 2526-2532, 2012. Providence Convention Center, Providence, Rhode Island, USA, June 16-21th, 2012.

Citation preview

Linear Discriminative Image Processing Operator Analysis Toru Tamaki, Bingzhi Yuan, Kengo Harada, Bisser Raytchev, Kazufumi Kaneda

Most discriminative image processing operators (IPOs)

reco

gniti

on

Feat

ure

spac

e

LDA

clas

sifie

r

Generating matrices (image processing operators)

Goal

Motivation

Contribution

Find a most discriminative set of image processing operations for LDA.

For a small sample size problem, many studies use an approach to increase training samples by synthetically generating new training samples. But, HOW ?

Ad-hoc… discriminatively !

Simultaneous estimation of both LDA feature space and a set of discriminative generating matrices.

Linear IPO + LDA = LDA with increased samples

xj = Gjx

m0 = Gm

m

0i =

1

Jni

JX

j=1

X

x2Xi

Gjx =1

J

JX

j=1

Gjmi = Gmi

S0i = G (Si �Ri) G

T +1

J

JX

j=1

GjRiGTj

S0W = G (SW �RW ) GT +

1

J

JX

j

GjRWGjT

S0B = GSBG

T

Ri =1

ni

X

x2Xi

xx

T

RW =cX

i

Ri

X 0 = G (X �Rall) GT +

1

J

JX

j=1

GjRallGTj

eS0i =ATPTS0

iPA

eS0W =ATPTS0

WPA

eS0B =ATPTS0

BPA

yj =ATPTxj = ATPTGjx

m

0i =ATPT = ATPT Gmi

m

0 =ATPTm

0 = ATPT Gm

�PTS0

WP��1

PTS0BP

tr(eS0B)

tr(eS0W )

increased sample

Mean of class i for increased samples

Mean of all increased samples

an original sample

a generating matrix (an image processing operator)

average of image processing operations

Scatter matrix of class i for increased samples

Within-class scatter matrix for increased samples

Between-class scatter matrix for increased samples

are scatter matrices for original (non-increased) samples

scatter matrices Rayleigh quotient

Generalized Eigenvalue problem

Training sample

Scatter matrices

PCA

LDA Feature space

PPCA projection matrix

Covariance matrix

SW , SB

Dimensionality reduction

Given , we don’t need to actually increase training samples. But, need more memory to store…

{Gj}

Analysis of IPO: the spectral decomposition

Definition 1 Let f(x), g(x) 2 L2(R2

) be complex-

valued 2D functions where x 2 R2. The inner

product is defined as

(f, g) ⌘Z

R2

f(x)g(x)dx,

where g is the complex conjugate of g.An operator G : f 7! g is linear if it satis-

fies G(af + bg) = aG(f) + bG(g), 8a, b 2 R.

G⇤is the adjoint operator of G if it satis-

fies (Gf, g) = (f,G⇤g).

Corollary 1 Filtering or geometric transformation opera-

tors G are normal operators which satisfy G⇤G = GG⇤.

G =X

�iPiA normal operator can be decomposed into projection operators!

(a) x (b) Gx (c) GTGx (d) GTx

x

E1x E2x E3x E4x E5x E6x

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

(a) H11, H21

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

(b) H12, H22

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

index

eige

n va

lue

100 200 300 400 500 600 700 800 900 1000

−0.1

−0.05

0

0.05

0.1

0.15

indexei

gen

valu

e

(c) H13, H23 P11ix P21ix P12ix P22ix P13ix P23ix

But, is it feasible for a generating matrix? Yes! Is a fltering Hermite?

||G�GT || < 10�6Almost symmetric

Is a geometric trans. Unitary? Transpose is apparently inverse

G = H1 + iH2, H1 =G+GT

2, H2 =

G�GT

2i

i =p�1

Are eigenvalues complex? Use Hermite decomposition.

So, two step approximation.

an operator two Hermite operators (which have real eigenvalues)

G = H1 + iH2, H1 =G+GT

2, H2 =

G�GT

2i

G 'X

j

ajEj =X

aj(H1j + iH2j) 'X

j

ajX

i

(�1jiP1ji + i�2jiP2ji)

Examples

Real eigenvalues can be small so that we can compress them. Eigenprojections of eigenoperators transorm images to … wavelets?

Eigenoperators transorm images to variants.

Q: To reduce the memory cost of generating matrices, can we use a decomposition for operators just like for images?

A: Yes.

LDA + IPO = LDIPOA: find a set of discriminative IPOs

G(k) = 1k+1

Pkl=0 G

(l) D = PA

eS0(k)W = DT G(k) (SW �RW ) G(k)TD +

1

k + 1

kX

l=0

DTG(l)RWG(l)TD

eS0(k)B = DT G(k)SBG

(k)TD

X 0(k) = G(k) (X �Rall) G(k)T +

1

k + 1

kX

l=0

G(l)RallG(l)T

S0(k)W = G(k) (SW �RW ) G(k)T +

1

k + 1

kX

l=0

G(l)RWG(l)T

S0(k)B = G(k)SBG

(k)T

Algorithm 1 LDIPOA

1: Compute PCA P and LDA A. G0 I.2: for k = 1, . . . , do3: repeat

4: ↵ step: ↵(k) = argmax↵ E(A,P,↵)5: PCA step: Compute P with ↵(k).6: LDA step: A = argmaxA E(A,P,↵(k))7: until E converges8: end for

alpha step

PCA step

LDA step

At each step k, estimate a single generating matrix represented as a linear combination.

G(k) =JX

j

↵(k)j Gj (↵(k)

1 ,↵(k)2 , . . . ,↵(k)

J )T = ↵(k)

Experiments with FERET dataset

The proposed algorithm iteratively estimates • α (coeffs. of generating matrices) • P (PCA) • A (LDA) at the same time.

k: the number of estimated generating matrices

10 generating matrices are used to increase the dataset 11 times.

1 generating matrix is used to increase the dataset double.

No generating matrices are used (normal LDA)

The Rayleigh quotient

xj = Gjx

Proposition 1 A filtering is defined as

Gf(x) =

ZG(x,y)f(y)dy,

where the kernel is symmetric G(x,y) = G(y,x) and real valued.

G is an Hermite operator which satisfies G⇤ = G.

Proposition 2 A geometric (a�ne) transformation G is defined

as

Gf(x) = |A|1/2f(Ax+ t),

where |A| 6= 0. G is a unitary operator which satisfies G⇤G = I.

real imag real imag real imag

Size of images: 32x32 Size of generating matrices: 1024x1024 Number of classes: 1001 (fa) Training images per class: 1 (fa) Test images per class: 1 (fb) Eigen-generating matrices: 96 Initial generating matrices: 567 (3 scaling, 7 rotations, 3 Gaussian and 9 motion blurs) Classifiers: nearest neighbor PCA rates: 80% and 95% for eigen-generating matrices (G-PCA) for PCA step (LDA-PCA)

Maximized in a few steps

A few generating matrices are enough to improve the performance.

Bad approximation of generating matrices do not lead to any improvement…

i = 1

i = 2

...

Recommended