52
Divergence matching criteria for indexing and registration Alfred O. Hero Dept. EECS, Dept Biomed. Eng., Dept. Statistics University of Michigan - Ann Arbor [email protected] http://www.eecs.umich.edu/˜hero Collaborators: Olivier Michel, Bing Ma, Huzefa Heemuchwala Outline 1. Statistical framework: entropy measures, error exponents 2. α-MI indexing using single pixel gray levels 3. α-MI indexing via coincident features 4. α-entropy and α-MI estimation via MST 1

registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

  • Upload
    others

  • View
    21

  • Download
    0

Embed Size (px)

Citation preview

Page 1: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Divergence matching criteria for indexing andregistration

Alfred O. Hero

Dept. EECS, Dept Biomed. Eng., Dept. Statistics

University of Michigan - Ann Arbor

[email protected]

http://www.eecs.umich.edu/˜hero

Collaborators: Olivier Michel, Bing Ma, Huzefa Heemuchwala

Outline

1. Statistical framework: entropy measures, error exponents

2. α-MI indexing using single pixel gray levels

3. α-MI indexing via coincident features

4. α-entropy andα-MI estimation via MST

1

Page 2: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

(a) ImageI1 (b) ImageI0 (c) Registration result

Figure 1: A multidate image registration example

2

Page 3: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Statistical Framework� I : an image

� Z = Z(I): an image feature vector over[0;1]d

� IR a reference image, featureZR

� fI ig a database ofK images, featuresZi

� f (ZR; Zi): joint feature density

For a pairZR andZi we have two hypotheses:

H0 : f (ZR; Zi) = f (ZR) f (Zi)H1 : f (ZR; Zi) 6= f (ZR) f (Zi)

3

Page 4: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Divergence Measures

Refs: [Csiszar:67,Basseville:SP89]

Define densities

f1 = f (ZR; Zi); f0 = f (ZR) f (Zi)

The Renyi α-divergence of fractional orderα 2 [0;1] [Renyi:61,70 ]

Dα( f1 k f0) =

1α�1

ln

Zf1

f1f0

�αdx

=

1α�1

ln

Zf α1 f 1�α

0 dx

4

Page 5: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Renyi α-Divergence: Special cases� α-Divergence vsα-Entropy

Hα( f1) =

11�α

ln

Z

f α1 dx=�Dα( f1 k f0)j f0=U([0;1]d)

� α-Divergence vs. Batthacharyya-Hellinger distance

D 12

( f1 k f0) = ln�Z p

f1 f0dx

�2

D2BH( f1k f0) =

Z �pf1�

pf0

�2dx= 2

1�Z p

f1 f0dx

� α-Divergence vs. Kullback-Liebler divergence

limα!1

Dα( f1k f0) =Z

f1 lnf1f0

dx:

5

Page 6: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Renyi α-divergence and Error Exponents

Observe i.i.d. sampleW = [W1; : : : ;Wn]

H0 : Wj � f0(w)

H1 : Wj � f1(w)

Bayes probability of error

Pe(n) = β(n)P(H1)+α(n)P(H0)

LDP gives Chernoff bound [Dembo&Zeitouni:98]

liminfn!∞

1n

logPe(n) =� supα2[0;1]

f(1�α)Dα( f1k f0)g :6

Page 7: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Registration via α-Mutual-Information

Ref: Viola&Wells:ICCV95

1. ReferenceIR and targetIT images.

2. Set of rigid transformationsfTig3. Derived feature vectors

ZR = Z(IR); Zi = Z(Ti(IT))

7

Page 8: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

H0 : fZR;Zig independent

H1 : fZR;Zig dependent

Error exponent isα-MI (Neemuchwala&etal:ICIP01,

Pluim&etal:SPIE01)

MIα(ZR;Zi) =

1α�1

ln

Z

f α(ZR;Zi)( f (ZR) f (Zi))1�αdZRdZi :

8

Page 9: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Registration via α-Jensen-Difference

Ref: Ma&etal:ICIP00, He&etal:SigProc01

� Jensen’s difference btwnf0; f1:

∆Jα = Hα(ε f1+(1� ε) f0)� εHα( f1)� (1� ε)Hα( f0)� 0

� f0; f1 are two densities,ε satisfies 0� ε� 1

� Let X ;Y be i.i.d. features extracted from two images

Xm = fX1; : : : ;Xmg; Yn = fY1; : : : ;Yng

� Each realization inunorderedsampleZ = fXm; Yng has marginal

fZ(z) = ε fX(z)+(1� ε) fY(z); ε =

mn+m

9

Page 10: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Ultrasound Registration Example

Figure 2: Three ultrasound breast scans. From top to bottom are: case 151,

case 142 and case 162.

10

Page 11: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

(a) ImageX1 (b) ImageX0

Figure 3: Single Pixel Coincidences (Left and right: 0o and 8o rotations)

11

Page 12: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Grey Level Scatterplots

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

Figure 4:Grey level scatterplots. 1st Col: target=reference slice. 2nd Col: target = reference+1 slice.

12

Page 13: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

−5 −4 −3 −2 −1 0 1 2 3 4 5−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

ANGLE OF ROTATION (deg) −−−−−−>

α−D

IVE

RG

EN

CE

(αM

I) −

−−

−−

−>

α−DIVERGENCE v/s α CURVES FOR α ∈ [0,1] FOR SINGLE PIXEL INTENSITY

α =0 α =0.1α =0.2α =0.3α =0.4α =0.5α =0.6α =0.7α =0.8α =0.9

α=0.1

α=0.9

α=0

Figure 5:α-Divergence as function of angle for ultra sound image registration of

image 142

13

Page 14: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.005

0.01

0.015

0.02

0.025

α −−−−−−>

curv

atur

e of

α−

MI −

−−

−−

−>

CURVATURE OF: α−MI v/s α CURVE FOR SINGLE PIXEL INTENSITY

Figure 6:Resolution ofα-Divergence as function of alpha

14

Page 15: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Higher Level Features

Disadvantages of gray level features:

� Only depends on histogram of single pixel pairs

� Insensitive to spatial reording of pixels in each image

� Difficult to select out grey level anomalies (shadows, speckle)

� Spatial discriminants fall outside of single pixel domain

� Alternative : Spatial-feature-based indexing

15

Page 16: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Local Tags

(a) ImageX0 (b) ImageXi

Figure 7: Local Tag Coincidences

16

Page 17: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Spatial Relations Between Local Tags

N

E

SE

S

W

R

(a) ImageX0

N

E

SE

S

W

R

(b) ImageXi

Figure 8: Spatial Relation Coincidences

17

Page 18: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Feature: spatial tags via feature trees

Root Node

Depth 1

Depth 2

Not examined further

Figure 9:Part of feature tree data structure.

Terminal nodes (Depth 16)

Figure 10:Leaves of feature tree data structure.

18

Page 19: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Feature: projection-coefficient wrt ICA basis

Figure 11:Estimated ICA basis set for ultrasound breast image database

19

Page 20: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Simple Example

50 100 150 200 250 300 350

50

100

150

200

250

300

350

50 100 150 200 250 300 350

50

100

150

200

250

300

350

−8 −6 −4 −2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Performance of higher order features v/s single pixels

Relative rotation between images

MS

T L

engt

h an

d S

hann

on M

I

MST LengthShannon MI

Figure 12:10 basis composite images with noise and MI vs Feature divergence objective functions

20

Page 21: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

US Registration Comparisons

151 142 162 151/8 151/16 151/32

pixel 0:6=0:9 0:6=0:3 0:6=0:3

tag 0:5=3:6 0:5=3:8 0:4=1:4

spatial-tag 0:99=14:6 0:99=8:4 0:6=8:3

ICA 0:7=4:1 0:7=3:9 0:99=7:7

Table 1: Numerator =optimal values ofα and Denominator = maximum

resolution of mutualα-information for registering various images (Cases

151, 142, 162) using various features (pixel, tag, spatial-tag, ICA). 151/8,

151/16, 151/32 correspond to ICA algorithm with 8, 16 and 32 basis ele-

ments run on case 151.

21

Page 22: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Feature-based Indexing: Challenges� How to best select discriminating features?

– Require training database of images to learn feature set

– Apply cross-validation...

– ...bagging, boosting, or randomized selection?

� How to computeα-MI for multi-dimensional features?

– Tag space is of high cardinality:25616� 1032

– ICA projection-coefficient space is multi-dimensional continuum

– Soln 1: partition feature space and count coincidences...

– Soln 2: apply kernel density estimation and ...

– ...plug in to theα-MI or α-Jensen formula

– Soln 3: estimateα-MI or α-Jensen directly via MST.

22

Page 23: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Methods of Entropy/Divergence Estimation� Z = (ZR;ZT): a statistic (feature pair)

� fZig: n i.i.d. realizations fromf (Z)

Objective: Estimate

Hα( f ) =1

1�αln

Z

f α(x)dx

1. Parametric density estimation methods

2. Non-parametric density estimation methods

3. Non-parametric minimal-graph estimation methods

23

Page 24: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Non-parametric estimation methods

Given i.i.d. sampleX = fX1; : : : ;Xng

Density “plug-in” estimator

Hα( fn) =

11�α

ln

Z

IRdf α(x)dx

Previous work limited to Shannon entropyH( f ) =�R

f (x) ln f (x)dx

� Histogram plug-in [Gyorfi&VanDerMeulen:CSDA87]

� Kernel density plug-in [Ahmad&Lin:IT76]

� Sample-spacing plug-in [Hall:JMS86](d = 1)

� Performance degrades as densityf becomes non smooth

� Unclear how to robustifyf against outliers

� d-dimensional integration might be difficult

� ) functionf f (x) : x2 IRdg over-parameterizes entropy functional

24

Page 25: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Direct α-entropy estimation� MST estimator ofα-entropy [Hero&Michel:IT99]:

Hα =

11�α

lnLγ(Xn)=n�α

� Direct entropy estimator: faster convergence for nonsmooth densities

� Parameterα is varied by varying interpoint distance measure

� Optimally prunedk-MST graphs robustifyf against outliers

� Greedy multi-scale MST approximations reduce combinatorial

complexity

25

Page 26: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Minimal Graphs: Minimal Spanning Tree (MST)

Let Mn = M (Xn) denote the possible sets of edges in the class of acyclic

graphs spanningXn (spanning trees).

The Euclidean Power Weighted MST achieves

LMST(Xn) = minMn

∑e2Mn

kekγ:

26

Page 27: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

z1

z 2

128 random samples

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

z1

z 2

MST

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

z1

z 2

128 random samples

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

z1

z 2

MST

Figure 13:

27

Page 28: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Convergence of MST

0 500 10000

5

10

15

20

N

MS

T le

ngth

MST length, Unif. dist. (red), Triang. dist (blue)

0 500 10000.4

0.6

0.8

N

MST normalized compensated length

Figure 14:

28

Page 29: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 15:Continuous quasi-additive euclidean functional satisfies “self-similarity” property on any scale.

29

Page 30: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Asymptotics: the BHH Theorem and entropy estimation

Theorem 1Beardwood&etal:Camb59,Steele:95,Redmond&Yukich:SPA96Let L

be a continuous quasi-additive Euclidean functional with power-exponent

γ, and letXn = fX1; : : : ;Xng be an i.i.d. sample drawn from a distribution

on [0;1]d with an absolutely continuous component having (Lebesgue)

density f(x). Then

(1)

limn!∞

Lγ(Xn)=n(d�γ)=d = βLγ;d

Z

f (x)(d�γ)=ddx; (a:s:)

Or, lettingα = (d� γ)=d

limn!∞

Lγ(Xn)=nα = βLγ;d exp((1�α)Hα( f )) ; (a:s:)

30

Page 31: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Asymptotics: Plug-in estimation ofHα( f )

Class of Holder continuous functions over[0;1]d

Σd(κ;c) =n

f (x) : j f (x)� pbκcx (z)j � c kx�zkκ

o

Proposition 1 (Hero&Ma:IT01) Assume that fα 2 Σd(κ;c). Then, if f α

is aminimax estimator

supf α2Σd(κ;c)

E1=p

�����Z bf α(x)dx�Z

f α(x)dx

����p� = O

n�κ=(2κ+d)�

31

Page 32: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Asymptotics: Minimal-graph estimation of Hα( f )

Proposition 2 (Hero&Ma:IT01) Let d� 2 and

α = (d� γ)=d 2 [1=2;(d�1)=d]. Assume that fα 2 Σd(κ;c) whereκ� 1

and c< ∞. Then for any continuous quasi-additive Euclidean functional

supf α2Σd(κ;c)

E1=p

�����Lγ(X1; : : : ;Xn)nα �βLγ;d

Z

f α(x)dx

����p��O

n�2=(3d)�

Conclude: minimal-graph estimator converges faster for

κ <

2d3d�4

32

Page 33: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Observations� Minimal graph rates valid for MST,k-NN graph, TSP, Steiner Tree,

etc

� Analogous rate bound holds for progressive-resolution algorithm

LGγ (Xn) =

md

∑i=1

Lγ(Xn\Qi)

fQig is uniform partition of[0;1]d into cell volumes 1=md

� Optimal sequence of cell volumes is:

m�d = n�1=3

33

Page 34: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Computational Acceleration of MST

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

z0

z1

Selection of nearest neighbors for MST using disc

r=0.15

Figure 16:Acceleration of Kruskal’s MST algorithm from n2 logn to nlogn.

34

Page 35: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

0 2000 4000 6000 8000 10000−50

0

50

100

150

200

250

300

Number of Points, N with Uniform Distribution

Comparison: Standard and Modified Kruskal Agorithms

Exe

cutio

n T

ime

in s

econ

ds

Standard Kruskal Modified Kruskal (radius=0.09)

Figure 17:Comparison of Kruskal’s MST to our nlogn MST algorithm.

35

Page 36: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

0 0.02 0.04 0.06 0.08 0.10

10

20

30

40

50

60

70

Radius of disc

MS

T L

engt

h

Effect of disc radius on MST Length: 10000 unif. dist. points

Figure 18:Bias of nlogn MST algorithm as function of radius parameter.

36

Page 37: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Application of MST to US image Registration

1. Extract features from reference and transformed target images:

Xm = fXig

mi=1 and Yn = fYig

nyi=1

2. Construct MST on union ofXm andYn

Lγ(Xm[Yn)

3. MinimizeLγ over transformations producingYn.

Note: This minimizer converges to minimizer ofα-Jensen difference

Lγ(Xm[Yn)=(m+n)α ! βLγ;d exp((1�α)Hα(ε fx+(1� ε) fy)) ; (a:s:)

whereε = mm+n

37

Page 38: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

0

200

400

0

100

200

3000

10

20

30

40

X

misaligned points

Y

Inte

nsity

(a)

0

200

400

0

100

200

3000

10

20

30

40

X

MST demonstration

Y

Inte

nsity

(b)

Figure 19: MST demonstration for misaligned images

38

Page 39: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

0

200

400

0

100

200

3000

20

40

60

X

Aligned points

Y

Inte

nsity

(a)

0

200

400

0

100

200

3000

20

40

60

X

MST demonstration

Y

inte

nsity

(b)

Figure 20: MST demonstration for aligned images

39

Page 40: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Illustration for Case 142 and Single Pixels

−8 −6 −4 −2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Relative rotation between images (degrees)

Nor

mal

ized

MS

T L

engt

h, M

I

MST Length, alphaMI profiles. Single Pixel domain, vol. 142

MI (alpha=0.5)MST Length

Figure 21:Single-pixel objective function profiles for MST estimator ofα-Jensen difference vs histogram plug-in estimator (α = 1=2).

40

Page 41: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Illustration for Case 142 and 8-D ICA Features

−8 −6 −4 −2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Relative rotation between images (degrees)

MS

T L

en.,

alph

aJen

sen(

4 bi

ns/d

imen

sion

)

MST & alphaJensen profile for 8D ICA feature vector, vol. 142

Jensen (alpha=0.5)MST Length

Figure 22: 8-basis ICA objective function profiles for MST estimator ofα-Jensen difference vs histogram plug-in estimator ofα-MI

(α = 1=2).

41

Page 42: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Illustration for Case 142 and 64-D ICA Features

−8 −6 −4 −2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

MS

T L

engt

h

Relative rotation between images (degrees)

Norm. MST Length profile for 64D ICA feature vector, vol. 142

Figure 23:64-basis ICA objective function profiles for MST estimator ofα-Jensen difference.

42

Page 43: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Extension of MST to divergence estimation

1. Let i.i.d.fZig

ni=1 have marginal densityf1 on [0;1]d

2. Let f0 dominate densityf1

3. Define measure transformationM on [0;1]d which takesf0 to uniformdensity

ThenSidef

= M (Zi) hasα-entropy:

(1�α)Hα(Si) = ln

Zf αS (s)ds= ln

Z( f1(z)= f0(z))

α f0(z)dz

Conclude, forSn = fS1; : : : ;Sng:

Dα( f1k f0) =

11�α

lnLγ(Sn)=nα� lnβL;γ

�: (2)

is a consistent estimator ofα-divergence

43

Page 44: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Example

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1Original data

z1

z 2

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

exact inverse transform

z1

z 2

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1tranfd data

z1

z 2

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1tranfd data, 2D cdf estd

z1

z 2

Figure 24:

44

Page 45: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

MST robustification against outliers: Pruned MST

Fix k, 1� k� n.

Let Mn;k = M (xi1; : : : ;xik) be a minimal graph connectingk distinct

verticesxi1; : : : ;xik.

Thek-MST T�n;k = T�(xi�1

; : : : ;xi�k

) is minimum of allk-point MST’s

L�n;k = L�(Xn;k) = mini1;:::;ik

minMn;k

∑e2Mn;k

kekγ

45

Page 46: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

0 0.5 1

0

0.2

0.4

0.6

0.8

1

z2

k−MST (k=99): 1 outlier rejection

0 0.5 1

0

0.2

0.4

0.6

0.8

1

z2

(k=98): 2 outlier rejection

0 0.5 1

0

0.2

0.4

0.6

0.8

1

z1

z2

k−MST (k=62): 38 outlier rejection

0 0.5 1

0

0.2

0.4

0.6

0.8

1

(k=25): 75 outlier rejection

z1z2

Figure 25:k-MST for 2D annulus density with and without the addition of

uniform “outliers”.

46

Page 47: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Extension of BHH to Pruned Graphs

Fix α 2 [0;1] and letk= bαnc. Then asn! ∞ (Hero&Michel:IT99)

L(X �

n;k)=(bαnc)ν ! βLγ;d minA:P(A)�α

Z

f ν(xjx2 A)dx (a:s:)

or, alternatively, with

Hν( f jx2 A) =1

1�νln

Z

f ν(xjx2 A)dx

L(X �

n;k)=(bαnc)ν ! βL;γ exp

�(1�ν) min

A:P(A)�αHν( f jx2 A)

�(a:s:)

47

Page 48: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Water Pouring Construction

0 10 20 30 40 500

0.1

Density of X

x

f(x)

0 10 20 30 40 500

0.1

x

max

(f(x

)−f(

x)

Water filled density

0 10 20 30 40 500

0.1

0.2Asymptotic density of X seen by k−MST

x

f(x|

A)

Figure 26:Water pouring construction of f(xjAo). Arrows indicate the high water marks indicating high probability regions which are

not truncated in the influence function.

48

Page 49: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

k-MST Influence Function for Gaussian Density

−20

0

20

−20

0

20−100

0

100

200

300

MST for planar Gaussian

IC(x

,F,L

)

−20

0

20

−20

0

20−40

−20

0

20

40

60

k−MST for planar Gaussian

IC(x

,F,L

)

Figure 27:MST and k-MST influence curves for Gaussian density on the plane.

49

Page 50: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

k-MST Stopping Rule

0 20 40 60 80 1000

20

40

60

80

100

k

k−M

ST

leng

thk−MST length as function of k

0 20 40 60 80 1000

5

10

15

20

25

30

k

crite

rion(

k)

selection criterion

Figure 28:Left: k-MST curve for 2D annulus density with addition of uniform “outliers” has a knee in the vicinity of n� k = 35. This

knee can be detected using residual analysis from a linear regression line fitted to the left-most part of the curve. Right: error residual of linear

regression line.

50

Page 51: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Conclusions

1. α-divergence for indexing can be justified via decision theory

2. Applicable to feature-based image registration

3. Non-parametric estimation is possible without density estimation via

MST

4. MST outperforms plug-in estimation for non-smooth densities

5. Robustified MST can be defined via optimal pruning of MST: k-MST

51

Page 52: registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Divergence vs. Jensen: Asymptotic Comparison

For ε 2 [0;1] andg a p.d.f. define

fε = ε f1+(1� ε) f0; Eg[Z] =Z

Z(x)g(x)dx; f α12

=

f α12R

f α12dx

Then

∆Jα =

αε(1� ε)

2

24Ef α12

0@" f1� f0f 1

2

#2

1A+

α1�α

Ef α12

"

f1� f0f 1

2

#!2

35+O(∆)

Dα( f1k f0) =

α4

Z

f 12

"

f1� f0f 1

2

#2

dx+O(∆)

52