registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250

Divergence matching criteria for indexing andregistration

Alfred O. Hero

Dept. EECS, Dept Biomed. Eng., Dept. Statistics

University of Michigan - Ann Arbor

[email protected]

http://www.eecs.umich.edu/˜hero

Collaborators: Olivier Michel, Bing Ma, Huzefa Heemuchwala

Outline

1. Statistical framework: entropy measures, error exponents

2. α-MI indexing using single pixel gray levels

3. α-MI indexing via coincident features

4. α-entropy andα-MI estimation via MST

1

(a) ImageI1 (b) ImageI0 (c) Registration result

Figure 1: A multidate image registration example

2

Statistical Framework� I : an image

� Z = Z(I): an image feature vector over[0;1]d

� IR a reference image, featureZR

� fI ig a database ofK images, featuresZi

� f (ZR; Zi): joint feature density

For a pairZR andZi we have two hypotheses:

H0 : f (ZR; Zi) = f (ZR) f (Zi)H1 : f (ZR; Zi) 6= f (ZR) f (Zi)

3

Divergence Measures

Refs: [Csiszar:67,Basseville:SP89]

Define densities

f1 = f (ZR; Zi); f0 = f (ZR) f (Zi)

The Renyi α-divergence of fractional orderα 2 [0;1] [Renyi:61,70 ]

Dα( f1 k f0) =

1α�1

ln

Zf1

�

f1f0

�αdx

=

1α�1

ln

Zf α1 f 1�α

0 dx

4

Renyi α-Divergence: Special cases� α-Divergence vsα-Entropy

Hα( f1) =

11�α

ln

Z

f α1 dx=�Dα( f1 k f0)j f0=U([0;1]d)

� α-Divergence vs. Batthacharyya-Hellinger distance

D 12

( f1 k f0) = ln�Z p

f1 f0dx

�2

D2BH( f1k f0) =

Z �pf1�

pf0

�2dx= 2

�

1�Z p

f1 f0dx

�

� α-Divergence vs. Kullback-Liebler divergence

limα!1

Dα( f1k f0) =Z

f1 lnf1f0

dx:

5

Renyi α-divergence and Error Exponents

Observe i.i.d. sampleW = [W1; : : : ;Wn]

H0 : Wj � f0(w)

H1 : Wj � f1(w)

Bayes probability of error

Pe(n) = β(n)P(H1)+α(n)P(H0)

LDP gives Chernoff bound [Dembo&Zeitouni:98]

liminfn!∞

1n

logPe(n) =� supα2[0;1]

f(1�α)Dα( f1k f0)g :6

Registration via α-Mutual-Information

Ref: Viola&Wells:ICCV95

1. ReferenceIR and targetIT images.

2. Set of rigid transformationsfTig3. Derived feature vectors

ZR = Z(IR); Zi = Z(Ti(IT))

7

H0 : fZR;Zig independent

H1 : fZR;Zig dependent

Error exponent isα-MI (Neemuchwala&etal:ICIP01,

Pluim&etal:SPIE01)

MIα(ZR;Zi) =

1α�1

ln

Z

f α(ZR;Zi)( f (ZR) f (Zi))1�αdZRdZi :

8

Registration via α-Jensen-Difference

Ref: Ma&etal:ICIP00, He&etal:SigProc01

� Jensen’s difference btwnf0; f1:

∆Jα = Hα(ε f1+(1� ε) f0)� εHα( f1)� (1� ε)Hα( f0)� 0

� f0; f1 are two densities,ε satisfies 0� ε� 1

� Let X ;Y be i.i.d. features extracted from two images

Xm = fX1; : : : ;Xmg; Yn = fY1; : : : ;Yng

� Each realization inunorderedsampleZ = fXm; Yng has marginal

fZ(z) = ε fX(z)+(1� ε) fY(z); ε =

mn+m

9

Ultrasound Registration Example

Figure 2: Three ultrasound breast scans. From top to bottom are: case 151,

case 142 and case 162.

10

(a) ImageX1 (b) ImageX0

Figure 3: Single Pixel Coincidences (Left and right: 0o and 8o rotations)

11

Grey Level Scatterplots

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

Figure 4:Grey level scatterplots. 1st Col: target=reference slice. 2nd Col: target = reference+1 slice.

12

−5 −4 −3 −2 −1 0 1 2 3 4 5−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

ANGLE OF ROTATION (deg) −−−−−−>

α−D

IVE

RG

EN

CE

(αM

I) −

−−

−−

−>

α−DIVERGENCE v/s α CURVES FOR α ∈ [0,1] FOR SINGLE PIXEL INTENSITY

α =0 α =0.1α =0.2α =0.3α =0.4α =0.5α =0.6α =0.7α =0.8α =0.9

α=0.1

α=0.9

α=0

Figure 5:α-Divergence as function of angle for ultra sound image registration of

image 142

13

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.005

0.01

0.015

0.02

0.025

α −−−−−−>

curv

atur

e of

α−

MI −

−−

−−

−>

CURVATURE OF: α−MI v/s α CURVE FOR SINGLE PIXEL INTENSITY

Figure 6:Resolution ofα-Divergence as function of alpha

14

Higher Level Features

Disadvantages of gray level features:

� Only depends on histogram of single pixel pairs

� Insensitive to spatial reording of pixels in each image

� Difficult to select out grey level anomalies (shadows, speckle)

� Spatial discriminants fall outside of single pixel domain

� Alternative : Spatial-feature-based indexing

15

Local Tags

(a) ImageX0 (b) ImageXi

Figure 7: Local Tag Coincidences

16

Spatial Relations Between Local Tags

N

E

SE

S

W

R

(a) ImageX0

N

E

SE

S

W

R

(b) ImageXi

Figure 8: Spatial Relation Coincidences

17

Feature: spatial tags via feature trees

Root Node

Depth 1

Depth 2

Not examined further

Figure 9:Part of feature tree data structure.

Terminal nodes (Depth 16)

Figure 10:Leaves of feature tree data structure.

18

Feature: projection-coefficient wrt ICA basis

Figure 11:Estimated ICA basis set for ultrasound breast image database

19

Simple Example

50 100 150 200 250 300 350

50

100

150

200

250

300

350

50 100 150 200 250 300 350

50

100

150

200

250

300

350

−8 −6 −4 −2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Performance of higher order features v/s single pixels

Relative rotation between images

MS

T L

engt

h an

d S

hann

on M

I

MST LengthShannon MI

Figure 12:10 basis composite images with noise and MI vs Feature divergence objective functions

20

US Registration Comparisons

151 142 162 151/8 151/16 151/32

pixel 0:6=0:9 0:6=0:3 0:6=0:3

tag 0:5=3:6 0:5=3:8 0:4=1:4

spatial-tag 0:99=14:6 0:99=8:4 0:6=8:3

ICA 0:7=4:1 0:7=3:9 0:99=7:7

Table 1: Numerator =optimal values ofα and Denominator = maximum

resolution of mutualα-information for registering various images (Cases

151, 142, 162) using various features (pixel, tag, spatial-tag, ICA). 151/8,

151/16, 151/32 correspond to ICA algorithm with 8, 16 and 32 basis ele-

ments run on case 151.

21

Feature-based Indexing: Challenges� How to best select discriminating features?

– Require training database of images to learn feature set

– Apply cross-validation...

– ...bagging, boosting, or randomized selection?

� How to computeα-MI for multi-dimensional features?

– Tag space is of high cardinality:25616� 1032

– ICA projection-coefficient space is multi-dimensional continuum

– Soln 1: partition feature space and count coincidences...

– Soln 2: apply kernel density estimation and ...

– ...plug in to theα-MI or α-Jensen formula

– Soln 3: estimateα-MI or α-Jensen directly via MST.

22

Methods of Entropy/Divergence Estimation� Z = (ZR;ZT): a statistic (feature pair)

� fZig: n i.i.d. realizations fromf (Z)

Objective: Estimate

Hα( f ) =1

1�αln

Z

f α(x)dx

1. Parametric density estimation methods

2. Non-parametric density estimation methods

3. Non-parametric minimal-graph estimation methods

23

Non-parametric estimation methods

Given i.i.d. sampleX = fX1; : : : ;Xng

Density “plug-in” estimator

Hα( fn) =

11�α

ln

Z

IRdf α(x)dx

Previous work limited to Shannon entropyH( f ) =�R

f (x) ln f (x)dx

� Histogram plug-in [Gyorfi&VanDerMeulen:CSDA87]

� Kernel density plug-in [Ahmad&Lin:IT76]

� Sample-spacing plug-in [Hall:JMS86](d = 1)

� Performance degrades as densityf becomes non smooth

� Unclear how to robustifyf against outliers

� d-dimensional integration might be difficult

� ) functionf f (x) : x2 IRdg over-parameterizes entropy functional

24

Direct α-entropy estimation� MST estimator ofα-entropy [Hero&Michel:IT99]:

Hα =

11�α

lnLγ(Xn)=n�α

� Direct entropy estimator: faster convergence for nonsmooth densities

� Parameterα is varied by varying interpoint distance measure

� Optimally prunedk-MST graphs robustifyf against outliers

� Greedy multi-scale MST approximations reduce combinatorial

complexity

25

Minimal Graphs: Minimal Spanning Tree (MST)

Let Mn = M (Xn) denote the possible sets of edges in the class of acyclic

graphs spanningXn (spanning trees).

The Euclidean Power Weighted MST achieves

LMST(Xn) = minMn

∑e2Mn

kekγ:

26

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

z1

z 2

128 random samples

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

z1

z 2

MST

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

z1

z 2

128 random samples

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

z1

z 2

MST

Figure 13:

27

Convergence of MST

0 500 10000

5

10

15

20

N

MS

T le

ngth

MST length, Unif. dist. (red), Triang. dist (blue)

0 500 10000.4

0.6

0.8

N

MST normalized compensated length

Figure 14:

28

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 15:Continuous quasi-additive euclidean functional satisfies “self-similarity” property on any scale.

29

Asymptotics: the BHH Theorem and entropy estimation

Theorem 1Beardwood&etal:Camb59,Steele:95,Redmond&Yukich:SPA96Let L

be a continuous quasi-additive Euclidean functional with power-exponent

γ, and letXn = fX1; : : : ;Xng be an i.i.d. sample drawn from a distribution

on [0;1]d with an absolutely continuous component having (Lebesgue)

density f(x). Then

(1)

limn!∞

Lγ(Xn)=n(d�γ)=d = βLγ;d

Z

f (x)(d�γ)=ddx; (a:s:)

Or, lettingα = (d� γ)=d

limn!∞

Lγ(Xn)=nα = βLγ;d exp((1�α)Hα( f )) ; (a:s:)

30

Asymptotics: Plug-in estimation ofHα( f )

Class of Holder continuous functions over[0;1]d

Σd(κ;c) =n

f (x) : j f (x)� pbκcx (z)j � c kx�zkκ

o

Proposition 1 (Hero&Ma:IT01) Assume that fα 2 Σd(κ;c). Then, if f α

is aminimax estimator

supf α2Σd(κ;c)

E1=p

��Z bf α(x)dx�Z

f α(x)dx

��p� = O

�

n�κ=(2κ+d)�

31

Asymptotics: Minimal-graph estimation of Hα( f )

Proposition 2 (Hero&Ma:IT01) Let d� 2 and

α = (d� γ)=d 2 [1=2;(d�1)=d]. Assume that fα 2 Σd(κ;c) whereκ� 1

and c< ∞. Then for any continuous quasi-additive Euclidean functional

Lγ

supf α2Σd(κ;c)

E1=p

��Lγ(X1; : : : ;Xn)nα �βLγ;d

Z

f α(x)dx

��p��O

�

n�2=(3d)�

Conclude: minimal-graph estimator converges faster for

κ <

2d3d�4

32

Observations� Minimal graph rates valid for MST,k-NN graph, TSP, Steiner Tree,

etc

� Analogous rate bound holds for progressive-resolution algorithm

LGγ (Xn) =

md

∑i=1

Lγ(Xn\Qi)

fQig is uniform partition of[0;1]d into cell volumes 1=md

� Optimal sequence of cell volumes is:

m�d = n�1=3

33

Computational Acceleration of MST

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

z0

z1

Selection of nearest neighbors for MST using disc

r=0.15

Figure 16:Acceleration of Kruskal’s MST algorithm from n2 logn to nlogn.

34

0 2000 4000 6000 8000 10000−50

0

50

100

150

200

250

300

Number of Points, N with Uniform Distribution

Comparison: Standard and Modified Kruskal Agorithms

Exe

cutio

n T

ime

in s

econ

ds

Standard Kruskal Modified Kruskal (radius=0.09)

Figure 17:Comparison of Kruskal’s MST to our nlogn MST algorithm.

35

0 0.02 0.04 0.06 0.08 0.10

10

20

30

40

50

60

70

Radius of disc

MS

T L

engt

h

Effect of disc radius on MST Length: 10000 unif. dist. points

Figure 18:Bias of nlogn MST algorithm as function of radius parameter.

36

Application of MST to US image Registration

1. Extract features from reference and transformed target images:

Xm = fXig

mi=1 and Yn = fYig

nyi=1

2. Construct MST on union ofXm andYn

Lγ(Xm[Yn)

3. MinimizeLγ over transformations producingYn.

Note: This minimizer converges to minimizer ofα-Jensen difference

Lγ(Xm[Yn)=(m+n)α ! βLγ;d exp((1�α)Hα(ε fx+(1� ε) fy)) ; (a:s:)

whereε = mm+n

37

0

200

400

0

100

200

3000

10

20

30

40

X

misaligned points

Y

Inte

nsity

(a)

0

200

400

0

100

200

3000

10

20

30

40

X

MST demonstration

Y

Inte

nsity

(b)

Figure 19: MST demonstration for misaligned images

38

0

200

400

0

100

200

3000

20

40

60

X

Aligned points

Y

Inte

nsity

(a)

0

200

400

0

100

200

3000

20

40

60

X

MST demonstration

Y

inte

nsity

(b)

Figure 20: MST demonstration for aligned images

39

Illustration for Case 142 and Single Pixels

−8 −6 −4 −2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Relative rotation between images (degrees)

Nor

mal

ized

MS

T L

engt

h, M

I

MST Length, alphaMI profiles. Single Pixel domain, vol. 142

MI (alpha=0.5)MST Length

Figure 21:Single-pixel objective function profiles for MST estimator ofα-Jensen difference vs histogram plug-in estimator (α = 1=2).

40

Illustration for Case 142 and 8-D ICA Features

−8 −6 −4 −2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


MS

T L

en.,

alph

aJen

sen(

4 bi

ns/d

imen

sion

)

MST & alphaJensen profile for 8D ICA feature vector, vol. 142

Jensen (alpha=0.5)MST Length

Figure 22: 8-basis ICA objective function profiles for MST estimator ofα-Jensen difference vs histogram plug-in estimator ofα-MI

(α = 1=2).

41

Illustration for Case 142 and 64-D ICA Features

−8 −6 −4 −2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

MS

T L

engt

h


Norm. MST Length profile for 64D ICA feature vector, vol. 142

Figure 23:64-basis ICA objective function profiles for MST estimator ofα-Jensen difference.

42

Extension of MST to divergence estimation

1. Let i.i.d.fZig

ni=1 have marginal densityf1 on [0;1]d

2. Let f0 dominate densityf1

3. Define measure transformationM on [0;1]d which takesf0 to uniformdensity

ThenSidef

= M (Zi) hasα-entropy:

(1�α)Hα(Si) = ln

Zf αS (s)ds= ln

Z( f1(z)= f0(z))

α f0(z)dz

Conclude, forSn = fS1; : : : ;Sng:

Dα( f1k f0) =

11�α

�

lnLγ(Sn)=nα� lnβL;γ

�: (2)

is a consistent estimator ofα-divergence

43

Example

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1Original data

z1

z 2

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

exact inverse transform

z1

z 2

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1tranfd data

z1

z 2

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1tranfd data, 2D cdf estd

z1

z 2

Figure 24:

44

MST robustification against outliers: Pruned MST

Fix k, 1� k� n.

Let Mn;k = M (xi1; : : : ;xik) be a minimal graph connectingk distinct

verticesxi1; : : : ;xik.

Thek-MST T�n;k = T�(xi�1

; : : : ;xi�k

) is minimum of allk-point MST’s

L�n;k = L�(Xn;k) = mini1;:::;ik

minMn;k

∑e2Mn;k

kekγ

45

0 0.5 1

0

0.2

0.4

0.6

0.8

1

z2

k−MST (k=99): 1 outlier rejection

0 0.5 1

0

0.2

0.4

0.6

0.8

1

z2

(k=98): 2 outlier rejection

0 0.5 1

0

0.2

0.4

0.6

0.8

1

z1

z2

k−MST (k=62): 38 outlier rejection

0 0.5 1

0

0.2

0.4

0.6

0.8

1

(k=25): 75 outlier rejection

z1z2

Figure 25:k-MST for 2D annulus density with and without the addition of

uniform “outliers”.

46

Extension of BHH to Pruned Graphs

Fix α 2 [0;1] and letk= bαnc. Then asn! ∞ (Hero&Michel:IT99)

L(X �

n;k)=(bαnc)ν ! βLγ;d minA:P(A)�α

Z

f ν(xjx2 A)dx (a:s:)

or, alternatively, with

Hν( f jx2 A) =1

1�νln

Z

f ν(xjx2 A)dx

L(X �

n;k)=(bαnc)ν ! βL;γ exp

�(1�ν) min

A:P(A)�αHν( f jx2 A)

�(a:s:)

47

Water Pouring Construction

0 10 20 30 40 500

0.1

Density of X

x

f(x)

0 10 20 30 40 500

0.1

x

max

(f(x

)−f(

x)

Water filled density

0 10 20 30 40 500

0.1

0.2Asymptotic density of X seen by k−MST

x

f(x|

A)

Figure 26:Water pouring construction of f(xjAo). Arrows indicate the high water marks indicating high probability regions which are

not truncated in the influence function.

48

k-MST Influence Function for Gaussian Density

−20

0

20

−20

0

20−100

0

100

200

300

MST for planar Gaussian

IC(x

,F,L

)

−20

0

20

−20

0

20−40

−20

0

20

40

60

k−MST for planar Gaussian

IC(x

,F,L

)

Figure 27:MST and k-MST influence curves for Gaussian density on the plane.

49

k-MST Stopping Rule

0 20 40 60 80 1000

20

40

60

80

100

k

k−M

ST

leng

thk−MST length as function of k

0 20 40 60 80 1000

5

10

15

20

25

30

k

crite

rion(

k)

selection criterion

Figure 28:Left: k-MST curve for 2D annulus density with addition of uniform “outliers” has a knee in the vicinity of n� k = 35. This

knee can be detected using residual analysis from a linear regression line fitted to the left-most part of the curve. Right: error residual of linear

regression line.

50

Conclusions

1. α-divergence for indexing can be justified via decision theory

2. Applicable to feature-based image registration

3. Non-parametric estimation is possible without density estimation via

MST

4. MST outperforms plug-in estimation for non-smooth densities

5. Robustified MST can be defined via optimal pruning of MST: k-MST

51

Divergence vs. Jensen: Asymptotic Comparison

For ε 2 [0;1] andg a p.d.f. define

fε = ε f1+(1� ε) f0; Eg[Z] =Z

Z(x)g(x)dx; f α12

=

f α12R

f α12dx

Then

∆Jα =

αε(1� ε)

2

24Ef α12

0@" f1� f0f 1

2

#2

1A+

α1�α

Ef α12

"

f1� f0f 1

2

#!2

35+O(∆)

Dα( f1k f0) =

α4

Z

f 12

"

f1� f0f 1

2

#2

dx+O(∆)

52

Documents

registrationweb.eecs.umich.edu/~hero/Preprints/fsu_slides01.pdf · Simple Example 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250 300 350 50 100 150 200 250