Probabilistic Tracking in a Metric Space

Probabilistic Tracking in a Metric Space

Kentaro Toyama and Andrew Blake Microsoft Research

Presentation prepared by:Linus Luotsinen

Outline

Introduction Modelling of images and observations Pattern theoretic tracking Learning

– Learn mixture centers (exemplars)– Learn kernel parameters (observational likelihood)– Learn dynamic model (transition probabilities)

Practical tracking Results

– Human motion using curve based exemplars– Mouth using exemplars from raw image

Conclusions

Introduction

Metric Mixture, M2

– Combine exemplars in metric space with probabilistic treatments

– Models easily created directly from training set– Dynamic model to deal with occlusion

Problems with other probabilistic approaches– Complex models– Training required for each object to be tracked– Difficult to fully automate

Pattern Theoretic Tracking - Notation

1 2

* * * *1 2

Test images: { , ,..., }

Train images: { , ,..., }Class is defined by a set of exemplars: { , 1, 2,..., }Geometrical transformation: , (known in advance)Pattern theoretic tracking

T

T

k

Z z z z

Z z z zx k K

T A

:

State vector defined by: ( , )z T xk

Metric Functions

True metric function– All constraints

Distance function– Without 3 and 4

1) ( , ) 0, ,2) ( , ) 0,3) ( , ) ( , )4) ( , ) ( , ) ( , )

a b a ba b iff a ba b b aa b b c a c

a

b

c

Modelling of Images and Observations

Patches– Image sub-region– Shuffle distance function

Distance with the most similar pixel in its neighborhood

Curves– Edge maps– Chamfer distance function

Distance to the nearest pixel in the binary images See next slide!

Probabilistic Modelling of Images and Observations

Distance image – dI

Exemplar - TOriginal image

Feature image - I

Tt

I tdT

TI )(1),(chamfer Not a true metric!3) ( , ) ( , )4) ( , ) ( , ) ( , )

I T T IA B B C A C

Curves with Chamfer distance

Pattern Theoretic Tracking

xTz

Geometrical Transform:

- Translation- Affine- Projective…

Observation

Exemplar(from a training set):

- Intensity images- Feature images (edges, corners…)

“patches”

“curves”

Pattern Theoretic Tracking

x1:

x2:

x3:

x4:

Exemplars xk (k=1…K)

z:

Given Image

Geometric Transform Tα (α A)

Tα1Tα2Tα3

kx~1) - A set of K Exemplars.

2) - Distribution of observations around – Likelihood:

k)(αXXzp,

| kxT

~

3) - Prior about dependency between states – Dynamics:

1| tt XXp

t-1 t

Pattern Theoretic Tracking - Learning

1) Find “central” exemplar -

)',(maxminarg'0 zzzzz

MAX

0z

2) “Align” other images to z0 -

mm

mm

zTx

zzT1

01 ),(minarg

Goal - given M images (zm), find K exemplars:

zm xmm=1…M

Learning Mixture Centers

2c

ckk xx )~,~( 1

3) Find K “distinct” exemplars -

4) Cluster the rest by minimal distance -

)~,(minarg kmkm xxk

5) Find new representatives -

)',(maxminarg~'

xxxxxk kx~

Learning Mixture Centers

kx~

z

)~,( kxTz

1) Using a “validation set” find distances between images and their exemplars:

22~(..) d

2) Approx. distances as chi-square:

(to find σ and d)

(..)exp1)|( Z

Xzp

3) Then the observation likelihood is:

Learning Kernel Parameters

221;

dZ

Learning Dynamics

Learn a Markov matrix for by histogramming transitions

Run a first order auto-regressive process (ARP) for , with coefficients calculated using the Yule-

Walker algorithm

M 1( | )t tp k k

1( | )t tp

Practical Tracking

Forward algorithm–

Results are chosen by–

1 1

1 2

1 1 1

( ) ( | , ,..., )

( ) ( | ) ( | ) ( )t t

t t t t t

t t t t t t tk

p X p X Z Z Z

p X p z X p X X p X

ˆ arg max ( )t t tX p Xz1 z2 z3

X1

X2

X3

X4

zT…

t= 1 2 3 T

0

1

0

0

0

0.09

0.01

0.2

…

Results

Tracking human motion– Based on contour edges– Dynamics learned on 5 sequences of 100 frames each

Exemplars Same person, motion not seen in training sequence

Results

Tracking human motion– Based on contour edges– Dynamics learned on 5 sequences of 100 frames each

Different person Different person with occlusion (power of dynamic model)

Results

Tracking person’s mouth motion– Based on raw pixel values– Training sequence was 210 frames captured at 30Hz– Exemplar set was 30 (K=30)

Left image show test sequence Right image show maximum a posteriori

Using L2 distance Using shuffle distance

Results

Tracking ballerina– Larger exemplar sets (K=300)

Conclusions

Metric Mixture (M2) Model– Easier to fully automate learning– Avoid explicit parametric models to describe target objects

Generality– Metrics can be chosen without significant restrictions

Temporal fusion of information for occlusion recovery– Bayesian networks

References

[1] Kentaro Toyama, Andrew Blake, Probabilistic Tracking with Exemplars in a Metric Space, International Journal of Computer Vision, Volume 48, Issue 1, Marr Prize Special Issue, Pages: 9–19, 2002, ISSN:0920-5691.

[2] Jongwoo Lim, CSE 252C: Selected Topics in Vision & Learning. http://www-cse.ucsd.edu/classes/fa02/cse252c/

[3] Eli Schechtman and Neer Saad, Advanced topics in computer and human vision. http://www.wisdom.weizmann.ac.il/~armin/AdvVision02/course.html

http://www-cse.ucsd.edu/classes/fa02/cse252c/

http://www.wisdom.weizmann.ac.il/~armin/AdvVision02/course.html

http://www.wisdom.weizmann.ac.il/~armin/AdvVision02/course.html

Documents

Probabilistic Tracking in a Metric Space