26
1 Los Alamos National Lab, 12-2-2002 Visual Inference by Data -Driven Markov Chain Monte Carlo Zhuowen Tu and Song-Chun Zhu Statistics and Computer Science University of California, Los Angeles Los Alamos National Lab, 12-2-2002 Parsing Image Into Various Stochastic Patterns input image point process curve process a color region texture regions objects Depending on the types of patterns it focuses, image parsing subsumes conventional vision tasks: perceptual organization, image segmentation, object recognition, etc.

Visual Inference by Data -Driven Markov Chain Monte Carlo

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Visual Inference by Data -Driven Markov Chain Monte Carlo

1

Los Alamos National Lab, 12-2-2002

Visual Inference by Data-Driven Markov Chain Monte Carlo

Zhuowen Tu and Song-Chun Zhu

Statistics and Computer Science University of California, Los Angeles

Los Alamos National Lab, 12-2-2002

Parsing Image Into Various Stochastic Patterns

input image point process curve process

a color region texture regions objectsDepending on the types of patterns it focuses, image parsing subsumes conventional vision tasks:

perceptual organization, image segmentation, object recognition, etc.

Page 2: Visual Inference by Data -Driven Markov Chain Monte Carlo

2

Los Alamos National Lab, 12-2-2002

ppp (W) W)| (I I) |(W w

max arg w

max arg W*

Ω∈Ω∈==

A basic assumption, dated back to Helmholtz (1860), is that biologic and machine vision is to compute the most probable interpretation(s) from input images.

Let I be an image and W be a semantic representation of the world.

A Bayesian Formulation

)I|W(~WWW ),...,,( k21 p

In statistics, we sample from a posterior probability to preserve ambiguities.

p

W

Los Alamos National Lab, 12-2-2002

Problems

1. Representational or modeling problems:

What are W , p(W), and p( I | W) ?

Can MCMC run in seconds on a PC for parsing images?

a). What are the structures of the search space, which we call Ω?

b). How do we explore the search space for globally optimal solutions ? --- reversible MC jumps + diffusion (PDEs).

c). How do we compute and preserve ambiguities .

2. Computational problems:

Page 3: Visual Inference by Data -Driven Markov Chain Monte Carlo

3

Los Alamos National Lab, 12-2-2002

Ideas to Improve MCMC Speed in Literature

A main idea is to introduce auxiliary random variables:

x ~ π(x)

The common problem is:The Markov chain moves are designed a priori, without looking at the data.

T --- temperature (Simulated tempering, Narinari and Parisi, 92, Geyer and Thompson, 95 )

s --- scale (Multi-grid sampling, Goodman and Sokal 88, Liu et al 94 )

w --- weight (dynamic weighting, Liang and Wong 1996 )

b --- bond (clustering, Swendsen-Wang, 87)

u --- energy level (slice sampling, Edwards and Sokal, 88 …)

Augment x by variables:

Los Alamos National Lab, 12-2-2002

What is Data-Driven Markov Chain Monte Carlo ?

)I|W(~W pThe complexity of sampling the posterior is in the Metropolis -Hastings jumps

Consider a reversible jump BA WW ↔

))|()I|()|()I|(

,1(minor))()I|()()I|(

,1(min)(ABA

BAB

BAA

ABBBA WWqWp

WWqWpWWGWpWWGWp

WW→→

=→α

In DDMCMC, ))I,|()I|()I,|()I|(

,1(min)(ABA

BABBA WWqWp

WWqWpWW =→α

If )I|()I,|(),I|()I,|( ABABAB WpWWqWpWWq ≅≅

Without looking at the data, the pre-designed proposal probabilities are often uniform distributions, thus it is a blind (exhaustive) search !

Then it may converges in a small number of steps !

Page 4: Visual Inference by Data -Driven Markov Chain Monte Carlo

4

Los Alamos National Lab, 12-2-2002

q

The proposal probabilities q( ) focuses on a tiny portion of the search space and thusnarrows the search exponentially in a probabilistic fashion. Thus the Markov chainconverges and mixes very fast.

Basic Ideas

Search space:

p

Los Alamos National Lab, 12-2-2002

Intuitive Idea: Divide-and-Conquer

Let W=(w1, w2, …., wn), usually these variables are divided for several types:

partition, label of models, model parameters, order, …

Consequently, the search space is made of a few types of “atomic spaces”--- one for each type of variables --- through union and production.

Then we can compute discriminative probabilities in each atomic space,which is then composed into the proposal probabilities.

Page 5: Visual Inference by Data -Driven Markov Chain Monte Carlo

5

Los Alamos National Lab, 12-2-2002

Example: Image Segmentation

Ω∈= = ):? ,l,R,(nW n,...,2,1iiii

/ : n

1ijiin21 R RR)R ..., ,R ,(R pgU

=≠∀∅=∩ ,Λ=== ji,np p

The partition space is U

==

|

1npp n

ϖϖA permutation group

)R ..., ,R ,(R 721=7p1

2

43

5

76

is a 7-partition of the lattice.

Los Alamos National Lab, 12-2-2002

Some Image Models

Some families of image models:

g1ϖ : iid Gaussian for pixel intensities g2ϖ : non-parametric histograms

g3ϖ : Markov random fields for texture g4ϖ : Spline model for lighting variations

c1ϖ : iid Gaussian for color (LUV) c2ϖ : mixture of Gaussians for color

c3ϖ : spline model for smooth color variations (e.g. sky, lake, …)

g1ϖ g2ϖ g3ϖ g4ϖ

Page 6: Visual Inference by Data -Driven Markov Chain Monte Carlo

6

Los Alamos National Lab, 12-2-2002

iO

a). solution space c). an atomic space

atomic particles

7pΩ

b). a sub-space of 7 regions

A 7-partition

space

atomicspaces

1CΩ 1CΩ

2CΩ2CΩ 2CΩ

3CΩ 3CΩ

The Search Space

Los Alamos National Lab, 12-2-2002

Designing Markov Chain Dynamics

Type I: Diffusion of region boundary -- region competition.

Type II: Splitting of a region into two.

Type III: Merging two regions into one.

Type IV: Switching the family of models for a region.

Type V: Model adaptation for a region.

1

2

43

5

76

Page 7: Visual Inference by Data -Driven Markov Chain Monte Carlo

7

Los Alamos National Lab, 12-2-2002

Edges in Partition Space pϖ

Edge detection and tracing at three scales of details:

Los Alamos National Lab, 12-2-2002

Clustering in Color Space c1ϖ

saliency maps 1 2 3 4 5 6The brightness represents how likely a pixel belongs to a cluster.

Input

Mean-shift clustering (Cheng, 1995, Meer et al 2001)

∑=

−=K

1iii )?g(??I)|q(?

Page 8: Visual Inference by Data -Driven Markov Chain Monte Carlo

8

Los Alamos National Lab, 12-2-2002

Walking in the Partition Space

an adjacency graph: each vertex is a basic element : pixels, small-regions, edges, ….each link e=<a, b> is associated with a probability/ratio for similarity

))F(I),F(I|off""q(e))F(I),F(I|on""q(e

(b)(a)

(b)(a)

==

b

a

e

Los Alamos National Lab, 12-2-2002

Walking in the Partition Space

Sampling the edges independently, we get connected components:

These connected sub-graphs are the clusters in the partition space

sampling c ~ q( C | F(I)) on pϖ

Page 9: Visual Inference by Data -Driven Markov Chain Monte Carlo

9

Los Alamos National Lab, 12-2-2002

Graph Partitioning– Generalizing SW

The red edges are the bridgesTheorem. Accepting the label change proposal with probability:

results in an ergodic and reversible Markov Chain.

),(),,( 'c

lcc

lc VVVEVVVE −−

AG BGAG

Los Alamos National Lab, 12-2-2002

Diffusion Components by PDEs

(s)n))?|y)p(I(x,log)?|y)p(I(x,log

?(s)(µdt(s)vd

b

a rr

⋅+⋅=

The Markov chains realized reversible jumps between sub-spaces of varying dimensions.

Within a subspace of fixed dimension, there are various diffusion processes expressedas partial differential equations.

For example, the region competition for curve evolution (Zhu, Lee, and Yuille, 95)

Ra

Rb y(s))(x(s),(s)v =r

Let v be a point on the boundary between two regions, its motionis governed by the region-competition equation.

Page 10: Visual Inference by Data -Driven Markov Chain Monte Carlo

10

Los Alamos National Lab, 12-2-2002

Results by DDMCMC

snapshot of solution W sampled by DDMCMC

segmentation synthesis

Los Alamos National Lab, 12-2-2002

Running DDMCMC

input

MC 1 MC 2 MC 3

starting with 3 different initial segments below

energy plots of three MCMCs

W1 I1~p( I |W1) W2 I2~ p(I|W2)

Page 11: Visual Inference by Data -Driven Markov Chain Monte Carlo

11

Los Alamos National Lab, 12-2-2002

DDMCMC are 2-3 orders of magnitude faster than traditional MCMC.

Performance Comparison

Analyze performance bounds of DDMCMC paradigm.

Los Alamos National Lab, 12-2-2002

Experiments: Color Image Segmentation

Input segment π∗ synthesis I ~ p( I | W*)

Page 12: Visual Inference by Data -Driven Markov Chain Monte Carlo

12

Los Alamos National Lab, 12-2-2002

Input segment π∗ synthesis I ~ p( I | W*)

Experiments: Color Image Segmentation

Los Alamos National Lab, 12-2-2002

Input segment π∗ synthesis I ~ p( I | W*)

Experiments: Color Image Segmentation

Page 13: Visual Inference by Data -Driven Markov Chain Monte Carlo

13

Los Alamos National Lab, 12-2-2002

a. Input image b. segmented regions c. synthesis I ~ p( I | W*)

Los Alamos National Lab, 12-2-2002

Image Segmentation

Input segment π∗ synthesis I ~ p( I | W*)

Page 14: Visual Inference by Data -Driven Markov Chain Monte Carlo

14

Los Alamos National Lab, 12-2-2002

The Berkeley Benchmark Study

test images DDMCMC manual segment

0.3082

0.5627

“error” measure

0.1083

(David Martin et al, 2001)

Los Alamos National Lab, 12-2-2002

a. Input image b. segmented regions c. synthesis I ~ p( I | W*)

Examples of Failure

Page 15: Visual Inference by Data -Driven Markov Chain Monte Carlo

15

Los Alamos National Lab, 12-2-2002

a. the first two face features b. an example of face detection

Adaboost in the Label Space

---- an example from Viola and Jones, 2001.

y=Sign( a1h1(I) + … + a ThT (I) ) à sign(p(y=1| I)/ p(y=-1| I)

Adaboost is a learning algorithm which makes decision by combining a number of simple features. As T and training samplers become large enough, it weakly

converges to the log ratio of the posterior probability

Los Alamos National Lab, 12-2-2002

Ambiguities in Visual Inference

Nicker cube Vase vs. faces bikini vs. martini

Page 16: Visual Inference by Data -Driven Markov Chain Monte Carlo

16

Los Alamos National Lab, 12-2-2002

a. Input image b. Segmented texture regions c. synthesis by texture models

d. curve processes + bkgd region e. synthesis by curve models

Ambiguity in Visual Inference

Los Alamos National Lab, 12-2-2002

Computing Multiple Solutions

To faithfully preserve the posterior probability p(W|I),We compute a set of weighted scene particles W1, W2, …, WM,

A mathematical principle:

Page 17: Visual Inference by Data -Driven Markov Chain Monte Carlo

17

Los Alamos National Lab, 12-2-2002

Pursuit of Multiple Solutions

The Kullback-Leibler divergence can be computed if we assumemixture of Gaussian distributions.

--- a simple fact: the KL-divergence of two Gaussians is the signal-to-noise ratio.

Intuition: S includes global maximum, local modes, apart from each other.

Los Alamos National Lab, 12-2-2002

A k-adventurer algorithm

Page 18: Visual Inference by Data -Driven Markov Chain Monte Carlo

18

Los Alamos National Lab, 12-2-2002

Preserving Distinct Particles

x1 x2 x3 x4

Los Alamos National Lab, 12-2-2002

An Example of Keeping Multiple Solutions

An example of illustration:

1

23 4

Page 19: Visual Inference by Data -Driven Markov Chain Monte Carlo

19

Los Alamos National Lab, 12-2-2002

Preserving Distinct Particles

A model p with 50 particles two approximate models q with 6 particles

min D(q || q1)

q q1 q2

An example of illustration:min | q – q2 |

Los Alamos National Lab, 12-2-2002

Ω

iO

a). solution space 22

cmϖc). an atomic space

atomic particles 1p

3p2p

11cϖ 1

2cϖ 1

1cmϖ

21cϖ

nc1ϖ nc

2ϖ nmncϖ

22cϖ 2

2

cmϖ

x

x

x xx

x xx

x xx

iΩb). a sub-space

partitionspaces

General Search Space for Image Parsing

Page 20: Visual Inference by Data -Driven Markov Chain Monte Carlo

20

Los Alamos National Lab, 12-2-2002

Parsing Images into Regions and Curves

)W,W,W,(WW tpcr=

Los Alamos National Lab, 12-2-2002

Curve Models

Curve shape ))(),(( sHsU=Γ

)(sU is the center line.

),( θΓ=CCurve

)(sH is the curve width.

)..1);,(,(W c cii

c KiCK == α

Page 21: Visual Inference by Data -Driven Markov Chain Monte Carlo

21

Los Alamos National Lab, 12-2-2002

input regions rW

synthesis

curves cW

synthesis csynI cWby~ synthesis r

synI rWby~

group 1

Parse Image into Regions and Curves

Los Alamos National Lab, 12-2-2002

input

synthesis

curves cW

synthesis csynI cWby~

regions rW

synthesis rsynI rWby~

tree

Parsing Images with Trees

Page 22: Visual Inference by Data -Driven Markov Chain Monte Carlo

22

Los Alamos National Lab, 12-2-2002

input regions rW

synthesis

curves cW

synthesis csynI cWby~ synthesis r

synI rWby~

group 1 group 2

Parse Image into Regions and Curves

Los Alamos National Lab, 12-2-2002

input

synthesis

curves cW

synthesis csynI cWby~

regions rW

synthesis rsynI rWby~

group 1 group 2 group 3

Parse Image into Regions and Curves

Page 23: Visual Inference by Data -Driven Markov Chain Monte Carlo

23

Los Alamos National Lab, 12-2-2002

input

synthesis

curves cW

synthesis csynI cWby~

regions rW

synthesis rsynI rWby~

tree 1 tree 2

Parsing Images with Trees

Los Alamos National Lab, 12-2-2002

Input range image Input reflectance image

Segmenting Laser Range Images

Our segmentation Manual segmentation

Page 24: Visual Inference by Data -Driven Markov Chain Monte Carlo

24

Los Alamos National Lab, 12-2-2002

Input range image Input reflectance image

Segmenting Laser Range Images

Our segmentation Manual segmentation

Los Alamos National Lab, 12-2-2002

Input range image Input reflectance image

Segmenting Laser Range Images

Our segmentation Manual segmentation

Page 25: Visual Inference by Data -Driven Markov Chain Monte Carlo

25

Los Alamos National Lab, 12-2-2002

Two Computing Paradigms in Vision

1. Generative methods --- “Top-down”explicitly model the visual patterns

--Bayesian framework,--Markov random fields,--Markov chain Monte Carlo,--Partial differential equations for diffusion, evolving, ...

2. Discriminative methods --- “Bottom-up”explore “intra-class” vs “Inter-class” difference

-- Feature extraction, on /off, e.g. Edge detection-- Data clustering-- Adaboost,-- Decision tree, …

General but quite slow

Fast but not reliable

Los Alamos National Lab, 12-2-2002

Summary

1. DDMCMC is a systematic way for integrating “top-down” and “bottom-up”.

The discriminative methods approximate local posterior probabilities (ratios)in various atomic spaces. These probabilities/ratios are used as importance proposal probabilities, and drive the Markov chain to searchfor globally optimal solutions.

2. Fast Markov chain convergence and mixing at low temperature.In contrast to simulated annealing, the SW- type algorithm can movefast at low temperature.

3. Ensemble complexity vs. worst case complexityThough one can always construct worst case and prove NP-completeness, buton the average case, the computational complexity can be much lower.

Page 26: Visual Inference by Data -Driven Markov Chain Monte Carlo

26

Los Alamos National Lab, 12-2-2002

When the bottom-up proposal probabilities fail !