Shrinivas Pundlik Committee members Dr. Stan Birchfield (chair) Dr. Adam Hoover Dr. Ian Walker

Motion Segmentation from Clustering of

Sparse Point Features Using Spatially

Constrained Mixture Models

Shrinivas Pundlik

Committee members

Dr. Stan Birchfield (chair)Dr. Adam Hoover

Dr. Ian WalkerDr. Damon Woodard

Motion SegmentationGestalt insight: grouping forms the basis of human perception

Gestalt Laws: Factors that affect the grouping process (cues)

Motion segmentation: segmenting images based on common motion

points moving together are grouped together

similarity proximity common motion(common fate)

continuity

Typically, motion segmentation uses common motion + proximity

Applications of Motion Segmentation object detection

pedestrian detection tracking

vehicle tracking robotics surveillance image and video compression scene reconstruction video manipulation / editing

video matting video annotation motion magnification

Video editingCriminisi et al., 2006

Vehicle trackingKanhere et al., 2005

Pedestrian detectionViola et al., 2003

Previous Work

Approach

Wang and Adelson 1994

Xiao and Shah 2005

Ayer and Sawhney 1995

Willis et al. 2003

Motion Layer Estimation

Multi Body Factorization

Object Level Grouping

Miscellaneous

Costeria and Kanade 1995

Sivic et al. 2004Kanhere et al. 2005

Ke and Kanade 2002

Black and Fleet 1998

Birchfield 1999Levine and Weiss 2006

Vidal and Sastry 2003Yan and Pollefeys 2006Gruber and Weiss 2006

Jojic and Frey 2001

Algorithm

Shi and Malik 1998

Expectation Maximization

Graph Cuts

Belief Propagation

Normalized Cuts

Jojic and Frey 2001

Smith et al. 2004Kokkinos and Maragos 2004

Xiao and Shah 2005

Willis et al. 2003

Criminisi et al. 2006

Kumar et al. 2005

Variational Methods

Cremers and Soatto 2005

Brox et al. 2005

Nature of Data

Dense Motion

Motion + Image Cues

Sparse Features

Sivic et al. 2004Kanhere et al. 2005Rothganger et al. 2004

Cremers and Soatto 2005

Brox et al. 2005

Kumar et al. 2005

Criminisi et al. 2006

Xiao and Shah 2005

Challenges: Short Term1. statue

2. wall

4. grass

3. trees

5. biker

6. pedestrian

+

++

+++

computation of motion in the scene• influence of the neighboring motion

number of objects / regions in the scene

initialization of motion parameters description of complex motions (articulated human motion)

Challenges: Long Term

t

thre

shold

slow

mediumfastx

time windowbatch processing incremental processing

thre

shold

slowmediumfast

t

x crawling

time window

batch processing vs. incremental processing• updating the reference frame

maintain existing groups• growing existing regions• splitting

adding new groups (new objects) (deleting invisible groups)

Objectives

motioncomputation

clustering (two-frame)

long-term maintenance

of groups

observed data

parameterestimation

groupassignment

motion models

translation affine

complex models

Feature Tracking Motion Segmentation

MixtureModel

Framework

Articulated Human

Motion Models

• motion segmentation using sparse point features

• automatically determine the number of groups

• handling dynamic sequences

• real time performance

• handling complex motions

Overview of the Topics Feature Tracking: Tracking sparse point features

for computation of image motion and its extension to joint feature tracking. S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and

Edges”, CVPR, 2008.

Motion Segmentation: Clustering point features in videos based on their motion and spatial connectivity. S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any

Speed”, BMVC 2006. S. J. Pundlik and S. T. Birchfield, “Real Time Motion

Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008.

Articulated Human Motion Models: Learning

human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model)

Iris Segmentation: Texture and intensity based segmentation of non-ideal iris images . S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris

Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.

Point Features

Popular features: Harris corner feature [Harris & Stephens 1987, Schmid et al. 2000]

Shi-Tomasi feature [Shi & Tomasi 1994]

Forstner corner feature [Forstner 1994]

Scale invariant feature transform (SIFT) [Lowe 2000]

Gradient Location and Orientation Histogram (GLOH) [Mikolajczyk and Schmid 2005]

Features from accelerated segment test (FAST) [Rosten and Drummond 2005]

Speeded up robust features (SURF) [Bay et al. 2006]

DAISY [Tola et al. 2008]

gradientspoint featuresinput

capturing the information content

Utility of Point Features Advantages: highly repeatable and extensible (work for a variety of images) efficient to compute (real time implementations available) local methods for processing (tracking through multiple frames)

tracking multiple point features = sparse optical flow

sparse point feature tracks yield the image motion

Tracking Point Features : Lucas-Kanade

(optic flow constraint equation)

image pixel displacement

image spatial derivatives image temporal derivative

Estimate the pixel displacement u = ( u, v )T by minimizing:

Differentiating with respect to u and v, setting the derivatives to zero leads to a linear system:

Assume constant brightness:

Iterate using Newton-Raphson methodGradient covariance matrix

convolution kernel

Detection of Point Featuresin

ten

sity

xy

no feature

1

low intensity variation

emax = 5.15, emin = 3.13

two small eigenvalues

inte

nsi

ty

xy

edge feature

2

unidirectional intensity variation

emax = 1026.9, emin = 29.9

a small and a large eigenvalue

inte

nsi

ty

y x

good feature

3

bidirectional intensity variation

emax = 1672.44, emin = 932.4

two large eigenvalues

1

3

2Gradient covariance matrix:

eigenvalues of Z threshold

Good feature:

Z =

convolution kernelimage gradients

>

Dense Optical Flow: Horn-SchunckHorn-Schunck: find global displacement functions u(x,y) and v(x,y) by minimizing:

data term(optical flow constraint)

smoothness term

regularization parameter

Solve using Euler-Lagrange: Laplacian

Approximation leads to a sparse system:

average displacement in the neighborhooda constant

Need for a Joint ApproachLucas-Kanade (1981) Horn-Schunck (1981)

local method (local smoothing)

pixel displacement: constant within a small neighborhood

robust under noise

produces sparse optical flow

global method (global smoothing)

pixel displacement: a smooth function over the image domain

sensitive to noise

produces dense optical flow

use global smoothing to improve feature tracking

use local smoothing to improve dense optical flow

Joint Feature Tracking Combined Local-Global approach (Bruhn et al., 2004)

Joint Lucas-Kanade (JLK)

data term (optical flow constraint) smoothness term (regularization)

Joint Lucas-Kanade energy functional:number of feature points

Differentiating EJLK w.r.t. (u,v) gives a 2N x 2N system whose (2i-1) and (2i)th rows are given by:

Sparse system is solved using Jacobi iterations

expected values

Results of JLK

lowtexture

repetitivetexture











Mixture Models Basics

PosteriorProbability of drawing a Red

sample

likelihood of the sample being Red

(measurement)

prior probability of the Red bin

P(Red|sample) P(sample|Red) P(Red)

how Red is thedrawn sample?

how big is the Red bin?

3 bins (components)

sample

P(sample) = P(sample|Red)P(Red) + P(sample|Green) P(Green) + P(sample|Blue)P(Blue)

probability of drawing a sample from a mixture of three bins:

challenge: only available information is the drawn sample!

Mixture Model: likelihoods and priors for all the components

Mixture Model Example: GMM

Parameters of a Gaussian density, θ : mean (μ) and variance (σ2)

grayscale values

θ1= {μ1, σ1}

θ2= {μ2, σ2}

θ3= {μ3, σ3}

θ4= {μ4, σ4}

Gaussian density for the jth component:

ith pixel conditioned on parameters of the jth Gaussian density

Learning Mixture ModelsMixture model defined as:

number of components (known)

mixing weights (unknown) component density

density parameters (unknown)

Learning mixture models (parameter estimation): Estimate mixing weights and component density parameters

parameter estimation

class association (segmentation)

observed data point (known)

Circular nature of the problem:

Expectation MaximizationEM: an iterative two step algorithm for parameter estimation

1. Initialize:a. number of components Kb. component density parameters θ for all componentsc. mixing weights πd. convergence criterion

2. repeat until convergence E STEP

a. for all N data points i. compute likelihood from the component densityii. estimate weights, w

M STEPb. estimate mixing weightsc. estimate component density parameters

E Step: Find expectation of the likelihood function(Segmentation / label assignment)

M Step: Maximize the likelihood function(parameter estimation based on segmentation)

convergence: when the likelihood cannot be further maximized

(when estimates do not change between successive iterations )

Various Mixture Modelsdata term

(how closely the datafollow the models)

smoothness term(spatial interaction ofthe data elements)

one prior for each component(mixing weights)

prior distribution for each data element

(label probabilities)

neighbors mostlyhave similar labels(loose constraint)

enforce spatialconnectivity of labels

Finite Mixture ModelFMM

Spatially VariantFinite Mixture Model (ML)

ML-SVFMM [1]

Spatially VariantFinite Mixture Model (MAP)

MAP-SVFMM [1]

Spatially Constrained

Finite Mixture ModelSCFMM

EM algorithm Greedy EM algorithm

1. S. Sanjay-Gopal and T. Hebert, “Bayesian Pixel Classification Using Spatially Variant Finite Mixtures and Generalized EM Algorithm”, IEEE Tran. on Image Processing, 1998.

Greedy-EM (Iterative Region Growing)

start location 1

start location 2start location 3

consider a 4-connected grid

Properties of Greedy EM:

enforces spatial connectivity of labels (SCFMM) automatically determines the number of groups local initialization of parameters primary user defined parameters:

• inclusion criterion• minimum number of elements

in a group

Grouping Point FeaturesBetween two frames, Repeat

Randomly select seed feature

Fit motion model to neighbors

Repeat until group does not change: Discard all features

except the one near the centroid

Grow group by recursively including neighboring features with similar motion

Update the motion model Until all features have

been considered

grouping features from a single seed point

originalseed

originalseed

centroidcentroid

originalseed

centroid

originalseed

centroid

originalseed

originalseed

centroid

Grouping Consistent Features input: point features

tracked between two frames

output: groups of point features

for N seed points group point features gather sets of features

always grouped together

seed 1 seed 2

seed 3

consistent feature group

Grouping Consistent Features

+ =

1

1 1

1

1

1

a b c d

d

c

b

a1 1

1 1

1

1

1

a b c d

d

c

b

a

a b c d

d

c

b

a 2

2

2

2

2

ab

cd

ab

cd

a b

cd

Consistency check: Features that are always grouped together, no matter the seed point

In practice, we use 7 seed points

seed pointseed point

Consistent Features: Multiple Groups

Feature groups obtained for various iterations

consistent feature groups

Maintaining Groups Over Time

669

2

7

6

3

5

4

12

6

8

3

5

7track

features

find consistent groups

lost features

newly addedfeatures

if Х2 test fails1

4

8

37

9

5

frame k frame k + n

6

37

89

5

either features are regrouped

or multiple groups

are found

Experimental Results

mobile-calendar freethrow

car-maprobotsstatue

Videos

statue sequencemobile-calendar sequence

Results Over Time

freethrow mobile-calendar statue

car-map robots vehicles

Algorithm dynamically determines the number of feature groups

Comparison with Other Approaches

Algorithm Run Time (sec/frame)

Max. number of

groups

Xiao and Shah (PAMI, 2005) 520 4

Kumar et al. (ICCV, 2005) 500 6

Smith et al. (PAMI, 2004) 180 3

Rothganger et al. (CVPR, 2004) 30 3

Jojic and Frey (CVPR, 2001) 1 3

Cremers and Soatto (IJCV, 2005) 40 4

Our algorithm (TSMC, 2008) 0.16 8

Effect of Joint Feature Tracking

input

standard Lucas-Kanade

Joint Lucas-Kanade











Articulated Motion Models

Objectives: learn articulated human motion models motion only, no appearance viewpoint and scale invariant detection varying lighting conditions (day and night time sequences) detection in presence of camera and background motion pose estimation

Theme: Sparse Motion alone captures a wealth of information

Purpose of human motion analysis: pedestrian detection/surveillance action recognition pose estimation

Traditional approaches use: appearance frame differencing

Use of Motion Capture Data

motion capture (mocap) data in 3D

train high-level descriptors (appearance or motion based) that describe articulated motion at a global level for detection

learn the motion of individual joints from the training data and aggregate the information to detect human motion

Bottom-Up Approach

Top-Down Approach

hand

foot 2 foot 1

center

displacement of the limbs w.r.t. the body center

Approach Overview

Training

3D motion capture points angular viewpoints

walking poses

Motion Descriptor

Gaussian weight maps for the various means and orientations that constitute the motion descriptor

spatial arrangement of the descriptor bins w.r.t. the body center

bin values of the motion descriptor describing human subjects from various viewpoints and pose configurations

views

poses

confusion matrix for 64 training descriptors

Segmentation Results

View-invariant segmentation of articulated motion using a motion descriptorright profile left profile angular front

Segmentation of articulated motion in a challenging sequence involving camera and background motion

Pose Estimation Results

front view nighttime sequence

right-profile view angular view

Videos of Detection and Pose Estimation











Iris Image Segmentationnon-ideal iris image segmentation using texture and intensity

Ideas: • local intensity variations (computed from gradient magnitude and point features) can be used for texture representation that segments eyelash and non-eyelash regions• possible segments based on image intensity: iris, pupil and background

higher densityof point features

higher gradient magnitude

lower density of point features

lower gradient magnitude

input image

point features

gradient magnitude

background irispupil

Coarse Texture Computation

eye

eyelash non-eyelash

iris pupil background

texturedregions

un-texturedregions

(Four Regions)

Iris Segmentation and Recognition

Input Iris Image

Preprocessed input Iris Segmentation Iris Refinement

Iris Mask

Iris Ellipse

Specular Reflections

-

Iris segmentation:

Iris recognition: unwrap and normalize the iris mask generate iris signature from iris mask (using texture in the iris) compare iris signature using Hamming distance

Image Segmentation Results

iris

background

pupil

eyelashes

Input Image Segmentation Iris Mask

Iris RecognitionIris recognition using our segmentation algorithm

West Virginia Non-Ideal Database West Virginia Off-Axis Database1868 images

467 classes, 4 images/class584 images

146 classes, 4 images/class

Conclusions and Future Work

Motion segmentation based on sparse feature clustering spatially constrained mixture model and greedy EM algorithm automatically determines number of groups real-time performance ability to handle long, dynamic sequences and arbitrary number of feature groups

Joint feature tracking incorporation of neighboring feature motion improved performance in areas of low-texture or repetitive texture

Detection of articulated motion motion based approach for learning high-level human motion models segment and track human motion in varying pose, scale, and lighting conditions view invariant pose estimation

Iris segmentation graph cuts based dense segmentation using texture and intensity combines appearance and eye geometry handles non-ideal iris image with occlusion, illumination changes, and eye rotation

Future Work integration of motion segmentation, joint feature tracking, and articulated motion segmentation dense segmentation from the sparse feature groups handling non-rigid motions, non-textured regions, and occlusions combining sparse feature groups, discontinuities, and image contours for a novel

representation of video

Questions?

Documents

Shrinivas Pundlik Committee members Dr. Stan Birchfield (chair) Dr. Adam Hoover Dr. Ian Walker