Upload
xena
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Motion Segmentation from Clustering of Sparse Point Features Using Spatially Constrained Mixture Models. Shrinivas Pundlik Committee members Dr. Stan Birchfield (chair) Dr. Adam Hoover Dr. Ian Walker Dr. Damon Woodard. Motion Segmentation. - PowerPoint PPT Presentation
Citation preview
Motion Segmentation from Clustering of
Sparse Point Features Using Spatially
Constrained Mixture Models
Shrinivas Pundlik
Committee members
Dr. Stan Birchfield (chair)Dr. Adam Hoover
Dr. Ian WalkerDr. Damon Woodard
Motion SegmentationGestalt insight: grouping forms the basis of human perception
Gestalt Laws: Factors that affect the grouping process (cues)
Motion segmentation: segmenting images based on common motion
points moving together are grouped together
similarity proximity common motion(common fate)
continuity
Typically, motion segmentation uses common motion + proximity
Applications of Motion Segmentation object detection
pedestrian detection tracking
vehicle tracking robotics surveillance image and video compression scene reconstruction video manipulation / editing
video matting video annotation motion magnification
Video editingCriminisi et al., 2006
Vehicle trackingKanhere et al., 2005
Pedestrian detectionViola et al., 2003
Previous Work
Approach
Wang and Adelson 1994
Xiao and Shah 2005
Ayer and Sawhney 1995
Willis et al. 2003
Motion Layer Estimation
Multi Body Factorization
Object Level Grouping
Miscellaneous
Costeria and Kanade 1995
Sivic et al. 2004Kanhere et al. 2005
Ke and Kanade 2002
Black and Fleet 1998
Birchfield 1999Levine and Weiss 2006
Vidal and Sastry 2003Yan and Pollefeys 2006Gruber and Weiss 2006
Jojic and Frey 2001
Algorithm
Shi and Malik 1998
Expectation Maximization
Graph Cuts
Belief Propagation
Normalized Cuts
Jojic and Frey 2001
Smith et al. 2004Kokkinos and Maragos 2004
Xiao and Shah 2005
Willis et al. 2003
Criminisi et al. 2006
Kumar et al. 2005
Variational Methods
Cremers and Soatto 2005
Brox et al. 2005
Nature of Data
Dense Motion
Motion + Image Cues
Sparse Features
Sivic et al. 2004Kanhere et al. 2005Rothganger et al. 2004
Cremers and Soatto 2005
Brox et al. 2005
Kumar et al. 2005
Criminisi et al. 2006
Xiao and Shah 2005
Challenges: Short Term1. statue
2. wall
4. grass
3. trees
5. biker
6. pedestrian
+
++
+++
computation of motion in the scene• influence of the neighboring motion
number of objects / regions in the scene
initialization of motion parameters description of complex motions (articulated human motion)
Challenges: Long Term
t
thre
shold
slow
mediumfastx
time windowbatch processing incremental processing
thre
shold
slowmediumfast
t
x crawling
time window
batch processing vs. incremental processing• updating the reference frame
maintain existing groups• growing existing regions• splitting
adding new groups (new objects) (deleting invisible groups)
Objectives
motioncomputation
clustering (two-frame)
long-term maintenance
of groups
observed data
parameterestimation
groupassignment
motion models
translation affine
complex models
Feature Tracking Motion Segmentation
MixtureModel
Framework
Articulated Human
Motion Models
• motion segmentation using sparse point features
• automatically determine the number of groups
• handling dynamic sequences
• real time performance
• handling complex motions
Overview of the Topics Feature Tracking: Tracking sparse point features
for computation of image motion and its extension to joint feature tracking. S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and
Edges”, CVPR, 2008.
Motion Segmentation: Clustering point features in videos based on their motion and spatial connectivity. S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any
Speed”, BMVC 2006. S. J. Pundlik and S. T. Birchfield, “Real Time Motion
Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008.
Articulated Human Motion Models: Learning
human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model)
Iris Segmentation: Texture and intensity based segmentation of non-ideal iris images . S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris
Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.
Point Features
Popular features: Harris corner feature [Harris & Stephens 1987, Schmid et al. 2000]
Shi-Tomasi feature [Shi & Tomasi 1994]
Forstner corner feature [Forstner 1994]
Scale invariant feature transform (SIFT) [Lowe 2000]
Gradient Location and Orientation Histogram (GLOH) [Mikolajczyk and Schmid 2005]
Features from accelerated segment test (FAST) [Rosten and Drummond 2005]
Speeded up robust features (SURF) [Bay et al. 2006]
DAISY [Tola et al. 2008]
gradientspoint featuresinput
capturing the information content
Utility of Point Features Advantages: highly repeatable and extensible (work for a variety of images) efficient to compute (real time implementations available) local methods for processing (tracking through multiple frames)
tracking multiple point features = sparse optical flow
sparse point feature tracks yield the image motion
Tracking Point Features : Lucas-Kanade
(optic flow constraint equation)
image pixel displacement
image spatial derivatives image temporal derivative
Estimate the pixel displacement u = ( u, v )T by minimizing:
Differentiating with respect to u and v, setting the derivatives to zero leads to a linear system:
Assume constant brightness:
Iterate using Newton-Raphson methodGradient covariance matrix
convolution kernel
Detection of Point Featuresin
ten
sity
xy
no feature
1
low intensity variation
emax = 5.15, emin = 3.13
two small eigenvalues
inte
nsi
ty
xy
edge feature
2
unidirectional intensity variation
emax = 1026.9, emin = 29.9
a small and a large eigenvalue
inte
nsi
ty
y x
good feature
3
bidirectional intensity variation
emax = 1672.44, emin = 932.4
two large eigenvalues
1
3
2Gradient covariance matrix:
eigenvalues of Z threshold
Good feature:
Z =
convolution kernelimage gradients
>
Dense Optical Flow: Horn-SchunckHorn-Schunck: find global displacement functions u(x,y) and v(x,y) by minimizing:
data term(optical flow constraint)
smoothness term
regularization parameter
Solve using Euler-Lagrange: Laplacian
Approximation leads to a sparse system:
average displacement in the neighborhooda constant
Need for a Joint ApproachLucas-Kanade (1981) Horn-Schunck (1981)
local method (local smoothing)
pixel displacement: constant within a small neighborhood
robust under noise
produces sparse optical flow
global method (global smoothing)
pixel displacement: a smooth function over the image domain
sensitive to noise
produces dense optical flow
use global smoothing to improve feature tracking
use local smoothing to improve dense optical flow
Joint Feature Tracking Combined Local-Global approach (Bruhn et al., 2004)
Joint Lucas-Kanade (JLK)
data term (optical flow constraint) smoothness term (regularization)
Joint Lucas-Kanade energy functional:number of feature points
Differentiating EJLK w.r.t. (u,v) gives a 2N x 2N system whose (2i-1) and (2i)th rows are given by:
Sparse system is solved using Jacobi iterations
expected values
Results of JLK
lowtexture
repetitivetexture
Overview of the Topics Feature Tracking: Tracking sparse point features
for computation of image motion and its extension to joint feature tracking. S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and
Edges”, CVPR, 2008.
Motion Segmentation: Clustering point features in videos based on their motion and spatial connectivity. S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any
Speed”, BMVC 2006. S. J. Pundlik and S. T. Birchfield, “Real Time Motion
Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008.
Articulated Human Motion Models: Learning
human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model)
Iris Segmentation: Texture and intensity based segmentation of non-ideal iris images . S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris
Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.
Mixture Models Basics
PosteriorProbability of drawing a Red
sample
likelihood of the sample being Red
(measurement)
prior probability of the Red bin
P(Red|sample) P(sample|Red) P(Red)
how Red is thedrawn sample?
how big is the Red bin?
3 bins (components)
sample
P(sample) = P(sample|Red)P(Red) + P(sample|Green) P(Green) + P(sample|Blue)P(Blue)
probability of drawing a sample from a mixture of three bins:
challenge: only available information is the drawn sample!
Mixture Model: likelihoods and priors for all the components
Mixture Model Example: GMM
Parameters of a Gaussian density, θ : mean (μ) and variance (σ2)
grayscale values
θ1= {μ1, σ1}
θ2= {μ2, σ2}
θ3= {μ3, σ3}
θ4= {μ4, σ4}
Gaussian density for the jth component:
ith pixel conditioned on parameters of the jth Gaussian density
Learning Mixture ModelsMixture model defined as:
number of components (known)
mixing weights (unknown) component density
density parameters (unknown)
Learning mixture models (parameter estimation): Estimate mixing weights and component density parameters
parameter estimation
class association (segmentation)
observed data point (known)
Circular nature of the problem:
Expectation MaximizationEM: an iterative two step algorithm for parameter estimation
1. Initialize:a. number of components Kb. component density parameters θ for all componentsc. mixing weights πd. convergence criterion
2. repeat until convergence E STEP
a. for all N data points i. compute likelihood from the component densityii. estimate weights, w
M STEPb. estimate mixing weightsc. estimate component density parameters
E Step: Find expectation of the likelihood function(Segmentation / label assignment)
M Step: Maximize the likelihood function(parameter estimation based on segmentation)
convergence: when the likelihood cannot be further maximized
(when estimates do not change between successive iterations )
Various Mixture Modelsdata term
(how closely the datafollow the models)
smoothness term(spatial interaction ofthe data elements)
one prior for each component(mixing weights)
prior distribution for each data element
(label probabilities)
neighbors mostlyhave similar labels(loose constraint)
enforce spatialconnectivity of labels
Finite Mixture ModelFMM
Spatially VariantFinite Mixture Model (ML)
ML-SVFMM [1]
Spatially VariantFinite Mixture Model (MAP)
MAP-SVFMM [1]
Spatially Constrained
Finite Mixture ModelSCFMM
EM algorithm Greedy EM algorithm
1. S. Sanjay-Gopal and T. Hebert, “Bayesian Pixel Classification Using Spatially Variant Finite Mixtures and Generalized EM Algorithm”, IEEE Tran. on Image Processing, 1998.
Greedy-EM (Iterative Region Growing)
start location 1
start location 2start location 3
consider a 4-connected grid
Properties of Greedy EM:
enforces spatial connectivity of labels (SCFMM) automatically determines the number of groups local initialization of parameters primary user defined parameters:
• inclusion criterion• minimum number of elements
in a group
Grouping Point FeaturesBetween two frames, Repeat
Randomly select seed feature
Fit motion model to neighbors
Repeat until group does not change: Discard all features
except the one near the centroid
Grow group by recursively including neighboring features with similar motion
Update the motion model Until all features have
been considered
grouping features from a single seed point
originalseed
originalseed
centroidcentroid
originalseed
centroid
originalseed
centroid
originalseed
originalseed
centroid
Grouping Consistent Features input: point features
tracked between two frames
output: groups of point features
for N seed points group point features gather sets of features
always grouped together
seed 1 seed 2
seed 3
consistent feature group
Grouping Consistent Features
+ =
1
1 1
1
1
1
a b c d
d
c
b
a1 1
1 1
1
1
1
a b c d
d
c
b
a
a b c d
d
c
b
a 2
2
2
2
2
ab
cd
ab
cd
a b
cd
Consistency check: Features that are always grouped together, no matter the seed point
In practice, we use 7 seed points
seed pointseed point
Consistent Features: Multiple Groups
Feature groups obtained for various iterations
consistent feature groups
Maintaining Groups Over Time
669
2
7
6
3
5
4
12
6
8
3
5
7track
features
find consistent groups
lost features
newly addedfeatures
if Х2 test fails1
4
8
37
9
5
frame k frame k + n
6
37
89
5
either features are regrouped
or multiple groups
are found
Experimental Results
mobile-calendar freethrow
car-maprobotsstatue
Videos
statue sequencemobile-calendar sequence
Results Over Time
freethrow mobile-calendar statue
car-map robots vehicles
Algorithm dynamically determines the number of feature groups
Comparison with Other Approaches
Algorithm Run Time (sec/frame)
Max. number of
groups
Xiao and Shah (PAMI, 2005) 520 4
Kumar et al. (ICCV, 2005) 500 6
Smith et al. (PAMI, 2004) 180 3
Rothganger et al. (CVPR, 2004) 30 3
Jojic and Frey (CVPR, 2001) 1 3
Cremers and Soatto (IJCV, 2005) 40 4
Our algorithm (TSMC, 2008) 0.16 8
Effect of Joint Feature Tracking
input
standard Lucas-Kanade
Joint Lucas-Kanade
Overview of the Topics Feature Tracking: Tracking sparse point features
for computation of image motion and its extension to joint feature tracking. S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and
Edges”, CVPR, 2008.
Motion Segmentation: Clustering point features in videos based on their motion and spatial connectivity. S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any
Speed”, BMVC 2006. S. J. Pundlik and S. T. Birchfield, “Real Time Motion
Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008.
Articulated Human Motion Models: Learning
human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model)
Iris Segmentation: Texture and intensity based segmentation of non-ideal iris images . S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris
Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.
Articulated Motion Models
Objectives: learn articulated human motion models motion only, no appearance viewpoint and scale invariant detection varying lighting conditions (day and night time sequences) detection in presence of camera and background motion pose estimation
Theme: Sparse Motion alone captures a wealth of information
Purpose of human motion analysis: pedestrian detection/surveillance action recognition pose estimation
Traditional approaches use: appearance frame differencing
Use of Motion Capture Data
motion capture (mocap) data in 3D
train high-level descriptors (appearance or motion based) that describe articulated motion at a global level for detection
learn the motion of individual joints from the training data and aggregate the information to detect human motion
Bottom-Up Approach
Top-Down Approach
hand
foot 2 foot 1
center
displacement of the limbs w.r.t. the body center
Approach Overview
Training
3D motion capture points angular viewpoints
walking poses
Motion Descriptor
Gaussian weight maps for the various means and orientations that constitute the motion descriptor
spatial arrangement of the descriptor bins w.r.t. the body center
bin values of the motion descriptor describing human subjects from various viewpoints and pose configurations
views
poses
confusion matrix for 64 training descriptors
Segmentation Results
View-invariant segmentation of articulated motion using a motion descriptorright profile left profile angular front
Segmentation of articulated motion in a challenging sequence involving camera and background motion
Pose Estimation Results
front view nighttime sequence
right-profile view angular view
Videos of Detection and Pose Estimation
Overview of the Topics Feature Tracking: Tracking sparse point features
for computation of image motion and its extension to joint feature tracking. S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and
Edges”, CVPR, 2008.
Motion Segmentation: Clustering point features in videos based on their motion and spatial connectivity. S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any
Speed”, BMVC 2006. S. J. Pundlik and S. T. Birchfield, “Real Time Motion
Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans. on Systems, Man, and Cybernetics, 2008.
Articulated Human Motion Models: Learning
human walking motion from various pose and view angles for segmentation and pose estimation (a special handling of a complex motion model)
Iris Segmentation: Texture and intensity based segmentation of non-ideal iris images . S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris
Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics, 2008.
Iris Image Segmentationnon-ideal iris image segmentation using texture and intensity
Ideas: • local intensity variations (computed from gradient magnitude and point features) can be used for texture representation that segments eyelash and non-eyelash regions• possible segments based on image intensity: iris, pupil and background
higher densityof point features
higher gradient magnitude
lower density of point features
lower gradient magnitude
input image
point features
gradient magnitude
background irispupil
Coarse Texture Computation
eye
eyelash non-eyelash
iris pupil background
texturedregions
un-texturedregions
(Four Regions)
Iris Segmentation and Recognition
Input Iris Image
Preprocessed input Iris Segmentation Iris Refinement
Iris Mask
Iris Ellipse
Specular Reflections
-
Iris segmentation:
Iris recognition: unwrap and normalize the iris mask generate iris signature from iris mask (using texture in the iris) compare iris signature using Hamming distance
Image Segmentation Results
iris
background
pupil
eyelashes
Input Image Segmentation Iris Mask
Iris RecognitionIris recognition using our segmentation algorithm
West Virginia Non-Ideal Database West Virginia Off-Axis Database1868 images
467 classes, 4 images/class584 images
146 classes, 4 images/class
Conclusions and Future Work
Motion segmentation based on sparse feature clustering spatially constrained mixture model and greedy EM algorithm automatically determines number of groups real-time performance ability to handle long, dynamic sequences and arbitrary number of feature groups
Joint feature tracking incorporation of neighboring feature motion improved performance in areas of low-texture or repetitive texture
Detection of articulated motion motion based approach for learning high-level human motion models segment and track human motion in varying pose, scale, and lighting conditions view invariant pose estimation
Iris segmentation graph cuts based dense segmentation using texture and intensity combines appearance and eye geometry handles non-ideal iris image with occlusion, illumination changes, and eye rotation
Future Work integration of motion segmentation, joint feature tracking, and articulated motion segmentation dense segmentation from the sparse feature groups handling non-rigid motions, non-textured regions, and occlusions combining sparse feature groups, discontinuities, and image contours for a novel
representation of video
Questions?