Video Clustering

7/31/2019 Video Clustering

1/44

From videos to verbs: Mining videos foractivities using a cascade of dynamical systems

Pavan Turaga, Ashok Veeraraghavan, Rama Chellappa


2/44

Outline

What is video mining ?

Challenges

Prior Work

Overview of proposed algorithms

Experiments


3/44

Videos galore ..


4/44

Video MiningWhat is it ?

Isolate activities of interest from long videos

Identify repetitive activities

Aid analyst by presenting clusters


5/44

Challenges

Unsupervised: Dont know what we are looking for

Do not know temporal boundaries of an activity

Do not know how many clusters to find

Need to be invariant to affine changes, view, execution rate


6/44

Related Work

HMMsStochastic GrammarsTime seriesclustering

Shot BoundarydetectionNews cast, sportsvideos

Switched lineardynamic systemsSubspace angles


7/44

Where do we fit in ?

Mining Videos for Events using acascade of dynamical systems

ClusteringRecognition

View InvarianceRate Invariance

Learning the modelDistance metrics


8/44

Tiers of processing

Find repetitive sequencesof action elements

Extract Action-Elements(Temporal Segmentation)

Low Level Features


9/44

Tier I: Low-level features

Any of a wide choice of features depending on domain

Silhouettes

Point TrajectoriesOptical Flow

Kendalls Shape


10/44

Tier II: Segmentation

Break video into segments such that each segment can bemodelled by a linear dynamic system

How to segment ?

Curvature in space-timeAffine Motion-model

Shape deformation Texture


11/44

Tier III: Sequence of LTI

Simpler case of SLDS

Activity composed ofsegments of consistentmotion

Each segment modeled asLTI system

),0(~)(),()()1(

),0(~)(),()()(

QNtvtvtAztz

RNtwtwtCztf


12/44

Learning the Model: Prediction ErrorMethods

Maximum likelihood solution difficult to compute.

Instead, use Minimum prediction error criterion.

Solution can be obtained in closed form.


13/44


14/44

Is the learnt model any good ?

A very useful testfor a class of

generative modelsis to synthesize

from it

Ulf Grenander -Father of Pattern theory


15/44

Is the learnt model any good ?


16/44

Distance Metric for ARMA models

Principal angles betweencolumn spaces ofobservability matrices of thetwo models

Three types of distances:

}{ i


17/44

Clustering

Do not know number ofclusters

Multibody FactorizationApproachPerform row-column permutations


18/44

Guessing the number of clusters

Spectral Graph Theory

Construct normalizedLaplacian

Multiplicity of zeroeigenvalue = number ofconnected components ingraph (idealized case)

Practical Scenario

Elbow


19/44


20/44

So far

Model Activities as sequence of LTI

Segment video stream in subsequences

Learn model parameters for each segment

Cluster the segments

Identify repetitive sequences of labels


21/44

View and rate variations


22/44

Building Invariances: Motivation

Feature transforms

Reflected inobservation matrix C

Estimate transformparameters

from segments C1, C2

Change in execution rate

Reflected inState transition matrix A

Estimaterelative sampling rate

between segments A1, A2


23/44

Spatial transforms and the Observationmatrix

),0(~)(),()()(11

RNtwtwtzCtf

),0(~)(),()()(22

RNtwtwtzCtf

Transform T ?


24/44

Invariances


25/44

Implication

If two sequences are related by an affine transform, then thecorresponding principal components are also related by the sameaffine transform

Thus, affine transforms can be estimated from the C matrices


26/44

Affine Transforms


27/44

View Invariance

Result may be extended to view changes in a limited way

Valid when perspective distortion can be approximated by anaffinity


28/44

Compensating

Let S1 = (A1,C1) and S2 = (A2,C2)

where, T(S1) = (A1, T(C1)), and T is an appropriatetranformation group.


29/44

Performing the minimization

Optimization Procedures

Gradient based

Direct search

Stochastic approaches

Direct Methods: Used when gradients cannot be computed

Nelder-Mead (Simplex) procedure is extremely popular


30/44

Time warp and the transition matrix

),0(~)(),()()1(111

QNtvtvtzAtz

Warp factor q ?

),0(~)(),()()1( 222 QNtvtvtzAtz


31/44

Invariance to Execution rate


32/44

Some experiments


33/44

Visualizing the clusters

Bend

Throw

Phone Bat

Squat


34/44

A recognition experiment


35/44

Far field expt

Left Right

Top 5/6 6/6

Bottom 4/6 5/6


36/44

Model Order SelectionUSF gait expt


37/44

Change of View


38/44

Thank You !

Questions welcome.


39/44

Building Invariances: Motivation

Feature transforms

Reflected inobservation matrix C

Estimate transformparameters

from segments C1, C2

Change in execution rate

Reflected inState transition matrix A

Estimaterelative sampling rate

between segments A1, A2


40/44

Spatial transforms and the Observationmatrix

),0(~)(),()()(11

RNtwtwtzCtf

),0(~)(),()()( 22 RNtwtwtzCtf

Transform T ?


41/44

It can be shown that

If two sequences are related by an affine transform, then thecorresponding principal components are also related by the sameaffine transform

Thus, affine transforms can be estimated from the C matrices


42/44

View Invariance

Result may be extended to view changes in a limited way

Valid when perspective distortion can be approximated by anaffinity


43/44

Compensating

Let S1 = (A1,C1) and S2 = (A2,C2)

where, T(S1) = (A1, T(C1))

Nelder-Mead (Simplex) procedure to perform the minimization


44/44

Performing the minimization

Optimization Procedures

Gradient based

Direct search

Stochastic approaches

Direct Methods: Used when gradients cannot be computed

Nelder-Mead (Simplex) procedure is extremely popular

Documents

Video Clustering