Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Gestures Recognition

Image acquisition

• Image acquisition at BBC R&D studios in London using eight different viewpoints.

• Sequence frame-by-frame segmentation provided by Chroma keying system using non-blue screen.

Special camera provided with blue LED lights.

Room equipped withimage acquisition systems

Image acquisition

• Example of eight shots from synchronized cameras:

Image segmentation

• Chroma Keying segmentation enhanced by Shen algorithm.

Original image Chroma Keying application featuring Shen algorithm

Final result

3D Reconstruction

• Frame-by-frame 3D recontruction using different-viewpoint shots.

• Software developed in ISPG by Federico Lupica expecially to archive this goal:

[ScreenShot del siluetto]

A screenshot of the 3D reconstruction system

3D Reconstruction

This method is performed for each frame of a gesture action sequence.

•Example of voxelsets creation by 3D intersection of Visual Hulls projected from segmented edges.

Feature extraction

• Global body barycentre, height and horizontal plane projection are derived from each voxelset representing an action shot.

Barycentre

Horizontal Projection

Height

Feature extraction

• Motion direction first rough estimate as the major axis of the ellipse fitting the shape previously projected onto the horizontal plane.

Estimated motion direction

Feature extraction• Voxelsets are divided into six fundamental body regions, each of

them marked with a barycentre. K-Means algorithm is used in order to perform clustering.

Head/Shoulders

Right arm

Left arm

Abdomen

Left leg

Right leg

Feature extraction• Another, more accurate, motion direction estimate is computed through an

LDA (Linear Discriminant Analysis) based method, particularly studied for this project:

Vectors defining the LDA plane:maximum ratio of between-class variance to the within-class variance is guaranteed for datasets of legs and abdomen if projected onto this plane.

Note: this plane is a representation of a maximum separability slice of legs, hence it could be seen as orthogonal to the motion direction. The abdonem dataset is used in LDA computation in order to mantain plane verticality, othewise it could suffer from legs obliquity.

matrixvariance cluster-between

matrixvariance cluster-within

plane) LDAfining vectors(de

:formula main LDA

b

w

bw

S

S

SSinvrseigenvecto

Feature extraction

• The normal to the LDA plane projected onto the horizontal plane could be considered as an estimate of the motion direction.

• Eventually the set of features used for next statistical computations is made up of the 3D coordinates of each cluster barycentre in a reference frame represented by the motion direction, the normal to the motion direction (on horizontal plane) and the original vertical axis. This reference frame is centred in the global barycentre.

Feature extraction (conclusions)

• Features are now trajectory independent, integral with the moving body.

• Change of leg forward-backward during walking causes unavoidable alterations of motion direction estimate: it rotates from front to back and vice versa. Consequences are rigid rotations of patterns designed by features in the new reference frame. The next recognition system could take advantage from this: it is obvoiusly another degree of freedom that convey important informations to the statistical modelling process.

Gestures modeling

• HMM (Hiddel Markov Model) method is used in order to obtain a model given a set of features representing evolution of an entire action in time.

• E.M. (Expectation-Maximization) algorithm is performed in order to build the most likely HMM model for an action.

• Each HMM is characterized by a set of matrices:

sequence)a of(frame voxelsetsingle a for features of number

model) Markovthe (in statesavaiable of number

variancestate -to-observable(

value meanstate -to-observable

yprobabilitstate -to-state

F

N

where

sossNF

sossNF

ssNN

ji

ji

ji

:

|var)

|)(

|Pr)(

EC

A

Getures modeling

• Steps followed by E.M. algorithm:

Estimation of a prioriand a posteriori probability

for a given set of features along timeto be output of this HMM model

Computation of expectation valuesof each model matrix pretending

that the given set of featuresalong time should be model output

If evaluated probabilities are overa given threshold

STOP

Getures modeling

• Example of military marching action compound of 110 frames interpreted by a three-states HMM:

States

Frame number

Model clustering

• Clustering of HMM models is done by a metric definition, creating therefore a model space.

• Kullback-Leibler distance is used as a metric between HMM models. It is defined as:

model given likelihood sequencenobservatioOp

sequencenobservatio generated of lenghtT

method) Carlo Monte modelby generated sequencenobservatioO

models HMM,

:where

Op

Op

TOp

Op

TJ

)(

(

)(

)(log1

)(

)(log1

),(

Model clustering• Example of K-L distance iterative computation between two HMM

models (two instances of military marching action).

Recognition

• Gestures recognition is performed through computation of a new HMM model given a set of features along time and then through its classification into the space containing HMM clusters.

• If the distance between the new model and each cluster barycentre is over a threshold, this gesture is considered a new one and the system begin to build a new cluster to put it up.

Future work

• Estimate a consistent measure in order to quantify SNR involved in action recognition, distinguishing what is important in feature signals for the recognition system and what can be considered noise.

• Designing an efficient filtering system in order to maximize this SNR, probably using some costraint given by body joints.

• Enhance the recognition system based on HMM and, in case, search for another recognition engine.

• Implement a more efficient software version of the system, possibly to have real-time recognition.

Documents

Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation