19
Gestures Recognition

Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Embed Size (px)

Citation preview

Page 1: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Gestures Recognition

Page 2: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Image acquisition

• Image acquisition at BBC R&D studios in London using eight different viewpoints.

• Sequence frame-by-frame segmentation provided by Chroma keying system using non-blue screen.

Special camera provided with blue LED lights.

Room equipped withimage acquisition systems

Page 3: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Image acquisition

• Example of eight shots from synchronized cameras:

Page 4: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Image segmentation

• Chroma Keying segmentation enhanced by Shen algorithm.

Original image Chroma Keying application featuring Shen algorithm

Final result

Page 5: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

3D Reconstruction

• Frame-by-frame 3D recontruction using different-viewpoint shots.

• Software developed in ISPG by Federico Lupica expecially to archive this goal:

[ScreenShot del siluetto]

A screenshot of the 3D reconstruction system

Page 6: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

3D Reconstruction

This method is performed for each frame of a gesture action sequence.

•Example of voxelsets creation by 3D intersection of Visual Hulls projected from segmented edges.

Page 7: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Feature extraction

• Global body barycentre, height and horizontal plane projection are derived from each voxelset representing an action shot.

Barycentre

Horizontal Projection

Height

Page 8: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Feature extraction

• Motion direction first rough estimate as the major axis of the ellipse fitting the shape previously projected onto the horizontal plane.

Estimated motion direction

Page 9: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Feature extraction• Voxelsets are divided into six fundamental body regions, each of

them marked with a barycentre. K-Means algorithm is used in order to perform clustering.

Head/Shoulders

Right arm

Left arm

Abdomen

Left leg

Right leg

Page 10: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Feature extraction• Another, more accurate, motion direction estimate is computed through an

LDA (Linear Discriminant Analysis) based method, particularly studied for this project:

Vectors defining the LDA plane:maximum ratio of between-class variance to the within-class variance is guaranteed for datasets of legs and abdomen if projected onto this plane.

Note: this plane is a representation of a maximum separability slice of legs, hence it could be seen as orthogonal to the motion direction. The abdonem dataset is used in LDA computation in order to mantain plane verticality, othewise it could suffer from legs obliquity.

matrixvariance cluster-between

matrixvariance cluster-within

plane) LDAfining vectors(de

:formula main LDA

b

w

bw

S

S

SSinvrseigenvecto

Page 11: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Feature extraction

• The normal to the LDA plane projected onto the horizontal plane could be considered as an estimate of the motion direction.

• Eventually the set of features used for next statistical computations is made up of the 3D coordinates of each cluster barycentre in a reference frame represented by the motion direction, the normal to the motion direction (on horizontal plane) and the original vertical axis. This reference frame is centred in the global barycentre.

Page 12: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Feature extraction (conclusions)

• Features are now trajectory independent, integral with the moving body.

• Change of leg forward-backward during walking causes unavoidable alterations of motion direction estimate: it rotates from front to back and vice versa. Consequences are rigid rotations of patterns designed by features in the new reference frame. The next recognition system could take advantage from this: it is obvoiusly another degree of freedom that convey important informations to the statistical modelling process.

Page 13: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Gestures modeling

• HMM (Hiddel Markov Model) method is used in order to obtain a model given a set of features representing evolution of an entire action in time.

• E.M. (Expectation-Maximization) algorithm is performed in order to build the most likely HMM model for an action.

• Each HMM is characterized by a set of matrices:

sequence)a of(frame voxelsetsingle a for features of number

model) Markovthe (in statesavaiable of number

variancestate -to-observable(

value meanstate -to-observable

yprobabilitstate -to-state

F

N

where

sossNF

sossNF

ssNN

ji

ji

ji

:

|var)

|)(

|Pr)(

EC

A

Page 14: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Getures modeling

• Steps followed by E.M. algorithm:

Estimation of a prioriand a posteriori probability

for a given set of features along timeto be output of this HMM model

Computation of expectation valuesof each model matrix pretending

that the given set of featuresalong time should be model output

If evaluated probabilities are overa given threshold

STOP

Page 15: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Getures modeling

• Example of military marching action compound of 110 frames interpreted by a three-states HMM:

States

Frame number

Page 16: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Model clustering

• Clustering of HMM models is done by a metric definition, creating therefore a model space.

• Kullback-Leibler distance is used as a metric between HMM models. It is defined as:

model given likelihood sequencenobservatioOp

sequencenobservatio generated of lenghtT

method) Carlo Monte modelby generated sequencenobservatioO

models HMM,

:where

Op

Op

TOp

Op

TJ

)(

(

)(

)(log1

)(

)(log1

),(

Page 17: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Model clustering• Example of K-L distance iterative computation between two HMM

models (two instances of military marching action).

Page 18: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Recognition

• Gestures recognition is performed through computation of a new HMM model given a set of features along time and then through its classification into the space containing HMM clusters.

• If the distance between the new model and each cluster barycentre is over a threshold, this gesture is considered a new one and the system begin to build a new cluster to put it up.

Page 19: Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation

Future work

• Estimate a consistent measure in order to quantify SNR involved in action recognition, distinguishing what is important in feature signals for the recognition system and what can be considered noise.

• Designing an efficient filtering system in order to maximize this SNR, probably using some costraint given by body joints.

• Enhance the recognition system based on HMM and, in case, search for another recognition engine.

• Implement a more efficient software version of the system, possibly to have real-time recognition.