Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct. 2010

Structural Human Action Recognition from Still Images

Moin Nabi

Computer Vision Lab.

©IPM - Oct. 2010

Problem Definition

How can we recognize human action from a single Image?

?

Problem Definition

Pose as a Latent Valiable

Application

• News/sports image retrieval and analysis• An important cue for video-based action

recognition

Previous Works• Global template-based representation

HOG by Dalal and Triggs. And , Ikizler-Cinbis et al. ICCV09

5

Action Label

Action Label

• Pose estimation -> action recognitione.g. Ramanan and Forsyth NIPS03, Ferrari et al. CVPR09

Our Work

• Examplar based representation

Using Poselet as a new definition of a part

6

• Pose estimation + action recognition

Discriminative Pose

Golfing?

Walking?

• All elements of pose are not equally important• Develop integrated learning framework to

estimate pose for action recognition

Pose Representation

8

• We use a coarse non-parametric pose representation– An action-specific variant of the poselet[Bourdev&Malik ICCV09]

• A poselet is a set of patches not only with similar pose configuration, but also from the same action class.

Poselets

• Poselets obtained by clustering ground-truth joint positions of body parts for each action

PoseletsVisualization of the Poselets for Running images

For Every Action Class:

1. Devide annotation to 4 parts2. Cluster on normalized x,y3. Remove small clusters4. Crop that part of image

Learn SVM for every Poselet with HoG+: From that action -:same part from other action

5 (Actions) x 4 (Parts) x 5 (Clusters) = 100 – 10 (Remove) = 90 = 26 (leg) + 20 (L-arm) + 20 (R-arm) + 24 (Upper body)

Model Formation

⌂ Using Pictorial Structure Model of Pedro Felzenswalb

Training: Test:

Model Formation• Develop a scoring function– Should have high score for correct action label– Low score for other action labels– Model parameters

Model Formation

13

Pose

Action Label

Image

Choose best pose L

Model Formation

14

Pose

Action Label

Image

Running

Large score for

Model Formation

15

Pose

Action Label

Image

Sitting

Small score for

Model Formulation

Pairwise Relation

Part Appearance

17

Part Appearance Potential

Pose

Action Label

Image

Poselet matching

18

Pairwise Potential

Pose

Action Label

Image

Relative body part locations

19

Full Model

Pose

Action Label

Image

Model parameters learned using max-margin

Learning and Inference

Latent SVM

We Should Minimize Loss Function !

Latent SVM

Results

23

• Still image action dataset (Internet Image)

– Five action categories– 2458 images total– Train using 1/3 of images from each category

Visualization of latent pose

24

Successful classification examples

Unsuccessful classification examples

Any question

?

Documents

Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct. 2010