Learning an Attribute Dictionary for Human Action Classification Qiang Qiu, Zhuolin Jiang, and Rama Chellappa, ”Sparse Dictionary-based Representation

Learning an Attribute Dictionary for Human Action Classification

Qiang Qiu, Zhuolin Jiang, and Rama Chellappa, ”Sparse Dictionary-based Representation and Recognition of Action Attributes”, ICCV 2011

Qiang Qiu

Action Feature Representation

2

Shape

Motion

HOG

Action Sparse Representation

3

=

000.640.53-0.400.35000

0.430.6300-0.330-0.3600

=0.43 × + 0.63 × - 0.33 × - 0.36 ×

= 0.64× + 0.53× - 0.40 × +0.35 ×

Action Dictionary

Sparse code

K-SVD

4

=

Y

y2d1 d2 d3 …

000.640.53-0.400.35000

0.430.6300-0.330-0.3600

x1 x2

y1

D

X K-SVD

Input: signals Y, dictionary size, sparisty T Output: dictionary D, sparse codes X

arg min |Y- DX|2 s.t. i , |xi|0 ≤ TD,X

Input signals Dictionary

Sparse codes

[1] M. Aharon and M. Elad and A. Bruckstein, K-SVD: An Algorithm for Designing Overcomplete Dictionries for Sparse Representation, IEEE Trans. on Signal Process, 2006

5

CompactDiscriminativeand

Dictionary.

Learn a

Objective

Probabilistic Model for Sparse Representation

A Gaussian Process Dictionary Class Distribution

6

7

0.430.6300-0.330-0.36000000

000.640.53-0.400.350000000

00000000-0.280.6980.370.250

0000-0.4200000.420.4700.32

=

y2y1 y4y3

l1l1 l2l2

d1 d2 d3 …

x1 x2 x3 x4

xd1

l1l1 l2l2

More Views of Sparse Representation

8

y2y1 y4y3

l1l1 l2l2

xd10.43 0 0 0

x1 x2 x3 x4

0.63 0 0 0

0 0.64 0 0

0 0.53 0 0

-0.33 -0.40 0 -0.42… …

xd2

xd3

xd4

xd5

l1 l1 l2 l2

d1

d2

d3

d4

d5

A Gaussian Process• Covariance function entry: K(i,j) = cov(xdi, xdj)• P(Xd*|XD*) is a Gaussian with a closed-form conditional variance

A Gaussian Process

9

y2y1 y4y3

l1l1 l2l2

xd10.43 0 0 0x1 x2 x3 x4

0.63 0 0 0

0 0.64 0 0

0 0.53 0 0

-0.33 -0.40 0 -0.42… …

xd2

xd3

xd4

xd5

l1 l1 l2 l2

d1

d2

d3

d4

d5

Dictionary Class Distribution• P(L|di), L [1, M]• aggregate |xdi| based on class labels to obtain a M sized vector• P(L=l1|d5) = (0.33+0.40)/(0.33+0.40+0.42) = 0.6348• P(L=l2|d5) = (0+0.42)/(0.33+0.40+0.42) = 0.37

Dictionary Class Distribution

Dictionary Learning Approaches Maximization of Joint Entropy (ME) Maximization of Mutual Information

(MMI) Unsupervised Learning (MMI-1) Supervised Learning (MMI-2)

10

11

Maximization of Joint Entropy (ME)- Initialize dictionary using k-SVD

Do =

- Start with D* = - Untill |D*|=k, iteratively choose d* from Do\D*,

d* = arg max H(d|D*)dME dictionary

D

- A good approximation to ME criteriaarg max H(D)

where

12

Maximization of Mutual Information for Unsupervised Learning (MMI-1)

- Initialize dictionary using k-SVD


MMI dictionary

- Closed form:

- A near-optimal approximation to MMI

arg max I(D; Do\D) within (1-1/e) of the optimum

D

d* = arg max H(d|D*) - H(d|Do\(D*

d)) d

Diversity Coverage

13

Dictionary Class Distribution• P(L|di), L [1, M]• aggregate|xdi|based on class labels to obtain a M sized vector• P(l1|d5) = (0.33+0.40)/(0.33+0.40+0.42) = 0.6348• P(l2|d5) = (0+0.42)/(0.33+0.40+0.42) = 0.37

• P(Ld) = P(L|d) • P(LD) = P(L|D) , where

y2y1 y4y3

l1l1 l2l2

xd10.43 0 0 0

x1 x2 x3 x4

0.63 0 0 0

0 0.64 0 0

0 0.53 0 0

-0.33 -0.40 0 -0.42… …

xd2

xd3

xd4

xd5

l1 l1 l2 l2

d1

d2

d3

d4

d5

Revisit

14

Maximization of Mutual Information for Supervised Learning (MMI-2)- Initialize dictionary using k-SVD


dd* = arg max [H(d|D*) - H(d|Do\(D*

d))] + λ[H(Ld|LD*) – H(Ld|LDo\

(D* d))]

- MMI-1 is a special case of MMI-2 with λ=0.

Keck gesture dataset

15

Representation Consistency

16[1] J. Liu and M. Shah, Learning Human Actions via Information Maximization, CVPR 2008

[1]

Recognition Accuracy

17

The recognition accuracy using initial dictionary Do: (a) 0.23 (b) 0.42 (c) 0.71

Recognizing Realistic Actions

18

• 150 broadcast sports videos.• 10 different actions.

• Average recognition rate: 83:6%• Best reported result 86.6%

Documents

Learning an Attribute Dictionary for Human Action Classification Qiang Qiu, Zhuolin Jiang, and Rama Chellappa, ”Sparse Dictionary-based Representation