ECCV WS 2012 (Frank)

Recognizing Actions Across Cameras by Exploring the Correlation Subspace

4th International Workshop on Video Event Categorization, Tagging and Retrieval (VECTaR), in conjunction with ECCV 2012

Chun-Hao Huang, Yi-Ren Yeh, and Yu-Chiang Frank WangResearch Center for IT Innovation, Academia Sinica, Taiwan

Oct 12th, 2012

Outline

• Introduction• Our Proposed Framework

Learning Correlation Subspaces via CCADomain Transfer Ability of CCA SVM with A Novel Correlation Regularizer

• Experiments• Conclusion

Outline

Representing an Action

• Actions are represented as high-dim vectors. • Bag of spatio-temporal visual word model.• State-of-the-art classifiers (e.g., SVM) are applied to

address the recognition task.

[Laptev, IJCV, 2005]

[Dollár et al., ICCV WS on VS-PETS, 2005]

• Spatio-temporal interest points

Cross-Camera Action Recognition

Source view Target view

• Models learned at source views typically do not generalize well at target views.

check watch

3sv𝒳𝑠∈ℝ𝑑𝑠

𝒳𝑡∈ℝ𝑑𝑡

Colored: labeled dataHollowed: test data

Colored: labeled dataHollowed: test dataGray: unlabeled data

• An unsupervised strategy: Only unlabeled data available at target views. They are exploited to learn the relationship between

data at source and target views.

Cross-Camera Action Recognition (cont’d)

One branch of transfer learning

Approaches based on Transfer Learning

• To learn a common feature representation (e.g., a joint subspace) for both source and target view data.

• Training/testing can be performed in terms of such representations.• How to exploit unlabeled data from both views for determining this

joint subspace is the key issue.• Previous approaches:

1. Splits-based feature transfer [Farhadi and Tabrizi, ECCV ‘08 ] Requires frame-wise correspondence

2. Bag of bilingual words model (BoBW) [Liu et al., CVPR ‘11 ] Considers each dimension of the derived representation to be equally important.

Outline

2. Project the source label data onto it

Overview of Our Proposed Method

Correlation subspace 𝒳c ℝd

3sv𝒳s

,2s tv

,1s tv

4. Prediction

3. Learn a new SVM with constraints on domain

transfer ability

1. Learn a joint subspace via canonical correlation analysis (CCA)

Requirements of CCA

: unlabeled data pairs (observed at both views)

unlabeled actions observed by both camerasColored: labeled dataHollowed: test dataGray: unlabeled data

Learning the Correlation Subspace via CCA

• CCA aims at maximizing the correlation between two variable sets.

• Given two sets of n centered unlabeled observations :

• CCA learns two projection vectors us and ut, maximizing the correlation coefficient ρ between projected data, i.e.,

where are covariance matrices.

s ts s t tst

s s s s t t t t s s t tss tt

u Σ uu X X u

u X X u u X X u u Σ u u Σ u

•• •

• • • • • •

, , t t s t s stt st ss Σ X X Σ X X Σ X X• • •

1 1, ... , and , ... ,s td n d ns s s t t tn n

X x x X x xR R

CCA Subspace as Common Feature Representation

correlation subspace 𝒳c ℝd

3sv𝒳s

s sP x• t tP x•

,1s tv( ρ1, , )

,2s tv ( ρ2, , )

[P𝑠=¿P𝑡=¿

∈ℝ𝑑𝑠×𝑑

∈ℝ𝑑 𝑡×𝑑

Outline

• Introduction• The Proposed Framework

Domain Transfer Ability of CCA • Learn SVMs in the derived CCA subspace…Problem solved? - Yes and No!

• Domain Transfer Ability: - In CCA subspace, each dimension Vi

s,t is associated with a different ρi

- How well can the classifiers learned (in this subspace) from the projected source view data generalize to those from the target view?

• See the example below…

Outline

Learning Correlation Subspaces via CCADomain Transfer Ability of CCA SVM with a Novel Correlation Regularizer

• Proposed SVM formulation:

• The introduced correlation regularizer Abs(w) : and

• Larger/Smaller ρi

→ Stronger/smaller correlation between source & target view data → SVM model wi is more/less reliable at that dimension in the CCA space. • Our regularizer favors SVM solution to be dominant in reliable CCA dimensions

(i.e., larger correlation coefficents ρi imply larger |wi| values). • Classification of (projected) target view test data:

1 1 min Abs

s.t. , 1, 0, ,

s s s si i i i i i l

y b y D

w P x x

( ) sgn , t tf b x w P x•

1 2Abs , , ... , dw w w w 1 2, , ... , d r

Our Proposed SVM with Domain Transfer Ability

An Approximation for the Proposed SVM• It is not straightforward to solve the previous formulation with Abs(w).• An approximated solution can be derived by relaxing Abs(w):

where ⨀ indicates the element-wise multiplication. • We can further simplify the approximated problem as:

• We apply SSVM* to solve the above optimization problem.

1 min 1

s.t. , 1, 0, ,

i i ii i

y b y D

w P x x•

1 1 min

s.t. , 1, 0, ,

y b y D

w r r w w

w P x x

⨀ ⨀

*: Lee et al., Computational Optimization and Applications, 2001

Outline

Learning Correlation Subspaces via CCADomain Transfer Ability of CCA SVM with a Novel Correlation Regularizer

Dataset

• IXMAS multi-view action dataset Action videos of eleven action classes Each action video is performed three times by twelve actors The actions are captured simultaneously by five cameras

Experiment Setting

2/3 as unlabeled data: Learning correlation subspaces via CCA

Check-watch Scratch-head

Sit-down

1/3 as labeled data: Training and testing

⋯Leave-one-class-out protocol (LOCO)

Without Kick action

Experimental Results• A: BoW from source view directly• B: BoBW + SVM [Liu et al. CVPR’11]• C: BoBW + our SVM

(%)　

camera0 camera1 camera2

A B C D E A B C D E A B C D E

c0 - 9.29 60.96 63.03 63.18 64.90 11.62 41.21 50.76 56.97 60.61 c1 10.71 58.08 59.70 66.72 70.25 - 7.12 33.54 38.03 57.83 59.34 c2 8.79 52.63 49.34 57.37 62.47 6.67 50.86 45.79 59.19 61.87 -c3 6.31 40.35 44.44 65.30 66.01 9.75 33.59 33.27 46.77 52.68 5.96 41.26 43.99 61.36 61.36 c4 5.35 38.59 40.91 54.39 55.76 9.44 37.53 37.00 53.59 55.00 9.19 34.80 38.28 57.88 60.15

avg. 7.79 47.41 48.60 60.95 63.62 8.79 45.73 44.77 55.68 58.61 8.47 37.70 42.77 58.51 60.37

　camera3 camera4

A B C D E A B C D E

c0 7.78 39.65 41.36 63.64 62.17 7.12 24.60 37.02 43.69 48.23 c1 12.02 35.91 39.14 48.59 54.85 8.89 26.87 22.22 44.24 49.29 c2 6.46 41.46 42.78 60.00 61.46 10.35 28.03 33.43 45.05 51.82 c3 - 8.89 27.53 28.28 40.66 41.06 c4 9.60 27.68 34.60 48.03 48.89 -

avg. 8.96 36.17 39.47 55.06 56.84 8.81 26.76 30.24 43.41 47.60

• D: CCA + SVM • E: our proposed framework (CCA + our SVM).

Effects on The Correlation Coefficient ρ

• Recognition rates for the two models were 47.22% and 77.78%, respectively.

(a) Averaged |wi| of standard SVM (b) Averaged |wi| of our SVM

• We successfully suppress the SVM model |wi| when lower ρ is resulted.

• Ex: source: camera 3, target: camera 2, left-out action: get-up

dimension index dimension index

Outline

Conclusions

• We presented a transfer-learning based approach to cross-camera action recognition.

• We considered the domain transfer ability of CCA, and proposed a novel SVM formulation with a correlation regularizer.

• Experimental results on the IXMAS dataset confirmed performance improvements using our proposed method.

Thank You!

Representing an action

human body model

[Mikić et al., IJCV, 2003] [Junejo et al., TPAMI, 2010]

Representing an action

[Blank et al., ICCV, 2005]

[Weinland et al., CVIU, 2006]

Motion history volume

Space-time shapes

spatio-temporal volumes

ℝ 276

Split-based feature transfer (ECCV ‘08)

3sv𝒳𝑠∈ℝ 40

𝒳𝑡∈ℝ 40

ℝ 276

K-means K-means

Target instance in the source representation

action video

Matching according to split-based feature

Source view

How to construct split-based feature

ℝ 30

1000 different random projections

ℝ 276

ℝ 30

Max Margin Clustering

Split-based feature

Pick the best 25 random projections

ℝ 30

Target view

ℝ 276

Split-based feature ℝ 30

Train SVM using split-based feature as labels

ℝ 30

Same best 25 random projections

unlabeled frame

4. Train models and predict with this representation

3. Construct the codebook of bilingual words

1. Exploit unlabeled data to model the two codebooks as a bipartite graph

sv 2sv

2. Perform spectral clustering

Bag of Bilingual Words (CVPR ‘11)

Learning correlation subspace via CCA

• The projection vector us can be solved by a generalized eigenvalue decomposition problem:

• Largest η corresponds to largest ρ.

• Once us is obtained, ut can be calculated by1 s

t tt st

Σ Σ u

[P𝑠=¿P𝑡=¿

eigenvalues η1 correlationcoefficient ρ1

∈ℝ𝑑𝑠×𝑑

∈ℝ𝑑 𝑡×𝑑

1 s sst tt st ss Σ Σ Σ u Σ u• 1 s s

st tt t st ss s Σ Σ I Σ u Σ I u•

> > ηd

> > ρd

Colored: labeled dataHollowed: test dataGray: unlabeled data

LOCO protocol in real application: new action class

ECCV WS 2012 (Frank)

Documents

Intrinsic Images by Entropy Minimization ECCV, Prague, May 2004mark/ftp/Eccv04/extraimages.pdf · 2004. 4. 9. · Intrinsic Images by Entropy Minimization ECCV, Prague, May 2004 Graham

ECCV 2004 - CMPcmp.felk.cvut.cz/eccv2004/files/ECCV-2004-final-programme.pdf · SMVP Statistical Methods in Video Processing May 14 Fri May 15 Sat May 16 Sun Coffee Break SLCV Statistical

ECCV 2016 速報

I. WS 5425 / WS 5426 WS 5435 / WS 5436 WS 5445 / WS 5446 ...€¦ · I. WS 5425 / WS 5426 WS 5435 / WS 5436 WS 5445 / WS 5446 T 5205 / T 5206 PW 5065 PW 6055 PW 6065 PW 6080 PT 5135C

Schizophrenie Tutorium: Medizinische Psychologie Frank Weiss-Motz WS 03/04

【ECCV 2016 BNMW】Human Action Recognition without Human

Suizid Tutorium: Medizinische Psychologie Frank Weiss-Motz WS 2004/05

ECCV Submission To The Victorian Essential Services ... · Reduced Advocacy and Legal Capacity for CALD Consumers 8. ECCV is concerned that removing the importance of the concept

Künstliche Peptide aus beta-Aminosäuren welsch.pdf · Seminar zum OCF-Praktikum WS 03/04 Frank Welsch. 07.01.2004 Künstliche Peptide aus ß-Aminosäuren Frank Welsch 2 ... Struktur

Building Rome on a Cloudless Day (ECCV 2010) 1edunn/pubs/conf_eccv... · 2017-09-27 · 1 Building Rome on a Cloudless Day (ECCV 2010) 1 2 Jan-Michael Frahm 1, Pierre Georgel , David

UNESCO Heritage Site Hildesheim · 94/95 WS 95/96 WS 96/97 WS 97/98 WS 98/99 WS 99/00 WS 00/01 WS 01/02 WS 02/03 WS 03/04 WS 04/05 WS 05/06 WS 06/07 WS 07/08 WS 08/09 WS 09/10 WS

Arzt-Patienten-Beziehung Tutorium: Medizinische Psychologie Frank Weiss-Motz WS 04/05

PETS ECCV 2004 - · a finite-state model of the allowable behaviors. These are summarized in Section 2, which sho ws the allo w able se-quences of situations in a given scenario

Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Draft Accepted Paper Draft ECCV Camera Pose Estimation ... · Accepted Paper Draft ECCV #ECCV 2012 Accepted Paper Draft Camera Pose Estimation Using First-Order Curve Differential

Space Time Tracking ECCV 2002

Schema Matching Seminar WS 2007/08 Themen & Organisation · Schema Matching Seminar WS 2007/08 Themen & Organisation Prof. Felix Naumann, Alexander Albrecht, Frank Kaufer, Melanie

REFUGEES IN REGIONAL VICTORIA · Ethnic Communities’ Council of Victoria (ECCV) Inc. was established in 1974 as a voluntary community based organisation. Over 35 years later, ECCV

Lerntechniken Tutorium: Medizinische Psychologie Frank Weiss-Motz WS 04/05

ECCV Our Golden Years