34
Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla Machine Intelligence Laboratory, Engineering Department, University of C

Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

  • Upload
    misu

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest. Tsz -Ho Yu, Tae-Kyun Kim and Roberto Cipolla. Machine Intelligence Laboratory, Engineering Department, University of Cambridge. Introduction and Motivations. A novel real-time solution for action recognition - PowerPoint PPT Presentation

Citation preview

Page 1: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Real-time Action Recognition by Spatiotemporal Semantic and Structural ForestTsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla

Machine Intelligence Laboratory, Engineering Department, University of Cambridge

Page 2: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Introduction and Motivations

• A novel real-time solution for action recognition

• utilises local-appearance and structural information.

High run-time performances

Local appearance +

structural information

Short response time

Real-time feature extraction and classification

Continuous / frame-by-frame

recognition

Pyramidal spatiotemporal

relationship match (PSRM)

Main features / major contributions:

Main objective: efficiency

Page 3: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

A short demo

Please visit: “http://www.youtube.com/watch?v=eD5b8d7hV6E” on the Internet for the full demo video.

Page 4: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Related Work

• Many current methods focus on:[Schuldt et al. ICPR2004, Niebles et al. BMVC06, Ryoo and Aggarwal ICCV09, Willems BMVC09, Riemenschneider et al. BMVC09]

• Some achieve high accuracies, but take a long time to recognise • How can we improve efficiency?

• Can we improve codebook learning and feature matching?

“Bag of words” model

Sophisticated spatiotemporal

features Learned classifier

K-means codebook

Accuracy Action representation model (Feature design)

Page 5: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Related Work

• Vector quantisation by random forest [Moosmann et al. ECCV06]

• For image segmentation [Shotton et al. CVPR08]

• Can we apply it in video analysis?• Pyramid match kernel [Graumann and Darrell. ICCV05]

• Image recognition [Graumann and Darrell. ICCV05] , scene classification [Lazebnik et al. CVPR06], etc.

• Spatiotemporal relationship match [Ryoo and Aggarwal ICCV09]

S. Lazebnik C. Schmid J. Ponce “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories” , CVPR 2006K. Grauman and T. Darrell “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features” ICCV2005F. Moosmann, B. Triggs, and F. Jurie. “Fast discriminative visual codebooks using randomized clustering forests” NIPS2006J. Shotton, M. Johnson, and R. Cipolla. “Semantic texton forests for image categorization and segmentation” CVPR2008M. S. Ryoo and J. K. Aggarwal. “Spatio-temporal relationship match: Video structure comparison for recognition of copmlex human activities” ICCV2009

Graumann and Darrell. ICCV05

MoosmannNIPS2006

Ryoo and Aggarwal ICCV09

Page 6: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Our Contributions

• Our contribution is three-fold:

Efficient codebook learning

High run-time performance

Local appearance + structural information

SRM → PSRM: pyramidal spatiotemporal

relationship match

1. V-FAST corner detector

2. Random forest classifiers

3. Continuous action recognition

Spatiotemporal Texton ForestImage segmentation(2D) → Action recognition (3D)

Page 7: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Typical Approaches

Feature Encoding

Feature Matching

K-means Clustering

Slow for Large Codebook

The “Bag of Words” (BOW) ModelLacks Structural Information

Quantisation Error

Our MethodSemantic Texton Forest

Efficient

PSRMStructural Information

Hierarchical Matching

Robust

Comparison with existing approaches

Page 8: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Overview

Spatiotemporal Semantic Texton Forest

V-FAST Corner

PSRM

BOST Random Forest Classifier

K-means Forest

Results

Spatio-temporal Cuboids

Feature detection

Feature extraction

Feature matching Classification

Page 9: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Feature detection

Spatiotemporal Semantic Texton Forest

V-FAST Corner

PSRM

BOST Random Forest Classifier

K-means Forest

Results

Spatio-temporal Cuboids

Feature detection

Page 10: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

V-FAST: Spatiotemporal Feature Detection

• A novel spatiotemporal interest point detector

• Inspired from FAST [Rosten and Drummond ECCV2006]

• A cascade of three FAST detectors.

• Consider three orthogonal Bensenham circles

• Features:

• Very fast!

E. Rosten and T. Drummond. “Machine learning for high-speed corner detection” ECCV 2006

Page 11: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Feature extraction

Spatiotemporal Semantic Texton Forest

V-FAST Corner

PSRM

BOST Random Forest Classifier

K-means Forest

Results

Spatio-temporal Cuboids

Feature extraction

Page 12: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Building a codebook using STF

• Extract small video cuboids at detected keypoints

• Visual codebook using STF:

• Efficient visual codebook• One feature → multiple

codewords. • Quantisation and partial matching

Random forest based codebook

• Work on pixels directly• Hierarchical splits

“Textonises” patches recursively

Page 13: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Feature extraction

Spatiotemporal Semantic Texton Forest

V-FAST Corner

PSRM

BOST Random Forest Classifier

K-means Forest

Results

Spatio-temporal Cuboids

Feature matching

Page 14: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Pyramidal Spatiotemporal Relationship Match (PSRM)PSRM: a multi-codewords multi-resolution SRM• Old method: SRM [Ryoo and Aggarwal ICCV09]• PSRM: A multi-codebook multi-

resolution version.Natural combination: local appearance + action structureEvaluate each pair of codewords using a set of association rules.

A set of “rules” (in different colours) are designed to describe spatiotemporal

structure of features.

Page 15: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

TREE N

TREE N

Pyramidal Spatiotemporal Relationship Match (PSRM)

Page 16: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Pyramidal Spatiotemporal Relationship Match (PSRM)

• Apply on all each “association rules”

Apply on each tree in the STF

• We apply it semantically but not spatially

• Assumption: neighbouring codewords are similar

• Merging the ajacent nodes, instead of merging ajacent spatial bins

Pyramid match kernel:

Typical pyramid match kernel

Our Pyramid Match Kernel

Ajacent bins are merged

Children are merged to parents

Page 17: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Multiple Structural Relationship Histograms

PyramidMatch Kernel (PMK)

Pyramidal Spatiotemporal Relationship Match (PSRM)

Page 18: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Typical Methods

Our Approach

Features

Classification

Features

Classification

Features

Classification

Features

Classification

Features

Classification

Features

Classification

Features

Classification

Features

Classification

Features

Classification

Features

Classification

Continuous action recognition

Page 19: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Classification

Spatiotemporal Semantic Texton Forest

V-FAST Corner

PSRM

BOST Random Forest Classifier

K-means Forest

Results

Spatio-temporal Cuboids

Classification!

Page 20: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Combined Classification

• PSRM and BOST (bag of spatiotemporal textons) are classified indenpendently:

• PSRM: k-means forest

M.Muja and D. G. Lowe. “Fast approximate nearest neighbors with automatic algorithm” VISAPP2009K-means tree figure courtesy of David Aldavert Miró : http://www.cvc.uab.cat/~aldavert/plor/

Originally uses for NN approximation

Use PSRM as the matching kernel

Combined with the BOST model for

final results

Data points are clustered using k-means at root

For each cluster, perform another k-means

recursively

At each terminal cluster , a posterior prob. dist. Is

assigned

Page 21: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Experiments

• Short video sequences (50 frames ~ 2 seconds) are extracted from the input video.

• Sampling frequency is 5 frames for experiment and 1 frame for the laptop demo. (so it is a frame-by-frame recognition)

• Two datsets are used for performance evaluation:

• The standard benchmark• Six classes, with viewpoint changes, illumination changes, zoom ,

etc.

KTH dataset

• Human interactions, 6 classes of actions, cluttered backgroundUT dataset (for ICPR contest on Semantic Description of Human Activities 2010)

• Intel Core i7 920 (for accuracy and speed tests)• Core 2 Duo P9400 (for laptop demo)

Hardware specifications

KTH datasetUT interaction dataset

Page 22: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Experiments: Results (KTH dataset)

Mined features (ICCV2009)

CCA (CVPR2007)

Neighbourhood (CVPR2010)

Info. Max. (CVPR2008)

Shape-motion tree (ICCV2009)

Vocabulary Forest(CVPR2008)

Point clouds (CVPR2009)

our method (sequence)

our method (snippets)

90 91 92 93 94 95 96 97 98 99 100

96.7

95.33

94.53

94.15

93.43

93.17

93.17

95.67

93.55

Comparison with recent state-of-the-art

• Comparable to most state-of-the-art.

• Around ~3% slower than the top performer

• Is it a sensible trade-off?

• Useful for many more practical applications. (surveillance, robotics, etc.)

snippet: subsequence level recognition

sequence: major voting of subsequence labels

leave-of-out-cross-validation

Leave-of-out-cross-validation

Page 23: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Experiments: Results

• Results: UT interaction dataset

• Run time performance

PSRM and BOST gave low accuracies when applied separately.

~20% performance improved by simply combining the class labels!

< 25 fps, but enough for most real-time applications

Can be further optimised (e.g. GPU, mult-core processing)

Page 24: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Demo video

• Frame-level recognition

• Potential improvement:

• Delay (~1s) in recognition results (Depends on the subsequence length )

• Please visit: “http://www.youtube.com/watch?v=eD5b8d7hV6E” on the Internet for the full demo video.

Page 25: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Conclusions

A novel action recognition system

Main strength: run time performance

• k-means codebook → spatiotemporal semantic forest

• Histogram → PSRM• Traditional classifiers (e.g. SVM) → k-means

forest classifier / random forest

A re-design of the traditional “bag of words” model

Page 26: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

THE ENDTHANK YOU VERY MUCH

Page 27: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Extra slide

• Formulation of V-FAST

Page 28: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Extra slide

• Formulation of STF

• Split function model:

• Split criteria --- Information gain:

Page 29: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Extra slide

• Formulation of STF

Page 30: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Extra slide

• Formulation of PSRM

• Step 1 Feature matching:

• Step 2 Semantic PMK over histogram

Page 31: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Extra slide

• Formulation of Classifier training

• Optimising the clusters of feature which maximise the PMK with the mean.

Page 32: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Extra slide

• Experiment parameters

Page 33: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Extra slide

• Confusion matrix:

Page 34: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest

Extra slide

Kernel k-means forest

Random forest

PSRM BOST

Action recognition results (class labels)

Weighted combination