The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features

The Shanghai-Hongkong Team at MediaEval2012: Violent

Scene Detection Using Trajectory-based Features

Yu-Gang Jiang*, Qi Dai*, Chun Chet Tan**, Xiangyang Xue*, Chong-Wah Ngo**

*School of Computer Science, Fudan University, Shanghai

**Department of Computer Science, City University of Hong Kong, HK

MediaEval 2012 Workshop, Oct 4-5, Pisa, Italy

Outlines• Introduction

• Framework

• Feature Extraction

• Classifiers

• Temporal Smoothing

• Results

• Discussions

• First 20 clips retrieved

Introduction• Violent Scene Detection task [1] -

practical challenge, great potential in applications.

• Focus on novel features.

• Top performance in mAP@20, runner-up in mAP@100

[1] C.-H. Demarty, C. Penet, G. Gravier, and M. Soleymani. The MediaEval 2012 Affect Task: Violent Scenes Detection. In MediaEval 2012 Workshop, Pisa, Italy, 2012.

Framework

The circled numbers indicate the 5 submitted runs

Feature extraction

Trajectory-based (7 features)

Spatial-temporal interest point

MFCC audio feature

χ2 kernel SVM

Classifiers

Concept-based

Video shots

Detection score-level temporal

smoothing

All features except

concept-based

χ2 kernel SVM

Temporal feature

smoothing2

Feature Extraction• Trajectory-based features [2]:

- dense trajectory, HOG, HOF, MBH [5]

- TrajMF (relative locations and motions between trajectory pairs)

- Trajectory shape feature

• Advantages: robust to camera movement, rich information, implicitly capture object-object and object-background relationships.

[2] Y.-G. Jiang, Q. Dai, X. Xue, W. Liu, and C.-W. Ngo. Trajectory-based modeling of human actions with motion reference points. In ECCV, 2012.

[5] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, 2011.

Feature Extraction• SIFT [4]

• STIP [3]

• MFCC

• Concept-based Features (10 concepts: blood, carchase, coldarms, fights, fire, firearms, gore, explosions, gunshots, screams)

[3] I. Laptev. On space-time interest points. International Journal of Computer Vision, 64:107-123, 2005.

[4] D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision, 60:91-110, 2004.

Classifiers• BoW representation

• Chi-squared kernel SVMs

• Kernel level early fusion is used to combine multiple features

Temporal Smoothing• Feature Smoothing – averaged

features over a three-shot window.

• Score Smoothing – averaged prediction scores over a three-shot window.

r3 r2 r5 r4 r10

Results (mAP@20)

• Run 5: 7 dense trajectory features

• Run 4: Run 5 + SIFT + STIP + MFCC

• Run 3: Run 4 + concept scores

• Run 2: Run 4 + feature smoothing

• Run 1: Run 4 + score smoothing

Results (mAP@100)

• Run 5: 7 dense trajectory features

• Run 4: Run 5 + SIFT + STIP + MFCC

• Run 3: Run 4 + concept scores

• Run 2: Run 4 + feature smoothing

• Run 1: Run 4 + score smoothing

r3 r4 r5 r2 r10

Discussions• SIFT + STIP + MFCC show insignificant

improvement. TrajMF has encoded the rich information of SIFT and STIP.

• Concept-based scores do not improve the performances - overfitting SVMs due to insufficient training data. In fact, using mid-level concept detectors is a promising direction.

• Score smoothing boosts the performances. Feature smoothing that “blurs” the features across shots might not be a good option.

First 20 clips retrieved

Thank You

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features

Technology

Hongkong international airport

HongKong 9707 project

Public Lighting Hongkong

Flytampa Hongkong

Football in Hongkong

hongkong tourisim

Hongkong Bank

Hongkong 3 Days Trip : Hongkong - Macau - Shenzhen

KL Hongkong

IPO HongKong

Hongkong Imf

Hongkong En

Hongkong cholarships

Hongkong Audit

Made in Hongkong

7 hongkong kurikulum pendidikan hongkong

Hongkong Delicacies

Hongkong Sewer Pipeline

NEW Hongkong Terminal

technoir Hongkong