17
Challenge the future Delft University of Technology TUD MediaEval 2012 Tagging Task Reporter: Martha A. Larson Multimedia Information Retrieval Lab Delft University of Technology

TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

05-10-2012

Challenge the future Delft University of Technology

TUD MediaEval 2012 Tagging Task

Reporter: Martha A. Larson Multimedia Information Retrieval Lab Delft University of Technology

Page 2: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

2 Visual similarity measures for semantic video retrieval

Outline �

•  TUD-MM: Multi-modality video categorization with one-vs-all classifiers

•  Peng Xu, Yangyang Shi, Martha A. Larson

•  MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks

•  Yangyang Shi, Martha A. Larson, Catholijn M. Jonker

TUD MediaEval 2012 Tagging Task

Page 3: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

05-10-2012

Challenge the future Delft University of Technology

TUD-MM:Multi-modality video categorization with one-vs-all classifiers Peng Xu, Yangyang Shi, Martha A. Larson

Page 4: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

4 Visual similarity measures for semantic video retrieval

Introduction �

•  Features from different modalities

•  Visual feature

•  Visual Words based representation & Global video representation

•  Text features

•  ASR, Metadata

•  Term-frequency, LDA

•  Classification and Fusion

•  One-vs-all linear SVMs

•  Reciprocal Rank Fusion

•  Post-processing procedure to assign one category label for each video

TUD MediaEval 2012 Tagging Task

Page 5: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

5 Visual similarity measures for semantic video retrieval

Visual representations �

•  Visual words based video representation •  SIFT features are extracted from each key-frame

•  Visual vocabulary is build by hierarchical k-means clustering

•  The normalized term-frequency of the entire video

•  Global video representation •  Edit features

•  Content features

TUD MediaEval 2012 Tagging Task

Page 6: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

6 Visual similarity measures for semantic video retrieval

Classification and Fusion

•  One-vs-all linear SVM •  C is determined by the 5-folder cross-validation

•  Reciprocal Rank Fusion (RRF)*

•  K=60 is to balance the importance of the lower ranked items

•  The weights w(r) are determined by the cross-validation errors from each modalities

•  Post-processing procedure

•  * G. V. Cormack, C. L. A. Clarke, and S. Buettcher. Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. SIGIR '09, pages 758-759..

TUD MediaEval 2012 Tagging Task

Page 7: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

7 Visual similarity measures for semantic video retrieval

Result analysis �

•  MAP of different runs

•  Run_1 to Run_5 are official runs

•  Run_6 is the visual-only run without post-processing

•  Run_7 is the visual-only run with global feature

Run_1 Run_2 Run_3 Run_4 Run_5 *Run_6 *Run_7

MAP 0.0061 0.3127 0.2279 0.3675 0.2157 0.0577 0.0047

TUD MediaEval 2012 Tagging Task

Page 8: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

8 Visual similarity measures for semantic video retrieval

0

0,005

0,01

0,015

0,02

0,025

Random basline VW Global

Performance of visual features �

TUD MediaEval 2012 Tagging Task

Page 9: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

05-10-2012

Challenge the future Delft University of Technology

MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks Yangyang Shi, Martha A. Larson, Catholijn M. Jonker

Page 10: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

10 Visual similarity measures for semantic video retrieval

Models for One-best list and Confusion Networks

ASR

Support vector

machine

Dynamic Bayesian Networks

Conditional random fields

TUD MediaEval 2012 Tagging Task

Page 11: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

11 Visual similarity measures for semantic video retrieval

One-best List SVM

Cut-off 3 vocabulary TF-IDF

Linear kernel multi-class SVM(c=0.5)

TUD MediaEval 2012 Tagging Task

Page 12: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

12 Visual similarity measures for semantic video retrieval

One-best List DBN

E1

T1

W2

T2

E2

W3

T3

E3

W1

TUD MediaEval 2012 Tagging Task

Page 13: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

13 Visual similarity measures for semantic video retrieval

One-best List DBN

• 

TUD MediaEval 2012 Tagging Task

Page 14: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

14 Visual similarity measures for semantic video retrieval

Results on Only ASR Run

Models MAP

Run2-one-best SVM 0.23

Run2-one-best DBN 0.25

Run2-one-best CRF 0.10

Run2-CN-CRF 0.09

TUD MediaEval 2012 Tagging Task

Page 15: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

15 Visual similarity measures for semantic video retrieval

Average Precision on Each Genre

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8

DBN

SVM

TUD MediaEval 2012 Tagging Task

Page 16: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

16 Visual similarity measures for semantic video retrieval

Discussion and Future work �•  Discussion

•  Visual only methods can be improved in several ways

•  Features selection or dimensional reduction methods can be applied.

•  Genre-level video representation

•  CRF failure

•  A document is treated as a item rather than one word.

•  Feature size is too big to converge.

• DBN outperforms SVM: The sequence order information probably helps prediction

•  Potentials

•  Generate clear and useful labels

Video Search Reranking for Genre Tagging TUD MediaEval 2012 Tagging Task

Page 17: TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

17 Visual similarity measures for semantic video retrieval

Thank you!�

Video Search Reranking for Genre Tagging