15
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Institute for Anthropomatics www.kit.edu KIT at MediaEval 2012 - Content-based Genre Classification with Visual Cues Tomas Semela Makarand Tapaswi MediaEval 2012 Workshop

KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

Institute for Anthropomatics

www.kit.edu

KIT at MediaEval 2012 - Content-based Genre Classification with Visual CuesTomas SemelaMakarand Tapaswi

MediaEval 2012 Workshop

Page 2: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics2

Rapid growth of digital multimedia data in the broadcast and web video domain

Need for efficient automated indexing and content search

Motivation

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop

Page 3: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics3

Broadcast TV domain

Channel archives

Digital distribution

Web offerings

Arrangement in genres:Highly characteristic

Low variance

Clear boundaries

Web video domain

Video portals like YouTube (User content)

Arrangement in categories:Resemblence to topics (Autos – Animals – Travel)

Variation in production values and style

Not limited to single genre characterstics

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop

Challenges

Page 4: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics4 10.04.2023

Related work

System from University of Torino, ItalyExtract video features from aural, visual, cognitive and structural cues

Neural network for classification

M. Montagnuolo, A. Messina, ”Parallel NeuralNetworks for Multimodal Video Genre Classification”,Multimedia Tools and Appl., 41(1):125–159, 2009

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop

Page 5: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics5 10.04.2023

KIT System

Visual feature extraction from keyframes

SVM classification system

Fusion of results with majority voting

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop

Page 6: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics6 10.04.2023

Low-level visual features

ColorColor moments

HSV histogram

Color auto correlogram

TextureWavelet texture

Edge histogram

Co-occurrence texture

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop

H. K. Ekenel, T. Semela, and R. Stiefelhagen, “Content-based video genre classification using multiple cues”, AIEMPro'10, pages 21-26, 2010.

Global features for each video

Page 7: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics7

SIFT – For each keyframe

Interest point detectionDense sampling

Spatial-pyramid1x1 – 2x2 – 1x3

SIFT descriptorsSIFT

rgbSIFT

opponentSIFT

Bag-of-visual-wordsCodebook (500-dim.)

Codeword histograms

K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek, “Empowering Visual Categorization with the GPU”, IEEE Transactions on Multimedia, 13(1):60-70, 2011.

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop

Page 8: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics8 10.04.2023

Classification

Training of one support vector machine (SVM) for each genre and each feature

Binary classification (one vs. all)

RBF kernel

Cross-validation

Fusion in decision level

Majority voting (probability output)

SIFT: keyframes classified individually, output averaged over video

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop

Page 9: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics9 10.04.2023

Domain Knowledge

Video distribution in the development set:Autos 8 videos

Technology ~ 500 videos

Use this information in the final prediction of the category as a likelihood of the distribution on blip.tv:

1. SVM scores for each video normalized to unit sum

2. Divide these probabilities by the square root of the number of videos in the development set for each category to include the a-priori knowledge of the class distribution

3. Finally, step one is repeated to obtain unit sum

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop

Page 10: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics10 10.04.2023

Evaluation

Blip.tv data with ~ 9550 clips

Two configurations with/without prior domain knowledge

No prior run1 run2 run3

Visual SIFT Visual + SIFT

MAP 0.3008 0.2329 0.3499

Prior run4 run5 run6

Visual SIFT Visual + SIFT

MAP 0.3461 0.1448 0.3581

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop

Page 11: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics11 10.04.2023

Evaluation – Run 6

KIT at MediaEval 2011 – Content-based genre classification on web-videosMediaEval 2011 Workshop

Page 12: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics12 10.04.2023

Evaluation

Run6 (MAP):

Top 4 categories: Worst 4 categories:

autos and vehicles (0.812) citizen journalism (0.158)

health (0.668) documentary (0.119)

movies and television (0.602) videoblogging (0.100)

religion (0.578) travel (0.010)

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop

Page 13: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics13

Conclusions & Future Work

ConclusionsVisual-based classification shows limitations for category tagging

Few categories with satisfactory results

SIFT: only slight improvement of overall results

Prior domain knowledge improves results overall

Future WorkTemporal features

Mid-level semantics

Action Detection, Audio segmentation

ASR & Metadata integration

Individual classification approach & features for each genre

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop

Page 14: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics14 10.04.2023

Thank you

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop

Page 15: KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

Institute for Anthropomatics15 10.04.2023 KIT at MediaEval 2011 – Content-based genre classification on web-videosMediaEval 2011 Workshop