KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

Institute for Anthropomatics

www.kit.edu

KIT at MediaEval 2012 - Content-based Genre Classification with Visual CuesTomas SemelaMakarand Tapaswi

MediaEval 2012 Workshop

Institute for Anthropomatics2

Rapid growth of digital multimedia data in the broadcast and web video domain

Need for efficient automated indexing and content search

Motivation

KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop


Broadcast TV domain

Channel archives

Digital distribution

Web offerings

Arrangement in genres:Highly characteristic

Low variance

Clear boundaries

Web video domain

Video portals like YouTube (User content)

Arrangement in categories:Resemblence to topics (Autos – Animals – Travel)

Variation in production values and style

Not limited to single genre characterstics


Challenges

Institute for Anthropomatics4 10.04.2023

Related work

System from University of Torino, ItalyExtract video features from aural, visual, cognitive and structural cues

Neural network for classification

M. Montagnuolo, A. Messina, ”Parallel NeuralNetworks for Multimodal Video Genre Classification”,Multimedia Tools and Appl., 41(1):125–159, 2009



KIT System

Visual feature extraction from keyframes

SVM classification system

Fusion of results with majority voting



Low-level visual features

ColorColor moments

HSV histogram

Color auto correlogram

TextureWavelet texture

Edge histogram

Co-occurrence texture


H. K. Ekenel, T. Semela, and R. Stiefelhagen, “Content-based video genre classification using multiple cues”, AIEMPro'10, pages 21-26, 2010.

Global features for each video


SIFT – For each keyframe

Interest point detectionDense sampling

Spatial-pyramid1x1 – 2x2 – 1x3

SIFT descriptorsSIFT

rgbSIFT

opponentSIFT

Bag-of-visual-wordsCodebook (500-dim.)

Codeword histograms

K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek, “Empowering Visual Categorization with the GPU”, IEEE Transactions on Multimedia, 13(1):60-70, 2011.



Classification

Training of one support vector machine (SVM) for each genre and each feature

Binary classification (one vs. all)

RBF kernel

Cross-validation

Fusion in decision level

Majority voting (probability output)

SIFT: keyframes classified individually, output averaged over video



Domain Knowledge

Video distribution in the development set:Autos 8 videos

Technology ~ 500 videos

Use this information in the final prediction of the category as a likelihood of the distribution on blip.tv:

1. SVM scores for each video normalized to unit sum

2. Divide these probabilities by the square root of the number of videos in the development set for each category to include the a-priori knowledge of the class distribution

3. Finally, step one is repeated to obtain unit sum



Evaluation

Blip.tv data with ~ 9550 clips

Two configurations with/without prior domain knowledge

No prior run1 run2 run3

Visual SIFT Visual + SIFT

MAP 0.3008 0.2329 0.3499

Prior run4 run5 run6

Visual SIFT Visual + SIFT

MAP 0.3461 0.1448 0.3581



Evaluation – Run 6

KIT at MediaEval 2011 – Content-based genre classification on web-videosMediaEval 2011 Workshop


Evaluation

Run6 (MAP):

Top 4 categories: Worst 4 categories:

autos and vehicles (0.812) citizen journalism (0.158)

health (0.668) documentary (0.119)

movies and television (0.602) videoblogging (0.100)

religion (0.578) travel (0.010)



Conclusions & Future Work

ConclusionsVisual-based classification shows limitations for category tagging

Few categories with satisfactory results

SIFT: only slight improvement of overall results

Prior domain knowledge improves results overall

Future WorkTemporal features

Mid-level semantics

Action Detection, Audio segmentation

ASR & Metadata integration

Individual classification approach & features for each genre



Thank you


Institute for Anthropomatics15 10.04.2023 KIT at MediaEval 2011 – Content-based genre classification on web-videosMediaEval 2011 Workshop