Upload
mediaeval2012
View
473
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Institute for Anthropomatics
www.kit.edu
KIT at MediaEval 2012 - Content-based Genre Classification with Visual CuesTomas SemelaMakarand Tapaswi
MediaEval 2012 Workshop
Institute for Anthropomatics2
Rapid growth of digital multimedia data in the broadcast and web video domain
Need for efficient automated indexing and content search
Motivation
KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop
Institute for Anthropomatics3
Broadcast TV domain
Channel archives
Digital distribution
Web offerings
Arrangement in genres:Highly characteristic
Low variance
Clear boundaries
Web video domain
Video portals like YouTube (User content)
Arrangement in categories:Resemblence to topics (Autos – Animals – Travel)
Variation in production values and style
Not limited to single genre characterstics
KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop
Challenges
Institute for Anthropomatics4 10.04.2023
Related work
System from University of Torino, ItalyExtract video features from aural, visual, cognitive and structural cues
Neural network for classification
M. Montagnuolo, A. Messina, ”Parallel NeuralNetworks for Multimodal Video Genre Classification”,Multimedia Tools and Appl., 41(1):125–159, 2009
KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop
Institute for Anthropomatics5 10.04.2023
KIT System
Visual feature extraction from keyframes
SVM classification system
Fusion of results with majority voting
KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop
Institute for Anthropomatics6 10.04.2023
Low-level visual features
ColorColor moments
HSV histogram
Color auto correlogram
TextureWavelet texture
Edge histogram
Co-occurrence texture
KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop
H. K. Ekenel, T. Semela, and R. Stiefelhagen, “Content-based video genre classification using multiple cues”, AIEMPro'10, pages 21-26, 2010.
Global features for each video
Institute for Anthropomatics7
SIFT – For each keyframe
Interest point detectionDense sampling
Spatial-pyramid1x1 – 2x2 – 1x3
SIFT descriptorsSIFT
rgbSIFT
opponentSIFT
Bag-of-visual-wordsCodebook (500-dim.)
Codeword histograms
K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek, “Empowering Visual Categorization with the GPU”, IEEE Transactions on Multimedia, 13(1):60-70, 2011.
KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop
Institute for Anthropomatics8 10.04.2023
Classification
Training of one support vector machine (SVM) for each genre and each feature
Binary classification (one vs. all)
RBF kernel
Cross-validation
Fusion in decision level
Majority voting (probability output)
SIFT: keyframes classified individually, output averaged over video
KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop
Institute for Anthropomatics9 10.04.2023
Domain Knowledge
Video distribution in the development set:Autos 8 videos
Technology ~ 500 videos
Use this information in the final prediction of the category as a likelihood of the distribution on blip.tv:
1. SVM scores for each video normalized to unit sum
2. Divide these probabilities by the square root of the number of videos in the development set for each category to include the a-priori knowledge of the class distribution
3. Finally, step one is repeated to obtain unit sum
KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop
Institute for Anthropomatics10 10.04.2023
Evaluation
Blip.tv data with ~ 9550 clips
Two configurations with/without prior domain knowledge
No prior run1 run2 run3
Visual SIFT Visual + SIFT
MAP 0.3008 0.2329 0.3499
Prior run4 run5 run6
Visual SIFT Visual + SIFT
MAP 0.3461 0.1448 0.3581
KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop
Institute for Anthropomatics11 10.04.2023
Evaluation – Run 6
KIT at MediaEval 2011 – Content-based genre classification on web-videosMediaEval 2011 Workshop
Institute for Anthropomatics12 10.04.2023
Evaluation
Run6 (MAP):
Top 4 categories: Worst 4 categories:
autos and vehicles (0.812) citizen journalism (0.158)
health (0.668) documentary (0.119)
movies and television (0.602) videoblogging (0.100)
religion (0.578) travel (0.010)
KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop
Institute for Anthropomatics13
Conclusions & Future Work
ConclusionsVisual-based classification shows limitations for category tagging
Few categories with satisfactory results
SIFT: only slight improvement of overall results
Prior domain knowledge improves results overall
Future WorkTemporal features
Mid-level semantics
Action Detection, Audio segmentation
ASR & Metadata integration
Individual classification approach & features for each genre
KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop
Institute for Anthropomatics14 10.04.2023
Thank you
KIT at MediaEval 2012 – Content-based Genre Classification with Visual CuesMediaEval 2012 Workshop
Institute for Anthropomatics15 10.04.2023 KIT at MediaEval 2011 – Content-based genre classification on web-videosMediaEval 2011 Workshop