1
Business Unit or Product Name
© 2003 IBM Corporation
IBM Research
© 2004 IBM Corporation
Large-Scale Video Semantic
Filtering
November 16, 2005
Ching-Yung Lin
Exploratory Stream Processing Systems,
IBM T. J. Watson Research Center
Business Unit or Product Name
© 2003 IBM Corporation2
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Outline -- Large-Scale Video Semantic Filtering
iphttp
ntp
udp
tcpftp
rtp
rtsp
video
audio
• Introduction and Motivation
• Prior Arts on Video Semantic Classification
• Compressed-Domain Feature Extraction
• Complexity-Accuracy Trade-Off
• Speed Up SVM Classification
• Speech-Text Complexity-Accuracy Analysis
• Conclusions
• Prospective Projects
2
Video Semantic Understanding and Filtering
© 2004 IBM CorporationIBMPage 3
System Overview
Create breakthrough technology to enable
…large-scale production, analysis and management of
…information and knowledge from
…thousands of disparate input sources
…using an autonomic system that
…continuously adapts processing to satisfy demands of
…changing environment, content, and requests.
Video Semantic Understanding and Filtering
© 2004 IBM CorporationIBMPage 4
� 10Gbit/s Continuous Feed Coming into System
� Types of Data
• Speech, text, moving images, still images, coded application
data, machine-to-machine binary communication
� System Mechanisms
• Telephony: 9.6Gbit/sec (including VoIP)
• Internet
� Email: 250Mbit/sec (about 500 pieces per second)
� Dynamic web pages: 50Mbit/sec
� Instant Messaging: 200Kbit/sec
� Static web pages: 100Kbit/sec
� Transactional data: TBD
• TV: 40Mb/sec (equivalent to about 10 stations)
• Radio: 2Mb/sec (equivalent to about 20 stations)
What is Large-scale?
3
Business Unit or Product Name
© 2003 IBM Corporation5
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Example IP Packet Stream Instantiation
ip http
ntp
udp
tcp ftp
rtp
rtsp
video
audio
200-500MB/s ~100MB/sper PE
rates
10 MB/s
InputsDataflow
Graph
Business Unit or Product Name
© 2003 IBM Corporation6
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Semantic MM Routing and Filtering
200-500MB/s ~100MB/sper PE
rates
10 MB/s
InputsDataflow
Graph
ip http
ntp
udp
tcp ftp
rtp
rtsp
sessvideo
sessaudio Interest
Routing
keywords id
Packet content
analysis
Advanced content
analysis
Interest
Filtering
Interested
MM streams
4
Business Unit or Product Name
© 2003 IBM Corporation7
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Objective
� Configurable Parameters of Processing Elements to maximize relevant information:
– Y’’(X | q, R) > Y’(X | q, R),
with resource constraint.
� Required resource-efficient algorithms for:
– Classification, routing and filtering of signal-oriented data: (audio, video and, possibly, sensor data)
X
R
Y(X|q)Y’’(X|q,R)X’
� Input data X – Queries q – Resource R
– Y(X | q): Relevant information
– Y’(X | q, R) � Y(X | q): Achievable subset given R
Business Unit or Product Name
© 2003 IBM Corporation8
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Video Concept Classification/Routing� Objective:
– Build Moderate Amount of Real-Time Concept Classifiers, e.g.,
Human, Outdoors, Face, Indoors, etc.
� Test-bed Corpus:
– NIST TRECVID 2003 corpus: 128 hours of MPEG-1 videos, 62
hours of them were manually annotated by 23 worldwide groups
using IBM VideoAnnEx Collaborative Annotation System.
– 38 hours of video for training; 24 hours (partitioned to 6-, 6-, 12-
hour sets) for internal evaluation; 65 hours for NIST benchmarking.
Classification
VideoLAN
Networked
Player
RTP Streamer
Outdoors
IP Port 1234
MPEG-1 Video
Over RTP/UDP
MPEG-1 files
VideoLAN
Networked
Player
Outer Space
IndoorsIP Port 2346
Intranet
Office Desktop: 9.2.10.85 Laptop T40: 9.2.89.55
Compressed-
Domain
Feature
Extraction
Concept
Model
Office Desktop: 9.2.64.86
5
Business Unit or Product Name
© 2003 IBM Corporation9
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Shot Segmentation
Semi-ManualAnnotation
Feature Extraction
BuildingClassifier Detectors
Region Segmentation
Training Video Corpus
Lexicon
• Regions– Object (motion, Camera registration)
– Background (5 lg regions / shot)
Features• Color:
– Color histograms (72 dim, 512 dim), Auto-Correlograms (72 dim)
• Structure & Shape:
– Edge orientation histogram (32 dim), Dudani Moment Invariants (6 dim), Aspect ratio of bounding box (1dim)
• Texture:
– Co-occurrence texture (48 dim), Coarseness (1 dim), Contrast (1 dim), Directionality (1 dim), Wavelet (12 dim)
• Motion:
– Motion vector histogram (6 dim)
Prior Art: Supervised Learning for Building Generic Concept Detectors
Supervised learning: Classification and Fusion:• Support Vector Machines (SVM)
• Ensemble Fusion
• Other Fusions (Hierarchical, SVM, Multinet, etc.)
Business Unit or Product Name
© 2003 IBM Corporation10
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
IBM Research Video Concept Detection Systems (Aug. 2003)
EF
EF2
Post-
processing
BOU BOF BOBO
Uni-models Multi-Modality
Fusion
Multi-
Model Fusion
MLP
DMF17
MN
ONT
DMF64
V1
V2
V1
V2
Best Uni-model
Ensemble Run
Best Multi-modal
Ensemble Run
Best-of-the-Best
Ensemble Run
Annotation and
Data
Preparation
Videos (128 hours
MPEG-1 )
Collabor
ative
Annotati
on
Fe
atu
re
Ex
trac
tion
CH
WT
EH
MI
CT
MV
TAM
CLG
AUD
CC
Re
gio
n
Ex
trac
tion
Filtering
VW
MLP
: training only: training and testing
Feature Extraction
SD/A
By IBM TRECVID
2003 Team
6
Business Unit or Product Name
© 2003 IBM Corporation11
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
MPEG-7 XML
Shot Segmentation
RegionSegmentation
Feature Extraction
Classifi-cation
MPEG-1 sequence
• Key-Frame Extraction• Story Boards• Video Editing
Detectors
• Content-Based Retrieval• Relevance Feedback
• Model-based applications
• Object-based compression, manipulation
XML Rendering
• detectors
“Text-overlay” “Outdoors” “People”
…
Prior Art: Classification Steps
Business Unit or Product Name
© 2003 IBM Corporation12
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Novel Visual Semantic Concept Filters
� 100 visual filters were built (v.s.
64 in 2003).
� Concepts are arranged
hierarchically.
� Based on novel compressed-
domain sliced features.
� Neither segmentation nor fusion
is used.
7
13
IBM Research
Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation
An Example of Video Sensor Network System
Face
Outdoors
Indoors
PE1: 9.2.63.66: 1220
PE2: 9.2.63.67
PE3: 9.2.63.68
Female
Male
Airplane
Chair
Clock
PE4: 9.2.63.66:1235
PE5: 9.2.63.66: 1240
PE6: 9.2.63.66
PE7: 9.2.63.66
PE100: 9.2.63.66
(Server) Concept Detection PEs
60 Mbps
TV broadcast, VCR,
DVD discs, Video
File Database,
Webcam
CDS
Features
(Client) Feature Extraction PEs
& Display Modules
Meta-
data
600
bps
Control
Modules
Resource
Constraints
User
Interests
Display and
Information
Aggregation
Modules
1.5
Mbps
MPEG-
1/2GOP
Extraction
Event/Shot
Extraction 2.8
Kbps
Feature
Extraction320
Kbps
22.4
Kbps
Encod-
ing
Sensor 1
Sensor 2
Sensor NSensor 3
14
IBM Research
Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation
Main Process
�This feature set is extracted as follows:1. Parse the MPEG-1/2 packets and get the beginning of an I-frame or the closest I-frame of a pre-specified shot keyframe.2. Using the VLC maps to map variable-length codes to the DCT-domain coefficients.3. Within an MPEG slice or a union of slices, truncate selected DCT coefficients and calculate the histogram of these DCT coefficients.4. Form a feature vector of the frame based on the histogram coefficients with multiple slices.
�In the above procedure, we can see that no multiplication operation is required to get these feature vectors. Only addition is needed for getting the histogram.
�In a typical situation, we partition a frame into three slices, and use 3 DCT coefficient histogram (1 DC and 2 lowest frequency AC coefficients) on all YCbCr domains. This forms a 576-dimensional feature vector.
�After feature extraction, SVM is used to train models and classification.
8
Business Unit or Product Name
© 2003 IBM Corporation15
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Three Measurements of Multimedia Semantic Concept Detectors
Concept Model
Accuracy
Concept Model EfficiencyConcept Coverage
Business Unit or Product Name
© 2003 IBM Corporation16
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Progress on Novel Concept Filters for Distillery MM Semantic Routing
Concept Model
Accuracy
Concept Model EfficiencyConcept Coverage
� Baseline: (state-of-the-art) IBM TREC ’03 Visual Detectors
56% increase
21% increase
More than 10 times of
Increase on
classification
9
Business Unit or Product Name
© 2003 IBM Corporation17
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Performance Comparison of the Compressed-Domain Detectors and IBM 2003 Visual Detectors
0.0000
0.1000
0.2000
0.3000
0.4000
0.5000
0.6000
0.7000
0.8000
0.9000
Hum
anFa
ce
Out
doors
Stud
io_S
ettin
g
Indo
ors
Male_
Face
Wea
ther_N
ews
Spor
t_Eve
ntSk
y
Female_
Face
Natur
e_Veg
etation
Cro
wd
Mou
ntain
Peop
le_E
vent
Water_B
ody
Peop
le
Veh
icle (T
rans
porta
tion)
Person
Roc
k
Building
Airp
lane
Graph
ics
Snow
Tree
Land
Bill
_Clin
ton
Bea
ch Car
Roa
dRiot
Flow
er
Des
ert
Cloud
City
scap
e
Ani
mal
Truck
Cartoon
Podium
Smok
e
Phys
ical_V
iolenc
e
Train
Fire
IBM 2003 Visual Models
New CDS Models
� Improved Mean Average Precision: 21.48%
� Improved Efficiency:
� >2M multiplication operations needed for generating features for IBM-03 classification per key frame.
� no multiplication operations needed for new CDS models, only 6K addition operations.
Business Unit or Product Name
© 2003 IBM Corporation18
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Demo -- Novel Semantic Concept Filters
� http://www.research.ibm.com/VideoDIG
� E.g.:
10
Business Unit or Product Name
© 2003 IBM Corporation19
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Outline -- Large-Scale Video Semantic Filtering
iphttp
ntp
udp
tcpftp
rtp
rtsp
video
audio
• Introduction and Motivation
• Prior Arts on Video Semantic Classification
• Compressed-Domain Feature Extraction
• Complexity-Accuracy Trade-Off
• Speed Up SVM Classification
• Speech-Text Complexity-Accuracy Analysis
• Conclusions
• Prospective Projects
Business Unit or Product Name
© 2003 IBM Corporation20
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Complexity Reduction Introduction
� Objective: Real-time classification of instances using Support Vector Machines (SVMs)
� Computationally efficient and reasonably accurate solutions
� Techniques capable of adjusting tradeoff between accuracy and speed based on available computational resources
SVM
Accu
racy
Resource
Accu
racy
Resource
Accu
racy
Resource
Objective Achieved
11
Business Unit or Product Name
© 2003 IBM Corporation21
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
SVMs
� Developed over the past decade
� Strong theoretical background
– Attempt to achieve Structural Risk Minimization
– Minimize upper bound on generalization error instead of
MSE
� Impressive empirical results, handles noisy data
� Effective over diverse domains
– Text (character/word based), Image, Surveillance
(Video/Network traffic), Bio (Gene/Protein/Disease) etc.
Business Unit or Product Name
© 2003 IBM Corporation22
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
SVM formulation
� Given :
– Training instances with labels
� Objective :
– Find maximum margin hyperplane separating positive
and negative training instances
{xi} yi
SVM
12
Business Unit or Product Name
© 2003 IBM Corporation23
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Salient Features
� Formulation:
� Optimization problem with quadratic objective function, linear
constraints (QPP)
� Hyperplane separator in projected space
� Implicit projection of instances
minw,b1
2‖ w ‖2
s.t. yi(w · φ(xi) + b) ≥ 1
Business Unit or Product Name
© 2003 IBM Corporation24
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Kernel Projection
� Implicit projection to feature space
� Measures similarity between instances in
projected space
� Examples:
– Gaussian :
– Laplacian :
– Polynomial : (1 + xi · xj)p
K(xi, xj) =< φ(xi), φ(xj) >
Exp(-γ ‖ xi−xj ‖2)Exp(-γ ‖ xi − xj ‖)
13
Business Unit or Product Name
© 2003 IBM Corporation25
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Decision
� Score of unseen instance :
� In terms of Lagrangian multipliers
� Computational Cost : O( )
– : Number of support vectors
– : Dimensionality of each data instance
uj w ·φ(uj)
∑i αiyik(xi, uj)
nsvd
nsv
d
Business Unit or Product Name
© 2003 IBM Corporation26
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Complexity Analysis on SVM classifiers
( , )ix x
rik x x e
−−
=
1
( ) ( , )S
i i
i
f x a k x x b=
= ⋅ +∑
� Complexity c: operation (multiplication, addition) required for classification
� SVM Classifier:
where S is the number of support vectors, k(.,.) is a kernel function. E.g.,
c S D∝ ⋅ where D is the dimensionality of the feature vector
support
vectors
� Support Vector Machine
– Largest margin hyperplane in the projected
feature space
– With good kernel choices, all operations can
be done in low-dimensional input feature
space
14
Business Unit or Product Name
© 2003 IBM Corporation27
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Problems
� Number of support vectors grows quasi-linearly
with size of training set [Tipping 2000]
� Inner product with each support vector of
dimensionality expensive
– Example TREC2003
• Human : 19745 support vectors
• Face : 18090
� High data rates(10Gbits/sec) means large number
of abandoned data
d
Business Unit or Product Name
© 2003 IBM Corporation28
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Example
� Processing Power 1 Ghz
� 10000 support vectors
� 1000 / 2 features per instance
� Order of at least 10^7 operations required per
stream per sec
� Translates to less than 100 instances evaluated
per sec with only one classifier
15
Business Unit or Product Name
© 2003 IBM Corporation29
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Naïve Approach I – Feature Dimension Reduction
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
00.20.40.60.81
Complexity -- Feature Dimension Ratio
Accura
cy -- A
vera
ge P
recis
ion
0.01230.037037037111
0.070.074074074211
0.01730.074074074121
0.03180.074074074112
0.070.111111111311
0.03140.111111111131
0.32190.111111111113
0.10650.148148148221
0.34570.148148148212
0.06990.148148148122
0.10650.222222222321
0.16840.222222222231
0.34570.222222222312
0.12410.222222222132
0.65810.222222222213
0.4270.222222222123
0.52350.296296296222
0.16840.333333333331
0.65810.333333333313
0.46850.333333333133
0.52350.444444444322
0.58220.444444444232
0.77570.444444444223
0.58220.666666667332
0.77570.666666667323
0.78610.666666667233
0.78611333
AP
Feature Dimension RatioTextureColorSlice
� Experimental Results for Weather_News
Detector
� Model Selection based on the Model
Validation Set
� E.g., for Feature Dimension Ratio 0.22, (the
best selection of features are: 3 slices, 1
color, 2 texture selections), the accuracy is
decreased by 17%.
Business Unit or Product Name
© 2003 IBM Corporation30
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Naïve Approach II – Reduction on the Number of Support Vectors
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
00.20.40.60.81
Complexity -- Number of Support Vectors
Accu
racy - A
vera
ge
Pre
cisi
on
� Proposed Novel Reduction Methods:
– Ranked Weighting
– P/N Cost Reduction
– Random Selection
– Support Vector Clustering and Centralization
� Experimental Results on Weather_News Detectors show that complexity can be
at 50% for the cost of 14% decrease on accuracy
16
Business Unit or Product Name
© 2003 IBM Corporation31
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Approach Overview
� Reduced set methods
� Focus on support vectors and weights already obtained (Same SVM formulation)
� Accommodate effect due to all support vectors in approximation
� Amenable to
– multiple kernels
– , norms
� Approximate support vector scoring with consideration for overall error
L1 L2
Business Unit or Product Name
© 2003 IBM Corporation32
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Approaches
� Weighted Clustering Approach:
– Approximate effect of support vector(s) by (possibly
hypothetical) instance with weight adjustment
� Error Radius Approach:
– Determine error radius and approximate effect outside
same
17
Business Unit or Product Name
© 2003 IBM Corporation33
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Weighted Clustering Approach
� Basic steps
– Cluster support vectors
– Use cluster center as representative for all support
vectors in cluster
– Determine scalar weight associated with each cluster
center
– Use only cluster centers to score new instances
Business Unit or Product Name
© 2003 IBM Corporation34
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Weighted Clustering Approach (contd.)
� Clustering
– Cluster positive and negative support vectors
separately
– Ratio of +ve clusters to -ve clusters same as ratio of
+ve support vectors to -ve support vectors
– Attempts to avoid bloated clusters because of
imbalance
– Manual decision on number of clusters
18
Business Unit or Product Name
© 2003 IBM Corporation35
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Weighted Clustering Approach (contd.)
� Cluster center weight
– Given :
• Support vector in cluster with cluster center
• Weight associated with
– Squared distance between and
– Unknown instance at squared distance
• from
• from
xcxiαi xi
∆i xi xc
xcxi
d
d− δi
Business Unit or Product Name
© 2003 IBM Corporation36
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Cluster center weight (contd.)
� Difference in scores
� Optimal
� Problem:
– Determined specific to instance
– Impossible to obtain speedup if computed on a per-instance basis
� Solution:
– Find minimizing error on average
= αiexp(δip)γi
γi exp(−dp)− αiexp(−
(d−δi)p)
γi
γi
γi
19
Business Unit or Product Name
© 2003 IBM Corporation37
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Cluster center weight (contd.)
� Choose minimizing square of difference in scores
over all and d
� Sub-cases :
d < ∆id ≥ ∆i
c
��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
∆ i
∆i
xc
������������������������������������������������������������������������������������������
������������������������������������������������������������������������������������������
γ iδi $
d
$
Business Unit or Product Name
© 2003 IBM Corporation38
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Cluster center weight (contd.)
� Case 1: Distance to unseen instance
� Optimal
� Minima verification:
� Optimal obtained for
2∆ipexp(−2∆i
p) ≥ 0
i
d d − δ i
x
∆id
x icza
∆ i
x xc i za
d − δ
d ≥ ∆i
d ≥ ∆i
γiupper = αipexp(
∆ip)−exp(−
∆ip)
2∆i
γiupper
∫∞
∆i
∫∆i
−∆i(γiexp(−
dp)− αiexp(−
(d−δi)p))2dδidd.
20
Business Unit or Product Name
© 2003 IBM Corporation39
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Cluster center weight (contd.)
i
d d − δ i
x
∆id
x icza
∆ i
x xc i za
d − δ
� Case 2:
� Optimal
� Minima :
� Optimal obtained for
exp( 2∆ip ) > (1 +
2∆ip )
d < ∆i
d < ∆i
∫ ∆i
0
∫ ∆i
∆i−2d(γiexp(−
dp) − αiexp(−
(d+δi)p
))2dδidd
γilower =αiexp(−
∆ip)[2∆ip−1+exp(−
2∆ip)]
1−exp(−2∆ip)−
2∆ipexp(−
2∆ip)
γilower
Business Unit or Product Name
© 2003 IBM Corporation40
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Using the weights
� For every support vector in cluster
– Distance known
– Two weights computed
� Cumulative effect of all support vectors in clusters additive
– because of various support vectors added up at center
to simulate effect of all support vectors
� sorted, weight arrays rearranged
∆i
∆i
∆i
21
Business Unit or Product Name
© 2003 IBM Corporation41
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Using the weights (contd.)
� Given distance of unseen instance
– Binary search on array
– For support vectors with less than , used, for others
� Sum associated with cluster center not computed per-instance
� Pre-computed at every position of sorted array
∆id ∆i
d
������
������
������
������
∆ι
γilower
γi upper Sum upper
Sum lower
d
γ iupper γ ilower
42
IBM Research
Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation
Experiments
� Datasets
� TREC video datasets (2003 and 2005)
• 576 features per instance
• > 20000 test instances overall
� MNist handwritten digit dataset (RBF kernel)
• 576 features
• 60000 training instances, 10000 test instances
� Performance metrics
� Speedup achieved over evaluation with all support vectors
� Average precision achieved
RECALL
1.0
1.00.5
0.5
PR
EC
ISIO
N
22
43
IBM Research
Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation
Results (Mnist 0-4)
0.96
0.965
0.97
0.975
0.98
0.985
0.99
0.995
1
1.005
0 50 100 150 200 250
Speedup Ratio
Average Precision 0
1
2
3
4
44
IBM Research
Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation
Results (Mnist 5-9)
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1.02
1 10 100 1000
Speedup Ratio
Average Precision
5
6
7
8
9
23
45
IBM Research
Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation
Results (TREC 2003)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Hum
an
Face
Outd
oors
Studio
-Setti
ng
Sport-Event
Natu
re-V
egetatio
n
Cro
wd
Mounta
in
People-E
vent
Concept
Average Precision
AP_fast
AP_original
1
10
100
1000
10000
Hum
an
Outd
oors
Sport-
Event
Cro
wd
People
-Eve
nt
Concept
Speedup
46
IBM Research
Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation
TREC 2005
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Bus
Pris
oner
Truck
Walk
ing_R
unning
Anim
al
Outd
oor
Court
Pers
onFace
Concept
Average Precision
AP_original
AP_fast
24
47
IBM Research
Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation
Summary of Complexity Reduction
� Techniques presented demonstrate reasonable performance in terms of both speedup and average precision over multiple concepts in datasets
� Speedups� MNist : All concepts at least 50 times faster with AP
within 0.04 of original
� TREC 2003: Eight out of nine concepts speedup greater than 80 times with AP within 0.05 of original
� TREC 2005: APs in some cases along with speedup respectable
� APs of most concepts close to original APs
48
IBM Research
Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation
VideoDIG Demo– configurable and scalable video semantic concept detection/filtering
Face
Outdoors
Indoors
PE1: 9.2.63.66: 1220
PE2: 9.2.63.67
PE3: 9.2.63.68
Female
Male
Airplane
Chair
Clock
PE4: 9.2.63.66:1235
PE5: 9.2.63.66: 1240
PE6: 9.2.63.66
PE7: 9.2.63.66
PE100: 9.2.63.66
(Server) Concept Detection PEs
60 Mbps
TV broadcast, VCR,
DVD discs, Video
File Database,
Webcam
1.5
Mbps
MPEG-
1/2GOP
Extraction
Shot
Segmen-
tation
CDS
Features
2.8
Kbps
(Client) Feature Extraction PEs
& Display Modules
Meta-
data
600
bps
Control
Modules
Resource
Constraints
User
Interests
Feature
Extraction320
Kbps
22.4
Kbps
Display
Controller
On/Off
Encod-
ing
25
49
IBM Research
Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation
Outline -- Large-Scale Video Semantic Filtering
iphttp
ntp
udp
tcpftp
rtp
rtsp
video
audio
• Introduction and Motivation
• Prior Arts on Video Semantic Classification
• Compressed-Domain Feature Extraction
• Complexity-Accuracy Trade-Off
• Speed Up SVM Classification
• Speech-Text Complexity-Accuracy Analysis
• Conclusion
• Prospective Projects
Business Unit or Product Name
© 2003 IBM Corporation50
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Other Issues: Speech and Text-based Topic Detection
Query ConceptWeather News
weather, weather condition, atmosphericcondition
=> cold weather => fair weather, sunshine
=> hot weather
Weather conditionAtmospheric condition
Cold weatherFair weather
Weather forecastWarm weather
Speech-basedRetrieval
Weather
atmospheric
cold weather
freeze
frost
sunshine
temper
scorcher
sultry
thaw
warm
downfall
hail
rain
building
edifice
walk-up
butchery
apart build
tenement
architecture
call center
sanctuary
Bathhouse
animal
beast
brute
critter
darter
peeper
creature
Fauna
microorganism
poikilotherm
airplane
aeroplane
plane
airline
airbus
twin-aisle
airplane
amphibian
biplane
passenger
fighter
aircraft
Weather newsBuildingAnimalAirplane
� Unsupervised learning from WordNet:
Example of Topic – Related Keywords
26
Business Unit or Product Name
© 2003 IBM Corporation51
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Concept Detection based on Automatic Speech Recognition
Concept Detection based on ASR
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Airpla
ne
Animal
Buildin
gFace
Madele
ine_Alb
right
Outd
oors
Na ture
_Vegetatio
nPeople
Phys ical_Vio
lence
Road
Sport_Event
Weath
er_N
ewsM
AP
Concept
Av
era
ge
Pre
cis
ion
Baseline (HJN)
Supervised
Revised Supervised
WordNet
weather, weather condition,
atmospheric condition, cold weather,
freeze, frost, fair weather, sunshine,
temperateness, hot weather,
scorcher, sultriness, thaw, thawing,
warming, precipitation, downfall, fine
spray, hail, rain, rainfall, monsoon,
rainstorm, line storm, equinoctial
storm, thundershower, downpour,
cloudburst, …..
rain 334, temperatur 327, continu 287,
weather 284, forecast 269, coast 254,
shower 254, storm 245, california 242,
northern 226, thunderstorm 225, snow
224, west 208, move 206, lake 195,
southeast 187, northeast 175,
northwest 172, heavi 170, eastern 161,
plain 158, southern 157, expect 153,
rocki 153, high 151, warm 147, gulf 143,
headlin 142, look 142, great 140,
caribbean 140, todai 139, seventi 137,
east 135, eighti 134, thirti 133, hawaii
129, meteorologist 125, midwest 125
Supervised:
WordNet:
Business Unit or Product Name
© 2003 IBM Corporation52
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Complexity-Accuracy on Concept Detection based on Automatic Speech Recognition
Tradeoff of Complexity and Accuracy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99
Number of Expanded Keywords
Av
era
ge
Pre
cis
ion
Face
Madeleine_Albright
Outdoors
Nature_Vegetation
People
Physical_Violence
Road
Sport_Event
Weather_News
27
Business Unit or Product Name
© 2003 IBM Corporation53
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Conclusion
� Novel Challenge:
� Current Status:
– Classifying 40 video streams of 100 concepts in real-time by using a Pentium III PC.
– SVM classification can be speed up in a factor of 10s to 1000s with average precision of 90% or above of the original.
� Future Works:
– Learning and Classifying in noisy environment
– Frameworks for other types of media (e.g., speech, email, web access, ….) and other types of machine learning/pattern recognition framework (e.g., dynamic Bayesian Net..)
Business Unit or Product Name
© 2003 IBM Corporation54
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Revisited Video Semantic Filtering Framework
Face
Outdoors
Indoors
PE1: 9.2.63.66: 1220
PE2: 9.2.63.67
PE3: 9.2.63.68
Female
Male
Airplane
Chair
Clock
PE4: 9.2.63.66:1235
PE5: 9.2.63.66: 1240
PE6: 9.2.63.66
PE7: 9.2.63.66
PE100: 9.2.63.66
(Server) Concept Detection
Processing Elements
Camera
Input
60 Mbps
CDS
Features
(Distributed Smart Sensors) Block diagram of the
smart sensors
Meta-
data
600
bps
Control
Modules
Resource
Constraints
User
Interests
Display and
Information
Aggregation
Modules
1.5
Mbps
MPEG-
1/2GOP
Extraction
Event
Extraction 2.8
Kbps
Feature
Extraction320
Kbps
22.4
Kbps
Encod-
ing
Sensor 1
Sensor 2
Sensor NSensor 3
28
Business Unit or Product Name
© 2003 IBM Corporation55
IBM Research
Multimedia Semantic Routing © 2004 IBM Corporation
Projects Available (Spring 2006)
� Developing Distributed Smart Video Cameras
(Phase II):
� Large-Scale Data Mining, Community Modeling,
Human Behavior Modeling
� Also Course: EE6886 Multimedia Security Systems
56
IBM Research
Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation
Acknowledgements
� Navneet Panda (UCSB), Xiaodan Song (UW), Victor Sutan (CU), Jason Cardillo (CU)
� Olivier Verscheure (IBM), Lisa Amini (IBM)
Questions?
Contact: [email protected]://www.research.ibm.com/people/c/cylin