28
1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic Filtering November 16, 2005 Ching-Yung Lin Exploratory Stream Processing Systems, IBM T. J. Watson Research Center Business Unit or Product Name © 2003 IBM Corporation 2 IBM Research Multimedia Semantic Routing © 2004 IBM Corporation Outline -- Large-Scale Video Semantic Filtering ip http ntp udp tcp ftp rtp rtsp video audio Introduction and Motivation Prior Arts on Video Semantic Classification Compressed-Domain Feature Extraction Complexity-Accuracy Trade-Off Speed Up SVM Classification Speech-Text Complexity-Accuracy Analysis • Conclusions Prospective Projects

Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

1

Business Unit or Product Name

© 2003 IBM Corporation

IBM Research

© 2004 IBM Corporation

Large-Scale Video Semantic

Filtering

November 16, 2005

Ching-Yung Lin

Exploratory Stream Processing Systems,

IBM T. J. Watson Research Center

Business Unit or Product Name

© 2003 IBM Corporation2

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Outline -- Large-Scale Video Semantic Filtering

iphttp

ntp

udp

tcpftp

rtp

rtsp

video

audio

• Introduction and Motivation

• Prior Arts on Video Semantic Classification

• Compressed-Domain Feature Extraction

• Complexity-Accuracy Trade-Off

• Speed Up SVM Classification

• Speech-Text Complexity-Accuracy Analysis

• Conclusions

• Prospective Projects

Page 2: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

2

Video Semantic Understanding and Filtering

© 2004 IBM CorporationIBMPage 3

System Overview

Create breakthrough technology to enable

…large-scale production, analysis and management of

…information and knowledge from

…thousands of disparate input sources

…using an autonomic system that

…continuously adapts processing to satisfy demands of

…changing environment, content, and requests.

Video Semantic Understanding and Filtering

© 2004 IBM CorporationIBMPage 4

� 10Gbit/s Continuous Feed Coming into System

� Types of Data

• Speech, text, moving images, still images, coded application

data, machine-to-machine binary communication

� System Mechanisms

• Telephony: 9.6Gbit/sec (including VoIP)

• Internet

� Email: 250Mbit/sec (about 500 pieces per second)

� Dynamic web pages: 50Mbit/sec

� Instant Messaging: 200Kbit/sec

� Static web pages: 100Kbit/sec

� Transactional data: TBD

• TV: 40Mb/sec (equivalent to about 10 stations)

• Radio: 2Mb/sec (equivalent to about 20 stations)

What is Large-scale?

Page 3: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

3

Business Unit or Product Name

© 2003 IBM Corporation5

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Example IP Packet Stream Instantiation

ip http

ntp

udp

tcp ftp

rtp

rtsp

video

audio

200-500MB/s ~100MB/sper PE

rates

10 MB/s

InputsDataflow

Graph

Business Unit or Product Name

© 2003 IBM Corporation6

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Semantic MM Routing and Filtering

200-500MB/s ~100MB/sper PE

rates

10 MB/s

InputsDataflow

Graph

ip http

ntp

udp

tcp ftp

rtp

rtsp

sessvideo

sessaudio Interest

Routing

keywords id

Packet content

analysis

Advanced content

analysis

Interest

Filtering

Interested

MM streams

Page 4: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

4

Business Unit or Product Name

© 2003 IBM Corporation7

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Objective

� Configurable Parameters of Processing Elements to maximize relevant information:

– Y’’(X | q, R) > Y’(X | q, R),

with resource constraint.

� Required resource-efficient algorithms for:

– Classification, routing and filtering of signal-oriented data: (audio, video and, possibly, sensor data)

X

R

Y(X|q)Y’’(X|q,R)X’

� Input data X – Queries q – Resource R

– Y(X | q): Relevant information

– Y’(X | q, R) � Y(X | q): Achievable subset given R

Business Unit or Product Name

© 2003 IBM Corporation8

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Video Concept Classification/Routing� Objective:

– Build Moderate Amount of Real-Time Concept Classifiers, e.g.,

Human, Outdoors, Face, Indoors, etc.

� Test-bed Corpus:

– NIST TRECVID 2003 corpus: 128 hours of MPEG-1 videos, 62

hours of them were manually annotated by 23 worldwide groups

using IBM VideoAnnEx Collaborative Annotation System.

– 38 hours of video for training; 24 hours (partitioned to 6-, 6-, 12-

hour sets) for internal evaluation; 65 hours for NIST benchmarking.

Classification

VideoLAN

Networked

Player

RTP Streamer

Outdoors

IP Port 1234

MPEG-1 Video

Over RTP/UDP

MPEG-1 files

VideoLAN

Networked

Player

Outer Space

IndoorsIP Port 2346

Intranet

Office Desktop: 9.2.10.85 Laptop T40: 9.2.89.55

Compressed-

Domain

Feature

Extraction

Concept

Model

Office Desktop: 9.2.64.86

Page 5: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

5

Business Unit or Product Name

© 2003 IBM Corporation9

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Shot Segmentation

Semi-ManualAnnotation

Feature Extraction

BuildingClassifier Detectors

Region Segmentation

Training Video Corpus

Lexicon

• Regions– Object (motion, Camera registration)

– Background (5 lg regions / shot)

Features• Color:

– Color histograms (72 dim, 512 dim), Auto-Correlograms (72 dim)

• Structure & Shape:

– Edge orientation histogram (32 dim), Dudani Moment Invariants (6 dim), Aspect ratio of bounding box (1dim)

• Texture:

– Co-occurrence texture (48 dim), Coarseness (1 dim), Contrast (1 dim), Directionality (1 dim), Wavelet (12 dim)

• Motion:

– Motion vector histogram (6 dim)

Prior Art: Supervised Learning for Building Generic Concept Detectors

Supervised learning: Classification and Fusion:• Support Vector Machines (SVM)

• Ensemble Fusion

• Other Fusions (Hierarchical, SVM, Multinet, etc.)

Business Unit or Product Name

© 2003 IBM Corporation10

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

IBM Research Video Concept Detection Systems (Aug. 2003)

EF

EF2

Post-

processing

BOU BOF BOBO

Uni-models Multi-Modality

Fusion

Multi-

Model Fusion

MLP

DMF17

MN

ONT

DMF64

V1

V2

V1

V2

Best Uni-model

Ensemble Run

Best Multi-modal

Ensemble Run

Best-of-the-Best

Ensemble Run

Annotation and

Data

Preparation

Videos (128 hours

MPEG-1 )

Collabor

ative

Annotati

on

Fe

atu

re

Ex

trac

tion

CH

WT

EH

MI

CT

MV

TAM

CLG

AUD

CC

Re

gio

n

Ex

trac

tion

Filtering

VW

MLP

: training only: training and testing

Feature Extraction

SD/A

By IBM TRECVID

2003 Team

Page 6: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

6

Business Unit or Product Name

© 2003 IBM Corporation11

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

MPEG-7 XML

Shot Segmentation

RegionSegmentation

Feature Extraction

Classifi-cation

MPEG-1 sequence

• Key-Frame Extraction• Story Boards• Video Editing

Detectors

• Content-Based Retrieval• Relevance Feedback

• Model-based applications

• Object-based compression, manipulation

XML Rendering

• detectors

“Text-overlay” “Outdoors” “People”

Prior Art: Classification Steps

Business Unit or Product Name

© 2003 IBM Corporation12

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Novel Visual Semantic Concept Filters

� 100 visual filters were built (v.s.

64 in 2003).

� Concepts are arranged

hierarchically.

� Based on novel compressed-

domain sliced features.

� Neither segmentation nor fusion

is used.

Page 7: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

7

13

IBM Research

Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation

An Example of Video Sensor Network System

Face

Outdoors

Indoors

PE1: 9.2.63.66: 1220

PE2: 9.2.63.67

PE3: 9.2.63.68

Female

Male

Airplane

Chair

Clock

PE4: 9.2.63.66:1235

PE5: 9.2.63.66: 1240

PE6: 9.2.63.66

PE7: 9.2.63.66

PE100: 9.2.63.66

(Server) Concept Detection PEs

60 Mbps

TV broadcast, VCR,

DVD discs, Video

File Database,

Webcam

CDS

Features

(Client) Feature Extraction PEs

& Display Modules

Meta-

data

600

bps

Control

Modules

Resource

Constraints

User

Interests

Display and

Information

Aggregation

Modules

1.5

Mbps

MPEG-

1/2GOP

Extraction

Event/Shot

Extraction 2.8

Kbps

Feature

Extraction320

Kbps

22.4

Kbps

Encod-

ing

Sensor 1

Sensor 2

Sensor NSensor 3

14

IBM Research

Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation

Main Process

�This feature set is extracted as follows:1. Parse the MPEG-1/2 packets and get the beginning of an I-frame or the closest I-frame of a pre-specified shot keyframe.2. Using the VLC maps to map variable-length codes to the DCT-domain coefficients.3. Within an MPEG slice or a union of slices, truncate selected DCT coefficients and calculate the histogram of these DCT coefficients.4. Form a feature vector of the frame based on the histogram coefficients with multiple slices.

�In the above procedure, we can see that no multiplication operation is required to get these feature vectors. Only addition is needed for getting the histogram.

�In a typical situation, we partition a frame into three slices, and use 3 DCT coefficient histogram (1 DC and 2 lowest frequency AC coefficients) on all YCbCr domains. This forms a 576-dimensional feature vector.

�After feature extraction, SVM is used to train models and classification.

Page 8: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

8

Business Unit or Product Name

© 2003 IBM Corporation15

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Three Measurements of Multimedia Semantic Concept Detectors

Concept Model

Accuracy

Concept Model EfficiencyConcept Coverage

Business Unit or Product Name

© 2003 IBM Corporation16

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Progress on Novel Concept Filters for Distillery MM Semantic Routing

Concept Model

Accuracy

Concept Model EfficiencyConcept Coverage

� Baseline: (state-of-the-art) IBM TREC ’03 Visual Detectors

56% increase

21% increase

More than 10 times of

Increase on

classification

Page 9: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

9

Business Unit or Product Name

© 2003 IBM Corporation17

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Performance Comparison of the Compressed-Domain Detectors and IBM 2003 Visual Detectors

0.0000

0.1000

0.2000

0.3000

0.4000

0.5000

0.6000

0.7000

0.8000

0.9000

Hum

anFa

ce

Out

doors

Stud

io_S

ettin

g

Indo

ors

Male_

Face

Wea

ther_N

ews

Spor

t_Eve

ntSk

y

Female_

Face

Natur

e_Veg

etation

Cro

wd

Mou

ntain

Peop

le_E

vent

Water_B

ody

Peop

le

Veh

icle (T

rans

porta

tion)

Person

Roc

k

Building

Airp

lane

Graph

ics

Snow

Tree

Land

Bill

_Clin

ton

Bea

ch Car

Roa

dRiot

Flow

er

Des

ert

Cloud

City

scap

e

Ani

mal

Truck

Cartoon

Podium

Smok

e

Phys

ical_V

iolenc

e

Train

Fire

IBM 2003 Visual Models

New CDS Models

� Improved Mean Average Precision: 21.48%

� Improved Efficiency:

� >2M multiplication operations needed for generating features for IBM-03 classification per key frame.

� no multiplication operations needed for new CDS models, only 6K addition operations.

Business Unit or Product Name

© 2003 IBM Corporation18

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Demo -- Novel Semantic Concept Filters

� http://www.research.ibm.com/VideoDIG

� E.g.:

Page 10: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

10

Business Unit or Product Name

© 2003 IBM Corporation19

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Outline -- Large-Scale Video Semantic Filtering

iphttp

ntp

udp

tcpftp

rtp

rtsp

video

audio

• Introduction and Motivation

• Prior Arts on Video Semantic Classification

• Compressed-Domain Feature Extraction

• Complexity-Accuracy Trade-Off

• Speed Up SVM Classification

• Speech-Text Complexity-Accuracy Analysis

• Conclusions

• Prospective Projects

Business Unit or Product Name

© 2003 IBM Corporation20

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Complexity Reduction Introduction

� Objective: Real-time classification of instances using Support Vector Machines (SVMs)

� Computationally efficient and reasonably accurate solutions

� Techniques capable of adjusting tradeoff between accuracy and speed based on available computational resources

SVM

Accu

racy

Resource

Accu

racy

Resource

Accu

racy

Resource

Objective Achieved

Page 11: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

11

Business Unit or Product Name

© 2003 IBM Corporation21

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

SVMs

� Developed over the past decade

� Strong theoretical background

– Attempt to achieve Structural Risk Minimization

– Minimize upper bound on generalization error instead of

MSE

� Impressive empirical results, handles noisy data

� Effective over diverse domains

– Text (character/word based), Image, Surveillance

(Video/Network traffic), Bio (Gene/Protein/Disease) etc.

Business Unit or Product Name

© 2003 IBM Corporation22

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

SVM formulation

� Given :

– Training instances with labels

� Objective :

– Find maximum margin hyperplane separating positive

and negative training instances

{xi} yi

SVM

Page 12: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

12

Business Unit or Product Name

© 2003 IBM Corporation23

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Salient Features

� Formulation:

� Optimization problem with quadratic objective function, linear

constraints (QPP)

� Hyperplane separator in projected space

� Implicit projection of instances

minw,b1

2‖ w ‖2

s.t. yi(w · φ(xi) + b) ≥ 1

Business Unit or Product Name

© 2003 IBM Corporation24

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Kernel Projection

� Implicit projection to feature space

� Measures similarity between instances in

projected space

� Examples:

– Gaussian :

– Laplacian :

– Polynomial : (1 + xi · xj)p

K(xi, xj) =< φ(xi), φ(xj) >

Exp(-γ ‖ xi−xj ‖2)Exp(-γ ‖ xi − xj ‖)

Page 13: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

13

Business Unit or Product Name

© 2003 IBM Corporation25

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Decision

� Score of unseen instance :

� In terms of Lagrangian multipliers

� Computational Cost : O( )

– : Number of support vectors

– : Dimensionality of each data instance

uj w ·φ(uj)

∑i αiyik(xi, uj)

nsvd

nsv

d

Business Unit or Product Name

© 2003 IBM Corporation26

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Complexity Analysis on SVM classifiers

( , )ix x

rik x x e

−−

=

1

( ) ( , )S

i i

i

f x a k x x b=

= ⋅ +∑

� Complexity c: operation (multiplication, addition) required for classification

� SVM Classifier:

where S is the number of support vectors, k(.,.) is a kernel function. E.g.,

c S D∝ ⋅ where D is the dimensionality of the feature vector

support

vectors

� Support Vector Machine

– Largest margin hyperplane in the projected

feature space

– With good kernel choices, all operations can

be done in low-dimensional input feature

space

Page 14: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

14

Business Unit or Product Name

© 2003 IBM Corporation27

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Problems

� Number of support vectors grows quasi-linearly

with size of training set [Tipping 2000]

� Inner product with each support vector of

dimensionality expensive

– Example TREC2003

• Human : 19745 support vectors

• Face : 18090

� High data rates(10Gbits/sec) means large number

of abandoned data

d

Business Unit or Product Name

© 2003 IBM Corporation28

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Example

� Processing Power 1 Ghz

� 10000 support vectors

� 1000 / 2 features per instance

� Order of at least 10^7 operations required per

stream per sec

� Translates to less than 100 instances evaluated

per sec with only one classifier

Page 15: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

15

Business Unit or Product Name

© 2003 IBM Corporation29

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Naïve Approach I – Feature Dimension Reduction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

00.20.40.60.81

Complexity -- Feature Dimension Ratio

Accura

cy -- A

vera

ge P

recis

ion

0.01230.037037037111

0.070.074074074211

0.01730.074074074121

0.03180.074074074112

0.070.111111111311

0.03140.111111111131

0.32190.111111111113

0.10650.148148148221

0.34570.148148148212

0.06990.148148148122

0.10650.222222222321

0.16840.222222222231

0.34570.222222222312

0.12410.222222222132

0.65810.222222222213

0.4270.222222222123

0.52350.296296296222

0.16840.333333333331

0.65810.333333333313

0.46850.333333333133

0.52350.444444444322

0.58220.444444444232

0.77570.444444444223

0.58220.666666667332

0.77570.666666667323

0.78610.666666667233

0.78611333

AP

Feature Dimension RatioTextureColorSlice

� Experimental Results for Weather_News

Detector

� Model Selection based on the Model

Validation Set

� E.g., for Feature Dimension Ratio 0.22, (the

best selection of features are: 3 slices, 1

color, 2 texture selections), the accuracy is

decreased by 17%.

Business Unit or Product Name

© 2003 IBM Corporation30

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Naïve Approach II – Reduction on the Number of Support Vectors

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

00.20.40.60.81

Complexity -- Number of Support Vectors

Accu

racy - A

vera

ge

Pre

cisi

on

� Proposed Novel Reduction Methods:

– Ranked Weighting

– P/N Cost Reduction

– Random Selection

– Support Vector Clustering and Centralization

� Experimental Results on Weather_News Detectors show that complexity can be

at 50% for the cost of 14% decrease on accuracy

Page 16: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

16

Business Unit or Product Name

© 2003 IBM Corporation31

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Approach Overview

� Reduced set methods

� Focus on support vectors and weights already obtained (Same SVM formulation)

� Accommodate effect due to all support vectors in approximation

� Amenable to

– multiple kernels

– , norms

� Approximate support vector scoring with consideration for overall error

L1 L2

Business Unit or Product Name

© 2003 IBM Corporation32

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Approaches

� Weighted Clustering Approach:

– Approximate effect of support vector(s) by (possibly

hypothetical) instance with weight adjustment

� Error Radius Approach:

– Determine error radius and approximate effect outside

same

Page 17: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

17

Business Unit or Product Name

© 2003 IBM Corporation33

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Weighted Clustering Approach

� Basic steps

– Cluster support vectors

– Use cluster center as representative for all support

vectors in cluster

– Determine scalar weight associated with each cluster

center

– Use only cluster centers to score new instances

Business Unit or Product Name

© 2003 IBM Corporation34

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Weighted Clustering Approach (contd.)

� Clustering

– Cluster positive and negative support vectors

separately

– Ratio of +ve clusters to -ve clusters same as ratio of

+ve support vectors to -ve support vectors

– Attempts to avoid bloated clusters because of

imbalance

– Manual decision on number of clusters

Page 18: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

18

Business Unit or Product Name

© 2003 IBM Corporation35

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Weighted Clustering Approach (contd.)

� Cluster center weight

– Given :

• Support vector in cluster with cluster center

• Weight associated with

– Squared distance between and

– Unknown instance at squared distance

• from

• from

xcxiαi xi

∆i xi xc

xcxi

d

d− δi

Business Unit or Product Name

© 2003 IBM Corporation36

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Cluster center weight (contd.)

� Difference in scores

� Optimal

� Problem:

– Determined specific to instance

– Impossible to obtain speedup if computed on a per-instance basis

� Solution:

– Find minimizing error on average

= αiexp(δip)γi

γi exp(−dp)− αiexp(−

(d−δi)p)

γi

γi

γi

Page 19: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

19

Business Unit or Product Name

© 2003 IBM Corporation37

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Cluster center weight (contd.)

� Choose minimizing square of difference in scores

over all and d

� Sub-cases :

d < ∆id ≥ ∆i

c

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

∆ i

∆i

xc

������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������

γ iδi $

d

$

Business Unit or Product Name

© 2003 IBM Corporation38

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Cluster center weight (contd.)

� Case 1: Distance to unseen instance

� Optimal

� Minima verification:

� Optimal obtained for

2∆ipexp(−2∆i

p) ≥ 0

i

d d − δ i

x

∆id

x icza

∆ i

x xc i za

d − δ

d ≥ ∆i

d ≥ ∆i

γiupper = αipexp(

∆ip)−exp(−

∆ip)

2∆i

γiupper

∫∞

∆i

∫∆i

−∆i(γiexp(−

dp)− αiexp(−

(d−δi)p))2dδidd.

Page 20: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

20

Business Unit or Product Name

© 2003 IBM Corporation39

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Cluster center weight (contd.)

i

d d − δ i

x

∆id

x icza

∆ i

x xc i za

d − δ

� Case 2:

� Optimal

� Minima :

� Optimal obtained for

exp( 2∆ip ) > (1 +

2∆ip )

d < ∆i

d < ∆i

∫ ∆i

0

∫ ∆i

∆i−2d(γiexp(−

dp) − αiexp(−

(d+δi)p

))2dδidd

γilower =αiexp(−

∆ip)[2∆ip−1+exp(−

2∆ip)]

1−exp(−2∆ip)−

2∆ipexp(−

2∆ip)

γilower

Business Unit or Product Name

© 2003 IBM Corporation40

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Using the weights

� For every support vector in cluster

– Distance known

– Two weights computed

� Cumulative effect of all support vectors in clusters additive

– because of various support vectors added up at center

to simulate effect of all support vectors

� sorted, weight arrays rearranged

∆i

∆i

∆i

Page 21: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

21

Business Unit or Product Name

© 2003 IBM Corporation41

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Using the weights (contd.)

� Given distance of unseen instance

– Binary search on array

– For support vectors with less than , used, for others

� Sum associated with cluster center not computed per-instance

� Pre-computed at every position of sorted array

∆id ∆i

d

������

������

������

������

∆ι

γilower

γi upper Sum upper

Sum lower

d

γ iupper γ ilower

42

IBM Research

Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation

Experiments

� Datasets

� TREC video datasets (2003 and 2005)

• 576 features per instance

• > 20000 test instances overall

� MNist handwritten digit dataset (RBF kernel)

• 576 features

• 60000 training instances, 10000 test instances

� Performance metrics

� Speedup achieved over evaluation with all support vectors

� Average precision achieved

RECALL

1.0

1.00.5

0.5

PR

EC

ISIO

N

Page 22: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

22

43

IBM Research

Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation

Results (Mnist 0-4)

0.96

0.965

0.97

0.975

0.98

0.985

0.99

0.995

1

1.005

0 50 100 150 200 250

Speedup Ratio

Average Precision 0

1

2

3

4

44

IBM Research

Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation

Results (Mnist 5-9)

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1.02

1 10 100 1000

Speedup Ratio

Average Precision

5

6

7

8

9

Page 23: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

23

45

IBM Research

Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation

Results (TREC 2003)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Hum

an

Face

Outd

oors

Studio

-Setti

ng

Sport-Event

Natu

re-V

egetatio

n

Cro

wd

Mounta

in

People-E

vent

Concept

Average Precision

AP_fast

AP_original

1

10

100

1000

10000

Hum

an

Outd

oors

Sport-

Event

Cro

wd

People

-Eve

nt

Concept

Speedup

46

IBM Research

Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation

TREC 2005

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Bus

Pris

oner

Truck

Walk

ing_R

unning

Anim

al

Outd

oor

Court

Pers

onFace

Concept

Average Precision

AP_original

AP_fast

Page 24: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

24

47

IBM Research

Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation

Summary of Complexity Reduction

� Techniques presented demonstrate reasonable performance in terms of both speedup and average precision over multiple concepts in datasets

� Speedups� MNist : All concepts at least 50 times faster with AP

within 0.04 of original

� TREC 2003: Eight out of nine concepts speedup greater than 80 times with AP within 0.05 of original

� TREC 2005: APs in some cases along with speedup respectable

� APs of most concepts close to original APs

48

IBM Research

Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation

VideoDIG Demo– configurable and scalable video semantic concept detection/filtering

Face

Outdoors

Indoors

PE1: 9.2.63.66: 1220

PE2: 9.2.63.67

PE3: 9.2.63.68

Female

Male

Airplane

Chair

Clock

PE4: 9.2.63.66:1235

PE5: 9.2.63.66: 1240

PE6: 9.2.63.66

PE7: 9.2.63.66

PE100: 9.2.63.66

(Server) Concept Detection PEs

60 Mbps

TV broadcast, VCR,

DVD discs, Video

File Database,

Webcam

1.5

Mbps

MPEG-

1/2GOP

Extraction

Shot

Segmen-

tation

CDS

Features

2.8

Kbps

(Client) Feature Extraction PEs

& Display Modules

Meta-

data

600

bps

Control

Modules

Resource

Constraints

User

Interests

Feature

Extraction320

Kbps

22.4

Kbps

Display

Controller

On/Off

Encod-

ing

Page 25: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

25

49

IBM Research

Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation

Outline -- Large-Scale Video Semantic Filtering

iphttp

ntp

udp

tcpftp

rtp

rtsp

video

audio

• Introduction and Motivation

• Prior Arts on Video Semantic Classification

• Compressed-Domain Feature Extraction

• Complexity-Accuracy Trade-Off

• Speed Up SVM Classification

• Speech-Text Complexity-Accuracy Analysis

• Conclusion

• Prospective Projects

Business Unit or Product Name

© 2003 IBM Corporation50

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Other Issues: Speech and Text-based Topic Detection

Query ConceptWeather News

weather, weather condition, atmosphericcondition

=> cold weather => fair weather, sunshine

=> hot weather

Weather conditionAtmospheric condition

Cold weatherFair weather

Weather forecastWarm weather

Speech-basedRetrieval

Weather

atmospheric

cold weather

freeze

frost

sunshine

temper

scorcher

sultry

thaw

warm

downfall

hail

rain

building

edifice

walk-up

butchery

apart build

tenement

architecture

call center

sanctuary

Bathhouse

animal

beast

brute

critter

darter

peeper

creature

Fauna

microorganism

poikilotherm

airplane

aeroplane

plane

airline

airbus

twin-aisle

airplane

amphibian

biplane

passenger

fighter

aircraft

Weather newsBuildingAnimalAirplane

� Unsupervised learning from WordNet:

Example of Topic – Related Keywords

Page 26: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

26

Business Unit or Product Name

© 2003 IBM Corporation51

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Concept Detection based on Automatic Speech Recognition

Concept Detection based on ASR

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Airpla

ne

Animal

Buildin

gFace

Madele

ine_Alb

right

Outd

oors

Na ture

_Vegetatio

nPeople

Phys ical_Vio

lence

Road

Sport_Event

Weath

er_N

ewsM

AP

Concept

Av

era

ge

Pre

cis

ion

Baseline (HJN)

Supervised

Revised Supervised

WordNet

weather, weather condition,

atmospheric condition, cold weather,

freeze, frost, fair weather, sunshine,

temperateness, hot weather,

scorcher, sultriness, thaw, thawing,

warming, precipitation, downfall, fine

spray, hail, rain, rainfall, monsoon,

rainstorm, line storm, equinoctial

storm, thundershower, downpour,

cloudburst, …..

rain 334, temperatur 327, continu 287,

weather 284, forecast 269, coast 254,

shower 254, storm 245, california 242,

northern 226, thunderstorm 225, snow

224, west 208, move 206, lake 195,

southeast 187, northeast 175,

northwest 172, heavi 170, eastern 161,

plain 158, southern 157, expect 153,

rocki 153, high 151, warm 147, gulf 143,

headlin 142, look 142, great 140,

caribbean 140, todai 139, seventi 137,

east 135, eighti 134, thirti 133, hawaii

129, meteorologist 125, midwest 125

Supervised:

WordNet:

Business Unit or Product Name

© 2003 IBM Corporation52

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Complexity-Accuracy on Concept Detection based on Automatic Speech Recognition

Tradeoff of Complexity and Accuracy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99

Number of Expanded Keywords

Av

era

ge

Pre

cis

ion

Face

Madeleine_Albright

Outdoors

Nature_Vegetation

People

Physical_Violence

Road

Sport_Event

Weather_News

Page 27: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

27

Business Unit or Product Name

© 2003 IBM Corporation53

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Conclusion

� Novel Challenge:

� Current Status:

– Classifying 40 video streams of 100 concepts in real-time by using a Pentium III PC.

– SVM classification can be speed up in a factor of 10s to 1000s with average precision of 90% or above of the original.

� Future Works:

– Learning and Classifying in noisy environment

– Frameworks for other types of media (e.g., speech, email, web access, ….) and other types of machine learning/pattern recognition framework (e.g., dynamic Bayesian Net..)

Business Unit or Product Name

© 2003 IBM Corporation54

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Revisited Video Semantic Filtering Framework

Face

Outdoors

Indoors

PE1: 9.2.63.66: 1220

PE2: 9.2.63.67

PE3: 9.2.63.68

Female

Male

Airplane

Chair

Clock

PE4: 9.2.63.66:1235

PE5: 9.2.63.66: 1240

PE6: 9.2.63.66

PE7: 9.2.63.66

PE100: 9.2.63.66

(Server) Concept Detection

Processing Elements

Camera

Input

60 Mbps

CDS

Features

(Distributed Smart Sensors) Block diagram of the

smart sensors

Meta-

data

600

bps

Control

Modules

Resource

Constraints

User

Interests

Display and

Information

Aggregation

Modules

1.5

Mbps

MPEG-

1/2GOP

Extraction

Event

Extraction 2.8

Kbps

Feature

Extraction320

Kbps

22.4

Kbps

Encod-

ing

Sensor 1

Sensor 2

Sensor NSensor 3

Page 28: Large-Scale Video Semantic Filteringsfchang/course/spr/... · 1 Business Unit or Product Name © 2003 IBM Corporation IBM Research © 2004 IBM Corporation Large-Scale Video Semantic

28

Business Unit or Product Name

© 2003 IBM Corporation55

IBM Research

Multimedia Semantic Routing © 2004 IBM Corporation

Projects Available (Spring 2006)

� Developing Distributed Smart Video Cameras

(Phase II):

� Large-Scale Data Mining, Community Modeling,

Human Behavior Modeling

� Also Course: EE6886 Multimedia Security Systems

56

IBM Research

Ching-Yung Lin, IBM T. J. Watson Research Center © 2004 IBM Corporation

Acknowledgements

� Navneet Panda (UCSB), Xiaodan Song (UW), Victor Sutan (CU), Jason Cardillo (CU)

� Olivier Verscheure (IBM), Lisa Amini (IBM)

Questions?

Contact: [email protected]://www.research.ibm.com/people/c/cylin