34
Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

  • View
    223

  • Download
    7

Embed Size (px)

Citation preview

Page 1: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Video Search Engines and

Content-Based Retrieval

Steven C.H. Hoi

CUHK, CSE

18-Sept, 2006

Page 2: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Outline

Video Search Engines

Content-Based Video Retrieval

Page 3: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Video Search Engines

A survey of state-of-the-arts

Page 4: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Introduction

Who are doing video search engines?

Top text search engines5.6 billion searches

07/2006

Page 5: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Introduction Google

Page 6: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Introduction Yahoo

Page 7: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Introduction MSN/Live Search

Page 8: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Introduction YouTube

Page 9: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Business Models Web Advertising

Site Volume, or keyword customized Video Ads

Disable controls (MSN) Subscription

MLB, Real Download to own

iTunes, Movie Rental

Limited time, number of plays Other

Desktop Media Search Media player (jukebox) Media Monitoring Media Asset Management

Page 10: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Types of video Sites Content Originators

Major Broadcasters Affiliates, Local News Major League Baseball

Syndication, Aggregation, “Internet Broadcasters” Rental, purchase, advertising, subscription MSN, Google, iTunes ROO Media, FeedRoom

Movie and Video Download Share portals

Consumer content, blogs YouTube, Putfile, Vsocial, Google, Akimbo

Traditional Search Engines (Crawl) / “RSS” Yahoo, Blinkx

Other Public (Internet Archive) Media Monitoring, asset management systems

Page 11: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Video Search Challenges

Page 12: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Current Video Search Engines

Metadata File type and context Media file attributes

Size, length Structured global metadata

RSS content description

Content Content Indexing

Search within a video Full text of dialog Image or video content

Automated Content Indexing

Page 13: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Current Video Search Engines

Content Search Engines

Keyword search with transcripts from speech recognition

Page 14: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Content-Based Video Search Engine

Architecture

Page 15: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Content-Based Video Search Engine

Video Processing

Page 16: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Content-Based Video Search Engine

Research ChallengesSpeech RecognitionShot Boundary DetectionVideo Story Segmentation Concept DetectionMulti-modal Fusion for Ranking

Text/ASR, Audio/Speech, Visual, etc.

Page 17: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Content-Based Retrieval

Our Research ProblemLearning to rank video shots for automatic

content-based search tasks !

ChallengesMulti-Modal Information FusionSmall Sample Learning (a few pos. & no neg.)Learning on large-scale datasets

Page 18: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Multi-modal and Multi-scale Ranking Framework

Main IdeasRepresenting video structures by graphsUsing semi-supervised learning to address

small labeled sample learning problemFusing Multi-modal information by Harmonic

learning over graphsMulti-scale ranking for achieving efficient

performance on large-scale datasets

Page 19: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Multi-modal and Multi-scale Ranking Framework

Graph-based Modeling

StoryText

Shot

Page 20: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Multi-modal and Multi-scale Ranking Framework

Semi-Supervised Learning on GraphTo find an optimal real-valued function

g: VR on the graph GTo minimize a quadratic energy function:

Using Gaussian field and Harmonic property of Spectral Graph Theory (J. Zhu’s ICML’03), a harmonic function g can be found:

Page 21: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Multi-modal and Multi-scale Ranking Framework

Semi-Supervised Learning on GraphLet

The solution of the harmonic function g can be expressed in matrix operations:

Page 22: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Multi-modal and Multi-scale Ranking Framework

Multi-Modal Fusion over GraphTo combine text information into SSL on visual

modality, we consider the text inputs as the attached nodes on the visual graph:

Visual - g

Text - f

Page 23: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Multi-modal and Multi-scale Ranking Framework

ChallengesNumber of examples in database: N is large

For examples:TRECVID 2005: Rep. Key-Frames N = 45,765TRECVID 2006: Rep. Key-Frames N = 79,487

How to do Semi-Supervised Learning?!

Page 24: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Multi-modal and Multi-scale Ranking Framework

Multi-Scale RankingLearning ranking through multi-scale rerankingEach stage is associated with different

computational costsIn our solution, four ranking stages include:

Ranking by Text Retrieval using Language ModelsRe-ranking by NN fusing Text and VisualRe-ranking by SVM fusing Text and VisualRe-ranking by multi-modal Semi-supervised Learning

Page 25: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Top M related Stories

Text

Top N2 related Shots

Text + Visual NN

SVM/KLR

Top N3 related Shots

Top N4 related Shots

SSR

Video Stories

Video Shots

Top N1 related Shots

Text Processing

VideoProcessing

User’s Queryreturn top K shots

Multi-modal Fusion

Mu

lti-sc

ale

Ra

nk

ing

Image Processing

Raw

Video C

lips / Stream

s

Semi-Supervised Ranking

Supervised Ranking

Page 26: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Benchmark Evaluations

DatasetTRECVID 2005Test: 140 video clips, 45,765 rep. key frames24 queriesA query example:

<videoTopic num="0152">

<textDescription text="Find shots of Hu Jintao, president of the People's Republic of China" />  </videoTopic>

Page 27: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Benchmark Evaluations Text-only Retrieval

No Pseudo-Relevance Feedback (No-PRF)

With Pseudo-Relevance Feedback (PRF)

Evaluation of Language Models

0

0.02

0.04

0.06

0.08

0.1

MA

P No-PRF

PRF Language Models TF-IDF Okapi KL-JM KL-DIR KL-ABS

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Text-only Results

MA

P

IBM

Columbia

TRECVID-Max

CUHK

Page 28: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Benchmark Evaluations Visual Features

Color Grid Color Moment 3*3 grid, 81-dimensions

Edge Edge Direction Histogram 36 bin+1, 37-dimensions

Texture Gabor Moments 5*8=40, 3 moments,120

dimensions

238 dimensions in total

Normalized Comparison

0

0.1

0.2

0.3

0.4

0.5

0.6

0 20 40 60 80 100 120

GCM

EDH

Gabor

GCM+Gabor+EDH

COREL Benchmark Photos

Page 29: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Benchmark Evaluations

Multi-modal Retrieval (Text + Visual)Text-only retrievalText + NN (Text + Visual)Text + SVM (Text + Visual)MMMS (Text + Visual)

Page 30: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Benchmark Evaluations

MAP Num_Ret Improvement

Text 0.0903 1669 0%

Text+NN 0.1034 1705 +14.51%

Text+SVM 0.1083 1764 +19.93%

MMMS 0.1157 1764 +28.13%

Average Performance on TRECVID 2005 Dataset

Evaluation Results

Page 31: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Benchmark Evaluations

0.095

0.1

0.105

0.11

0.115

0.12M

AP

IBM (T+V+M)

CUHK-MMMS

Columbia (V+T+M)

IBM (V+T)

Average performance of 24 queries

Comparison with other approaches

Page 32: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Related Work

IBM Solution SVM + NN + Multiple Instance Learning

Columbia solutionInformation-Theoretical Clustering Approach

CMU SolutionQuery-Class Dependent Weighting Ranking

Page 33: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Conclusion

A tutorial of video search engines Research contributions

A Unified framework of Multi-Modal and Multi-Scale Ranking for video retrieval

Graph-based Modeling of video structuresSemi-Supervised Learning for Multimodal

RankingMaking SSL practical for large-scale problemsPromising empirical results…

Page 34: Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Future Work

Research is in progress, tough ahead…

Any suggestions or comments?