38
Multimedia Retrieval

Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Embed Size (px)

Citation preview

Page 1: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Multimedia Retrieval

Page 2: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Outline

• Audio Retrieval • Spoken information• Music

• Document Image Analysis and Retrieval• Video Retrieval

Page 3: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

A Taxonomy of Audio

Sound

Music Other?Speech

Classical

Country

Disco Hip Hop

Jazz

RockSportsAnnouncer

Female

Male

Orchestra

StringQuartet

Choir

Piano

?

Page 4: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Spoken Document Retrieval

Page 5: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Spoken Document Retrieval

Page 6: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Acoustic Modeling

Describes the sounds thatmake up speech

Lexicon

Describes which sequences of speech

sounds make upvalid words

Language Model

Describes the likelihoodof various sequences of

words being spoken

Speech Recognition

Speech Recognition Knowledge Sources

Page 7: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Speech Recognition in Brief

Pronunciation Lexicon

Signal Processing

PhoneticProbabilityEstimator(Acoustic

Model)

Decoder(Language

Model)WordsSpeech

Grammar

Page 8: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Hints For Better Recognition

• Topical information• News of the day• Image information ?

• Goal: improve the estimation p(word|acoustic_sig)• Main idea:

p(word|acoustic_sign) p(word|acoustic_signal, X)

What could be X?

Page 9: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Hints For Better Recognition

• Topical information• News of the day• Image information

• Lip reading• Video Optical Character

Recognition (VOCR)

• Goal: improve the estimation p(word|acoustic_sig)• Main idea:

p(word|acoustic_sign) p(word|acoustic_signal, X)

What could be X?

Page 10: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Speech Recognition AccuracyWord Error Rate

BenchmarkLab

TV Studio

DialogNews

Documentary

Commercials

0

10

20

30

40

50

60

70

80

90

100

Page 11: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Information Retrieval Precision vs. Speech Accuracy

Word Error Rate

% of Text IR

100

90

80

70

60

50

40

30

Rel

ativ

e P

reci

sio

n

0 10 20 30 40 50 60 70 80

Indexing and Search of Multimodal Information, Hauptmann, A., Wactlar, H. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP-97), Munich, Germany, April 1997.

A rather small degradation in retrieval when word error rate is small than 30%

Page 12: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Spoken Document Retrieval

• Segmentation issue• Continuous speech data without story boundaries

• Typical segmentation approaches

Overlapping windows (30 sec for each segment)

Automatic detection of speaker changes

Page 13: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Spoken Document Retrieval:Document Expansion

• Motivation: documents are erroneous• Goal: apply expansion techniques to reduce the

impacts of recognition errors in spoken documents• Similar to query expansion

Page 14: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Spoken Document Retrieval:Document Expansion

• Motivation: documents are erroneous• Goal: apply expansion techniques to reduce the

impacts of recognition errors in spoken documents• Similar to query expansion

Clean Doc Collection (web docs)

Speech Recognized Transcript

doc1

doc2

doc3

doc4

Find common

words in top ranked docs

Page 15: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Spoken Document Retrieval:Document Expansion

• Motivation: documents are erroneous• Goal: apply expansion techniques to reduce the

impacts of recognition errors in spoken documents• Similar to query expansion

• Treat each speech document as a query

• Find clean documents that are relevant to speech documents

• Expand each speech document with the common words in the top ranked clean documents.

Page 16: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Document Expansion (Sighal & Piereira, 1999)

Page 17: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

A Taxonomy of Audio

Sound

Music Other?Speech

Classical

Country

Disco Hip Hop

Jazz

RockSportsAnnouncer

Female

Male

Orchestra

StringQuartet

Choir

Piano

?

Page 18: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Music Information Retrieval

Page 19: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Music Retrieval

• A textual retrieval approach• Using meta data: titles, artists, genres, …

• Content-based music retrieval• Query by audio• Query by score document/segment

Page 20: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Content-based Music Retrieval

Short-termAutocorrelation

NoteSegmentation

Mid-level Representation

Similarity Comparison

Query results(Ranked song list)

Songs Database

Midi message Extraction

Microphone Signal input

Sampling

11KHz

CenterClipping

Off-line processing

On-line processing

67 64 65 62 60 (Midi representation)

-3 1 -3 -2

Page 21: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Content-based Music Retrieval

: 1 1 2 0 -2 0 1 2 0 : -3 1 1 2

• N-gram representation

1 1 2 C1 1 1

1 2 0 C2 2 0

2 0 –2 C3 1 0

0 –2 0 C4 1 0

-3 1 1 C5 0 1

• A vector representation for each music document• A typical information retrieval problem

Page 22: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Document Image Analysis and Retrieval

Page 23: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Document Image Analysis

• Recognize text (OCR)• convert page images to Unicode

• machine-printed, handwritten

• Analyze page layout geometry• a 2-D problem (unlike speech, text)

• good ‘language-free’ algorithms

• Capture logical structure• output marked-up text (XML, etc)

• exploit non-textual clues

Page 24: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Video/Image OCR Block Diagram

Text Area

Detection

Text Area

Preprocessing

Commercial

OCR

Video orImage

UTF8 Text

Page 25: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Text Detection

Page 26: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

• Low resolution (as low as 10 pixel height/character)

• limited by NTSC (352x248) /PAL/SECAM TV standard

• Complex background

• Character Hue and Brightness similar to background

Video OCR

Page 27: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

VOCR Preprocessing Problems

Page 28: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Video Frames(1/2 s intervals)

Filtered Frames AND-ed Frames

Page 29: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval
Page 30: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

OCR Document Retrieval

• Task: find OCR recognized document relevant to a information need

• Challenge: erroneous documents

needs to handle with word errors

Page 31: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

OCR Document Retrieval

• Correction based approaches• Find potential word errors and replace each with the

most likely correct one

• Partial matching approaches• Word a set of n-grams

• Word matches n-gram matches

Page 32: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Video Retrieval

Page 33: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Video Retrieval - Application of Diverse Technologies

• Speech understanding for automatically derived transcripts

• Image understanding for video “paragraphing”; face, text and other object recognition

• Natural language for query expansion, topic detection and content summarization

• Human computer interaction for video display, navigation and reuse

• Integration overcomes limitation of each

Page 34: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Introduction to TREC Video Retrieval Track

• NIST TREC Video Track web site: http://www-nlpir.nist.gov/projects/trecvid/

• Video Retrieval Track started in 2001• Investigation of content-based retrieval from digital video

• Focus on the shot as the unit of information retrieval rather than the scene or story/segment/clip

Page 35: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

The TRECVID Collections

2001 - 11 hours, 74 queries, 8000 shots

2002 - 40 hours, 25 queries, 14000 shotsVideo from the Internet Archive between the ‘50’s and ’70’s

Advertising, educational, industrial and amateur films

Common shot boundaries

2003 – 56 hours, 25 queries, 32000 shots1998 Broadcast News (CNN, ABC, CSpan)

+ Common Speech Recognition

+ Common Annotations

2004 – 61 hours, 24 queries, 33000 shotsMore 1998 Broadcast News

Page 36: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

Sample Query and Target

Query: Find pictures of Harry Hertz, Director

of the National Quality Program, NIST

Speech: We’re looking for people that have a broad range of expertise that have business knowledge that have knowledge on quality management on quality improvement and in particular …

OCR:H,arry Hertz a Director aro 7 wa-,i,,ty Program,Harry Hertz a Director

Page 37: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

System Architecture (Trec Video Track 2001)

• Combine video, audio and text retrieval scores

Query

Text Image Audio

Text Score Image Score Audio Score

RetrievalAgents

Final Score

Page 38: Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

ARR Recall

ASR Transcripts 1.84% 13.2%

VOCR 5.93% 7.52%

Image Retrieval 14.99% 24.45%

Combine 18.9% 28.25%

Results for TREC01