24
WEB + TV = Multimedia Information Fusion Victor Kulesh and Valery Petrushin

WEB + TV = Multimedia Information Fusion Victor Kulesh and Valery Petrushin

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

WEB + TV = Multimedia Information Fusion

Victor Kulesh and Valery Petrushin

Outline

• Information Retrieval on the Web and Event Tracking• Multimedia data analysis• Story matching• Story synchronization across multimedia streams• Demos:

– Video story extraction

– Event tracking in TV broadcasts

• Applications of the technology

Information Retrieval on the Web

• The goal of IR is to find documents that are relevant to the user’s query

• IR is a well-studied field• A branch of IR called Information Filtering (IF)

deals with static type of queries called ‘user profile’ and dynamic information

• Event Tracking is a sub-problem of Information Filtering for tracking specific events rather than general topics

Text document representation

• Newswire articles are converted from HTML to ASCII text• Most common words are removed (stop-list operation)• Stemming is applied• Term-frequency-inverse-document-frequency (TFIDF)

vectors are computed• When more than one document is available as seeds for an

event then their vectors are averaged• Event tracking is done by computed similarity between

documents

Detection Tradeoff Error (DET) Curves in 5 minutes

Distribution of Detection ScoresDET curves: ----- for A ----- for B

A

B

Web-based Event Tracking Performance Results

• The first two figures show the performance of the tracking system for Cosine (red), Dice (green) and Jaccard(blue) measures when 1 and 4 seed documents are used in the profile for each event

Similarity Measures

n

i ik

n

i ij

n

i ikijkj

ww

wwdd

1

2

1

2

1

*

*),cos(

:Cosine

kj

kj

kjdd

dddddice

*2),(

:Dice

kjkj

kjkj

dddd

ddddjaccardext

*

*),(_

:Jaccard Extended

22

Web-based Event Tracking Performance Results

• Performance of Cosine measure with 1, 2, 3, and 4 seed documents

DET curves: ----- 1 seed document ----- 2 seed documents ----- 3 seed documents ----- 4 seed documents

Multimedia data analysis

• Three types of data:– Closed captioning (CC)– Audio– Video

• Commercial detection and recognition• All three carry the same message, however, in

different formats and poorly synchronized• CC is easy to work with but AV is what we

really want

Commercial Detection/Recognition

• Commercial Detection– Video: frequent change of shots

– Audio: music + speech

• Commercial Recognition– Video: Hidden Markov Model (HMM)

– Audio: Gaussian Mixture Model (GMM)

• Composite score: HMM + GMM

Commercial Recognition Results

• These are preliminary results and we are working on improving the algorithms for commercials recognition

DET curves for commercial recognitionUsing audio, video and composite approaches

  Miss Probability (%)

False Alarm(%)

Video only 14.29 6.73

Audio only 3.57 1.24

Video and Audio

3.57 0.16

Optimal operating points for ad recognition

Fusing multimedia data streams

Clo

sed

Cap

tio

nin

g

A

SR

Au

dio

Vid

eo

Processing CC streams

• Closed captioning data has two good properties:– It is already divided into stories and speaker turns by

inclusion of special markers (>>> and >>)

– It is more accurate than ASR data

• The bad part is the fact that CC is very loosely synchronized with AV streams

• CC is used as a link between text documents from the internet for finding relevant stories and for extracting separate stories from AV streams

Story matching• Given a list of text documents and a list of stories from CC stream classical IR

techniques are applied to find pairs of documents that describe the same events

First -- lawyers for timothy mcveigh spoke with the condemned oklahoma city bomber today. They say mcveigh did not get a fair trial and he will seek a stay of execution. Mcveigh's lawyers accused the federal government of committing "a fraud upon the court." They say the supreme court has an old doctrine that when such fraud occurs any judgment that court has made is void. Attorney robert nigh says it's mcveigh's belief a stay of the june 11th scheduled execution is necessary in order to promote the integrity of the criminal justice system. Mr. Mcveigh has given us permission to seek a stay of execution on his behalf. This decision was not easy for mr. Mcveigh. He has prepared to die. He had previously indicated he preferred death to life in prison without the possibility of release. The attorneys are basing their claim on the basis of documents that were inadvertently withheld from them by the fbi. It is their contention the fbi is knowingly withholding further documents.

McVeigh to Decide About Appeal

DENVER (AP) - Oklahoma City bomber Timothy McVeigh will meet with lawyers this week and is likely to file a request to block his execution, his attorneys said Wednesday.The request would be based on about 4,000 documents the FBI turned over to McVeigh's attorneys earlier this month, just days before he had been scheduled to be executed for carrying out the 1995 blast that killed 168 people and injured hundreds more.At his office in Tulsa, Okla., Rob Nigh said he plans to meet with McVeigh on Thursday at the federal penitentiary in Terre Haute, Ind., and will seek his approval on a request to block the execution.Nigh declined to comment on the contents of the group of documents he would show McVeigh.But he added: ``You can certainly anticipate it will request a stay.''Nathan Chambers, McVeigh's Denver-based attorney, said McVeigh believes the information is worthy of judicial review.``If he gives us permission to file something, we'll probably file something tomorrow,'' he said Wednesday. ``We're in the process of drafting the paperwork.''McVeigh told a federal judge in December that he would not appeal his death sentence.In early May, the FBI gave McVeigh's attorneys thousands of documents that it said had accidentally not been turned over to the defense. Attorney General John Ashcroft then postponed McVeigh's execution from May 16 to June 11.In a statement Wednesday, Ashcroft reiterated that the government would fight any further delay, saying that failure to carry out the sentence ``would deny justice for the victims of this crime and for the American people.''Meanwhile, a former FBI agent who worked on the case reportedly told a Republican member of the Senate Judiciary Committee last year that the FBI ignored evidence that might have helped the defense.Ricardo Ojeda, a former special agent in Oklahoma City, wrote Sen. Charles Grassley, R-Iowa, in March 2000, complaining of corruption and discrimination in the FBI's field office, according to CBS' ``60 Minutes II.''``I am also aware of instances in other cases, including the Oklahoma City bombing, where exculpatory evidence was ignored and not documented. Including exculpatory information I personally gathered from leads assigned me in the case,'' Ojeda wrote.Nigh said Ojeda's allegations should have an impact on the case.``That information should, at minimum, change the course of this case in the near future,'' Nigh said on ``60 Minutes II.''The FBI said Ojeda's records were turned over to McVeigh's lawyers, but that none of his investigation was used at trial. Ojeda said he was fired from the FBI after testifying in a discrimination hearing against FBI management.``Because he is no longer on the rolls, former Agent Ojeda would not know that his concerns are unfounded,'' FBI Deputy Director Tom Pickard said in a statement. ``Thousands of agents worked on this case but, in the end, most did not have their work presented at trial.''Ojeda could not be reached by The Associated Press; there was no answer at his home in Oklahoma, and a message left at his wife's business was not returned.On the Net:Justice Department: http://www.usdoj.govBombing memorial: http://www.oklahomacitynationalmemorial.orgN

ewsw

ire

arti

cle

CC

Audio stream analysis

• Silence detection; non-silent parts of at least 4 seconds are further processed

• Using a 1 second window classification into speech/music is performed for non-silent parts of the audio stream

• The segments containing loud music are either commercials or previews

• Speech segments are transcribed using ViaVoice ASR engine

Audio processing examples

Speech segments and their ASR transcripts*Lawyers for convicted Oklahoma City bomber Timothy McVeigh saidin a motion for a stay of execution

Of a final decision will be made until they meetwith him tomorrow if and his attorneys said they didn't have enough time to review four thousand pages of withheld FBI documents

The Justice Department's says the documents don't change any thank attorney-general John Ashcroft is not support granting anotherthe lack

*Text in red is transcribed incorrectly

CC and Audio alignment

• For each CC we know the story boundaries (in words)• For each segment of transcription we know its location

in the audio stream (1 second precision)• By aligning CC and transcripts using local alignment

algorithm we find the story boundaries in the audio stream

• The beginning of the left-most audio segment and the end of the right-most audio segment are used for locating the video boundaries

CC and ASR text alignment example

Local Alignment results for: ASR story3.txtLocal Alignment Number: 1 Similarity Score: 3444Match Percentage: 75%Number of Matches: 380Number of Mismatches: 19Total Length of Gaps: 107Begins at: 878 1 Ends at: 1295 487

878 Lawyers for convicted Oklahoma City bomber Timothy McVeigh said in a motion for a stay of execution |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ----||||||||||||||||||||||||||||||||| 1 Lawyers for convicted oklahoma city bomber timothy mcveigh say a motion for a stay of execution

978 O f a final decision w |-|-----|-----||-------------------------------------------------------------------|||||||||||||||| 97 could be filed as early as tomorrow. Right now he 's scheduled to die june 11th. No final decision w

1000 ill be made until they meet with him tomorrow if and his attorneys said they didn't have enoug |||||||||||||||||||||||||||||||||||||||||||||-| -| ||||||||||||-----||||||||||||||||||||||||||| 197 ill be made until they meet with him tomorrow. Mcveig h's attorneys have said they didn't have enoug

1094 h time to review four thousand pages of withheld FBI documents The Justice Department's says the do ||||||||||||||||| |---------|||||||||||||||||||||||||||||||-|||||||||||||||||||||||--|||||||||||| 296 h time to review 4000 pages of withheld fbi documents. The justice department says the do

1193 cuments don't change any thank attorney-general John Ashcroft i s not support granting another the ||||||||||||||||||||||||-|| | -||||||||| |||||||||||||||||||||| --||||||||||||||||||||||||||||||| -| 385 cuments don't change any thing. Attorney general john ashcroft does not support granting another d e

1290 lack -|| | 483 lay.

Extracting AV clips

• Audio is synchronized with video fairly well • Observation: video slightly precedes audio• Shot cuts in video are detected using intensity

histogram difference of neighboring frames• The true boundaries of the AV clip are

computed by searching for video shot cuts that are closest to the left and right audio story boundaries in a local neighborhood of 10 frame radius

Preliminary Accuracy Results

Date

(clip # )

True video cuts

Start End

Predicted video cuts

Start End

May 30

(clip 1) 1364 1787 1365 1787

May 31

(clip 2) 500 1545 509 1545

Jun 01

(clip 3) 1647 2226 1649 2388

Miss error rate: 0.58%False alarm error rate: 7.41%

Perseus: Personal Multimedia News Portal

• Collects news on the Web– User’s profile

– Event tracking

• Extracts relevant video clips• Creates personalized video summaries

(for the grandma)

Agent-based view of the system architecture

Perseus demo

How about commercialization?

• Web service companies– Providing video clips on demand

– Personalized video news summaries

• Electronics manufacturers– Embedding Web and AV analysis into

a product (WebTV + TiVo box)

• Broadcast companies– AV indexing

Future work

• Algorithms• AV story extraction based on transcripts ONLY

• Commercials recognition improvement (CHMM)

• Dynamic commercials learning

• Speaker identification and learning

• Applications• Perseus implementation a la TiVo box

• Home video analysis

• Multimedia DB indexing and retrieval