114
Focus on spoken content in multimedia retrieval 1/48 Focus on spoken content in multimedia retrieval Maria Eskevich Centre for Next Generation Localisation School of Computing, Dublin City University, Dublin, Ireland April, 16, 2013

Focus on spoken content in multimedia retrieval

Embed Size (px)

DESCRIPTION

Invited talk at the University of Texas at El Paso

Citation preview

Page 1: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 1/48

Focus on spoken contentin multimedia retrieval

Maria Eskevich

Centre for Next Generation LocalisationSchool of Computing, Dublin City University,

Dublin, Ireland

April, 16, 2013

Page 2: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 2/48

Outline

I Spoken Content Retrieval: historical perspective

I MediaEval Benchmark:

I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinking tasks

I Dataset collection creation issues for multimedia retrieval:crowdsourcing aspect

I Interesting observations on results:I Segmentation methodsI Evaluation metricsI Numbers

Page 3: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 3/48

Outline

I Spoken Content Retrieval: historical perspective

I MediaEval Benchmark:

I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinking tasks

I Dataset collection creation issues for multimedia retrieval:crowdsourcing aspect

I Interesting observations on results: segmentation aspect

Page 4: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 4/48

Towards Effective Retrieval

of

Spontaneous Conversational Spoken Content

Towards Effective Retrieval

Spontaneous Conversational Spoken Content

Information Retrieval (IR)

Standard IR SystemSpoken Content Retrieval (SCR)

Queries

IR SystemSCR SystemIndexed

DocumentsIndexed

Transcripts

IR ModelInformation

Request

ResultsAudioFiles

RetrievalRetrieval

Speech Processing (Automatic Speech Recognition (ASR))

Audio Data Collection Transcriptsof Audio DataASR

System

Page 5: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 4/48

Towards Effective Retrieval

of

Spontaneous Conversational Spoken Content

Towards Effective Retrieval

Spontaneous Conversational Spoken Content

Information Retrieval (IR)

Standard IR System

Spoken Content Retrieval (SCR)

Queries

IR SystemSCR SystemIndexed

DocumentsIndexed

Transcripts

IR ModelInformation

Request

ResultsAudioFiles

RetrievalRetrieval

Speech Processing (Automatic Speech Recognition (ASR))

Audio Data Collection Transcriptsof Audio DataASR

System

Page 6: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 4/48

Towards Effective Retrieval

of

Spontaneous Conversational Spoken Content

Towards Effective Retrieval

Spontaneous Conversational Spoken Content

Information Retrieval (IR)

Standard IR System

Spoken Content Retrieval (SCR)

Queries

IR System

SCR System

IndexedDocuments

IndexedTranscripts

IR ModelInformation

Request

Results

AudioFiles

Retrieval

Retrieval

Speech Processing (Automatic Speech Recognition (ASR))

Audio Data Collection Transcriptsof Audio DataASR

System

Page 7: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 4/48

Towards Effective Retrieval

of

Spontaneous Conversational Spoken Content

Towards Effective Retrieval

Spontaneous Conversational Spoken Content

Information Retrieval (IR)

Standard IR System

Spoken Content Retrieval (SCR)

Queries

IR System

SCR System

IndexedDocuments

IndexedTranscripts

IR ModelInformation

Request

Results

AudioFiles

Retrieval

Retrieval

Speech Processing (Automatic Speech Recognition (ASR))

Audio Data Collection Transcriptsof Audio DataASR

System

Page 8: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 4/48

Towards Effective Retrieval

of

Spontaneous Conversational Spoken Content

Towards Effective Retrieval

Spontaneous Conversational Spoken Content

Information Retrieval (IR)Standard IR System

Spoken Content Retrieval (SCR)

Queries

IR System

SCR SystemIndexed

DocumentsIndexed

Transcripts

IR ModelInformation

Request

Results

AudioFiles

Retrieval

Retrieval

Speech Processing (Automatic Speech Recognition (ASR))

Audio Data Collection Transcriptsof Audio DataASR

System

Page 9: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 5/48

Spoken Content Retrieval (SCR)

Data

Spoken Content

ASR Transcript

Research Question 1:How does segmentationof spoken data affect the

retrieval performance?What are the character-istics of a segmentationmethod that maximizes

SCR effectiveness?

Research Question 2:What is the relationship be-

tween ASR errors in the tran-script and retrieval behaiour?

Research Question 3:How can regions of poor

speech recognition be identi-fied and processed in orderto improve overall speech

retrieval performance (detec-tion, special treatment in thespeech retrieval process)?

Research Question 4:Can we implement a mean-

ingful approach to SCRof conversation content

incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?

RQ 2 RQ 3 RQ 4

Experiments

ASR System

Indexed Transcript

Ranked Result List

1

2...

RQ 1Indexing

EvaluationMetrics

Retrieval

Page 10: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 5/48

Spoken Content Retrieval (SCR)

Data

Spoken Content

ASR Transcript

Research Question 1:How does segmentationof spoken data affect the

retrieval performance?What are the character-istics of a segmentationmethod that maximizes

SCR effectiveness?

Research Question 2:What is the relationship be-

tween ASR errors in the tran-script and retrieval behaiour?

Research Question 3:How can regions of poor

speech recognition be identi-fied and processed in orderto improve overall speech

retrieval performance (detec-tion, special treatment in thespeech retrieval process)?

Research Question 4:Can we implement a mean-

ingful approach to SCRof conversation content

incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?

RQ 2 RQ 3 RQ 4

Experiments

ASR System

Indexed Transcript

Ranked Result List

1

2...

RQ 1Indexing

EvaluationMetrics

Retrieval

Page 11: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 5/48

Spoken Content Retrieval (SCR)

Data

Spoken Content

ASR Transcript

Research Question 1:How does segmentationof spoken data affect the

retrieval performance?What are the character-istics of a segmentationmethod that maximizes

SCR effectiveness?

Research Question 2:What is the relationship be-

tween ASR errors in the tran-script and retrieval behaiour?

Research Question 3:How can regions of poor

speech recognition be identi-fied and processed in orderto improve overall speech

retrieval performance (detec-tion, special treatment in thespeech retrieval process)?

Research Question 4:Can we implement a mean-

ingful approach to SCRof conversation content

incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?

RQ 2 RQ 3 RQ 4

Experiments

ASR System

Indexed Transcript

Ranked Result List

1

2...

RQ 1

Indexing

EvaluationMetrics

Retrieval

Page 12: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 5/48

Spoken Content Retrieval (SCR)

Data

Spoken Content

ASR Transcript

Research Question 1:How does segmentationof spoken data affect the

retrieval performance?What are the character-istics of a segmentationmethod that maximizes

SCR effectiveness?

Research Question 2:What is the relationship be-

tween ASR errors in the tran-script and retrieval behaiour?

Research Question 3:How can regions of poor

speech recognition be identi-fied and processed in orderto improve overall speech

retrieval performance (detec-tion, special treatment in thespeech retrieval process)?

Research Question 4:Can we implement a mean-

ingful approach to SCRof conversation content

incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?

RQ 2 RQ 3 RQ 4

Experiments

ASR System

Indexed Transcript

Ranked Result List

1

2...

RQ 1

Indexing

EvaluationMetrics

Retrieval

Page 13: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 5/48

Spoken Content Retrieval (SCR)

DataSpoken Content

ASR Transcript

Research Question 1:How does segmentationof spoken data affect the

retrieval performance?What are the character-istics of a segmentationmethod that maximizes

SCR effectiveness?

Research Question 2:What is the relationship be-

tween ASR errors in the tran-script and retrieval behaiour?

Research Question 3:How can regions of poor

speech recognition be identi-fied and processed in orderto improve overall speech

retrieval performance (detec-tion, special treatment in thespeech retrieval process)?

Research Question 4:Can we implement a mean-

ingful approach to SCRof conversation content

incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?

RQ 2 RQ 3 RQ 4

Experiments

ASR System

Indexed Transcript

Ranked Result List

1

2...

RQ 1

Indexing

EvaluationMetrics

Retrieval

Page 14: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 5/48

Spoken Content Retrieval (SCR)

DataSpoken Content

ASR Transcript

Research Question 1:How does segmentationof spoken data affect the

retrieval performance?What are the character-istics of a segmentationmethod that maximizes

SCR effectiveness?

Research Question 2:What is the relationship be-

tween ASR errors in the tran-script and retrieval behaiour?

Research Question 3:How can regions of poor

speech recognition be identi-fied and processed in orderto improve overall speech

retrieval performance (detec-tion, special treatment in thespeech retrieval process)?

Research Question 4:Can we implement a mean-

ingful approach to SCRof conversation content

incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?

RQ 2 RQ 3 RQ 4

Experiments

ASR System

Indexed Transcript

Ranked Result List

1

2...

RQ 1

Indexing

EvaluationMetrics

Retrieval

Page 15: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 5/48

Spoken Content Retrieval (SCR)

DataSpoken Content

ASR Transcript

Research Question 1:How does segmentationof spoken data affect the

retrieval performance?What are the character-istics of a segmentationmethod that maximizes

SCR effectiveness?

Research Question 2:What is the relationship be-

tween ASR errors in the tran-script and retrieval behaiour?

Research Question 3:How can regions of poor

speech recognition be identi-fied and processed in orderto improve overall speech

retrieval performance (detec-tion, special treatment in thespeech retrieval process)?

Research Question 4:Can we implement a mean-

ingful approach to SCRof conversation content

incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?

RQ 2 RQ 3 RQ 4

Experiments

ASR System

Indexed Transcript

Ranked Result List

1

2...

RQ 1

Indexing

EvaluationMetrics

Retrieval

Page 16: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 6/48

Outline: Spoken Content

DataSpoken Content

ASR Transcript

Research Question 1:How does segmentationof spoken data affect the

retrieval performance?What are the character-istics of a segmentationmethod that maximizes

SCR effectiveness?

Research Question 2:What is the relationship be-

tween ASR errors in the tran-script and retrieval behaiour?

Research Question 3:How can regions of poor

speech recognition be identi-fied and processed in orderto improve overall speech

retrieval performance (detec-tion, special treatment in thespeech retrieval process)?

Research Question 4:Can we implement a mean-

ingful approach to SCRof conversation content

incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?

RQ 2 RQ 3 RQ 4

Experiments

ASR System

Indexed Transcript

Ranked Result List

1

2...

RQ 1

Indexing

EvaluationMetrics

Retrieval

Page 17: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken Content

Prepared Speech

InformalConversational

Speech

Broadcast NewsBroadcast News

LecturesLectures

MeetingsMeetings

Informal ContentInformal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLectures

I SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 18: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast NewsBroadcast News

LecturesLectures

MeetingsMeetings

Informal ContentInformal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 19: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast News

Broadcast News

LecturesLectures

MeetingsMeetings

Informal ContentInformal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 20: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast News

Broadcast News

Lectures

Lectures

MeetingsMeetings

Informal ContentInformal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 21: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast News

Broadcast News

Lectures

Lectures

Meetings

Meetings

Informal ContentInformal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 22: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast News

Broadcast News

Lectures

Lectures

Meetings

Meetings

Informal Content

Informal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 23: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast News

Broadcast News

Lectures

Lectures

Meetings

Meetings

Informal Content

Informal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 24: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast NewsBroadcast News

Lectures

Lectures

Meetings

Meetings

Informal Content

Informal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 25: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast NewsBroadcast News

Lectures

Lectures

Meetings

Meetings

Informal Content

Informal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 26: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast News

Broadcast News

LecturesLectures

Meetings

Meetings

Informal Content

Informal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 27: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast News

Broadcast News

LecturesLectures

Meetings

Meetings

Informal Content

Informal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 28: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast News

Broadcast News

Lectures

Lectures

MeetingsMeetings

Informal Content

Informal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 29: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast News

Broadcast News

Lectures

Lectures

MeetingsMeetings

Informal Content

Informal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 30: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast News

Broadcast News

Lectures

Lectures

Meetings

Meetings

Informal ContentInformal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 31: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast News

Broadcast News

Lectures

Lectures

Meetings

Meetings

Informal ContentInformal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 32: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 7/48

Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech

InformalConversational

Speech

Broadcast News

Broadcast News

LecturesLectures

MeetingsMeetings

Informal ContentInformal Content

Internet TV,Podcast, Interview

Internet TV,Podcast, Interview

Broadcast News:

I DataI High quality recordings:

I Often soundproof studioI Speaker - professional presenter

I Well defined structureI Query is on a certain topic:

User is ready to listen to the whole section

I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries

I Evaluation: interest in rank position

HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript

I ASR good: large amounts of training dataI Data structure

CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech

Lectures:I Data:

I Prepared presentations containingconversational style features:hesitations, mispronunciations

I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low

probability scores in the ASR languagemodel

I Additional information available:presentation slides, textbooks

I Experiments:I Lectures browsing:

e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:

e.g. IR experiments, evaluation metrics thatassess topic segmentation methods

HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in

points

Meetings:

I Data features:I Mixture of semi-formal and prepared spoken

contentI Additional data: slides, minutes

I Possible real life motivated scenario:I Jump-in points where discussion on topic

started or a decision point is reachedI Opinion of a certain person or person with a

certain roleI Search for all relevant (parts of) meetings

where topic was discussed

I Experiments:I topic segmentation, browsingI summarization

HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI

corpus and set up a task scenario ourselves

Informal Content (Interviews, Internet TV):I Data features:

I Varying quality: semi- andnon-professional data creators

I Additional data: professionally oruser-generated metadata

I Experiments:I CLEF CL-SR: MALACH collection

I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of

semi-professional multimedia contentI known-item task, unknown

boundariesI Metrics: focus on ranking and penalize

distance from the jump-in point

HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the

user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information

Review of the challenges/our work for Informal SCR:

I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing

I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking

I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods

I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics

Page 33: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 8/48

Outline

I Spoken Content Retrieval: historical perspective

I MediaEval Benchmark:

I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinkingtasks

I Dataset collection creation issues for multimedia retrieval:crowdsourcing aspect

I Interesting observations on results:I Segmentation methodsI Evaluation metricsI Numbers

Page 34: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 9/48

MediaEvalMultimedia Evaluation benchmarking inititative

I Evaluate new algorithms for multimedia access andretrieval.

I Emphasize the ”multi” in multimedia: speech, audio,visual content, tags, users, context.

I Innovates new tasks and techniques focusing on thehuman and social aspects of multimedia content.

Page 35: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 10/48

MediaEval 2011Rich Speech Retrieval (RSR) Task

I Task Goal:I Information to be found - combination of required

audio and visual content, and speaker’s intention

Page 36: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 10/48

MediaEval 2011Rich Speech Retrieval (RSR) Task

I Task Goal:I Information to be found - combination of required

audio and visual content, and speaker’s intention

Page 37: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 11/48

MediaEval 2011Rich Speech Retrieval (RSR) Task

I Task Goal:I Information to be found - combination of required

audio and visual content, and speaker’s intention

Page 38: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 12/48

MediaEval 2011Rich Speech Retrieval (RSR) Task

I Task Goal:I Information to be found - combination of required

audio and visual content, and speaker’s intention

Transcript 1

6=Meaning 1 6=

Transcript 2

Meaning 2

Conventional retrieval

Page 39: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 12/48

MediaEval 2011Rich Speech Retrieval (RSR) Task

I Task Goal:I Information to be found - combination of required

audio and visual content, and speaker’s intention

Transcript 1

6=

Meaning 1

6=

Transcript 2Meaning 2

Conventional retrieval

Page 40: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 12/48

MediaEval 2011Rich Speech Retrieval (RSR) Task

I Task Goal:I Information to be found - combination of required

audio and visual content, and speaker’s intention

Transcript 1 6=Meaning 1 6=

Transcript 2Meaning 2

Conventional retrieval

Page 41: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 12/48

MediaEval 2011Rich Speech Retrieval (RSR) Task

I Task Goal:I Information to be found - combination of required

audio and visual content, and speaker’s intention

Transcript 1 6=Meaning 1 6=

Transcript 2Meaning 2

Conventional retrieval

Page 42: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 13/48

MediaEval 2011Rich Speech Retrieval (RSR) Task

I Task Goal:I Information to be found - combination of required

audio and visual content, and speaker’s intention

Transcript 1 =Meaning 1 6=

Speech act 1 6=

Transcript 2Meaning 2

Speech act 2

Extended speech retrieval

Page 43: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 13/48

MediaEval 2011Rich Speech Retrieval (RSR) Task

I Task Goal:I Information to be found - combination of required

audio and visual content, and speaker’s intention

Transcript 1 =Meaning 1 6=Speech act 1 6=

Transcript 2Meaning 2Speech act 2

Extended speech retrieval

Page 44: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 13/48

MediaEval 2011Rich Speech Retrieval (RSR) Task

I Task Goal:I Information to be found - combination of required

audio and visual content, and speaker’s intention

Transcript 1 =Meaning 1 6=Speech act 1 6=

Transcript 2Meaning 2Speech act 2

Extended speech retrieval

Page 45: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 14/48

MediaEval 2012-2013:Search and Hyperlinking (S&H) Task Background

Page 46: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 15/48

MediaEval 2012-2013:S&H Task

Page 47: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 16/48

MediaEval 2012-2013: S&H Task and Crowdsourcing

Page 48: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 17/48

Outline

I Spoken Content Retrieval: historical perspective

I MediaEval Benchmark:

I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinking tasks

I Dataset collection creation issues for multimediaretrieval: crowdsourcing aspect

I Interesting observations on results:I Segmentation methodsI Evaluation metricsI Numbers

Page 49: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 18/48

What is crowdsourcing?

I Crowdsourcing is a form of human computation.I Human computation is a method of having people dothings that we might consider assigning to a computingdevice, e.g. a language translation task.I A crowdsourcing system facilitates a crowdsourcingprocess.

I Factors to take into account:I Sufficient number of workersI Level of paymentI Clear instructionsI Possible cheating

Page 50: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 18/48

What is crowdsourcing?

I Crowdsourcing is a form of human computation.I Human computation is a method of having people dothings that we might consider assigning to a computingdevice, e.g. a language translation task.I A crowdsourcing system facilitates a crowdsourcingprocess.

I Factors to take into account:

I Sufficient number of workersI Level of paymentI Clear instructionsI Possible cheating

Page 51: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 18/48

What is crowdsourcing?

I Crowdsourcing is a form of human computation.I Human computation is a method of having people dothings that we might consider assigning to a computingdevice, e.g. a language translation task.I A crowdsourcing system facilitates a crowdsourcingprocess.

I Factors to take into account:I Sufficient number of workers

I Level of paymentI Clear instructionsI Possible cheating

Page 52: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 18/48

What is crowdsourcing?

I Crowdsourcing is a form of human computation.I Human computation is a method of having people dothings that we might consider assigning to a computingdevice, e.g. a language translation task.I A crowdsourcing system facilitates a crowdsourcingprocess.

I Factors to take into account:I Sufficient number of workersI Level of payment

I Clear instructionsI Possible cheating

Page 53: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 18/48

What is crowdsourcing?

I Crowdsourcing is a form of human computation.I Human computation is a method of having people dothings that we might consider assigning to a computingdevice, e.g. a language translation task.I A crowdsourcing system facilitates a crowdsourcingprocess.

I Factors to take into account:I Sufficient number of workersI Level of paymentI Clear instructions

I Possible cheating

Page 54: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 18/48

What is crowdsourcing?

I Crowdsourcing is a form of human computation.I Human computation is a method of having people dothings that we might consider assigning to a computingdevice, e.g. a language translation task.I A crowdsourcing system facilitates a crowdsourcingprocess.

I Factors to take into account:I Sufficient number of workersI Level of paymentI Clear instructionsI Possible cheating

Page 55: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 19/48

Results assessment

I Number of accepted HITs 6= number of collected queries

I No overlap of workers in dev and test setsI Creative work - Creative Cheating:

I Copy and paste provided examples− > Examples should be pictures, not texts

I Choose the option of no speech act found in the video− > Manual assessment by requester needed

I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video

Page 56: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 19/48

Results assessment

I Number of accepted HITs 6= number of collected queries

I No overlap of workers in dev and test setsI Creative work - Creative Cheating:

I Copy and paste provided examples− > Examples should be pictures, not texts

I Choose the option of no speech act found in the video− > Manual assessment by requester needed

I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video

Page 57: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 19/48

Results assessment

I Number of accepted HITs 6= number of collected queries

I No overlap of workers in dev and test setsI Creative work - Creative Cheating:

I Copy and paste provided examples− > Examples should be pictures, not texts

I Choose the option of no speech act found in the video− > Manual assessment by requester needed

I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video

Page 58: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 19/48

Results assessment

I Number of accepted HITs 6= number of collected queries

I No overlap of workers in dev and test sets

I Creative work - Creative Cheating:I Copy and paste provided examples− > Examples should be pictures, not texts

I Choose the option of no speech act found in the video− > Manual assessment by requester needed

I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video

Page 59: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 19/48

Results assessment

I Number of accepted HITs 6= number of collected queries

I No overlap of workers in dev and test setsI Creative work - Creative Cheating:

I Copy and paste provided examples− > Examples should be pictures, not texts

I Choose the option of no speech act found in the video− > Manual assessment by requester needed

I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video

Page 60: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 19/48

Results assessment

I Number of accepted HITs 6= number of collected queries

I No overlap of workers in dev and test setsI Creative work - Creative Cheating:

I Copy and paste provided examples

− > Examples should be pictures, not textsI Choose the option of no speech act found in the video− > Manual assessment by requester needed

I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video

Page 61: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 19/48

Results assessment

I Number of accepted HITs 6= number of collected queries

I No overlap of workers in dev and test setsI Creative work - Creative Cheating:

I Copy and paste provided examples− > Examples should be pictures, not texts

I Choose the option of no speech act found in the video− > Manual assessment by requester needed

I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video

Page 62: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 19/48

Results assessment

I Number of accepted HITs 6= number of collected queries

I No overlap of workers in dev and test setsI Creative work - Creative Cheating:

I Copy and paste provided examples− > Examples should be pictures, not texts

I Choose the option of no speech act found in the video

− > Manual assessment by requester neededI Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video

Page 63: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 19/48

Results assessment

I Number of accepted HITs 6= number of collected queries

I No overlap of workers in dev and test setsI Creative work - Creative Cheating:

I Copy and paste provided examples− > Examples should be pictures, not texts

I Choose the option of no speech act found in the video− > Manual assessment by requester needed

I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video

Page 64: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 19/48

Results assessment

I Number of accepted HITs 6= number of collected queries

I No overlap of workers in dev and test setsI Creative work - Creative Cheating:

I Copy and paste provided examples− > Examples should be pictures, not texts

I Choose the option of no speech act found in the video− > Manual assessment by requester needed

I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video

Page 65: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 20/48

Crowdsourcing issuesfor multimedia retrieval collection creation

I It is possible to crowdsource extensive and complextasks to support speech and language resources

I Use concepts and vocabulary familiar to the workersI Pay attention to technical issues of watching the videoI Video preprocessing into smaller segmentsI Creative work demands higher reward level, or just

more flexible systemI High level of wastage due to task complexity

Page 66: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 20/48

Crowdsourcing issuesfor multimedia retrieval collection creation

I It is possible to crowdsource extensive and complextasks to support speech and language resources

I Use concepts and vocabulary familiar to the workers

I Pay attention to technical issues of watching the videoI Video preprocessing into smaller segmentsI Creative work demands higher reward level, or just

more flexible systemI High level of wastage due to task complexity

Page 67: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 20/48

Crowdsourcing issuesfor multimedia retrieval collection creation

I It is possible to crowdsource extensive and complextasks to support speech and language resources

I Use concepts and vocabulary familiar to the workersI Pay attention to technical issues of watching the video

I Video preprocessing into smaller segmentsI Creative work demands higher reward level, or just

more flexible systemI High level of wastage due to task complexity

Page 68: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 20/48

Crowdsourcing issuesfor multimedia retrieval collection creation

I It is possible to crowdsource extensive and complextasks to support speech and language resources

I Use concepts and vocabulary familiar to the workersI Pay attention to technical issues of watching the videoI Video preprocessing into smaller segments

I Creative work demands higher reward level, or justmore flexible system

I High level of wastage due to task complexity

Page 69: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 20/48

Crowdsourcing issuesfor multimedia retrieval collection creation

I It is possible to crowdsource extensive and complextasks to support speech and language resources

I Use concepts and vocabulary familiar to the workersI Pay attention to technical issues of watching the videoI Video preprocessing into smaller segmentsI Creative work demands higher reward level, or just

more flexible system

I High level of wastage due to task complexity

Page 70: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 20/48

Crowdsourcing issuesfor multimedia retrieval collection creation

I It is possible to crowdsource extensive and complextasks to support speech and language resources

I Use concepts and vocabulary familiar to the workersI Pay attention to technical issues of watching the videoI Video preprocessing into smaller segmentsI Creative work demands higher reward level, or just

more flexible systemI High level of wastage due to task complexity

Page 71: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 21/48

Outline

I Spoken Content Retrieval: historical perspective

I MediaEval Benchmark:

I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinking tasks

I Dataset collection creation issues for multimedia retrieval:crowdsourcing aspect

I Interesting observations on results:I Segmentation methodsI Evaluation metricsI Numbers

Page 72: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 22/48

Dataset segment representation

Page 73: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 23/48

Approach 1: Fixed length segmentation

I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots

I Fixed length segmentation with sliding window:

I Post-processing:

Page 74: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 23/48

Approach 1: Fixed length segmentation

I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots

I Fixed length segmentation with sliding window:

I Post-processing:

Page 75: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 23/48

Approach 1: Fixed length segmentation

I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots

I Fixed length segmentation with sliding window:

I Post-processing:

Page 76: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 23/48

Approach 1: Fixed length segmentation

I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots

I Fixed length segmentation with sliding window:

I Post-processing:

Page 77: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 23/48

Approach 1: Fixed length segmentation

I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots

I Fixed length segmentation with sliding window:

I Post-processing:

Page 78: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 24/48

Approach 1: Fixed length segmentation

I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots

I Fixed length segmentation with sliding window:

I Post-processing:

Page 79: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 25/48

Approach 1: Fixed length segmentation

I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots

I Fixed length segmentation with sliding window:

I Post-processing:

Page 80: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 26/48

Approach 1: Fixed length segmentation

I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots

I Fixed length segmentation with sliding window:

I Post-processing:

Page 81: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 27/48

Approach 1: Fixed length segmentation

I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots

I Fixed length segmentation with sliding window:

I Post-processing:

Page 82: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 28/48

Approach 2: Flexible length segmentation

I Speech or Video units of varying length

I Speech: sentence, speech segment, silence points,changes of speakers

I Video: shots

I Topical segmentation

I Lexical cohesion - C99, TexTiling

Page 83: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 28/48

Approach 2: Flexible length segmentation

I Speech or Video units of varying length

I Speech: sentence, speech segment, silence points,changes of speakers

I Video: shots

I Topical segmentation

I Lexical cohesion - C99, TexTiling

Page 84: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 28/48

Approach 2: Flexible length segmentation

I Speech or Video units of varying length

I Speech: sentence, speech segment, silence points,changes of speakers

I Video: shots

I Topical segmentation

I Lexical cohesion - C99, TexTiling

Page 85: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 29/48

Outline

I Spoken Content Retrieval: historical perspective

I MediaEval Benchmark:

I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinking tasks

I Dataset collection creation issues for multimedia retrieval:crowdsourcing aspect

I Interesting observations on results:I Segmentation methodsI Evaluation metricsI Numbers

Page 86: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 30/48

Evaluation: Search sub-task

Page 87: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 31/48

Evaluation: Search sub-task

Page 88: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 32/48

Evaluation: Search sub-task

Page 89: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 33/48

Evaluation: Search sub-task

Page 90: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 34/48

Evaluation: Search sub-task

I Mean Reciprocal Rank (MRR):

RR =1

RANKI Mean Generalized Average Precision (mGAP):

GAP =1

RANK. PENALTY

Page 91: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 34/48

Evaluation: Search sub-task

I Mean Reciprocal Rank (MRR):

RR =1

RANKI Mean Generalized Average Precision (mGAP):

GAP =1

RANK. PENALTY

Page 92: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 35/48

Evaluation: Search sub-task

I Mean Average Segment Precision (MASP):Ranking + Length of (ir)relevant content

Segment Precision (SP[r ]) at rank r :

Average Segment Precision:

ASP =1n.

N∑r=1

SP[r ] · rel(sr )

rel(sr ) = 1, if relevant content is present,otherwise rel(sr ) = 0

Page 93: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 35/48

Evaluation: Search sub-task

I Mean Average Segment Precision (MASP):Ranking + Length of (ir)relevant content

Segment Precision (SP[r ]) at rank r :

Average Segment Precision:

ASP =1n.

N∑r=1

SP[r ] · rel(sr )

rel(sr ) = 1, if relevant content is present,otherwise rel(sr ) = 0

Page 94: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 35/48

Evaluation: Search sub-task

I Mean Average Segment Precision (MASP):Ranking + Length of (ir)relevant content

Segment Precision (SP[r ]) at rank r :

Average Segment Precision:

ASP =1n.

N∑r=1

SP[r ] · rel(sr )

rel(sr ) = 1, if relevant content is present,otherwise rel(sr ) = 0

Page 95: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 35/48

Evaluation: Search sub-task

I Mean Average Segment Precision (MASP):Ranking + Length of (ir)relevant content

Segment Precision (SP[r ]) at rank r :

Average Segment Precision:

ASP =1n.

N∑r=1

SP[r ] · rel(sr )

rel(sr ) = 1, if relevant content is present,otherwise rel(sr ) = 0

Page 96: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 36/48

Evaluation: Search sub-task

Focus on Precision/Recall of the relevant content within theretrieved segment.

Page 97: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 37/48

Outline

I Spoken Content Retrieval: historical perspective

I MediaEval Benchmark:

I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinking tasks

I Dataset collection creation issues for multimedia retrieval:crowdsourcing aspect

I Interesting observations on results:I Segmentation methodsI Evaluation metricsI Numbers

Page 98: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 38/48

Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods

Segment:I 100 % Recall of the relevant contentI High Precision (30, 56 %) of the relevant contentI Topic consistency

Page 99: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 39/48

Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods

Page 100: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 40/48

Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods

Page 101: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 41/48

Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods

Page 102: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 42/48

Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods

Page 103: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 43/48

Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods

Page 104: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 44/48

Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods

Page 105: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 45/48

Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods

Page 106: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 46/48

Experiments (S&H)

I Fixed length segmentation with sliding windowI 2 transcrpts (LIMSI, LIUM)

LIMSI LIUM

Page 107: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 47/48

Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant

content

I High recall and precision of the relevant content within thesegment leads to good segment ranking.

I Related metadata can be useful to improve ranking of thesegment with high recall and containing non relevantcontent.

I Influence of ASR quality:I The errors effect is not straightforward, can be smoothed by

the use of context, query dependent treatment of thetranscript.

I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).

I Multimodal queries: addition of visual informationdecreases performance.

Page 108: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 47/48

Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant

contentI High recall and precision of the relevant content within the

segment leads to good segment ranking.

I Related metadata can be useful to improve ranking of thesegment with high recall and containing non relevantcontent.

I Influence of ASR quality:I The errors effect is not straightforward, can be smoothed by

the use of context, query dependent treatment of thetranscript.

I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).

I Multimodal queries: addition of visual informationdecreases performance.

Page 109: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 47/48

Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant

contentI High recall and precision of the relevant content within the

segment leads to good segment ranking.I Related metadata can be useful to improve ranking of the

segment with high recall and containing non relevantcontent.

I Influence of ASR quality:I The errors effect is not straightforward, can be smoothed by

the use of context, query dependent treatment of thetranscript.

I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).

I Multimodal queries: addition of visual informationdecreases performance.

Page 110: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 47/48

Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant

contentI High recall and precision of the relevant content within the

segment leads to good segment ranking.I Related metadata can be useful to improve ranking of the

segment with high recall and containing non relevantcontent.

I Influence of ASR quality:

I The errors effect is not straightforward, can be smoothed bythe use of context, query dependent treatment of thetranscript.

I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).

I Multimodal queries: addition of visual informationdecreases performance.

Page 111: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 47/48

Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant

contentI High recall and precision of the relevant content within the

segment leads to good segment ranking.I Related metadata can be useful to improve ranking of the

segment with high recall and containing non relevantcontent.

I Influence of ASR quality:I The errors effect is not straightforward, can be smoothed by

the use of context, query dependent treatment of thetranscript.

I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).

I Multimodal queries: addition of visual informationdecreases performance.

Page 112: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 47/48

Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant

contentI High recall and precision of the relevant content within the

segment leads to good segment ranking.I Related metadata can be useful to improve ranking of the

segment with high recall and containing non relevantcontent.

I Influence of ASR quality:I The errors effect is not straightforward, can be smoothed by

the use of context, query dependent treatment of thetranscript.

I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).

I Multimodal queries: addition of visual informationdecreases performance.

Page 113: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 47/48

Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant

contentI High recall and precision of the relevant content within the

segment leads to good segment ranking.I Related metadata can be useful to improve ranking of the

segment with high recall and containing non relevantcontent.

I Influence of ASR quality:I The errors effect is not straightforward, can be smoothed by

the use of context, query dependent treatment of thetranscript.

I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).

I Multimodal queries: addition of visual informationdecreases performance.

Page 114: Focus on spoken content in multimedia retrieval

Focus on spoken content in multimedia retrieval 48/48

Thank you for your attention!

Questions?