Upload
maria-eskevich
View
354
Download
3
Embed Size (px)
DESCRIPTION
Invited talk at the University of Texas at El Paso
Citation preview
Focus on spoken content in multimedia retrieval 1/48
Focus on spoken contentin multimedia retrieval
Maria Eskevich
Centre for Next Generation LocalisationSchool of Computing, Dublin City University,
Dublin, Ireland
April, 16, 2013
Focus on spoken content in multimedia retrieval 2/48
Outline
I Spoken Content Retrieval: historical perspective
I MediaEval Benchmark:
I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinking tasks
I Dataset collection creation issues for multimedia retrieval:crowdsourcing aspect
I Interesting observations on results:I Segmentation methodsI Evaluation metricsI Numbers
Focus on spoken content in multimedia retrieval 3/48
Outline
I Spoken Content Retrieval: historical perspective
I MediaEval Benchmark:
I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinking tasks
I Dataset collection creation issues for multimedia retrieval:crowdsourcing aspect
I Interesting observations on results: segmentation aspect
Focus on spoken content in multimedia retrieval 4/48
Towards Effective Retrieval
of
Spontaneous Conversational Spoken Content
Towards Effective Retrieval
Spontaneous Conversational Spoken Content
Information Retrieval (IR)
Standard IR SystemSpoken Content Retrieval (SCR)
Queries
IR SystemSCR SystemIndexed
DocumentsIndexed
Transcripts
IR ModelInformation
Request
ResultsAudioFiles
RetrievalRetrieval
Speech Processing (Automatic Speech Recognition (ASR))
Audio Data Collection Transcriptsof Audio DataASR
System
Focus on spoken content in multimedia retrieval 4/48
Towards Effective Retrieval
of
Spontaneous Conversational Spoken Content
Towards Effective Retrieval
Spontaneous Conversational Spoken Content
Information Retrieval (IR)
Standard IR System
Spoken Content Retrieval (SCR)
Queries
IR SystemSCR SystemIndexed
DocumentsIndexed
Transcripts
IR ModelInformation
Request
ResultsAudioFiles
RetrievalRetrieval
Speech Processing (Automatic Speech Recognition (ASR))
Audio Data Collection Transcriptsof Audio DataASR
System
Focus on spoken content in multimedia retrieval 4/48
Towards Effective Retrieval
of
Spontaneous Conversational Spoken Content
Towards Effective Retrieval
Spontaneous Conversational Spoken Content
Information Retrieval (IR)
Standard IR System
Spoken Content Retrieval (SCR)
Queries
IR System
SCR System
IndexedDocuments
IndexedTranscripts
IR ModelInformation
Request
Results
AudioFiles
Retrieval
Retrieval
Speech Processing (Automatic Speech Recognition (ASR))
Audio Data Collection Transcriptsof Audio DataASR
System
Focus on spoken content in multimedia retrieval 4/48
Towards Effective Retrieval
of
Spontaneous Conversational Spoken Content
Towards Effective Retrieval
Spontaneous Conversational Spoken Content
Information Retrieval (IR)
Standard IR System
Spoken Content Retrieval (SCR)
Queries
IR System
SCR System
IndexedDocuments
IndexedTranscripts
IR ModelInformation
Request
Results
AudioFiles
Retrieval
Retrieval
Speech Processing (Automatic Speech Recognition (ASR))
Audio Data Collection Transcriptsof Audio DataASR
System
Focus on spoken content in multimedia retrieval 4/48
Towards Effective Retrieval
of
Spontaneous Conversational Spoken Content
Towards Effective Retrieval
Spontaneous Conversational Spoken Content
Information Retrieval (IR)Standard IR System
Spoken Content Retrieval (SCR)
Queries
IR System
SCR SystemIndexed
DocumentsIndexed
Transcripts
IR ModelInformation
Request
Results
AudioFiles
Retrieval
Retrieval
Speech Processing (Automatic Speech Recognition (ASR))
Audio Data Collection Transcriptsof Audio DataASR
System
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
Data
Spoken Content
ASR Transcript
Research Question 1:How does segmentationof spoken data affect the
retrieval performance?What are the character-istics of a segmentationmethod that maximizes
SCR effectiveness?
Research Question 2:What is the relationship be-
tween ASR errors in the tran-script and retrieval behaiour?
Research Question 3:How can regions of poor
speech recognition be identi-fied and processed in orderto improve overall speech
retrieval performance (detec-tion, special treatment in thespeech retrieval process)?
Research Question 4:Can we implement a mean-
ingful approach to SCRof conversation content
incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?
RQ 2 RQ 3 RQ 4
Experiments
ASR System
Indexed Transcript
Ranked Result List
1
2...
RQ 1Indexing
EvaluationMetrics
Retrieval
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
Data
Spoken Content
ASR Transcript
Research Question 1:How does segmentationof spoken data affect the
retrieval performance?What are the character-istics of a segmentationmethod that maximizes
SCR effectiveness?
Research Question 2:What is the relationship be-
tween ASR errors in the tran-script and retrieval behaiour?
Research Question 3:How can regions of poor
speech recognition be identi-fied and processed in orderto improve overall speech
retrieval performance (detec-tion, special treatment in thespeech retrieval process)?
Research Question 4:Can we implement a mean-
ingful approach to SCRof conversation content
incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?
RQ 2 RQ 3 RQ 4
Experiments
ASR System
Indexed Transcript
Ranked Result List
1
2...
RQ 1Indexing
EvaluationMetrics
Retrieval
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
Data
Spoken Content
ASR Transcript
Research Question 1:How does segmentationof spoken data affect the
retrieval performance?What are the character-istics of a segmentationmethod that maximizes
SCR effectiveness?
Research Question 2:What is the relationship be-
tween ASR errors in the tran-script and retrieval behaiour?
Research Question 3:How can regions of poor
speech recognition be identi-fied and processed in orderto improve overall speech
retrieval performance (detec-tion, special treatment in thespeech retrieval process)?
Research Question 4:Can we implement a mean-
ingful approach to SCRof conversation content
incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?
RQ 2 RQ 3 RQ 4
Experiments
ASR System
Indexed Transcript
Ranked Result List
1
2...
RQ 1
Indexing
EvaluationMetrics
Retrieval
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
Data
Spoken Content
ASR Transcript
Research Question 1:How does segmentationof spoken data affect the
retrieval performance?What are the character-istics of a segmentationmethod that maximizes
SCR effectiveness?
Research Question 2:What is the relationship be-
tween ASR errors in the tran-script and retrieval behaiour?
Research Question 3:How can regions of poor
speech recognition be identi-fied and processed in orderto improve overall speech
retrieval performance (detec-tion, special treatment in thespeech retrieval process)?
Research Question 4:Can we implement a mean-
ingful approach to SCRof conversation content
incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?
RQ 2 RQ 3 RQ 4
Experiments
ASR System
Indexed Transcript
Ranked Result List
1
2...
RQ 1
Indexing
EvaluationMetrics
Retrieval
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
DataSpoken Content
ASR Transcript
Research Question 1:How does segmentationof spoken data affect the
retrieval performance?What are the character-istics of a segmentationmethod that maximizes
SCR effectiveness?
Research Question 2:What is the relationship be-
tween ASR errors in the tran-script and retrieval behaiour?
Research Question 3:How can regions of poor
speech recognition be identi-fied and processed in orderto improve overall speech
retrieval performance (detec-tion, special treatment in thespeech retrieval process)?
Research Question 4:Can we implement a mean-
ingful approach to SCRof conversation content
incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?
RQ 2 RQ 3 RQ 4
Experiments
ASR System
Indexed Transcript
Ranked Result List
1
2...
RQ 1
Indexing
EvaluationMetrics
Retrieval
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
DataSpoken Content
ASR Transcript
Research Question 1:How does segmentationof spoken data affect the
retrieval performance?What are the character-istics of a segmentationmethod that maximizes
SCR effectiveness?
Research Question 2:What is the relationship be-
tween ASR errors in the tran-script and retrieval behaiour?
Research Question 3:How can regions of poor
speech recognition be identi-fied and processed in orderto improve overall speech
retrieval performance (detec-tion, special treatment in thespeech retrieval process)?
Research Question 4:Can we implement a mean-
ingful approach to SCRof conversation content
incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?
RQ 2 RQ 3 RQ 4
Experiments
ASR System
Indexed Transcript
Ranked Result List
1
2...
RQ 1
Indexing
EvaluationMetrics
Retrieval
Focus on spoken content in multimedia retrieval 5/48
Spoken Content Retrieval (SCR)
DataSpoken Content
ASR Transcript
Research Question 1:How does segmentationof spoken data affect the
retrieval performance?What are the character-istics of a segmentationmethod that maximizes
SCR effectiveness?
Research Question 2:What is the relationship be-
tween ASR errors in the tran-script and retrieval behaiour?
Research Question 3:How can regions of poor
speech recognition be identi-fied and processed in orderto improve overall speech
retrieval performance (detec-tion, special treatment in thespeech retrieval process)?
Research Question 4:Can we implement a mean-
ingful approach to SCRof conversation content
incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?
RQ 2 RQ 3 RQ 4
Experiments
ASR System
Indexed Transcript
Ranked Result List
1
2...
RQ 1
Indexing
EvaluationMetrics
Retrieval
Focus on spoken content in multimedia retrieval 6/48
Outline: Spoken Content
DataSpoken Content
ASR Transcript
Research Question 1:How does segmentationof spoken data affect the
retrieval performance?What are the character-istics of a segmentationmethod that maximizes
SCR effectiveness?
Research Question 2:What is the relationship be-
tween ASR errors in the tran-script and retrieval behaiour?
Research Question 3:How can regions of poor
speech recognition be identi-fied and processed in orderto improve overall speech
retrieval performance (detec-tion, special treatment in thespeech retrieval process)?
Research Question 4:Can we implement a mean-
ingful approach to SCRof conversation content
incorporating task specificsegmentation, and specialtreatment of regions withunreliable ASR output?
RQ 2 RQ 3 RQ 4
Experiments
ASR System
Indexed Transcript
Ranked Result List
1
2...
RQ 1
Indexing
EvaluationMetrics
Retrieval
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken Content
Prepared Speech
InformalConversational
Speech
Broadcast NewsBroadcast News
LecturesLectures
MeetingsMeetings
Informal ContentInformal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLectures
I SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast NewsBroadcast News
LecturesLectures
MeetingsMeetings
Informal ContentInformal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast News
Broadcast News
LecturesLectures
MeetingsMeetings
Informal ContentInformal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast News
Broadcast News
Lectures
Lectures
MeetingsMeetings
Informal ContentInformal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast News
Broadcast News
Lectures
Lectures
Meetings
Meetings
Informal ContentInformal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast News
Broadcast News
Lectures
Lectures
Meetings
Meetings
Informal Content
Informal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast News
Broadcast News
Lectures
Lectures
Meetings
Meetings
Informal Content
Informal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast NewsBroadcast News
Lectures
Lectures
Meetings
Meetings
Informal Content
Informal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast NewsBroadcast News
Lectures
Lectures
Meetings
Meetings
Informal Content
Informal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast News
Broadcast News
LecturesLectures
Meetings
Meetings
Informal Content
Informal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast News
Broadcast News
LecturesLectures
Meetings
Meetings
Informal Content
Informal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast News
Broadcast News
Lectures
Lectures
MeetingsMeetings
Informal Content
Informal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast News
Broadcast News
Lectures
Lectures
MeetingsMeetings
Informal Content
Informal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast News
Broadcast News
Lectures
Lectures
Meetings
Meetings
Informal ContentInformal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast News
Broadcast News
Lectures
Lectures
Meetings
Meetings
Informal ContentInformal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 7/48
Spoken Content Retrieval: historical perspectiveSpoken ContentPrepared Speech
InformalConversational
Speech
Broadcast News
Broadcast News
LecturesLectures
MeetingsMeetings
Informal ContentInformal Content
Internet TV,Podcast, Interview
Internet TV,Podcast, Interview
Broadcast News:
I DataI High quality recordings:
I Often soundproof studioI Speaker - professional presenter
I Well defined structureI Query is on a certain topic:
User is ready to listen to the whole section
I Experiments: TREC SDR (1997-2000)I Known-item search and ad-hoc retrievalI Search with and without fixed story boundaries
I Evaluation: interest in rank position
HIGHLIGHT: ”Success story” (Garofolo et al., 2000):Performance on ASR Transcript ≈ Manual Transcript
I ASR good: large amounts of training dataI Data structure
CHALLENGE:Speech data in broadcast news is close to the written text,and differs from the informal content of spontaneous speech
Lectures:I Data:
I Prepared presentations containingconversational style features:hesitations, mispronunciations
I Specialized vocabularyI Out-Of-Vocabulary wordsI Lecture specific words may have low
probability scores in the ASR languagemodel
I Additional information available:presentation slides, textbooks
I Experiments:I Lectures browsing:
e.g. TalkMiner, MIT lectures, eLecturesI SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10:
e.g. IR experiments, evaluation metrics thatassess topic segmentation methods
HIGHLIGHT/CHALLENGE:I Focus on segmentation methods, jump-in
points
Meetings:
I Data features:I Mixture of semi-formal and prepared spoken
contentI Additional data: slides, minutes
I Possible real life motivated scenario:I Jump-in points where discussion on topic
started or a decision point is reachedI Opinion of a certain person or person with a
certain roleI Search for all relevant (parts of) meetings
where topic was discussed
I Experiments:I topic segmentation, browsingI summarization
HIGHLIGHT/CHALLENGE:I No unified search scenarioI We created a test retrieval collection on the basis of AMI
corpus and set up a task scenario ourselves
Informal Content (Interviews, Internet TV):I Data features:
I Varying quality: semi- andnon-professional data creators
I Additional data: professionally oruser-generated metadata
I Experiments:I CLEF CL-SR: MALACH collection
I un/known-boundaries, ad-hoc taskI MediaEval’11,’12,’13: retrieval of
semi-professional multimedia contentI known-item task, unknown
boundariesI Metrics: focus on ranking and penalize
distance from the jump-in point
HIGHLIGHT/CHALLENGE:I Metric does not always take into account how much time the
user needs to spend listening to access the relevant contentI Diversity of the informal multimedia contentI Search scenario no longer limited to factual information
Review of the challenges/our work for Informal SCR:
I Framework of retrieval experiment has to be setup: retrieval collections to be createdOur work: We collected new multimodal retrievalcollections via crowdsourcing
I ASR errors decrease IR resultsOur work: We examined deeper relationshipbetween ASR performance and results ranking
I Suitable segmentation is vitalOur work: We carry out experiments with varyingmethods
I Need for metrics that reflect all aspects of userexperienceOur work: We created a new set of metrics
Focus on spoken content in multimedia retrieval 8/48
Outline
I Spoken Content Retrieval: historical perspective
I MediaEval Benchmark:
I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinkingtasks
I Dataset collection creation issues for multimedia retrieval:crowdsourcing aspect
I Interesting observations on results:I Segmentation methodsI Evaluation metricsI Numbers
Focus on spoken content in multimedia retrieval 9/48
MediaEvalMultimedia Evaluation benchmarking inititative
I Evaluate new algorithms for multimedia access andretrieval.
I Emphasize the ”multi” in multimedia: speech, audio,visual content, tags, users, context.
I Innovates new tasks and techniques focusing on thehuman and social aspects of multimedia content.
Focus on spoken content in multimedia retrieval 10/48
MediaEval 2011Rich Speech Retrieval (RSR) Task
I Task Goal:I Information to be found - combination of required
audio and visual content, and speaker’s intention
Focus on spoken content in multimedia retrieval 10/48
MediaEval 2011Rich Speech Retrieval (RSR) Task
I Task Goal:I Information to be found - combination of required
audio and visual content, and speaker’s intention
Focus on spoken content in multimedia retrieval 11/48
MediaEval 2011Rich Speech Retrieval (RSR) Task
I Task Goal:I Information to be found - combination of required
audio and visual content, and speaker’s intention
Focus on spoken content in multimedia retrieval 12/48
MediaEval 2011Rich Speech Retrieval (RSR) Task
I Task Goal:I Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1
6=Meaning 1 6=
Transcript 2
Meaning 2
Conventional retrieval
Focus on spoken content in multimedia retrieval 12/48
MediaEval 2011Rich Speech Retrieval (RSR) Task
I Task Goal:I Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1
6=
Meaning 1
6=
Transcript 2Meaning 2
Conventional retrieval
Focus on spoken content in multimedia retrieval 12/48
MediaEval 2011Rich Speech Retrieval (RSR) Task
I Task Goal:I Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1 6=Meaning 1 6=
Transcript 2Meaning 2
Conventional retrieval
Focus on spoken content in multimedia retrieval 12/48
MediaEval 2011Rich Speech Retrieval (RSR) Task
I Task Goal:I Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1 6=Meaning 1 6=
Transcript 2Meaning 2
Conventional retrieval
Focus on spoken content in multimedia retrieval 13/48
MediaEval 2011Rich Speech Retrieval (RSR) Task
I Task Goal:I Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1 =Meaning 1 6=
Speech act 1 6=
Transcript 2Meaning 2
Speech act 2
Extended speech retrieval
Focus on spoken content in multimedia retrieval 13/48
MediaEval 2011Rich Speech Retrieval (RSR) Task
I Task Goal:I Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1 =Meaning 1 6=Speech act 1 6=
Transcript 2Meaning 2Speech act 2
Extended speech retrieval
Focus on spoken content in multimedia retrieval 13/48
MediaEval 2011Rich Speech Retrieval (RSR) Task
I Task Goal:I Information to be found - combination of required
audio and visual content, and speaker’s intention
Transcript 1 =Meaning 1 6=Speech act 1 6=
Transcript 2Meaning 2Speech act 2
Extended speech retrieval
Focus on spoken content in multimedia retrieval 14/48
MediaEval 2012-2013:Search and Hyperlinking (S&H) Task Background
Focus on spoken content in multimedia retrieval 15/48
MediaEval 2012-2013:S&H Task
Focus on spoken content in multimedia retrieval 16/48
MediaEval 2012-2013: S&H Task and Crowdsourcing
Focus on spoken content in multimedia retrieval 17/48
Outline
I Spoken Content Retrieval: historical perspective
I MediaEval Benchmark:
I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinking tasks
I Dataset collection creation issues for multimediaretrieval: crowdsourcing aspect
I Interesting observations on results:I Segmentation methodsI Evaluation metricsI Numbers
Focus on spoken content in multimedia retrieval 18/48
What is crowdsourcing?
I Crowdsourcing is a form of human computation.I Human computation is a method of having people dothings that we might consider assigning to a computingdevice, e.g. a language translation task.I A crowdsourcing system facilitates a crowdsourcingprocess.
I Factors to take into account:I Sufficient number of workersI Level of paymentI Clear instructionsI Possible cheating
Focus on spoken content in multimedia retrieval 18/48
What is crowdsourcing?
I Crowdsourcing is a form of human computation.I Human computation is a method of having people dothings that we might consider assigning to a computingdevice, e.g. a language translation task.I A crowdsourcing system facilitates a crowdsourcingprocess.
I Factors to take into account:
I Sufficient number of workersI Level of paymentI Clear instructionsI Possible cheating
Focus on spoken content in multimedia retrieval 18/48
What is crowdsourcing?
I Crowdsourcing is a form of human computation.I Human computation is a method of having people dothings that we might consider assigning to a computingdevice, e.g. a language translation task.I A crowdsourcing system facilitates a crowdsourcingprocess.
I Factors to take into account:I Sufficient number of workers
I Level of paymentI Clear instructionsI Possible cheating
Focus on spoken content in multimedia retrieval 18/48
What is crowdsourcing?
I Crowdsourcing is a form of human computation.I Human computation is a method of having people dothings that we might consider assigning to a computingdevice, e.g. a language translation task.I A crowdsourcing system facilitates a crowdsourcingprocess.
I Factors to take into account:I Sufficient number of workersI Level of payment
I Clear instructionsI Possible cheating
Focus on spoken content in multimedia retrieval 18/48
What is crowdsourcing?
I Crowdsourcing is a form of human computation.I Human computation is a method of having people dothings that we might consider assigning to a computingdevice, e.g. a language translation task.I A crowdsourcing system facilitates a crowdsourcingprocess.
I Factors to take into account:I Sufficient number of workersI Level of paymentI Clear instructions
I Possible cheating
Focus on spoken content in multimedia retrieval 18/48
What is crowdsourcing?
I Crowdsourcing is a form of human computation.I Human computation is a method of having people dothings that we might consider assigning to a computingdevice, e.g. a language translation task.I A crowdsourcing system facilitates a crowdsourcingprocess.
I Factors to take into account:I Sufficient number of workersI Level of paymentI Clear instructionsI Possible cheating
Focus on spoken content in multimedia retrieval 19/48
Results assessment
I Number of accepted HITs 6= number of collected queries
I No overlap of workers in dev and test setsI Creative work - Creative Cheating:
I Copy and paste provided examples− > Examples should be pictures, not texts
I Choose the option of no speech act found in the video− > Manual assessment by requester needed
I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video
Focus on spoken content in multimedia retrieval 19/48
Results assessment
I Number of accepted HITs 6= number of collected queries
I No overlap of workers in dev and test setsI Creative work - Creative Cheating:
I Copy and paste provided examples− > Examples should be pictures, not texts
I Choose the option of no speech act found in the video− > Manual assessment by requester needed
I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video
Focus on spoken content in multimedia retrieval 19/48
Results assessment
I Number of accepted HITs 6= number of collected queries
I No overlap of workers in dev and test setsI Creative work - Creative Cheating:
I Copy and paste provided examples− > Examples should be pictures, not texts
I Choose the option of no speech act found in the video− > Manual assessment by requester needed
I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video
Focus on spoken content in multimedia retrieval 19/48
Results assessment
I Number of accepted HITs 6= number of collected queries
I No overlap of workers in dev and test sets
I Creative work - Creative Cheating:I Copy and paste provided examples− > Examples should be pictures, not texts
I Choose the option of no speech act found in the video− > Manual assessment by requester needed
I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video
Focus on spoken content in multimedia retrieval 19/48
Results assessment
I Number of accepted HITs 6= number of collected queries
I No overlap of workers in dev and test setsI Creative work - Creative Cheating:
I Copy and paste provided examples− > Examples should be pictures, not texts
I Choose the option of no speech act found in the video− > Manual assessment by requester needed
I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video
Focus on spoken content in multimedia retrieval 19/48
Results assessment
I Number of accepted HITs 6= number of collected queries
I No overlap of workers in dev and test setsI Creative work - Creative Cheating:
I Copy and paste provided examples
− > Examples should be pictures, not textsI Choose the option of no speech act found in the video− > Manual assessment by requester needed
I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video
Focus on spoken content in multimedia retrieval 19/48
Results assessment
I Number of accepted HITs 6= number of collected queries
I No overlap of workers in dev and test setsI Creative work - Creative Cheating:
I Copy and paste provided examples− > Examples should be pictures, not texts
I Choose the option of no speech act found in the video− > Manual assessment by requester needed
I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video
Focus on spoken content in multimedia retrieval 19/48
Results assessment
I Number of accepted HITs 6= number of collected queries
I No overlap of workers in dev and test setsI Creative work - Creative Cheating:
I Copy and paste provided examples− > Examples should be pictures, not texts
I Choose the option of no speech act found in the video
− > Manual assessment by requester neededI Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video
Focus on spoken content in multimedia retrieval 19/48
Results assessment
I Number of accepted HITs 6= number of collected queries
I No overlap of workers in dev and test setsI Creative work - Creative Cheating:
I Copy and paste provided examples− > Examples should be pictures, not texts
I Choose the option of no speech act found in the video− > Manual assessment by requester needed
I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video
Focus on spoken content in multimedia retrieval 19/48
Results assessment
I Number of accepted HITs 6= number of collected queries
I No overlap of workers in dev and test setsI Creative work - Creative Cheating:
I Copy and paste provided examples− > Examples should be pictures, not texts
I Choose the option of no speech act found in the video− > Manual assessment by requester needed
I Workers rarely find noteworthy content later than thethird minute from the start of playback point in the video
Focus on spoken content in multimedia retrieval 20/48
Crowdsourcing issuesfor multimedia retrieval collection creation
I It is possible to crowdsource extensive and complextasks to support speech and language resources
I Use concepts and vocabulary familiar to the workersI Pay attention to technical issues of watching the videoI Video preprocessing into smaller segmentsI Creative work demands higher reward level, or just
more flexible systemI High level of wastage due to task complexity
Focus on spoken content in multimedia retrieval 20/48
Crowdsourcing issuesfor multimedia retrieval collection creation
I It is possible to crowdsource extensive and complextasks to support speech and language resources
I Use concepts and vocabulary familiar to the workers
I Pay attention to technical issues of watching the videoI Video preprocessing into smaller segmentsI Creative work demands higher reward level, or just
more flexible systemI High level of wastage due to task complexity
Focus on spoken content in multimedia retrieval 20/48
Crowdsourcing issuesfor multimedia retrieval collection creation
I It is possible to crowdsource extensive and complextasks to support speech and language resources
I Use concepts and vocabulary familiar to the workersI Pay attention to technical issues of watching the video
I Video preprocessing into smaller segmentsI Creative work demands higher reward level, or just
more flexible systemI High level of wastage due to task complexity
Focus on spoken content in multimedia retrieval 20/48
Crowdsourcing issuesfor multimedia retrieval collection creation
I It is possible to crowdsource extensive and complextasks to support speech and language resources
I Use concepts and vocabulary familiar to the workersI Pay attention to technical issues of watching the videoI Video preprocessing into smaller segments
I Creative work demands higher reward level, or justmore flexible system
I High level of wastage due to task complexity
Focus on spoken content in multimedia retrieval 20/48
Crowdsourcing issuesfor multimedia retrieval collection creation
I It is possible to crowdsource extensive and complextasks to support speech and language resources
I Use concepts and vocabulary familiar to the workersI Pay attention to technical issues of watching the videoI Video preprocessing into smaller segmentsI Creative work demands higher reward level, or just
more flexible system
I High level of wastage due to task complexity
Focus on spoken content in multimedia retrieval 20/48
Crowdsourcing issuesfor multimedia retrieval collection creation
I It is possible to crowdsource extensive and complextasks to support speech and language resources
I Use concepts and vocabulary familiar to the workersI Pay attention to technical issues of watching the videoI Video preprocessing into smaller segmentsI Creative work demands higher reward level, or just
more flexible systemI High level of wastage due to task complexity
Focus on spoken content in multimedia retrieval 21/48
Outline
I Spoken Content Retrieval: historical perspective
I MediaEval Benchmark:
I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinking tasks
I Dataset collection creation issues for multimedia retrieval:crowdsourcing aspect
I Interesting observations on results:I Segmentation methodsI Evaluation metricsI Numbers
Focus on spoken content in multimedia retrieval 22/48
Dataset segment representation
Focus on spoken content in multimedia retrieval 23/48
Approach 1: Fixed length segmentation
I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots
I Fixed length segmentation with sliding window:
I Post-processing:
Focus on spoken content in multimedia retrieval 23/48
Approach 1: Fixed length segmentation
I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots
I Fixed length segmentation with sliding window:
I Post-processing:
Focus on spoken content in multimedia retrieval 23/48
Approach 1: Fixed length segmentation
I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots
I Fixed length segmentation with sliding window:
I Post-processing:
Focus on spoken content in multimedia retrieval 23/48
Approach 1: Fixed length segmentation
I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots
I Fixed length segmentation with sliding window:
I Post-processing:
Focus on spoken content in multimedia retrieval 23/48
Approach 1: Fixed length segmentation
I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots
I Fixed length segmentation with sliding window:
I Post-processing:
Focus on spoken content in multimedia retrieval 24/48
Approach 1: Fixed length segmentation
I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots
I Fixed length segmentation with sliding window:
I Post-processing:
Focus on spoken content in multimedia retrieval 25/48
Approach 1: Fixed length segmentation
I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots
I Fixed length segmentation with sliding window:
I Post-processing:
Focus on spoken content in multimedia retrieval 26/48
Approach 1: Fixed length segmentation
I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots
I Fixed length segmentation with sliding window:
I Post-processing:
Focus on spoken content in multimedia retrieval 27/48
Approach 1: Fixed length segmentation
I Fixed length segmentationI Number of words (including/excluding stop words)I Time slots
I Fixed length segmentation with sliding window:
I Post-processing:
Focus on spoken content in multimedia retrieval 28/48
Approach 2: Flexible length segmentation
I Speech or Video units of varying length
I Speech: sentence, speech segment, silence points,changes of speakers
I Video: shots
I Topical segmentation
I Lexical cohesion - C99, TexTiling
Focus on spoken content in multimedia retrieval 28/48
Approach 2: Flexible length segmentation
I Speech or Video units of varying length
I Speech: sentence, speech segment, silence points,changes of speakers
I Video: shots
I Topical segmentation
I Lexical cohesion - C99, TexTiling
Focus on spoken content in multimedia retrieval 28/48
Approach 2: Flexible length segmentation
I Speech or Video units of varying length
I Speech: sentence, speech segment, silence points,changes of speakers
I Video: shots
I Topical segmentation
I Lexical cohesion - C99, TexTiling
Focus on spoken content in multimedia retrieval 29/48
Outline
I Spoken Content Retrieval: historical perspective
I MediaEval Benchmark:
I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinking tasks
I Dataset collection creation issues for multimedia retrieval:crowdsourcing aspect
I Interesting observations on results:I Segmentation methodsI Evaluation metricsI Numbers
Focus on spoken content in multimedia retrieval 30/48
Evaluation: Search sub-task
Focus on spoken content in multimedia retrieval 31/48
Evaluation: Search sub-task
Focus on spoken content in multimedia retrieval 32/48
Evaluation: Search sub-task
Focus on spoken content in multimedia retrieval 33/48
Evaluation: Search sub-task
Focus on spoken content in multimedia retrieval 34/48
Evaluation: Search sub-task
I Mean Reciprocal Rank (MRR):
RR =1
RANKI Mean Generalized Average Precision (mGAP):
GAP =1
RANK. PENALTY
Focus on spoken content in multimedia retrieval 34/48
Evaluation: Search sub-task
I Mean Reciprocal Rank (MRR):
RR =1
RANKI Mean Generalized Average Precision (mGAP):
GAP =1
RANK. PENALTY
Focus on spoken content in multimedia retrieval 35/48
Evaluation: Search sub-task
I Mean Average Segment Precision (MASP):Ranking + Length of (ir)relevant content
Segment Precision (SP[r ]) at rank r :
Average Segment Precision:
ASP =1n.
N∑r=1
SP[r ] · rel(sr )
rel(sr ) = 1, if relevant content is present,otherwise rel(sr ) = 0
Focus on spoken content in multimedia retrieval 35/48
Evaluation: Search sub-task
I Mean Average Segment Precision (MASP):Ranking + Length of (ir)relevant content
Segment Precision (SP[r ]) at rank r :
Average Segment Precision:
ASP =1n.
N∑r=1
SP[r ] · rel(sr )
rel(sr ) = 1, if relevant content is present,otherwise rel(sr ) = 0
Focus on spoken content in multimedia retrieval 35/48
Evaluation: Search sub-task
I Mean Average Segment Precision (MASP):Ranking + Length of (ir)relevant content
Segment Precision (SP[r ]) at rank r :
Average Segment Precision:
ASP =1n.
N∑r=1
SP[r ] · rel(sr )
rel(sr ) = 1, if relevant content is present,otherwise rel(sr ) = 0
Focus on spoken content in multimedia retrieval 35/48
Evaluation: Search sub-task
I Mean Average Segment Precision (MASP):Ranking + Length of (ir)relevant content
Segment Precision (SP[r ]) at rank r :
Average Segment Precision:
ASP =1n.
N∑r=1
SP[r ] · rel(sr )
rel(sr ) = 1, if relevant content is present,otherwise rel(sr ) = 0
Focus on spoken content in multimedia retrieval 36/48
Evaluation: Search sub-task
Focus on Precision/Recall of the relevant content within theretrieved segment.
Focus on spoken content in multimedia retrieval 37/48
Outline
I Spoken Content Retrieval: historical perspective
I MediaEval Benchmark:
I 3 years of Spoken Content Retrieval experiments:Rich Speech Retrieval and Search and Hyperlinking tasks
I Dataset collection creation issues for multimedia retrieval:crowdsourcing aspect
I Interesting observations on results:I Segmentation methodsI Evaluation metricsI Numbers
Focus on spoken content in multimedia retrieval 38/48
Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods
Segment:I 100 % Recall of the relevant contentI High Precision (30, 56 %) of the relevant contentI Topic consistency
Focus on spoken content in multimedia retrieval 39/48
Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 40/48
Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 41/48
Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 42/48
Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 43/48
Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 44/48
Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 45/48
Experiments (RSR): Spontaneous Speech SearchRelationship BetweenRetrieval Effectiveness and Segmentation Methods
Focus on spoken content in multimedia retrieval 46/48
Experiments (S&H)
I Fixed length segmentation with sliding windowI 2 transcrpts (LIMSI, LIUM)
LIMSI LIUM
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant
content
I High recall and precision of the relevant content within thesegment leads to good segment ranking.
I Related metadata can be useful to improve ranking of thesegment with high recall and containing non relevantcontent.
I Influence of ASR quality:I The errors effect is not straightforward, can be smoothed by
the use of context, query dependent treatment of thetranscript.
I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).
I Multimodal queries: addition of visual informationdecreases performance.
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant
contentI High recall and precision of the relevant content within the
segment leads to good segment ranking.
I Related metadata can be useful to improve ranking of thesegment with high recall and containing non relevantcontent.
I Influence of ASR quality:I The errors effect is not straightforward, can be smoothed by
the use of context, query dependent treatment of thetranscript.
I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).
I Multimodal queries: addition of visual informationdecreases performance.
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant
contentI High recall and precision of the relevant content within the
segment leads to good segment ranking.I Related metadata can be useful to improve ranking of the
segment with high recall and containing non relevantcontent.
I Influence of ASR quality:I The errors effect is not straightforward, can be smoothed by
the use of context, query dependent treatment of thetranscript.
I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).
I Multimodal queries: addition of visual informationdecreases performance.
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant
contentI High recall and precision of the relevant content within the
segment leads to good segment ranking.I Related metadata can be useful to improve ranking of the
segment with high recall and containing non relevantcontent.
I Influence of ASR quality:
I The errors effect is not straightforward, can be smoothed bythe use of context, query dependent treatment of thetranscript.
I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).
I Multimodal queries: addition of visual informationdecreases performance.
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant
contentI High recall and precision of the relevant content within the
segment leads to good segment ranking.I Related metadata can be useful to improve ranking of the
segment with high recall and containing non relevantcontent.
I Influence of ASR quality:I The errors effect is not straightforward, can be smoothed by
the use of context, query dependent treatment of thetranscript.
I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).
I Multimodal queries: addition of visual informationdecreases performance.
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant
contentI High recall and precision of the relevant content within the
segment leads to good segment ranking.I Related metadata can be useful to improve ranking of the
segment with high recall and containing non relevantcontent.
I Influence of ASR quality:I The errors effect is not straightforward, can be smoothed by
the use of context, query dependent treatment of thetranscript.
I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).
I Multimodal queries: addition of visual informationdecreases performance.
Focus on spoken content in multimedia retrieval 47/48
Segmentation requirements for effective SCRI Segmentation plays significant role in retrieving relevant
contentI High recall and precision of the relevant content within the
segment leads to good segment ranking.I Related metadata can be useful to improve ranking of the
segment with high recall and containing non relevantcontent.
I Influence of ASR quality:I The errors effect is not straightforward, can be smoothed by
the use of context, query dependent treatment of thetranscript.
I ASR System Vocabulary variability: longer segments havehigher MRR scores with transcript of lower languagevariability (LIMSI), whereas shorter segments performbetter with transcripts of higher language variability (LIUM).
I Multimodal queries: addition of visual informationdecreases performance.
Focus on spoken content in multimedia retrieval 48/48
Thank you for your attention!
Questions?