Upload
multimediaeval
View
26
Download
2
Tags:
Embed Size (px)
DESCRIPTION
We described Dublin City University (DCU)'s participation in the Search sub-task of the Search and Hyperlinking Task at MediaEval 2014. Exploratory experiments were carried out to investigate the utility of prosodic prominence features in the task of retrieving relevant video segments from a collection of BBC videos. Normalised acoustic correlates of loudness, pitch, and duration were incorporated in a standard TF-IDF weighting scheme to increase weights for terms that were prominent in speech. Prosodic models outperformed a text-based TF-IDF baseline on the training set but failed to surpass the baseline on the test set.
Citation preview
DCU Search Runs at MediaEval2014 Search and HyperlinkingDavid N. Racca, Maria Eskevich, Gareth J.F. Jones
CNGL Centre for Global Intelligent ContentSchool of Computing, Dublin City University
Dublin, Ireland
Overview
Novelty of Approach for MediaEval 2014:• What: integrate prosodic prominence in the IR weighting scheme• How: increase weight of prosodic prominent terms• Effect: promote rank of retrieved segments containing prominent terms
Motivation
● Speech is much more than a “bag of words”
● Prosodic information: e.g. intonation, duration, and loudness
● Shown useful in many speech processing tasks:● emotions, discourse structure, speech acts, speaker ID, focus, contrast, topic shifting
Related Work
● Possible correlation between acoustic stress and TFIDF scores in English (Crestani 2001)
● Use of signal amplitude and duration in a spoken content retrieval task in Chinese (Chen et al. 2001)
● Use of pitch and intensity in a topic tracking task in French (Guinaudeau 2011)
Approach
● We followed Guinaudeau's method
● Implemented a SCR system that gives greater importance to terms that are prosodically prominent in the spoken content
● Prosodic prominent terms stand out from their surroundings by means of their acoustic characteristics
Approach: Indexing
Video Collection
LVCSR
Feature extraction
XML
CSV
Prosodic features: F0, loudness
~ 10 ms
Automatic text transcripts
XML
Transcripts with prosodic
information
XML
Normalised max, min of F0, and
loudness
XMLNon-overlapping fixed
time segmentation
Segments with prosodic
information
Video Collection
Segments Index
Terrier Indexing
Stopword removal,stemming
Approach: Indexing
XML
Normalised max, min of F0, and
loudness
Segments Index
XML
Transcripts with prosodic
information
car 350 (1,3, [.78, .34,.20 ] ,[ .56,.23, .25] ,[0.66,1.0, .33])(4,1,[ .23,.10, .15]) ,...
race 45 (1,1, [0.81,0.98 ])(9,2,[0.23,0.10 ] ,[0.54,0.27]) ,⋯
⋯
⋯
idf docid tf [F0, L, D]
Transcript
Speech Segment context
Speech Seg 0
Speech Seg 1
Speech Seg N
...
Full transcript context
Approach: Retrieval
Terrier Matching
w s (t )=θ ir∗( idf t∗tf t )+θac∗act
θ ir+θac
Segments Index
“car race”Query Result List
1 Traffic cops 0.0 1.30 3.35
2 Top gear 1.30 3.00 2.82
3 Top gear 48.0 49.30 2.73
... ... ... ... ...
score
act=max (durt )
act=max ( loudt )∗max (F0 t )
(G-dur)
(G-pr)
(G-lp)
Results: Development Set
TranscriptRetrieval
ModelMRR
Subtitles
TFIDF 1 0 .428
Gpr 2 3 .126
Glp 3 1 .281
Gdur 1 3
NSTSheffield
TFIDF 1 0 .313
Gpr 2 3 .329
Glp 3 1 .319
G-dur 1 3 ---
● 50 queries – 1335 hours
θir θacTranscript
Retrieval Model
MRR
LIMSI
TFIDF 1 0 .259
Gpr 2 3 .264
Glp 3 1 .285
Gdur 1 3 .279
LIUM
TFIDF 1 0 .296
Gpr 2 3 .253
G-lp 3 1 ---
G-dur 1 3 ---
θir θac
● Known-item Task
Results: Test Set
TranscriptRetrieval
ModelMAP
Subtitles
TFIDF 1 0 .639
Gpr 2 3 .599
Glp 3 1 .533
Gdur 1 3 .345
NSTSheffield
TFIDF 1 0 .440
Gpr 2 3 .434
Glp 3 1 .435
G-dur 1 3 .404
● 36 queries – 2686 hours
θir θacTranscript
Retrieval Model
MAP
LIMSI
TFIDF 1 0 .525
Gpr 2 3 .508
Glp 3 1 .428
Gdur 1 3 .505
LIUM
TFIDF 1 0 .451
Gpr 2 3 .444
Glp 3 1 .436
G-dur 1 3 .358
θir θac
● Ad-hoc Retrieval Task ● MAP is Overlap MAP
Conclusion
● We explored the potential of prosodic information in the Search and Hyperlinking task
● Prosodic models outperformed a textbased baseline on the development set but failed to do so on the test set:
● known-item vs ad hoc task● differences in query length● other differences in style of query
References
● F. Crestani. Towards the use of prosodic information for spoken document retrieval. SIGIR'01
● B. Chen, HM. Wang, and L.S. Lee. Improved spoken document retrieval by exploring extra acoustic and linguistic cues. INTERSPEECH'01
● C. Guinaudeau and J. Hirschberg. Accounting for prosodic information to improve ASRbased topic tracking for TV broadcast news. INTERSPEECH'11