Upload
vominh
View
222
Download
0
Embed Size (px)
Citation preview
“Semi-automated” – e.g. FAVE (fave.ling.upenn.edu)
• Alignment: automated with dynamic programming)• Formant extraction: automated with LPC• Transcription: manual
We now have access to thousands of hours of speech – manual transcription is impossible.
A Web Application for Automated Dialect Analysis!!
Sravana Reddy & James Stanford, Dartmouth College
• Socio-phoneticians study accents and social variables.• Quantify accent with formants (resonance frequencies), F1 & F2.
• Accents = systematic shifts in formant space.
• Common task: audio à formants of each vowel.
Problem: Vowel Formant Extraction
wave
spectro- gram
phones
Northern speaker
F1: 500Hz
F2: 3000Hz
Southern speaker
F1: 1000Hz
F2: 2000Hz
Transcription “paper”
Alignment
Formant Extraction
DARLA!
darla.dartmouth.edu
Existing Tools
Automate transcription with speech recognition… but isn’t speech recognition inaccurate?
Insight: stressed vowels are usually correct
• Filter out vowels with low acoustic confidence.
• Result: Formants from completely automated system ≅ formantsfrom semi-automated.
Our Idea
REF: no it’s it’s wood turning HYP: no it it would turn it REF: a real dog and cat and all the others HYP: a real docking tap and on the others
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2000 1800 1600 1400 1200 1000
700
600
500
400
Vowel Space
F2
F1
● Obama_ManualObama_Automated
IY
AY
EH
AA
IH
UW
AO
AH
OWEY
AE
ER
OY
AW
UH
EH
AY
AA
IH
IYUW
AO
AH
OW
EY
AE
UH
ER
AW
OY
Where is Obama from?
Semi-Automated
Completely Automated
• Speech recognition with CMU Pocketsphinx• Generic English acoustic models trained
on LibriSpeech (400 hours), language models on WSJ and Fisher transcripts.
• Alignment and formant extraction with FAVE.• Web interface accepts files or YouTube links.• Processing time is about 3x the audio length.
Implementation