Upload
leticia-joplin
View
228
Download
6
Tags:
Embed Size (px)
Citation preview
SIPCom 8-4Speech Processing, MM7
- Speech Synthesis- Speech Recognition (Part 1 of
3)
Børge [email protected]
Text-to speech Synthesis
Text analysis
Prosody generation
Sound generation
Text Synthetic speech
Lexicon & Rules
Pitch & duration (stød)
Diphone-database
• Why is it so difficult ?– Text nomalisation
• “kl 12-14”, “8-3=5”, “8-4-1997”, “mio”, “USA”
– Morphological analysis• “periferien” vs. “skoleferien”, “hul”
– Syntactic analysis• “en mand med hul røst dør bag en dør med hul i”
– Semantic analysis• “The man fed her dog biscuits”
– Sound generation• Transitions, time- and pitch scaling
Concatenative synthesis
test = /tEsd/ = /#t/ + /tE/ + /Es/ + /sd/ + /d#/
/#t/ /tE/ /Es/ /sd/ /d#/
Di-(tri)phone Database
• database of male speaker
• Approx. 2600 subword units (di- & triphones)
• Requires pitch-, di- and triphone segmentation
j a j A v e h E l C h a O l
9 0
1 0 0
11 0
1 2 0
F 0 [H z ]
T im e
S en ten c e : " Ja , je g v il h e lle re h a ' ø l"
Input to the sound generator
Effect of scaling
• No scaling
• Time scaled
• + pitch scaled
• + energy + stød
(aalb.wav)•Normal
More examples
(fast.wav)•High speaking rate, normal pitch
(slow.wav)•Low speaking rate, normal pitch
(light.wav)•Normal speaking rate, high pitch
(dark.wav)•Normal speaking rate, low pitch
Intelligibility test DST-demo Natural speech
# answers 1600 1600 # errors 18 3 % error 1,1 0,2
Evaluation - intelligibility
• 32 test persons
• 156 stimuli in carrier sentence: “Det er <keyword>, de siger“
Evaluation - naturalness
• 32 test persons
• 155 stimuli
Naturalness test Category MOS
Natural speech 4,63 DST-demo 2,29 INFOVOX 1,11
GSM 3,99