Upload
chayanmshah
View
44
Download
1
Embed Size (px)
DESCRIPTION
Speech Processing.ppt
Citation preview
Why Speech?
• No visual contact required
• No special equipment required
• Can be done while doing other things
• Telephones – AT&T
• Mobile Phones (1G and 2G)
Speech Processing
• Speech Coding
• Speech Synthesis
• Speech Recognition
• Speaker Recognition/Verification
• Dyslexia and Auditory problems
• Audio Engineering
Speech Coding
• Compress a Speech File
• Why not use standard compression techniques?
• MP3 Format– Perceptual Coding– Exploits sensory organ biases
Speech Synthesis
• Construct Speech waveform from words
• Speaker Quality and Accent
• Prosody?
• http://www.research.att.com/~ttsweb/tts/demo.php
Speech Recognition
• Convert a sound waveform to words
• The most relevant and important task in the industry
• 90% in lab conditions, much lower in factory conditions
• Sphinx by CMU, ViaVoce by IBM & SDK by Microsoft
Speaker Recognition
• Concerned with Biometrics
• Acceptable as a verification technique
• How would this be different from Speech recognition?– Speaker Quality– Prosody– Pitch, Accent etc.
Dyslexia & Auditory Problems
• Study Voice and Ear defects
• Detect and correct Speech Disfluencies – CMU
• Development of better Ear substitutes – Cochlear Implants
Audio Engineering
• Adding effects to sound
• Clarity of reproduction
• A Big industry with players like – Dolby, Bose, Phillips etc
• Voice Morphing!
SOURCE TARGET CONV 1 CONV 2
Courtesy: Hui Ye & Steve Young, Cambridge
Automatic Speech Recognition
• Most Important Task
• Hardest Task– Co-articulation: Two speakers speaking at the
same time– Speaker Variation– Spontaneity– Language Modeling– Noise Robustness
ASR: Problems
© James Glass, MIT
ASR: Method
© James Glass, MIT
ASR: Application
© James Glass, MIT
Automatic Speech Recognition
© James Glass, MIT
Automatic Speech Recognition
© James Glass, MIT
Speech Production