Speech Processing.ppt

SPEECH PROCESSING

BINIT MOHANTY

[email protected]

Why Speech?

• No visual contact required

• No special equipment required

• Can be done while doing other things

• Telephones – AT&T

• Mobile Phones (1G and 2G)

Speech Processing

• Speech Coding

• Speech Synthesis

• Speech Recognition

• Speaker Recognition/Verification

• Dyslexia and Auditory problems

• Audio Engineering

Speech Coding

• Compress a Speech File

• Why not use standard compression techniques?

• MP3 Format– Perceptual Coding– Exploits sensory organ biases

Speech Synthesis

• Construct Speech waveform from words

• Speaker Quality and Accent

• Prosody?

• http://www.research.att.com/~ttsweb/tts/demo.php

Speech Recognition

• Convert a sound waveform to words

• The most relevant and important task in the industry

• 90% in lab conditions, much lower in factory conditions

• Sphinx by CMU, ViaVoce by IBM & SDK by Microsoft

Speaker Recognition

• Concerned with Biometrics

• Acceptable as a verification technique

• How would this be different from Speech recognition?– Speaker Quality– Prosody– Pitch, Accent etc.

Dyslexia & Auditory Problems

• Study Voice and Ear defects

• Detect and correct Speech Disfluencies – CMU

• Development of better Ear substitutes – Cochlear Implants

Audio Engineering

• Adding effects to sound

• Clarity of reproduction

• A Big industry with players like – Dolby, Bose, Phillips etc

• Voice Morphing!

SOURCE TARGET CONV 1 CONV 2

Courtesy: Hui Ye & Steve Young, Cambridge

Automatic Speech Recognition

• Most Important Task

• Hardest Task– Co-articulation: Two speakers speaking at the

same time– Speaker Variation– Spontaneity– Language Modeling– Noise Robustness

ASR: Problems

© James Glass, MIT

ASR: Method

© James Glass, MIT

ASR: Application

© James Glass, MIT


© James Glass, MIT


© James Glass, MIT

Speech Production

Documents

Speech Processing.ppt