16
SPEECH PROCESSING BINIT MOHANTY [email protected]

Speech Processing.ppt

Embed Size (px)

DESCRIPTION

Speech Processing.ppt

Citation preview

Page 1: Speech Processing.ppt

SPEECH PROCESSING

BINIT MOHANTY

[email protected]

Page 2: Speech Processing.ppt

Why Speech?

• No visual contact required

• No special equipment required

• Can be done while doing other things

• Telephones – AT&T

• Mobile Phones (1G and 2G)

Page 3: Speech Processing.ppt

Speech Processing

• Speech Coding

• Speech Synthesis

• Speech Recognition

• Speaker Recognition/Verification

• Dyslexia and Auditory problems

• Audio Engineering

Page 4: Speech Processing.ppt

Speech Coding

• Compress a Speech File

• Why not use standard compression techniques?

• MP3 Format– Perceptual Coding– Exploits sensory organ biases

Page 5: Speech Processing.ppt

Speech Synthesis

• Construct Speech waveform from words

• Speaker Quality and Accent

• Prosody?

• http://www.research.att.com/~ttsweb/tts/demo.php

Page 6: Speech Processing.ppt

Speech Recognition

• Convert a sound waveform to words

• The most relevant and important task in the industry

• 90% in lab conditions, much lower in factory conditions

• Sphinx by CMU, ViaVoce by IBM & SDK by Microsoft

Page 7: Speech Processing.ppt

Speaker Recognition

• Concerned with Biometrics

• Acceptable as a verification technique

• How would this be different from Speech recognition?– Speaker Quality– Prosody– Pitch, Accent etc.

Page 8: Speech Processing.ppt

Dyslexia & Auditory Problems

• Study Voice and Ear defects

• Detect and correct Speech Disfluencies – CMU

• Development of better Ear substitutes – Cochlear Implants

Page 9: Speech Processing.ppt

Audio Engineering

• Adding effects to sound

• Clarity of reproduction

• A Big industry with players like – Dolby, Bose, Phillips etc

• Voice Morphing!

SOURCE TARGET CONV 1 CONV 2

Courtesy: Hui Ye & Steve Young, Cambridge

Page 10: Speech Processing.ppt

Automatic Speech Recognition

• Most Important Task

• Hardest Task– Co-articulation: Two speakers speaking at the

same time– Speaker Variation– Spontaneity– Language Modeling– Noise Robustness

Page 11: Speech Processing.ppt

ASR: Problems

© James Glass, MIT

Page 12: Speech Processing.ppt

ASR: Method

© James Glass, MIT

Page 13: Speech Processing.ppt

ASR: Application

© James Glass, MIT

Page 14: Speech Processing.ppt

Automatic Speech Recognition

© James Glass, MIT

Page 15: Speech Processing.ppt

Automatic Speech Recognition

© James Glass, MIT

Page 16: Speech Processing.ppt

Speech Production