Speech Recognition (Dr. M. Sabarimalai Manikandan

CN CN CN CN 711711711711 Speech RecognitionSpeech RecognitionSpeech RecognitionSpeech Recognition

Course Instructor: Dr. M. Sabarimalai Manikandan E-mail: msm.sabari@gmail.com

CN 711: Speech Recognition Course Topics

Course Objectives: Course Objectives: Course Objectives: Course Objectives:

This course provides an introduction to the field of

digital speech processing and applications. Speech

Processing offers a practical and theoretical

understanding of how human speech can be processed

by computers. It covers speech analysis and synthesis,

speech features, speech and speaker recognition, speech

synthesis and applications. The course involves practical

where the student will build working text-to-speech

system in his native language, speech recognition

systems, build their own synthetic voice and build a

complete telephone spoken dialog system.

A. Review some basic DSP conceptsReview some basic DSP conceptsReview some basic DSP conceptsReview some basic DSP concepts

B.B.B.B. Introduction to Speech Signals Introduction to Speech Signals Introduction to Speech Signals Introduction to Speech Signals

• Speech production mechanism

• Types of Sounds, Vowels and consonants

• Loudness, Sound Pressure

• Nature of speech signal, models of speech production

• Silence, Voiced and Unvoiced Speech

• Naturalness and Intelligibility

• Speech data acquisition system

• Why speech processing

• Speech perception model

C.C.C.C. Speech Analysis and Synthesis Speech Analysis and Synthesis Speech Analysis and Synthesis Speech Analysis and Synthesis

• Short-time Fourier Analysis, Spectrogram

• Autocorrelation and cross-correlation

• Human speech production model

• Temporal and spectral characteristics

• Linear prediction (LP) filter theory

• All-pole Filter, Inverse Filtering

• Formants and Pitch Determination

• LP Residuals and Hilbert Transform

• Vocal tract length normalization

D. Speech Features for RecognitionSpeech Features for RecognitionSpeech Features for RecognitionSpeech Features for Recognition

• Temporal and Short-Time Fourier Transform Features

• Teager Energy Based Features, Entropy

• Cepstral Coefficients

• Linear Prediction-based Cepstral coefficients (LPCC)

• Mel Frequency Cepstral Coefficients (MFCCs)

• AM-FM Features, Time-Frequency Analysis

• Wavelet Octave Coefficients of Residues (WOCR)

• Voice Activity Detection

• Silence, Voiced, and Unvoiced Speech Classification

E. Speech ESpeech ESpeech ESpeech Enhancementnhancementnhancementnhancement, Coding, Coding, Coding, Coding and and and and QQQQuality uality uality uality

AAAAssessment ssessment ssessment ssessment

• Acoustic echo cancellation

• Reverberant speech enhancement

• Removal of Different Types of noise and artifacts

• Speech Coding

• Subjective and Objective Metrics

F.F.F.F. Speaker RSpeaker RSpeaker RSpeaker Recognition ecognition ecognition ecognition

• Basic ASR System

• Close-set and Open-set ASR System

• Speaker Identification and Verification

• Text-Independent and Text-Dependent Recognition

• Mean Normalization, Feature Smoothing

• Dynamic Time Warping (DTW), Vector Quantization

• Gaussian Mixture Models (GMMs) and Universal

Background Model (UBM)

• Log-Likelihood Ratio (LLR)

• False Acceptance Probability, False Rejection

probability

• Detection Error Trade-off (DET) curve

• Equal Error Rate (EER)

G.G.G.G. Speech RecognitionSpeech RecognitionSpeech RecognitionSpeech Recognition

• Signal Processing, Template matching

• Phoneme-Recognition

• HMMs, Acoustic Modeling, Language Modeling

• Continuous and Emotional Speech Recognition

• Performance Evaluation

H.H.H.H. Speech Preprocessing ApplicationsSpeech Preprocessing ApplicationsSpeech Preprocessing ApplicationsSpeech Preprocessing Applications

• Voice Conversion, Text-Speech Synthesis

• Spoken Dialogue System,

• Interactive Voice Response (IVR) System

• Identify Your ID

Textbooks and MaterialsTextbooks and MaterialsTextbooks and MaterialsTextbooks and Materials

[1]. Li Tan, Digital Signal Processing: Fundamentals and Applications, Elsevier, 2008.

[2]. Jayant, N.S.; Noll, P. Digital coding of waveforms: principles and applications to speech and video. Englewood

Cliffs, NJ: Prentice Hall, 1984. ISBN 0132119137.

[3]. Rabiner, L.R.; Juang, B. Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall, 1993. ISBN

0130151572.

[4]. L.R. Rabiner and R.E Schafer : Digital processing of speech signals, Prentice Hall, 1978.

[5]. J.L Flanagan : Speech Analysis Synthesis and Perception - 2nd Edition - Sprenger Vertag, 1972.

[6]. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997.

[7]. Jurafsky & Martin. Speech and Language Processing: An Introduction to NLP, CL, and Speech Recognition,

Prentice Hall, 2000.

[8]. T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice-Hall, 2001.

[9]. J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals, 2nd edition, IEEE

Press, 2000.

[10]. T. W. Parsons, Voice and Speech Processing, McGraw-Hill, 1987.

[11]. X. Huang, A. Acero, H. Hon, and R. Reddy, Spoken Language Processing: A Guide to Theory, Algorithm and

System Development, Prentice-Hall, 2001.

[12].[12].[12].[12]. Instructor's Instructor's Instructor's Instructor's NotesNotesNotesNotes

Programming LanguagesProgramming LanguagesProgramming LanguagesProgramming Languages: : : : MATLAB and Jave Media Framework

Important Standard Important Standard Important Standard Important Standard Journals in the Field of Audio and Speech Journals in the Field of Audio and Speech Journals in the Field of Audio and Speech Journals in the Field of Audio and Speech

ProcessingProcessingProcessingProcessing

Important Conferences in the Field of Audio Important Conferences in the Field of Audio Important Conferences in the Field of Audio Important Conferences in the Field of Audio

and Speech Processingand Speech Processingand Speech Processingand Speech Processing

• IEEE Transactions on Audio, Speech and Language Processing

• IEEE Transactions on Signal Processing

• IEEE Signal Processing Magazine

• IEEE Transactions on Information Forensics and Security

• ACM Transactions on Speech and Language Processing

• IEEE Multimedia

• Speech Communication (by Elsevier)

• IEEE Signal Processing Letters

• Signal Processing (by Elsevier)

• Digital Signal Processing (by Elsevier)

• International Journal of Speech Technology

• International Journal of Speech Technology (by Springer)

• Signal, Image and Video Processing (by Springer)

• Computer Speech and Language

• EURASIP Journal on Audio, Speech, and Music Processing wi)

• Journal of Acoustical Society of America (JASA )

• Audio Engineering Society

• IEEE Int. Conf. on Acoustics, Speech and

Signal Processing (ICASSP)

• Eurospeech

• Int. Conf. on Spoken Language Processing

(ICSLP)

• Acoustical Society of America

Speech Recognition (Dr. M. Sabarimalai Manikandan

Documents

Information for Speech Recognition Joint Processing of ... Speech Recognition ... speech onset cues with audio-based speech energy Audio-Visual Speech synthesis ... speech recognition

SPEECH RECOGNITION:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W)

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types

Speech Recognition. What makes speech recognition hard?

Interaction Speech Recognition Technical Reference · 2020-05-05 · 6. Interaction Speech Recognition recognizes the response. 7. Interaction Speech Recognition returns the recognition

The Practical Guide to Speech Recognition · Speech recognition offers a rapid and substantial payback. Table One: Increasing Self-Help with Speech Recognition 3 Speech Recognition

SpeM: Modeling Human Speech Recognition - MRC ... · Web viewKeywords: human speech recognition; automatic speech recognition; spoken word recognition; computational modeling Abstract

Speech Recognition and Speech Translation

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition? also known as automatic speech recognition or computer speech

ISSUES IN SPEECH RECOGNITION Shraddha Sharma. Contents: Introduction What is speech recognition? Terminology of speech recognition Why we want speech

Speech Recognition

Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1

Speech and Speech Recognition resources