View
125
Download
1
Category
Preview:
Citation preview
CN CN CN CN 711711711711 Speech RecognitionSpeech RecognitionSpeech RecognitionSpeech Recognition
Course Instructor: Dr. M. Sabarimalai Manikandan E-mail: msm.sabari@gmail.com
CN 711: Speech Recognition Course Topics
Course Objectives: Course Objectives: Course Objectives: Course Objectives:
This course provides an introduction to the field of
digital speech processing and applications. Speech
Processing offers a practical and theoretical
understanding of how human speech can be processed
by computers. It covers speech analysis and synthesis,
speech features, speech and speaker recognition, speech
synthesis and applications. The course involves practical
where the student will build working text-to-speech
system in his native language, speech recognition
systems, build their own synthetic voice and build a
complete telephone spoken dialog system.
A. Review some basic DSP conceptsReview some basic DSP conceptsReview some basic DSP conceptsReview some basic DSP concepts
B.B.B.B. Introduction to Speech Signals Introduction to Speech Signals Introduction to Speech Signals Introduction to Speech Signals
• Speech production mechanism
• Types of Sounds, Vowels and consonants
• Loudness, Sound Pressure
• Nature of speech signal, models of speech production
• Silence, Voiced and Unvoiced Speech
• Naturalness and Intelligibility
• Speech data acquisition system
• Why speech processing
• Speech perception model
C.C.C.C. Speech Analysis and Synthesis Speech Analysis and Synthesis Speech Analysis and Synthesis Speech Analysis and Synthesis
• Short-time Fourier Analysis, Spectrogram
• Autocorrelation and cross-correlation
• Human speech production model
• Temporal and spectral characteristics
• Linear prediction (LP) filter theory
• All-pole Filter, Inverse Filtering
• Formants and Pitch Determination
• LP Residuals and Hilbert Transform
• Vocal tract length normalization
D. Speech Features for RecognitionSpeech Features for RecognitionSpeech Features for RecognitionSpeech Features for Recognition
• Temporal and Short-Time Fourier Transform Features
• Teager Energy Based Features, Entropy
• Cepstral Coefficients
• Linear Prediction-based Cepstral coefficients (LPCC)
• Mel Frequency Cepstral Coefficients (MFCCs)
• AM-FM Features, Time-Frequency Analysis
• Wavelet Octave Coefficients of Residues (WOCR)
• Voice Activity Detection
• Silence, Voiced, and Unvoiced Speech Classification
E. Speech ESpeech ESpeech ESpeech Enhancementnhancementnhancementnhancement, Coding, Coding, Coding, Coding and and and and QQQQuality uality uality uality
AAAAssessment ssessment ssessment ssessment
• Acoustic echo cancellation
• Reverberant speech enhancement
• Removal of Different Types of noise and artifacts
• Speech Coding
• Subjective and Objective Metrics
F.F.F.F. Speaker RSpeaker RSpeaker RSpeaker Recognition ecognition ecognition ecognition
• Basic ASR System
• Close-set and Open-set ASR System
• Speaker Identification and Verification
• Text-Independent and Text-Dependent Recognition
• Mean Normalization, Feature Smoothing
• Dynamic Time Warping (DTW), Vector Quantization
• Gaussian Mixture Models (GMMs) and Universal
Background Model (UBM)
• Log-Likelihood Ratio (LLR)
• False Acceptance Probability, False Rejection
probability
• Detection Error Trade-off (DET) curve
• Equal Error Rate (EER)
G.G.G.G. Speech RecognitionSpeech RecognitionSpeech RecognitionSpeech Recognition
• Signal Processing, Template matching
• Phoneme-Recognition
• HMMs, Acoustic Modeling, Language Modeling
• Continuous and Emotional Speech Recognition
• Performance Evaluation
H.H.H.H. Speech Preprocessing ApplicationsSpeech Preprocessing ApplicationsSpeech Preprocessing ApplicationsSpeech Preprocessing Applications
• Voice Conversion, Text-Speech Synthesis
• Spoken Dialogue System,
• Interactive Voice Response (IVR) System
• Identify Your ID
Textbooks and MaterialsTextbooks and MaterialsTextbooks and MaterialsTextbooks and Materials
[1]. Li Tan, Digital Signal Processing: Fundamentals and Applications, Elsevier, 2008.
[2]. Jayant, N.S.; Noll, P. Digital coding of waveforms: principles and applications to speech and video. Englewood
Cliffs, NJ: Prentice Hall, 1984. ISBN 0132119137.
[3]. Rabiner, L.R.; Juang, B. Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall, 1993. ISBN
0130151572.
[4]. L.R. Rabiner and R.E Schafer : Digital processing of speech signals, Prentice Hall, 1978.
[5]. J.L Flanagan : Speech Analysis Synthesis and Perception - 2nd Edition - Sprenger Vertag, 1972.
[6]. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997.
[7]. Jurafsky & Martin. Speech and Language Processing: An Introduction to NLP, CL, and Speech Recognition,
Prentice Hall, 2000.
[8]. T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice-Hall, 2001.
[9]. J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals, 2nd edition, IEEE
Press, 2000.
[10]. T. W. Parsons, Voice and Speech Processing, McGraw-Hill, 1987.
[11]. X. Huang, A. Acero, H. Hon, and R. Reddy, Spoken Language Processing: A Guide to Theory, Algorithm and
System Development, Prentice-Hall, 2001.
[12].[12].[12].[12]. Instructor's Instructor's Instructor's Instructor's NotesNotesNotesNotes
Programming LanguagesProgramming LanguagesProgramming LanguagesProgramming Languages: : : : MATLAB and Jave Media Framework
Important Standard Important Standard Important Standard Important Standard Journals in the Field of Audio and Speech Journals in the Field of Audio and Speech Journals in the Field of Audio and Speech Journals in the Field of Audio and Speech
ProcessingProcessingProcessingProcessing
Important Conferences in the Field of Audio Important Conferences in the Field of Audio Important Conferences in the Field of Audio Important Conferences in the Field of Audio
and Speech Processingand Speech Processingand Speech Processingand Speech Processing
• IEEE Transactions on Audio, Speech and Language Processing
• IEEE Transactions on Signal Processing
• IEEE Signal Processing Magazine
• IEEE Transactions on Information Forensics and Security
• ACM Transactions on Speech and Language Processing
• IEEE Multimedia
• Speech Communication (by Elsevier)
• IEEE Signal Processing Letters
• Signal Processing (by Elsevier)
• Digital Signal Processing (by Elsevier)
• International Journal of Speech Technology
• International Journal of Speech Technology (by Springer)
• Signal, Image and Video Processing (by Springer)
• Computer Speech and Language
• EURASIP Journal on Audio, Speech, and Music Processing wi)
• Journal of Acoustical Society of America (JASA )
• Audio Engineering Society
• IEEE Int. Conf. on Acoustics, Speech and
Signal Processing (ICASSP)
• Eurospeech
• Int. Conf. on Spoken Language Processing
(ICSLP)
• Acoustical Society of America
Recommended