2
CN CN CN CN 711 711 711 711 Speech Recognition Speech Recognition Speech Recognition Speech Recognition Course Instructor: Dr. M. Sabarimalai Manikandan E-mail: [email protected] CN 711: Speech Recognition Course Topics Course Objectives: Course Objectives: Course Objectives: Course Objectives: This course provides an introduction to the field of digital speech processing and applications. Speech Processing offers a practical and theoretical understanding of how human speech can be processed by computers. It covers speech analysis and synthesis, speech features, speech and speaker recognition, speech synthesis and applications. The course involves practical where the student will build working text-to-speech system in his native language, speech recognition systems, build their own synthetic voice and build a complete telephone spoken dialog system. A. Review some basic DSP concepts Review some basic DSP concepts Review some basic DSP concepts Review some basic DSP concepts B. B. B. B. Introduction to Speech Signals Introduction to Speech Signals Introduction to Speech Signals Introduction to Speech Signals Speech production mechanism Types of Sounds, Vowels and consonants Loudness, Sound Pressure Nature of speech signal, models of speech production Silence, Voiced and Unvoiced Speech Naturalness and Intelligibility Speech data acquisition system Why speech processing Speech perception model C. C. C. C. Speech Analysis and Synthesis Speech Analysis and Synthesis Speech Analysis and Synthesis Speech Analysis and Synthesis Short-time Fourier Analysis, Spectrogram Autocorrelation and cross-correlation Human speech production model Temporal and spectral characteristics Linear prediction (LP) filter theory All-pole Filter, Inverse Filtering Formants and Pitch Determination LP Residuals and Hilbert Transform Vocal tract length normalization D. Speech Features for Recognition Speech Features for Recognition Speech Features for Recognition Speech Features for Recognition Temporal and Short-Time Fourier Transform Features Teager Energy Based Features, Entropy Cepstral Coefficients Linear Prediction-based Cepstral coefficients (LPCC) Mel Frequency Cepstral Coefficients (MFCCs) AM-FM Features, Time-Frequency Analysis Wavelet Octave Coefficients of Residues (WOCR) Voice Activity Detection Silence, Voiced, and Unvoiced Speech Classification E. Speech E Speech E Speech E Speech Enhancement nhancement nhancement nhancement, Coding , Coding , Coding , Coding and and and and Quality uality uality uality Assessment ssessment ssessment ssessment Acoustic echo cancellation Reverberant speech enhancement Removal of Different Types of noise and artifacts Speech Coding Subjective and Objective Metrics F. F. F. F. Speaker R Speaker R Speaker R Speaker Recognition ecognition ecognition ecognition Basic ASR System Close-set and Open-set ASR System Speaker Identification and Verification Text-Independent and Text-Dependent Recognition Mean Normalization, Feature Smoothing Dynamic Time Warping (DTW), Vector Quantization Gaussian Mixture Models (GMMs) and Universal Background Model (UBM) Log-Likelihood Ratio (LLR) False Acceptance Probability, False Rejection probability Detection Error Trade-off (DET) curve Equal Error Rate (EER) G. G. G. G. Speech Recognition Speech Recognition Speech Recognition Speech Recognition Signal Processing, Template matching Phoneme-Recognition HMMs, Acoustic Modeling, Language Modeling Continuous and Emotional Speech Recognition Performance Evaluation H. H. H. H. Speech Preprocessing Applications Speech Preprocessing Applications Speech Preprocessing Applications Speech Preprocessing Applications Voice Conversion, Text-Speech Synthesis Spoken Dialogue System, Interactive Voice Response (IVR) System Identify Your ID

Speech Recognition (Dr. M. Sabarimalai Manikandan

  • Upload
    asmmjan

  • View
    125

  • Download
    1

Embed Size (px)

Citation preview

CN CN CN CN 711711711711 Speech RecognitionSpeech RecognitionSpeech RecognitionSpeech Recognition

Course Instructor: Dr. M. Sabarimalai Manikandan E-mail: [email protected]

CN 711: Speech Recognition Course Topics

Course Objectives: Course Objectives: Course Objectives: Course Objectives:

This course provides an introduction to the field of

digital speech processing and applications. Speech

Processing offers a practical and theoretical

understanding of how human speech can be processed

by computers. It covers speech analysis and synthesis,

speech features, speech and speaker recognition, speech

synthesis and applications. The course involves practical

where the student will build working text-to-speech

system in his native language, speech recognition

systems, build their own synthetic voice and build a

complete telephone spoken dialog system.

A. Review some basic DSP conceptsReview some basic DSP conceptsReview some basic DSP conceptsReview some basic DSP concepts

B.B.B.B. Introduction to Speech Signals Introduction to Speech Signals Introduction to Speech Signals Introduction to Speech Signals

• Speech production mechanism

• Types of Sounds, Vowels and consonants

• Loudness, Sound Pressure

• Nature of speech signal, models of speech production

• Silence, Voiced and Unvoiced Speech

• Naturalness and Intelligibility

• Speech data acquisition system

• Why speech processing

• Speech perception model

C.C.C.C. Speech Analysis and Synthesis Speech Analysis and Synthesis Speech Analysis and Synthesis Speech Analysis and Synthesis

• Short-time Fourier Analysis, Spectrogram

• Autocorrelation and cross-correlation

• Human speech production model

• Temporal and spectral characteristics

• Linear prediction (LP) filter theory

• All-pole Filter, Inverse Filtering

• Formants and Pitch Determination

• LP Residuals and Hilbert Transform

• Vocal tract length normalization

D. Speech Features for RecognitionSpeech Features for RecognitionSpeech Features for RecognitionSpeech Features for Recognition

• Temporal and Short-Time Fourier Transform Features

• Teager Energy Based Features, Entropy

• Cepstral Coefficients

• Linear Prediction-based Cepstral coefficients (LPCC)

• Mel Frequency Cepstral Coefficients (MFCCs)

• AM-FM Features, Time-Frequency Analysis

• Wavelet Octave Coefficients of Residues (WOCR)

• Voice Activity Detection

• Silence, Voiced, and Unvoiced Speech Classification

E. Speech ESpeech ESpeech ESpeech Enhancementnhancementnhancementnhancement, Coding, Coding, Coding, Coding and and and and QQQQuality uality uality uality

AAAAssessment ssessment ssessment ssessment

• Acoustic echo cancellation

• Reverberant speech enhancement

• Removal of Different Types of noise and artifacts

• Speech Coding

• Subjective and Objective Metrics

F.F.F.F. Speaker RSpeaker RSpeaker RSpeaker Recognition ecognition ecognition ecognition

• Basic ASR System

• Close-set and Open-set ASR System

• Speaker Identification and Verification

• Text-Independent and Text-Dependent Recognition

• Mean Normalization, Feature Smoothing

• Dynamic Time Warping (DTW), Vector Quantization

• Gaussian Mixture Models (GMMs) and Universal

Background Model (UBM)

• Log-Likelihood Ratio (LLR)

• False Acceptance Probability, False Rejection

probability

• Detection Error Trade-off (DET) curve

• Equal Error Rate (EER)

G.G.G.G. Speech RecognitionSpeech RecognitionSpeech RecognitionSpeech Recognition

• Signal Processing, Template matching

• Phoneme-Recognition

• HMMs, Acoustic Modeling, Language Modeling

• Continuous and Emotional Speech Recognition

• Performance Evaluation

H.H.H.H. Speech Preprocessing ApplicationsSpeech Preprocessing ApplicationsSpeech Preprocessing ApplicationsSpeech Preprocessing Applications

• Voice Conversion, Text-Speech Synthesis

• Spoken Dialogue System,

• Interactive Voice Response (IVR) System

• Identify Your ID

Textbooks and MaterialsTextbooks and MaterialsTextbooks and MaterialsTextbooks and Materials

[1]. Li Tan, Digital Signal Processing: Fundamentals and Applications, Elsevier, 2008.

[2]. Jayant, N.S.; Noll, P. Digital coding of waveforms: principles and applications to speech and video. Englewood

Cliffs, NJ: Prentice Hall, 1984. ISBN 0132119137.

[3]. Rabiner, L.R.; Juang, B. Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall, 1993. ISBN

0130151572.

[4]. L.R. Rabiner and R.E Schafer : Digital processing of speech signals, Prentice Hall, 1978.

[5]. J.L Flanagan : Speech Analysis Synthesis and Perception - 2nd Edition - Sprenger Vertag, 1972.

[6]. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997.

[7]. Jurafsky & Martin. Speech and Language Processing: An Introduction to NLP, CL, and Speech Recognition,

Prentice Hall, 2000.

[8]. T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice-Hall, 2001.

[9]. J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals, 2nd edition, IEEE

Press, 2000.

[10]. T. W. Parsons, Voice and Speech Processing, McGraw-Hill, 1987.

[11]. X. Huang, A. Acero, H. Hon, and R. Reddy, Spoken Language Processing: A Guide to Theory, Algorithm and

System Development, Prentice-Hall, 2001.

[12].[12].[12].[12]. Instructor's Instructor's Instructor's Instructor's NotesNotesNotesNotes

Programming LanguagesProgramming LanguagesProgramming LanguagesProgramming Languages: : : : MATLAB and Jave Media Framework

Important Standard Important Standard Important Standard Important Standard Journals in the Field of Audio and Speech Journals in the Field of Audio and Speech Journals in the Field of Audio and Speech Journals in the Field of Audio and Speech

ProcessingProcessingProcessingProcessing

Important Conferences in the Field of Audio Important Conferences in the Field of Audio Important Conferences in the Field of Audio Important Conferences in the Field of Audio

and Speech Processingand Speech Processingand Speech Processingand Speech Processing

• IEEE Transactions on Audio, Speech and Language Processing

• IEEE Transactions on Signal Processing

• IEEE Signal Processing Magazine

• IEEE Transactions on Information Forensics and Security

• ACM Transactions on Speech and Language Processing

• IEEE Multimedia

• Speech Communication (by Elsevier)

• IEEE Signal Processing Letters

• Signal Processing (by Elsevier)

• Digital Signal Processing (by Elsevier)

• International Journal of Speech Technology

• International Journal of Speech Technology (by Springer)

• Signal, Image and Video Processing (by Springer)

• Computer Speech and Language

• EURASIP Journal on Audio, Speech, and Music Processing wi)

• Journal of Acoustical Society of America (JASA )

• Audio Engineering Society

• IEEE Int. Conf. on Acoustics, Speech and

Signal Processing (ICASSP)

• Eurospeech

• Int. Conf. on Spoken Language Processing

(ICSLP)

• Acoustical Society of America