Application of HMMs: Speech recognition

• “Noisy channel” model of speech

Speech feature extractionAcoustic wave form

Sampled at 8KHz, quantized to 8-12 bits

Spectrogram

Frame(10 ms or 80 samples)

Feature vector

~39 dim.

Speech feature extractionAcoustic wave form

Sampled at 8KHz, quantized to 8-12 bits

Spectrogram

Frame(10 ms or 80 samples)

Feature vector

~39 dim.

Phonetic model

Phonetic model• Phones: speech sounds• Phonemes: groups of speech sounds that

have a unique meaning/function in a language (e.g., there are several different ways to pronounce “t”)

HMM models for phones• HMM states in most speech recognition systems

correspond to subphones– There are around 60 phones and as many as 603

context-dependent triphones

HMM models for words

Putting words together

• Given a sequence of acoustic features, how do we find the corresponding word sequence?

Decoding with the Viterbi algorithm

Limitations of Viterbi decoding• Number of states may be too large

– Beam search: at each time step, maintain a short list of the most probable words and only extend transitions from those words into the next time step

• Words with multiple pronunciation variants may get a smaller probability than incorrect words with fewer pronunciation paths

Word model for “tomato”

Limitations of Viterbi decoding• Number of states may be too large

• Beam search: at each time step, maintain a short list of the most probable words and only extend transitions from those words into the next time step

• Words with multiple pronunciation variants may get a smaller probability than incorrect words with fewer pronunciation paths– Use the forward algorithm instead of Viterbi algorithm

• The Markov assumption is too weak to capture the constraints of real language

Advanced techniques• Multiple pass decoding

– Let the Viterbi decoder return multiple candidate utterances and then re-rank them using a more sophisticated language model, e.g., n-gram model

Advanced techniques• Multiple pass decoding

– Let the Viterbi decoder return multiple candidate utterances and then re-rank them using a more sophisticated language model, e.g., n-gram model

• A* decoding– Build a search tree whose nodes are words and whose

paths are possible utterances– Path cost is given by the likelihood of the acoustic

features given the words inferred so far– Heuristic function estimates the best-scoring extension

until the end of the utterance

Reference

• D. Jurafsky and J. Martin, “Speech and Language Processing,” 2nd ed., Prentice Hall, 2008

Application of HMMs: Speech recognition

Documents

Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1

Speech Recognition Training Continuous Density HMMs Lecture Based on: Dr. Rita Singh’s Notes School of Computer Science Carnegie Mellon University

CS 224S / LINGUIST 281 Speech Recognition, Synthesis, and Dialogue Dan Jurafsky Lecture 5: Intro to ASR+HMMs: Forward, Viterbi, Word Error Rate IP Notice:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W)

Segmental HMMs: Modelling Dynamics and Underlying Structure for Automatic Speech Recognition Wendy Holmes 20/20 Speech Limited, UK A DERA/NXT Joint Venture

Speech and Speech Recognition resources

GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

1 Gesture recognition Using HMMs and size functions

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition? also known as automatic speech recognition or computer speech

Speech Recognition. What makes speech recognition hard?

HMMs as Generative Models of Speech · HMMs as Generative Models of Speech samudravijaya@gmail.com Workshop on TTS Synthesis 17-JUN-2014 DAIICT Samudravijaya K Tata Institute of Fundamental

ISSUES IN SPEECH RECOGNITION Shraddha Sharma. Contents: Introduction What is speech recognition? Terminology of speech recognition Why we want speech

Classification and Recognition of Adverse Dysarthric (Cerebral Palsy) Speech Using Hidden Markov Models (HMMs) and Artificial Neural Networks (ANNs) by

Information for Speech Recognition Joint Processing of ... Speech Recognition ... speech onset cues with audio-based speech energy Audio-Visual Speech synthesis ... speech recognition

Gesture recognition using salience detection and concatenated HMMs

Hidden Markov Models in Bioinformaticsceick/ML/HMM_in_BI.pdfA HMM is a statistical model for sequences of discrete simbols. Hmms are used for many years in speech recognition. HMMs

SpeM: Modeling Human Speech Recognition - MRC ... · Web viewKeywords: human speech recognition; automatic speech recognition; spoken word recognition; computational modeling Abstract

Vocal Tract Length Perturbation for Speech Recognition ...ndjaitly/jaitly-icml13-talk.pdf · Speech Recognition with DNN-HMMs ... 5 21.4 21.8 6 21.0 21.3 7 21.6 21.6. Most Improving

Speech Recognition and Speech Translation

1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)