31
Introduction Motivation - Why HMM ? Understanding HMM HMM and Speech Recognition Isolated Word Recognizer Hidden Markov Model and Speech Recognition Nirav S. Uchat 1 Dec,2006 Nirav S. Uchat Hidden Markov Model and Speech Recognition

Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Hidden Markov Model and Speech Recognition

Nirav S. Uchat

1 Dec,2006

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 2: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Outline

1 Introduction

2 Motivation - Why HMM ?

3 Understanding HMM

4 HMM and Speech Recognition

5 Isolated Word Recognizer

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 3: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Introduction

What is Speech Recognition ?

Understanding what is being said

Mapping speech data to textual information

Speech Recognition is indeed challenging

Due to presence of noise in input data

Variation in voice data due to speaker’s physical condition,mood etc..

Difficult to identify boundary condition

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 4: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Different types of Speech Recognition

Type of SpeakerSpeaker Dependent(SD)

relatively easy to constructrequires less training data (only from particular speaker)also known as speaker recognition

Speaker Independent(SID)requires huge training data (from various speaker)difficult to construct

Type of DataIsolates Word Recognizer

recognize single wordeasy to construct (pointer for more difficult speechrecognition)may be speaker dependent or speaker independent

Continuous Speech Recognitionmost difficult of allproblem of finding word boundary

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 5: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Outline

1 Introduction

2 Motivation - Why HMM ?

3 Understanding HMM

4 HMM and Speech Recognition

5 Isolated Word Recognizer

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 6: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Use of Signal Model

it helps us to characterize the property of the given signal

provide theoretical basis for signal processing system

way to understand how system works

we can simulate the source and it help us to understand asmuch as possible about signal source

Why Hidden Markov Model (HMM) ?

very rich in mathematical structure

when applied properly, work very well in practical application

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 7: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Outline

1 Introduction

2 Motivation - Why HMM ?

3 Understanding HMM

4 HMM and Speech Recognition

5 Isolated Word Recognizer

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 8: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Components of HMM [2]

1 Number of state = N2 Number of distinct observation symbol per state = M,

V = V1,V2, · · · ,VM

3 State transition probability =aij

4 Observation symbol probability distribution in statej,Bj(K ) = P[Vk at t|qt = Sj ]

5 The initial state distribution πi = P[q1 = Si ] 1 ≤ i ≤ NNirav S. Uchat Hidden Markov Model and Speech Recognition

Page 9: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Problem For HMM : Problem 1 [2]

Problem 1 : Evaluation Problem Given the observationsequence O = O1 O2 · · · OT , and model λ = (A,B, π), howdo we efficiently compute P(O|λ), the probability ofobservation sequence given the mode.

Figure: Evaluation Problem

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 10: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Problem 2 and 3 [2]

Problem 2 : Hidden State Determination (Decoding)Given the observation sequence O = O1 O2 · · · OT , andmodel λ = (A,B, π), How do we choose “BEST” statesequence Q = q1 q2 · · · qT which is optimal in somemeaningful sense.(In Speech Recognition it can be considered as state emittingcorrect phoneme)

Problem 3 : Learning How do we adjust the modelparameter λ = (A,B, π) to maximize P(O|λ). Problem 3 isone in which we try to optimize model parameter so as tobest describe as to how given observation sequence comes out

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 11: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Solution For Problem 1 : Forward Algorithm

P(O|λ) =∑

q1,··· ,qT

πq1bq1(O1)aq1q2bq2(O2) · · · aqT−1qTbqT

(OT )

Which is O(NT ) algorithm i.e. at every state we have Nchoices to make and total length is T.

Forward algorithm which uses dynamic programming methodto reduce time complexity.

It uses forward variable αt(i) defined as

αt(i) = P(O1,O2, · · · ,Oi , qt = Si |λ)

i.e., the probability of partial observation sequence, O1,O2 tillOt and state Si at time t given the model λ,

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 12: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Figure: Forward Variable

αt+1(j) =

[N∑

i=1

αt(i)aij

]bj(Ot+1), 1 ≤ t ≤ T −1, 1 ≤ j ≤ N

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 13: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Solution For Problem 2 : Decoding using Viterbi Algorithm[1]

Viterbi Algorithm : To find single best state sequencewe define a quantity

δt(i) = maxq1,q2,··· ,qt−1

P[q1q2 · · · qt = i ,O1O2 · · ·Ot |λ]

i.e., δt(i) is the best score along a single path, at time t,which account for the first t observations and ends in state Si ,by induction

δt+1(j) =

[max

iδt(i)aij

]bj(Ot+1)

Key point is, Viterbi algorithm is similar (except for thebacktracking part) in implementation to the Forwardalgorithm. The major difference is maximization of theprevious state in place of summing procedure in forwardcalculation

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 14: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Solution For Problem 3 : Learning (Adjusting modelparameter)

Uses Baum-Welch Learning Algorithm

Core operation is

ξt(i , j) = P(qt = Si , qt+1 = Sj |O, λ) i.e.,the probability ofbeing in state Si at time t, and state Sj at time t + 1 given themodel and observation sequenceγt(i) = the probability of being in state Si at time t, given theobservation sequence and modelwe can relate :

γt(i) =N∑

j=1

ξt(i , j)

re-estimated parameters are :

π = Expected number of times in state Si = γ1(i)

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 15: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

aij = expected number of transition from state Si to Sj

expected number of transition form state Si

=

T−1∑t=1

ξt(i , j)

T−1∑t=1

γt(i)

bj(k) = number of times in state j and observing symbol vk

expected number of times in state j

=

T∑t=1

s.t Ot=Vk

γt(j)

T∑t=1

γt(j)

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 16: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Outline

1 Introduction

2 Motivation - Why HMM ?

3 Understanding HMM

4 HMM and Speech Recognition

5 Isolated Word Recognizer

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 17: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Block Diagram of ASR using HMM

Figure: Forward VariableNirav S. Uchat Hidden Markov Model and Speech Recognition

Page 18: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Basic Structure

Phoneme

smallest unit of information in speech signal (over 10 msec) isPhoneme

“ONE” : W AH N

English language has approximately 56 phoneme

HMM structure for a Phoneme

This model is First Order Markov Model

Transition is from previous state to next state (no jumping)Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 19: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Question to be ask ?

What represent state in HMM ?

HMM for each phoneme

3 state for each HMM

states are : start mid and end

“ONE” : has 3 HMM for phoneme W AH and N each HMMhas 3 state

What is output symbol ?

Symbol form Vector Quantization is used as output symbolfrom state

concatenation of symbol gives phoneme

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 20: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Front-End

purpose is to parameterize an input signal (e.g., audio) into asequence of Features vectorMethod for Feature Vector extraction are

MFCC - Mel Frequency Cepsteral Coefficient

LPC Analysis - Linear Predictive Coding

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 21: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Acoustic Modeling[1]

Uses Vector Quantization to map Feature vector to Symbol.

create training set of feature vector

cluster them in to small number of classes

represent each class by symbol

for each class Vk , compute the probability that it is generatedby given HMM state.

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 22: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Creation Of Search Graph [3]

Search Graph represent Vocabulary under consideration

Acoustic Model, Language model and Lexicon (Decoderduring recognition) works together to produce Search Graph

Language model represent how word are related to each other(which word follows other)

it uses Bi-Gram model

Lexicon is a file containing WORD – PHONEME pair

So we have whole vocabulary represented as graph

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 23: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Complete Example

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 24: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Training

Training is used to adjust model parameter to maximize theprobability of recognition

Audio data from various different source are taken

it is given to the prototype HMM

HMM will adjust the parameter using Baum-Welch algorithm

Once the model is train, unknown data is given for recognition

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 25: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Decoding

It uses Viterbi algorithm for finding “BEST” state sequence

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 26: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Decoding Continued

This is just for Single Word

During Decoding whole graph is searched.

Each HMM has two non emitting state for connecting it toother HMM

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 27: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Outline

1 Introduction

2 Motivation - Why HMM ?

3 Understanding HMM

4 HMM and Speech Recognition

5 Isolated Word Recognizer

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 28: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Isolated Word Recognizer [4]

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 29: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 30: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Problem With Continuous Speech Recognition

Boundary condition

Large vocabulary

Training time

Efficient Search Graph creation

Nirav S. Uchat Hidden Markov Model and Speech Recognition

Page 31: Hidden Markov Model and Speech Recognitionnirav06/i/HMM_Presentation.pdf · V = V 1,V 2,··· ,V M 3 State transition probability =a ij 4 Observation symbol probability distribution

IntroductionMotivation - Why HMM ?

Understanding HMMHMM and Speech Recognition

Isolated Word Recognizer

Dan Jurafsky.CS 224S / LINGUIST 181 Speech Recognition and Synthesis.World Wide Web, http://www.stanford.edu/class/cs224s/.

Lawrence R. Rabiner.A Tutorial on Hidden Markov Model and Selected Applicaitonin Speech Recognition.IEEE, 1989.

Willie Walker, Paul Lamere, and Philip Kwok.Sphinx-4: A Flexible Open Source Framework for SpeechRecognition.SUN Microsystem, 2004.

Steve Young and Gunnar Evermannl.The HTK Book.Microsoft Corporation, 2005.

Nirav S. Uchat Hidden Markov Model and Speech Recognition