27
Automatic Speech Recognition II Hidden Markov Models Neural Network

Automatic Speech Recognition II Hidden Markov Models Neural Network

Embed Size (px)

Citation preview

Page 1: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Automatic Speech Recognition II Hidden Markov Models Neural Network

Page 2: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Hidden Markov Model

DTW, VQ => recognize pattern, use distance measurement.

HMM: statistical method for characterizing the properties of the frame of pattern,

Page 3: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Discrete-time Markov Processes Consider a system with:

N distinct states A set of probabilities associated with the

state. => probabilities to change from one state to another state

Time instants

Page 4: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Discrete-time Markov Processes

]|[,...],|[ 121 iqjqPkqiqjqP ttttt

First order Markov chain: the probability depends on just the preceding state.

The set of state-transition probabilities aij :

]|[ 1 iqjqPa ttij

N

jij

ij

ia

ija

1

,0

Page 5: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Discrete-time Markov Processes Ex. Consider a simple three-state Markov model of the

weather.

What is the probability that the weather for the next seven consecutive days is “sun-sun-snow-snow-sun-cloudy-sun”. Given the weather for today is “sun” and the weather condition for each day depends on the condition on a previous day.

O=(sun, sun, sun, snow, snow, sun, cloudy, sun) O=(3, 3, 3, 1, 1, 3, 2, 3)

State1: snowState2: cloudyState3: sunny

8.01.01.0

2.06.02.0

3.03.04.0

}{ ijaA

410536.1

)2.0)(1.0)(3.0)(4.0)(1.0)(8.0)(8.0)(1(

)2|3()3|2()1|3()1|1()3|1()3|3()3|3()3(

)3 2, 3, 1, 1, 3, 3, 3,()|(

PPPPPPPP

PModelOP

Prob. for initial state

Page 6: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Discrete-time Markov Processes Ex. Given a single fair coin, i.e., P(Heads)=P(Tails)=0.5

What is the probability that the next 10 tosses will provide the sequence (HHTHTTHTTH)

What is the probability that 5 of the next 10 tosses will be tails?

Page 7: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Coin-Toss Models

You are in a room with a barrier which you cannot see what is happening.

On the other side of the barrier is another person who is performing a coin-tossing experiment (using one or more coins).

The person will not tell you which coin he selects at any time; he will only tell you the result of each coin flip.

How do we build an HMM to explain the observe sequence of heads and tails? What the states in the model correspond to How many states should be in the model

Page 8: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Coin-Toss Models

Single coin Two states: heads or tails Observable Markov Model => not hidden

heads tails

P(H) 1-P(H)1-P(H)

P(H)

Page 9: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Coin-Toss Models

Two coins (Hidden Markov Model) Two state: coin 1, coin 2 Each state(coin) is characterized by a

probability distribution of heads and tails There are probabilities of state transition

(state transition matrix)

Coin 1 Coin 2

a11 a221-a11

1-a22P(H)=P1P(T)=1-

P1)

P(H)=P2P(T)=1-

P2)

Page 10: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Coin-Toss Models

Three coins (Hidden Markov Model) Three state: coin 1, coin 2, coin 3 Each state(coin) is characterized by a probability

distribution of heads and tails There are probabilities of state transition (state

transition matrix)a11 a22a12

a21

a33

a3

1

a13

a23a3

2P(H)=P1P(T)=1-

P1)

P(H)=P3P(T)=1-

P3)

P(H)=P2P(T)=1-

P2)

Page 11: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

The Urn-and-Ball Model

There are N-glass urns in the room. Each urn is a large quantity of colored balls : M distinct

colors A genie is in the room and it chooses an initial urn. From

this urn, a ball is chosen at random and its color is recorded as the observation. The ball is then replace to the same urn.

A new urn is then selected according to the random selection of the current urn.

Page 12: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Element of an HMM

The number of states in the model (N) : S={1,2,…,N} The number of distinct observation symbols per state (M):

V={v1,v2,…vM}

The state transition probability distribution A={aij} where aij=P[qt+1=j|qt=i]

The observation symbol probability distribution, B={bj(k)}, in which bj(k)=P[ot=vk|qt=j]

The initial state distribution

Complete parameter set of the model

NiiqPi 1],[ 1

),,( BA

Page 13: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

HMM Generator of Observations Given appropriate values of N, M, A, B, and , the HMM can

be used as a generator to give an observation sequence O=(o1o2…oT) Choose an initial state q1=i Set t=1 Choose ot=vk according to the symbol probability distribution in

state i , bj(k)

Transit to the new state qt+1 =j according to the state-transition probability distribution for state i, aij

Set t=t+1; return to step 3 if t<T, otherwise, terminate the procedure.

Page 14: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

HMM Generator of Observations Ex. Consider an HMM representation of a coin-tossing

problem. Assume a three-state model (three coins) with probabilities:

All state transition probabilities = 1/3

State1 State2 State3

P(H) 0.5 0.75 0.25

P(T) 0.5 0.25 0.75

Page 15: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

HMM Generator of Observations1. You observe the sequence O=(H H H H T H T T T T). What

state sequence is most likely? What is the probability of the observation sequence and this most likely state sequence?

Because all state transition probability are equal, the most likely state sequence is the one for which the probability of each individual observation is maximum.

Thus for each H, the most likely state is 2 and for each T the most likely state is 3. The most likely state sequence is

q=(2 2 2 2 3 2 3 3 3 3) with probability

1010 )3/1()75.0()|,( qOP

Page 16: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

HMM Generator of Observations2. What is the probability that the observation sequence

came entirely from state 1?

O=(H H H H T H T T T T), q=(1 1 1 1 1 1 1 1 1 1)

The probability that the first H come from state 1 =0.5*1/3

The probability that the second H come from state 1 =0.5*1/3…

The probability that the first T come from state 1 =0.5*1/3… 1010 )3/1()5.0()|,( qOP

Page 17: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

HMM Generator of Observations If the state-transition probabilities were:

What is the most likely state sequence for O=(H H H H T H T T T T).

a11=0.9 a21=0.45 a31=0.45

a12=0.05 a22=0.1 a32=0.45

a13=0.05 a23=0.45 a33=0.1

Page 18: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

The three basic problems for HMM Problem 1: How do we compute P(O|) Problem 2: How do we choose the state sequence q=(q1,

q2,…qT) that is optimal? (most likely)

Problem 3: How do we adjust the model to maximize P(O|)

Speech recognition sense

),,( BA

Training Model

Samples of W word

vocab

W1Model

WnModel

),,( BA

),,( BAProblem3

Page 19: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

The three basic problems for HMM

To study the physical meaning of model states.

Initial

Vowel

Final

Problem2

Page 20: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

The three basic problems for HMM

Unknown word

Recognize an unknown word.

Calculate P (O|1)

Calculate P (O|n)

compare Prediction

Problem1

Page 21: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Artificial Neural Network

An artificial neural network (ANN), usually called "neural network" (NN), is a mathematical model or computational model that tries to simulate the structure and/or functional aspects of biological neural networks.

Page 22: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Composition of NN

Input nodes: each node is the feature vector of each sample.

Hidden nodes: can be more than 1 layer. Output nodes: the output of the correspond input sample. The connections of input nodes, hidden nodes, and output

nodes are specified by weight values.

Input nodesHidden nodes

Output nodes

Connections

Page 23: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Feedforward operation and classification

A simple three-layer NN

x1 x2

y1 y2

zk

bias

Output k

Hidden j

Input i

wji

wkj

Page 24: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Feedforward operation and classification

Net activation: the inner product of the inputs with the weights at the hidden unit.

Where i = index of input layer, j =index of hidden layer node

Each hidden unit emits an output that is a nonlinear function of its activation, f(net) that is:

Simple of sign function:

01

jji

d

iij wwxnet

)( jj netfy

01

01)()(

netif

netifnetSgnnetf j

Page 25: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Feedforward operation and classification

Each output unit computes its net activation based on the hidden unit signals as

The output unit computes the nonlinear function of its net:

01

kkj

n

jjk wwynet

H

)( kk netfz

Page 26: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Back propagation

Backpropagation is the simplest and the most general methods for supervised training of multilayer NN.

The basic approach in learning starts with an untrained network and follows these steps: Present a training pattern to the input layer. Pass the signals through the net and determine the

output. Compare the output with the target values =>

difference (error) The weights are adjusted to reduce the error.

Page 27: Automatic Speech Recognition II  Hidden Markov Models  Neural Network

Exercise

Implement the vowel classifier by using Neural Network. Use the same speech samples that you use in VQ exercise. What is the important feature to classify vowels? Separate your samples into 2 groups: training and testing Label the class for the training sample. Train Multilayer perceptron from the training samples and perform

testing on the testing data.