View
26
Download
0
Category
Tags:
Preview:
DESCRIPTION
Factor Analysis of Acoustic Features for Streamed Hidden Markov Modeling. Chuan-Wei Ting Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan. Outline. Introduction Cepstral Factor Analysis FA Streamed Hidden Markov Model Experiments - PowerPoint PPT Presentation
Citation preview
Factor Analysis of Acoustic Features for Factor Analysis of Acoustic Features for Streamed Hidden Markov ModelingStreamed Hidden Markov Modeling
Chuan-Wei Ting
Department of Computer Science and Information Engineering,
National Cheng Kung University, Tainan, Taiwan
2
Outline
• Introduction
• Cepstral Factor Analysis
• FA Streamed Hidden Markov Model
• Experiments
• Conclusions & Future Works
3
Outline
• IntroductionIntroduction• Stochastic modeling
• Cepstral Factor Analysis
• FA Streamed Hidden Markov Model
• Experiments
• Conclusions & Future Works
4
Introduction
• The objective of constructing acoustic model is to capture the characteristics of speech signal.
• Stochastic modeling
• Hidden Markov model (HMM)
• Multi-Stream HMM
• Factorial HMM
5
Hidden Markov Model
• Topology of HMM
• Constraints
• All features are “tied” together
• Topology
• Transition moment
• Independent assumption
1ts 1tsts
6
Multi-Stream HMM
• Topology of Multi-stream HMM
J
j
M
mjm
mj
mj
J
jjj EpppMp
1 11
)|()|()|()|( YYY
)(mj )(
1mj
)(1Mj
)1(1j
)(Mj
)1(j
7
Simplification of Multi-Stream HMM
• Streams are assumed to be statistical independent
• Weighted log-likelihood approach
J
j
M
m
mj
mj MpMp
1 1
)|(log)|(log YY
J
j
M
m
mj
mj
mj MpMp
1 1
)|(log)|(log YY
8
Factorial HMM
• Topology of FHMM
)2(1ts
)1(1ts
)(1Mts
1ty ty 1ty
)(Mts
)(1Mts
)2(ts
)2(1ts
)1(ts
)1(1ts
9
Outline
• Introduction
• Cepstral Factor AnalysisCepstral Factor Analysis• Features analysis
• Factor analysis
• FA Streamed Hidden Markov Model
• Experiments
• Conclusions & Future Works
10
Cepstral Factor Analysis
• Feature analysis
• Dynamics of different features
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
-15
-10
-5
0
5
10
15
Time (sec)
MF
CC
13th MFCC
1st MFCC
4th MFCC
• Correlations
11
Factor Analysis
• Discover the correlations inherent in observation data.
• Applications
• Data compression
• Signal processing
• Acoustic modeling
12
Mathematical Definition of FA
• FA conducts data analysis of the multivariate observations using the common factors and the specific factors.
• For a dimensional feature vector , the general form of FA model is given by
D TDyy ],[ 1 y
εfWy f
common factor1M factor loading matrix
MD
specific factor1D
),0(~ IN
),0(~ ψN
13
Principal Component Solution
• Find an estimator that will approximate the fundamental expression
• Decompose covariance matrix of observation
• FA parameters can be estimated by
fW
ψWWyy TTER ffy ][
TTTR rrrf2/1
f2/1
ffy VVVVVV
rWfWcVcVcVyyy rfr21
rrf21
ff21
rf
14
Principal Factor Analysis Solution
• Using an initial estimate (diagonal) and then obtain loading matrix by
• Obtain an estimate of by performing a principal component analysis on .
• This process is continued until the communality
estimates converge.
ψ
TR ffyˆˆˆ WWψ
fW
ψy R
M
mdmd w
1
22 ˆ
15
Maximum Likelihood Solution
• When FA is carried out on the correlation matrix
• Where , ,
, , and is a diagonal matrix.
R
Ddw d
M
mdm ,...,1 ,1
1
2
UWψWψψψ~21212121 R
2121 ΣUUR
N
i
Tiin 1
))((1
1yyyyΣ
212111
21 ,..., KKU dmwW U~
UWψWψψψ~ˆˆ 21
021
021
021
0 R
16
Rotation of Loading Matrix
• Rotate loading matrix by an orthogonal matrix
• Where satisfies
WΓH
Γ
TTTTT WWWWΓΓWΓWΓHH ))((
DihqD
jiji ,,1 ,
1
2
H
D
j
D
i i
ijD
i i
ij Dq
hD
q
h
1
2
1
2
1
22
• Varimax rotation
• Let
• can be obtained by maximizing
17
Effectiveness of Rotation
• Obtain greater discriminability
(a) 1st Factor 2nd Factor (b)1st Rotated
Factor
2nd Rotated
Factor1st MFCC 0.842 0.011 1st MFCC -0.892 -0.0044th MFCC -0.312 -0.724 4th MFCC 0.266 0.79113th MFCC 0.896 0.120 13th MFCC -0.933 -0.135
18
Outline
• Introduction
• Cepstral Factor Analysis
• FA Streamed Hidden Markov ModelFA Streamed Hidden Markov Model• Survey of different HMMs
• FASHMM
• Experiments
• Conclusions & Future Works
19
FA Streamed HMM
• Using FA, the processes of observed features and hidden states are represented by common factors and residual factors.
20
Survey of Different HMMs (FAHMM)
• Covariance matrix modeling
• Full vs. diagonal• Sufficient data problem
• FA representation
1f
11ff
111y
ψWWψWWψψ TTIR
• State/latent representation
• Discrete vs. continuous
21
Survey of Different HMMs (Streamed HMM)
• In standard HMM, the joint probability of observation sequence and state sequence was represented by
• Using FHMM, the state at time was extended to
states, i.e. .
},,,{ 21 TY yyy },,,{ 21 TsssS
T
ttttt spsspspspYSp
21111 )|()|()|()(),( yy
t
M )()()1( ,,,, Mt
mttt ssss
• Likelihood combination• Multi-stream HMM
• FHMM
sub-word level
frame level
22
Likelihood Function of FHMM
• State transition probability
• Likelihood function
M
m
mt
mttt sspssp
1
)(1
)(1 )|()|(
, )()(2
1exp
||)2()|(
1
1
1
2/12/
M
mmt
TM
mmt
Dtt sp
yy
y
common covariance matrix
23
Estimation Approaches for FHMM
• Exact inference
• Expectation maximization (EM) algorithm
• Complexity )( 1MTMKO )( 2MTKO
• Approximations
• Gibbs sampling
• Variational inference
)(TMKO
)( 2TMKO
24
FASHMM
• According to FA method, the common factor are associated with some features, which are highly correlated.
• Correlated features are grouped together in a stream and shared by the same FA parameters.
• Observed feature vector can be represented by
mf
TMfff
M]][[
21rfff
rf
21rWwww
rWfWy
25
Topology of FASHMM
• State transition probability
Mfts 1
11fts
r1ts
1ty ty 1ty
rts
r1ts
Mfts
Mfts 1
1fts
11fts
M
m
ft
fttt
tft
ft
ftt
ft
ft
fttt
mm
MM
sspssp
sssssssspssp
11
r1
r
r1111
r1
)|()|(
),,,,|,,,,()|( 2121
26
Outline
• Introduction
• Cepstral Factor Analysis
• FA Streamed Hidden Markov Model
• ExperimentsExperiments• Simulated data setup
• HMM vs. FASHMM
• Recognition results & discussion
• Conclusions & Future Works
27
Experimental Setup
• Simulated data• 4 classes, 5 variables• Training: 100 sentences, 5 “words” per sentence• Testing: 50 utterances, 4 “words” per sentence
• Model structure
• HMM• 7 states each class• Only one Gaussian each state
• FASHMM• 3 states each class• Only one Gaussian each state
28
Class 1
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
2數列
3數列
4數列
5數列
CF1 CF2 CF3 CF4 CF5V1 0.9662 0.1598 0.1863 0.0171 0.0775
V2 -0.2655 -0.9526 -0.0807 -0.1246 -0.0046
V3 0.2394 -0.1161 0.9639 0.0108 0.0008
V4 0.9697 0.1644 0.1639 -0.0001 -0.0755
V5 0.0675 0.9565 -0.2565 -0.1212 -0.0045
-20
-15
-10
-5
0
5
10
15
20
25
30
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
CF1
CF2
CF3
CF4
CF5
29
Class 2
-10
-5
0
5
10
15
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
2數列
3數列
4數列
5數列
CF1 CF2 CF3 CF4 CF5V1 0.1317 0.9733 -0.1647 -0.0908 0.0008
V2 -0.2007 -0.0951 0.9750 0.0041 -0.0001
V3 -0.9818 -0.1093 0.1515 0.0045 -0.0339
V4 -0.9826 -0.1061 0.1486 -0.0005 0.0337
V5 0.0827 0.9931 0.0077 0.0823 -0.0006
-25
-20
-15
-10
-5
0
5
10
15
20
25
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
CF1
CF2
CF3
CF4
CF5
30
Class 3
-15
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
2數列
3數列
4數列
5數列
CF1 CF2 CF3 CF4 CF5V1 0.0324 0.9939 -0.0704 -0.0788 0.0004
V2 -0.1435 -0.1093 0.9836 -0.0004 -0.0003
V3 0.9913 0.0285 -0.1243 -0.0013 0.0321
V4 0.9955 0.0228 -0.0867 0.0006 -0.0314
V5 0.0186 0.9926 -0.0903 0.0792 -0.0002
-30
-20
-10
0
10
20
30
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
CF1
CF2
CF3
CF4
CF5
31
Class 4
-20
-15
-10
-5
0
5
10
15
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
2數列
3數列
4數列
5數列
CF1 CF2 CF3 CF4 CF5V1 0.1634 -0.9746 -0.1532 -0.0040 0.0002
V2 0.1133 0.1475 0.9826 0.0013 -0.0001
V3 -0.9876 0.0482 -0.1109 -0.0950 0.0313
V4 -0.9701 0.2110 -0.0517 0.1075 0.0161
V5 0.9887 -0.1165 0.0829 -0.0003 0.0456
-25
-20
-15
-10
-5
0
5
10
15
20
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
CF1
CF2
CF3
CF4
CF5
32
HMM vs. FASHMM
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
2數列
3數列
4數列
5數列
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
4數列
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
2數列
5數列
HMMHMMFASHMMFASHMM
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
3數列
33
Recognition Results
HMM FASHMM# State per HMM 7 3 ( x4 )
Recognition Accuracy 100% 100%
34
Discussion
-10
-5
0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
1數列
2數列
3數列
4數列
5數列
35
Outline
• Introduction
• Cepstral Factor Analysis
• FA Streamed Hidden Markov Model
• Experiments
• Conclusions & Future WorksConclusions & Future Works
36
Conclusions
• We have presented the FA approach
• Extract the common factor and the residual factors in acoustic features
• Separate the Markov chains for these factors.
• Represent the sophisticated dynamics in stochastic process of speech signal.
• A new topology of FA streamed HMM was proposed.
37
Future Works
• More acoustic features
• Model selection• Streams• States• Mixtures
• Large vocabulary continuous speech recognition (LVCSR) task
Recommended