20
A Bayesian Approach to HMM-Based Speech Synthesis Kei Hashimoto , Heiga Zen , Yoshihiko Nankaku , Takashi Masuko , and Keiichi Tokuda Nagoya Institute of Technology Tokyo Institute of Technology 1 2 1 1 1 1 2

A Bayesian Approach to HMM-Based Speech Synthesis

Embed Size (px)

DESCRIPTION

A Bayesian Approach to HMM-Based Speech Synthesis. 1. 1. 1. 2. 1. Kei Hashimoto , Heiga Zen , Yoshihiko Nankaku , Takashi Masuko , and Keiichi Tokuda Nagoya Institute of Technology Tokyo Institute of Technology. 1. 2. Background. HMM-based speech synthesis system - PowerPoint PPT Presentation

Citation preview

Page 1: A Bayesian Approach to  HMM-Based Speech Synthesis

A Bayesian Approach to HMM-Based Speech Synthesis

Kei Hashimoto , Heiga Zen ,

Yoshihiko Nankaku , Takashi Masuko ,

and Keiichi Tokuda

Nagoya Institute of Technology

Tokyo Institute of Technology

1

2

1 1

1

1

2

Page 2: A Bayesian Approach to  HMM-Based Speech Synthesis

2

Background HMM-based speech synthesis system

Spectrum, excitation and duration are modeled Speech parameter seqs. are generated

Maximum likelihood (ML) criterion Train HMMs and generate speech parameters Point estimate ⇒ The over-fitting problem

Bayesian approach Estimate posterior dist. of model parameters Prior information can be use

⇒ Alleviate the over-fitting problem

Page 3: A Bayesian Approach to  HMM-Based Speech Synthesis

Outline Bayesian speech synthesis

Variational Bayesian method Speech parameter generation

Bayesian context clustering Prior distribution using cross validation

Experiments Conclusion & Future work

3

Page 4: A Bayesian Approach to  HMM-Based Speech Synthesis

Model training and speech synthesis

Bayesian speech synthesis (1/2)

4

: Model parameters

: Label seq. for synthesis: Label seq. for training: Training data seq.

: Synthesis data seq.

ML

Bayes

Page 5: A Bayesian Approach to  HMM-Based Speech Synthesis

Bayesian speech synthesis (2/2)

Predictive distribution (marginal likelihood)

5

: HMM state seq. for synthesis data

Variational Bayesian method [Attias; ’99]

: HMM state seq. for training data: Likelihood of synthesis data: Likelihood of training data: Prior distribution for model parameters

Page 6: A Bayesian Approach to  HMM-Based Speech Synthesis

Estimate approximate posterior dist. ⇒ Maximize a lower bound

Variational Bayesian method (1/2)

6

  

: Expectation w.r.t.

( Jensen’s inequality )

: Approximate distribution of the true posterior distribution

Page 7: A Bayesian Approach to  HMM-Based Speech Synthesis

Random variables are statistically independent

Optimal posterior distributions

  

Variational Bayesian method (2/2)

7

  

: normalization terms

Iterative updates as the EM algorithm

Page 8: A Bayesian Approach to  HMM-Based Speech Synthesis

Approximation for speech synthesis

is dependent on synthesis data

⇒ Huge computational cost in the synthesis part

Ignore the dependency of synthesis data

⇒ Estimation from only training data

8

  

  

Page 9: A Bayesian Approach to  HMM-Based Speech Synthesis

Prior distribution Conjugate prior distribution

⇒ Posterior dist. becomes a same family of dist. with prior dist.

Determination using statistics of prior data

9

    

: Dimension of feature

: Covariance of prior data

: # of prior data

: Mean of prior data

    Conjugate prior distribution

Likelihood function

Page 10: A Bayesian Approach to  HMM-Based Speech Synthesis

Speech parameter generation Speech parameter

Consist of static and dynamic features

⇒ Only static feature seq. is generated Speech parameter generation based on

Bayesian approach ⇒ Maximize the lower bound

10

  

  

Page 11: A Bayesian Approach to  HMM-Based Speech Synthesis

Relation between Bayes and ML

Compare with the ML criterion

Use of expectations of model parameters Can be solved by the same fashion of ML

11

  

Output dist.

ML ⇒

Bayes ⇒

Page 12: A Bayesian Approach to  HMM-Based Speech Synthesis

Outline Bayesian speech synthesis

Variational Bayesian method Speech parameter generation

Bayesian context clustering Prior distribution using cross validation

Experiments Conclusion & Future work

12

Page 13: A Bayesian Approach to  HMM-Based Speech Synthesis

Bayesian context clustering

Context clustering based on maximizing

13

yes no

Select question

Gain of

Stopping condition

⇒ Split node based on gain

: Is this phoneme a vowel?

Page 14: A Bayesian Approach to  HMM-Based Speech Synthesis

Impact of prior distribution Affect model selection as tuning parameters

⇒ Require determination technique of prior dist.

Conventional: maximize the marginal likelihood Lead to the over-fitting problem as the ML Tuning parameters are still required

Determination technique of prior distribution using cross validation [Hashimoto; ’08]

14

Page 15: A Bayesian Approach to  HMM-Based Speech Synthesis

15

Bayesian approach using CV

Prior distribution based on Cross Validation

2,3 1,3Cross valid prior dist.

Calculate likelihood

Training data is randomly divided into K groups

Posterior dist.

1,2

Page 16: A Bayesian Approach to  HMM-Based Speech Synthesis

Outline Bayesian speech synthesis

Variational Bayesian method Speech parameter generation

Bayesian context clustering Prior distribution using cross validation

Experiments Conclusion & Future work

16

Page 17: A Bayesian Approach to  HMM-Based Speech Synthesis

17

Experimental conditions (1/2)Database ATR Japanese speech database B-set

Speaker MHT

Training data 450 utterances

Test data 53 utterances

Sampling rate 16 kHz

Window Blackman window

Frame size / shift 25 ms / 5 ms

Feature vector24 mel-cepstrum + Δ + ΔΔ and

log F0 + Δ + ΔΔ (78 dimension)

HMM5-state left-to-right HMM

without skip transition

Page 18: A Bayesian Approach to  HMM-Based Speech Synthesis

18

Experimental conditions (2/2) Compared approach

Mean Opinion Score (MOS) test Subjects were 10 Japanese students 20 sentences were chosen at random

Training Context clustering # of states

ML-MDL ML MDL 2,491

Bayes-Bayes Bayes Bayes using CV 25,911

Bayes-MDL BayesBayes using CV

Adjust threshold2,553

ML-Bayes MLMDL

Adjust threshold27,106

Page 19: A Bayesian Approach to  HMM-Based Speech Synthesis

Mean opinion score

Subjective listening test

192,491 25,911 27,1062,553

Page 20: A Bayesian Approach to  HMM-Based Speech Synthesis

20

Conclusions and future work A new framework based on Bayesian approach

All processes are derived from a single predictive distribution

Improve the naturalness of synthesized speech

Future work Introduce HSMM instead of HMM Investigate the relation between the speech

quality and model structures