Hidden Markov Model (HMM) based Speech Synthesis for Urdu...

Hidden Markov Model (HMM) based Speech Synthesis for Urdu Language

Presenter:

Dr. Tania Habib

Outline:

• Overview

• Unit selection vs HMM based Speech Synthesis (HTS) [1]

• Development

• Requirements for Voice building

• Data Set

• Challenges

• Subjective Evaluation

• Erroneous Words

• Summary

Speech Synthesis Overview:

Text to be Synthesized

Natural Language Processing

Speech Synthesis Engine

Synthesized Speech

Types of Speech Synthesis:

• Rule-based, formant synthesis Hand-crafting each phonetic units by rules

• Corpus based: Concatenative synthesis (Unit Selection) High quality speech can be synthesized using waveform concatenation

algorithms.

To obtain various voices, a large amount of speech data is necessary.

Statistical parametric synthesis (HMM based) Generate speech parameters from statistical models

Voice quality can easily be changed by transforming HMM parameters.

Unit Selection vs. HTS

Unit Selection HTS

Advantages:

High Quality at Waveform level (Specific Domain)

• Small Foot Print• Smooth• Stable Quality

Disadvantages:

• Large footprints• Discontinuous• Unstable quality

Vocoder sound(Domain-independent)

HTS Overview:

SPEECHDATABASE Excitation

Parameterextraction

SpectralParameterExtraction

Excitationgeneration

Synthesis filter

Text analysis

SYNTHESIZEDSPEECH

Parameter generationfrom HMMs

Context-dependent HMMs& state duration models

Labels Excitationparameters

Excitation

Spectralparameters

Speech signal Training part

Synthesis part

Training HMMs

Spectralparameters

Excitationparameters

Labels

[HTS Slides released by HTS Working Group slide no.21]

1. Annotated Training data.

2. Define speech features (MFCC, F0 and duration) for model training.

3. Sorting out unique context-dependent as well as context-independent phonemes (from the training data) for model training.

4. Unified question file for spectral, F0 and duration for context clustering.

Preliminary requirements for the HTS toolkit:

• Source: Paragraphs taken from Urdu Qaida of Grade 2 and 4 respectively

• Duration : 30 minutes

• Total number of utterances: 347

• Recording parameters: Sample rate : 8KHz (up-sampled to 48KHz)

Channel : Mono

Recording format: .WAV

Speaker: Native Urdu female speaker

Data Set Used:

Challenges:

• Generation of the full-context style labels.

• Addition of Prosodic Layers

• Segment

• Stress

• Syllable

• Word

• Unbalanced Training Data

• Defining the Question Set (Context Clustering)

Full-Context Format(1/2):

SIL^A-L+I_I=A@ 1_2/A:0_0_1/B:0-0-2@2-1&2-8#1-3$1-1!0-1;0-0|I_I/C:1+0+2/D:0_0/E:content+2@1+5&1+4#0+1/F:content_2/G:0_0/H:9=5^1=2|NONE/I:8=6/J:17+11-2

Supra-Segmental Context

Segmental Context

Segmental Supra-Segmental

• Current Phoneme• Previous two Phonemes• Next two Phonemes

• Syllable• Stress• Word• Phrase• POS

Full-Context Format(2/2):

x^x-SIL+A=L@1_0/A:0_0_0/B:0-0-0@1-0&1-1#1-1$1-1!0-0;0- …x^SIL-A+L=I_I@1_1/A:0_0_0/B:0-0-1@1-2&1-9#1-3$1-1!0-2;0- …

SIL^A-L+I_I=A@1_2/A:0_0_1/B:0-0-2@2-1&2-8#1-3$1-1!0-1;0-0 …A^L-I_I+A=P@2_1/A:0_0_1/B:0-0-2@2-1&2-8#1-3$1-1!0-1;0- …

� ۔۔۔�

� �� ا

Questions on Segmental/Prosodic Layers:

Phoneme {preceding, succeeding} two phonemes current phoneme

Syllable # of phonemes at {preceding, current, succeeding} syllable {accent, stress} of {preceding, current, succeeding} syllable Position of current syllable in current word # of {preceding, succeeding} {accented, stressed} syllable in current phrase # of syllables {from previous, to next} {accented, stressed} syllable Vowel within current syllable

Word Part of speech of {preceding, current, succeeding} word # of syllables in {preceding, current, succeeding} word Position of current word in current phrase # of {preceding, succeeding} content words in current phrase # of words {from previous, to next} content word

Phrase # of syllables in {preceding, current, succeeding} phrase

Addition of Stress/Syllable Layer:

• Added layers: Stress Syllable

Unbalanced Training data:

0200400600800

100012001400160018002000

A_Y R K

I_I H N M S

D_D U P G

F X Q T

Figure. Phoneme Coverage for the 30-min speech data

• High occurrence for vowels

• Some of the phonemes were completely ignored {J_H, L_H, M_H, N_G_H, R_H, Y, Z_Z} [2]

Context Clustering (Question Set):

• Number of possible combinations are quite enormous with 53 different questions.

• Possible contexts = C n

where C = Total count of basic phonetic units,

n = Total number of Questions

With only Segmental Context (n=5) Possible models are:

665 ≈ 1252 million

• If we consider all the context, it will be practically infinite.

Solution:

• Record data having maximum phoneme coverage at tri-phone or di-phone level.

• Apply context clustering technique to classify and share acoustically similar models

Subjective Evaluation:

• Testing Methodology: Mean Opinion Score (MOS)[3] for:

Naturalness Intelligibility

• Naturalness:

How close it seems to be produced by a human?

• Intelligibility:

How much conveniently the word was recognized?

Subjective Testing (Results):

ListenerType

MOSNaturalness

MOSIntelligibility

Technical 1 3.23 3.65

Linguistic 1 2.82 3.66

Table 1. Mean Opinion Score (MOS) results of four listeners

Erroneous words:

NastaliqueStyle

CISAMPA(Correct)

Listened(Incorrect)

Coverage (%)

طرف T_DARAF T_DALAF 5.92

گا GA_A D_DA_A 1.35

معلوم MAYLU_UM MAT_DLU_UM 0.00

تھے T_D_HA_Y T_SA_Y 0.66

رزی RAZI_I RAD_DI_I 0.88

کیونکہ KIU_U_NKA_Y T_SU_NKA_Y 0.15

حق HAQ HABS 0.46

بعد BAYD_D BAD_D 0.00

خیال XAJA_AL FIJA_AL 0.50

Table 2. Synthesized words with errors

Some Synthesized Examples:

Seen Context:

Un-seen Context:

Different Carrier Word:

Synthesized: Training Set:

Summary:

• Text to Speech Synthesis (TTS): Concatenative Parametric (Hmm based)

• Requirement for Voice building Annotated speech corpus Speech features Question file

• Challenges Full context style labels Addition of prosodic layers Question file for context clustering

• Subjective Evaluation

• Erroneous words

References:

• 1. H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black and K. Tokuda, "The HMM-based speech synthesis system (HTS) version 2.0," in proc. of Sixth ISCA Workshop on Speech Synthesis, Bonn, Germany, August, 2007.

• 2. “IPA to CISAMPA Conversion Chart," Center for Language Engineering, UET, Lahore, [Online]. Available: http://www.cle.org.pk/resources/CISAMPA.pdf. [Accessed 3 March 2014].

• 3. M. Viswanathan and M. Viswanathan, "Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale," Computer Speech & Language, vol. 19, no. 1, pp. 55-83, 2005.

Hidden Markov Model (HMM) based Speech Synthesis for Urdu...

Documents

Hidden Markov Model (HMM) - Tutorial

Modèle de Markov cachée Hidden Markov Model (HMM)

8: Hidden Markov Models - University of Cambridge · 8: Hidden Markov Models ... Hidden Markov Model; ... Given an observation sequence O and an HMM = (A;B), discover the best hidden

Hidden Markov Models - SPSC · Hidden Markov Models ... The above weather model turns into a hidden Markov model, ... For our weather HMM , nd the most probable hidden weather sequence

Hidden Markov Modelstp.lingfil.uu.se/~marie/undervisning/textanalys15/HMM.… · · 2015-04-22Hidden Markov Model course based on Jurafsky and Martin [2009, ... Hidden Markov Models

Hidden Markov Models - MIT CSAILpeople.csail.mit.edu/psantana/data/files/seminars/HMM...Markov chains 4. Hidden Markov models 5. HMM algorithms –Prediction –Filtering –Smoothing

Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

Pengenalan Kata Berbahasa Indonesia dengan Hidden Markov ... · Pengenalan Kata Berbahasa Indonesia dengan Hidden Markov Model (HMM) menggunakan AIgoritme Baum-Welch ... • Matlab

Hidden Markov Models - University of Auckland · Hidden Markov Models! A hidden Markov model (HMM) is a statistical model in which the ... problems, as the number of possible hidden

APLIKASI HIDDEN MARKOV MODEL PADA PINTU ...suara atau ucapan adalah Jaringan Saraf Tiruan (JST) dan Hidden Markov Model (HMM). Aplikasi Hidden Markov Model pada pintu geser berbasis

Modèle de Markov cachée Hidden Markov Model (HMM) - Cours

Maschinelles Lernen Hidden Markov Modelle (HMM) (Rabiner Tutorial)

The Hidden Markov Model (HMM) - Computer Scienceelgammal/classes/cs536/lectures/HMM.pdf · 2 Lecture Outline • Theory of Markov Models – discrete Markov processes – hidden Markov

Hidden Markov Model Nov 11, 2008 Sung-Bae Cho. Hidden Markov Model Inference of Hidden Markov Model Path Tracking of HMM Learning of Hidden Markov Model

The complete realization problem for hidden Markov models ...m.vidyasagar/Research/HMM-paper.pdftions is it possible to construct a hidden Markov model (HMM) for this process? This

Modèle de Markov Caché (HMM hidden Markov Model) · Modèle de Markov Caché (HMM hidden Markov Model) On parle de probabilités d’émission car souvent on pense aux HMMs comme

Hidden Markov Models (HMM)•Hidden Markov Models (HMM) Many slides from Michael Collins. Overview I The Tagging Problem I Generative models, and the noisy-channel model, for supervised

CSCE 471/871 Lecture 3: Markov Chains and Hidden Markov Modelscse.unl.edu/~sscott/teach/Classes/cse471S15/slides/3-hmm.pdf · 2015. 2. 2. · HMM Outline Markov chains Hidden Markov

Overview Hidden Markov Models Gaussian Mixture Models · Hidden Markov Models and Gaussian Mixture Models ... Hidden Markov models HMM algorithms ... model ASR Lectures 4&5 Hidden

A hidden Markov model HMM - Saylor Academy · Architecture of a hidden Markov model. The diagram below shows the general architecture of an instantiated HMM. Each oval shape represents