Transcript
Page 1: Letter to Phoneme Alignment

LETTER TO PHONEME ALIGNMENT

Reihaneh Rabbany

Shahin Jabbari

Page 2: Letter to Phoneme Alignment

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 2

Page 3: Letter to Phoneme Alignment

TEXT TO SPEECH TEXT TO SPEECH PROBLEM

Conversion of Text to Speech: TTS

Automated Telecom ServicesE-mail by PhoneBanking SystemsHandicapped People

3

Page 4: Letter to Phoneme Alignment

PRONUNCIATIONPRONUNCIATION

Pronunciation of the words Dictionary Words Non-Dictionary Words

Phonetic Analysis

Dictionary Look-up Language is alive, new words add Proper Nouns

4

Phonetic AnalysisWord

Pronunciation

Page 5: Letter to Phoneme Alignment

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 5

Page 6: Letter to Phoneme Alignment

PROBLEM

Letter to Phoneme Alignment◦ Letter: c a k e

◦ Phoneme: k ei k

6

L2P

Page 7: Letter to Phoneme Alignment

CHALLENGES

No Consistency◦ City / s /◦ Cake / k /◦ Kid / k /

No Transparency◦ K i d (3) / k i d / (3) ◦ S i x (3) / s i k s / (4)◦ Q u e u e (5) / k j u: / (3)◦ A x e (3) / a k s / (3)

7

Page 8: Letter to Phoneme Alignment

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 8

Page 9: Letter to Phoneme Alignment

ONE-TO-ONE EMDAELEMANS ET.AL., 1996 Length of word = pronunciation Produce all possible alignments

Inserting null letter/phoneme

Alignment probability

9

i

ii lpPAP )|()(

Page 10: Letter to Phoneme Alignment

DECISION TREEBLACK ET.AL., 1996

Train a CART Using Aligned Dictionary Why CART? A Single Tree for Each Letter

10

Page 11: Letter to Phoneme Alignment

KONDRAK

Alignments are not always one-to-one A x e / a k s / B oo k /b ú k /

Only Null Phoneme Similar to one-to-one EM

Produce All Possible Alignments Compute the Probabilities

11

Page 12: Letter to Phoneme Alignment

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 12

Page 13: Letter to Phoneme Alignment

FORMAL MODEL

Word: sequence of letters

Pronunciation: sequence of phonemes

Alignment: sequence of subalignments

Problem: Finding the most probable alignment

13

mpppP ...21

iiik PLaaaaA ,...21

nlllL ...21

),|(maxarg PLAPA Abest

2|||,| ii PL

Page 14: Letter to Phoneme Alignment

MANY-TO-MANY EM

1. Initialize prob(SubAlignmnets)// Expectation Step2. For each word in training_set

2.1. Produce all possible alignments 2.2. Choose the most probable

alignment// Maximization Step3. For all subalignments

3.1. Compute new_p(SubAlignmnets)

14][

],[)(

i

iii lM

plMaP

Page 15: Letter to Phoneme Alignment

DYNAMIC BAYESIAN NETWORK

15

Model

Subaligments are considered as hidden variables

Learn DBN by EM

lili PiPi

ai

k

iiii PLaPAP

1

),|()(

],[

][)(

ii

ii lpM

aMaP

Page 16: Letter to Phoneme Alignment

CONTEXT DEPENDENT DBN

Context independency assumption Makes the model simpler It is not always a correct assumption Example: Chat and Hat

Model

16

lili PiPi

aiai-1

k

iiiii PLaaPAP

11 ),,|()(

],,[

][)(

1 iii

ii lpaM

aMaP

Page 17: Letter to Phoneme Alignment

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 17

Page 18: Letter to Phoneme Alignment

EVALUATION DIFFICULTIES

Unsupervised Evaluation No Aligned Dictionary

Solutions How much it boost a supervised module

Letter to Phoneme Generator Comparing the result with a gold alignment

AER

18

Page 19: Letter to Phoneme Alignment

Letter to Phoneme Generator

Percentage of correctly generated phonemes and words

How it works? Finding Chunks

Binary Classification Using Instance-Based-Learning

Phoneme Prediction Phoneme is predicted independently for each letter Phoneme is predicted for each chunk

Hidden Markov Model 19

Page 20: Letter to Phoneme Alignment

ALIGNMENT ERROR RATIO

AER Evaluating by Alignment Error Ratio

Counting common pairs between Our aligned output Gold alignment

Calculating AER

20

|| A

GAAER

Page 21: Letter to Phoneme Alignment

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 21

Page 22: Letter to Phoneme Alignment

RESULTS

22

10 fold cross validation

Model Word Accuracy

Phoneme Accuracy

Best previous results 66.82 92.45

One_To_One EM 53.87% 85.66%

Many_To_Many EM 76% 94.5%

DBN ContextIndependent

79.12% 95.23%

ContextDependent

81.54% 96. 70%


Recommended