View
70
Download
0
Category
Tags:
Preview:
DESCRIPTION
Letter to Phoneme Alignment. Reihaneh Rabbany Shahin Jabbari. Outline. Motivation Problem and its Challenges Relevant Works Our Work Formal Model EM Dynamic Bayesian Network Evaluation Letter to Phoneme Generator AER Result. Text to Speech Problem. - PowerPoint PPT Presentation
Citation preview
LETTER TO PHONEME ALIGNMENT
Reihaneh Rabbany
Shahin Jabbari
OUTLINE
Motivation Problem and its Challenges Relevant Works Our Work
Formal Model EM Dynamic Bayesian Network
Evaluation Letter to Phoneme Generator AER
Result 2
TEXT TO SPEECH TEXT TO SPEECH PROBLEM
Conversion of Text to Speech: TTS
Automated Telecom ServicesE-mail by PhoneBanking SystemsHandicapped People
3
PRONUNCIATIONPRONUNCIATION
Pronunciation of the words Dictionary Words Non-Dictionary Words
Phonetic Analysis
Dictionary Look-up Language is alive, new words add Proper Nouns
4
Phonetic AnalysisWord
Pronunciation
OUTLINE
Motivation Problem and its Challenges Relevant Works Our Work
Formal Model EM Dynamic Bayesian Network
Evaluation Letter to Phoneme Generator AER
Result 5
PROBLEM
Letter to Phoneme Alignment◦ Letter: c a k e
◦ Phoneme: k ei k
6
L2P
CHALLENGES
No Consistency◦ City / s /◦ Cake / k /◦ Kid / k /
No Transparency◦ K i d (3) / k i d / (3) ◦ S i x (3) / s i k s / (4)◦ Q u e u e (5) / k j u: / (3)◦ A x e (3) / a k s / (3)
7
OUTLINE
Motivation Problem and its Challenges Relevant Works Our Work
Formal Model EM Dynamic Bayesian Network
Evaluation Letter to Phoneme Generator AER
Result 8
ONE-TO-ONE EMDAELEMANS ET.AL., 1996 Length of word = pronunciation Produce all possible alignments
Inserting null letter/phoneme
Alignment probability
9
i
ii lpPAP )|()(
DECISION TREEBLACK ET.AL., 1996
Train a CART Using Aligned Dictionary Why CART? A Single Tree for Each Letter
10
KONDRAK
Alignments are not always one-to-one A x e / a k s / B oo k /b ú k /
Only Null Phoneme Similar to one-to-one EM
Produce All Possible Alignments Compute the Probabilities
11
OUTLINE
Motivation Problem and its Challenges Relevant Works Our Work
Formal Model EM Dynamic Bayesian Network
Evaluation Letter to Phoneme Generator AER
Result 12
FORMAL MODEL
Word: sequence of letters
Pronunciation: sequence of phonemes
Alignment: sequence of subalignments
Problem: Finding the most probable alignment
13
mpppP ...21
iiik PLaaaaA ,...21
nlllL ...21
),|(maxarg PLAPA Abest
2|||,| ii PL
MANY-TO-MANY EM
1. Initialize prob(SubAlignmnets)// Expectation Step2. For each word in training_set
2.1. Produce all possible alignments 2.2. Choose the most probable
alignment// Maximization Step3. For all subalignments
3.1. Compute new_p(SubAlignmnets)
14][
],[)(
i
iii lM
plMaP
DYNAMIC BAYESIAN NETWORK
15
Model
Subaligments are considered as hidden variables
Learn DBN by EM
lili PiPi
ai
k
iiii PLaPAP
1
),|()(
],[
][)(
ii
ii lpM
aMaP
CONTEXT DEPENDENT DBN
Context independency assumption Makes the model simpler It is not always a correct assumption Example: Chat and Hat
Model
16
lili PiPi
aiai-1
k
iiiii PLaaPAP
11 ),,|()(
],,[
][)(
1 iii
ii lpaM
aMaP
OUTLINE
Motivation Problem and its Challenges Relevant Works Our Work
Formal Model EM Dynamic Bayesian Network
Evaluation Letter to Phoneme Generator AER
Result 17
EVALUATION DIFFICULTIES
Unsupervised Evaluation No Aligned Dictionary
Solutions How much it boost a supervised module
Letter to Phoneme Generator Comparing the result with a gold alignment
AER
18
Letter to Phoneme Generator
Percentage of correctly generated phonemes and words
How it works? Finding Chunks
Binary Classification Using Instance-Based-Learning
Phoneme Prediction Phoneme is predicted independently for each letter Phoneme is predicted for each chunk
Hidden Markov Model 19
ALIGNMENT ERROR RATIO
AER Evaluating by Alignment Error Ratio
Counting common pairs between Our aligned output Gold alignment
Calculating AER
20
|| A
GAAER
OUTLINE
Motivation Problem and its Challenges Relevant Works Our Work
Formal Model EM Dynamic Bayesian Network
Evaluation Letter to Phoneme Generator AER
Result 21
RESULTS
22
10 fold cross validation
Model Word Accuracy
Phoneme Accuracy
Best previous results 66.82 92.45
One_To_One EM 53.87% 85.66%
Many_To_Many EM 76% 94.5%
DBN ContextIndependent
79.12% 95.23%
ContextDependent
81.54% 96. 70%
Recommended