Upload
allan-goodwin
View
217
Download
0
Embed Size (px)
DESCRIPTION
NLPRS20013 Introd uction Grapheme-to-Phoneme (G2P) is a module in TTS system. Grapheme-to-Phoneme approaches. Dictionary base. Rule base. Statistical base. Probabilistic Generalized LR (PGLR) parser is statistical base approach.
Citation preview
NLPRS2001
1
Grapheme-to-Phoneme for
ThaiPongthai Tarsaku
NLPRS2001
2
ContentIntrodu
ctionGrapheme-to-Phoneme in TTS systemProblems in ThaiPGLR ApproachExperiment & Results & DiscussionConclusion
NLPRS2001
3
IntroductionGrapheme-to-Phoneme
(G2P) is a module in TTS system.Grapheme-to-Phoneme approaches.Dictionary
base.Rule base.Statistical base.Probabilistic Generalized LR
(PGLR) parser is statistical base approach.
NLPRS2001
4
G2P in TTS systemT ex t S eg m en ta tio n
G rap h em e -to -P h o n e m e
P ro so d y G e n e ra tio n
S p ee ch S ig n a l S y n th e s is
ผ ม ข อ ข อ บ ค ณุ ท กุ ท ่า น ท ี่ม า เย ีย่ ม ช ม ง า น
/ / / / / / / ผ ม ข อ ข อ บ ค ณุ ท กุ ท ่า น ท ี่ ม า เย ีย่ ม ช ม ง า น
/phom 4/k h@ :4/k h@ :p1/k hun0/thuk3/tha:n2/thi :2/m a:0/j i :am 2/chom 0/nga:n0/
S p e e c h W a v e fo rm
NLPRS2001
5
Problems in Thai (1)
“มณฑ ” า is pronounced as /mon0/tha:0/ “มณฑป” is pronounced as /mon0/dop1/“ ” เพลา (axe) is pronounced as /phlaw0/“ ” เพลา (time) is pronounced as /phe:0/la:0/“ ” น้ำ�า is phonologically pronounced as /nam3/ but usually pronounced as /na:m3/
Ambiguity in grapheme-phoneme mapping.Homograph.
Vowel’s length
NLPRS2001
6
Problems in Thai (2)
“วทิ” in “วทิ ” ยา is pronounced as /wit2/tha2/ “อัฐ”ิ is pronounced as /?at1/thi1/“ ” ตากลม can be segmented into “ตา| ” กลม (round eye) and “ตาก| ” ลม (to expose wind) which are pronounced /ta:0/klom0/ and /ta:k1/lom0/ respectively.
Linking syllable pronunciation.Ambiguity in consonantal functionality..Word boundary.
NLPRS2001
7
PGLR ApproachPGLR : Probabilistic
Generalize LR parsing.PGLR has advantage in context-sensitivity.PGLR is able to capture two levels of context.Global context - over structures
from the CFG rules.Local n-gram context.
NLPRS2001
8
Context-Free Grammar RulesA CFG rule is prepared for
Thai syllable construction.A set of CFG rules is grouped by Thai vowel unit.( 21 groups and 3 special groups)CFG rules are able to cover both monosyllable and polysyllable.
<G r p_ 1><G r p_ 2><G r p_ 1><G r p_ 2>
<in it> า<in it> า <fi nal>เ <in it> เ <in it> <fi nal>
NLPRS2001
9
PGLR parser
Thai Grapheme-to Phoneme system
Most probableparse tree G-P Mapping Toneme
Generation
PGLRTable
CFG Rules G-P Table
W
S y l
S y l S y l
ส มช า ยS y lS y l
p = 0 .3
W
S y l
S y l S y l
ส ม ช า ย
p = 0 .7
/som/chaj/ /som4/chaj0/
W
S y l
S y l S y l
ส ม ช า ย
p = 0 .7
สมชาย
NLPRS2001
10
Grapheme-Phoneme Mapping
Example.W
S y lty p e A
ก า รk a: n
S y lty p eB
W
S y lty p e C
เ ษช ฐ าch e: t th a:
S y lty p e D
W
S y lty p eE
ส ม พ รs m ph no @ :
NLPRS2001
11
Experiment Database
LEXiTRON : The Thai electronic dictio nary is used for training and testing.
~23000 Thai words with pronunciation. TrainingFour-fifth of database is used for training. TestingOne-fifth of database is used for testing. Testing against the -rule based [Wiboon, 1999] and the decision tr
-ee based[Chotimongkol, 2000] systems.
NLPRS2001
12
Result
Conversion (word) accuracy(%)Model
Exact match Ignorance ofVow. Length
PGLR 72.87 90.44
Rule-based 67.14 83.81
Decision Tree 68.76 86.94
NLPRS2001
13
Discussion Vowel’s length problem is dominant (90.44 -> 72.87).Half of all errors (~5%) come from linking syllable problem.To improve accuracy, more training data is required.
NLPRS2001
14
Conclusion PGLR approach has
advantage in context-sensitivity (both global and local context). The efficiency of PGLR parser depends on carefully writing in CFG rules.
This approach can be applie d in syllable segmentation f
ramework or soundex conv ersion framework.
NLPRS2001
15
Thank you
NLPRS2001
16
Tone in Thai There are 5 tone levels
(Tonemes) in Thai.mid-level : 0 low-level : 1 falling-level : 2 high-level : 3 rising-level : 4
Toneme is depended on con sonant class, syllable type a
nd tone marker.
NLPRS2001
17
Tonemic Gen erationTone Markers and Tonemes
ConsonantClass
Syllabletype
unmarked –่ –่ –่ � –่ �Live Syllable 4 1 2 3 4High
ClassDead Syllable 1 1 2 3 4
Live Syllable 0 1 2 3 4MiddleClass
Dead Syllable 1 1 2 3 4
Live Syllable 0 2 3 3 4
Dead ShortSyllable
3 2 2 3 4LowClass
Dead LongSyllable
2 1 3 3 4High Class ขฃฉฐถผฝศษสหMiddle Class ก จฎฏดตบปอLow Class ค ฅฆงชซฌญฑฒณทธนพฟภมยรลวฬฮ
Dead SyllableLive Syllable Dead Short Syllable Dead Long Syllable
- the final consonantis n, ng, m, or j.
- long vowel with nofinal consonant (z).
- short vowel withfinal consonant k, t,or p.
- short vowel with nofinal consonant (z).
- long vowel withfinal consonant k, t,or p.
NLPRS2001
18
GLR parser parse tree[i] G-P Mapping TonemeGeneration
GLRTable
CFG Rules G-P Table
W
S y l
S y l S y l
ส มช า ยS y lS y l
W
S y l
S y l S y l
ส ม ช า ย
W
S y l
S y l S y l
ส ม ช า ย
สมชาย
G-P Table
Phoneme Comparison
The selected parse tree
is used for training
Increasing i
mismatch
match
Parse Tree Selection