Upload
mason
View
40
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Lecture 4. CS4705 Sound Systems and Text-to-Speech. Sound Systems of Language. Phonetics The sounds ( phones ) of the world’s languages, the phonemes they map to, and how they are produced Phonology Rules that govern how phones are realized differently in different contexts - PowerPoint PPT Presentation
Citation preview
CS 4705
Lecture 4
CS4705
Sound Systems and Text-to-Speech
Sound Systems of Language
• Phonetics – The sounds (phones) of the world’s languages, the
phonemes they map to, and how they are produced
• Phonology– Rules that govern how phones are realized differently
in different contexts
• Technologies:– Automatic Speech Recognition (ASR) systems take
sounds as input and output word hypotheses
– Text-to-Speech (TTS) systems take text as input and produce speech
Letters and Sounds• same spelling = different sounds
o comb, tomb, bomb oo blood, food, good
c court, center, cheese s reason, surreal, shy
• same sound = different spellings[i] sea, see, scene, receive, thief [s] cereal, same, miss
[u] true, few, choose, lieu, do [ay] prime, buy, rhyme, lie
• combination of letters = single soundch child, beach th that, bathe
oo good, foot gh laugh
• single letter = combination of soundsx exit, Texas u use, music
• ‘silent’ lettersk knife, know p psycho, pterodactyl
e moose, bone gh through
Articulators
lips
teethAlveolar ridge
velum
uvula
pharyngeal
vocal folds:glottis
larynx
trachea
palate
Articulators in action
“Why did Ken set the soggy net on top of his deck?”
(Sample from the Queen’s University / ATR Labs X-ray Film Database)
Vocal fold vibration
[UCLA Phonetics Lab demo]
Places of articulation
http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html
labial
dentalalveolar post-alveolar/palatal
velar
uvular
pharyngeal
laryngeal/glottal
Articulatory parameters for English consonants (in ARPAbet)
h
q
glottal
dxflap
yl/r wapprox
ng n mnasal
jhchaffric.
zhsh z sdhth v ffric.
g k d t b pstop
velarpalatalalveolarinter-dental
labio-dental
bilabial
PLACE OF ARTICULATION
MA
NN
ER
OF
AR
TIC
ULA
TIO
N
VOICING: voicedvoiceless
American English vowel space
FRONT BACK
HIGH
LOW
eyow
aw
oy
ay
iy
ih
eh
ae aa
ao
uw
uh
ah
ax
ix ux
Acoustic landmarks
“Patricia and Patsy and Sally”
[p] [t] [p] [t]
[p] [t]
[l][sh] [s] [s][n] [n][ix]
[ix] [ih]
[ih] [ax] [ae] [iy] [iy][ae]
Syllables
• Syllabification important for– pronunciation: deny/denim
– speaking rate calculation: syllables per second
– word recognition in ASR
• (onset) + nucleus + (coda): – c a t
– a
– a t
– t o
• Lexical stress: primary, secondary, terciary– telephone
Phonological Rules
• Not all instances of a given phone [x] sound/look alike
• Phoneme /x/ may have many allophones• Phonological rules map phonemes in context to
allophones, e.g.– simple rules: /{t,d}/ --> [V’ _ V
– FSA’s, FST’s
– declarative constraints: t: V’ _ V
Allophones of /t/
• What we would consider a single ‘sound’ can be pronounced differently depending on the phonetic context. For example, the phoneme /t/:
Figure 4.8: Jurafsky & Martin (2000), page 104.
Application: Word Pronunciation for TTS
• Pronouncing dictionaries (the: [‘dhax],[‘dhiy])• Problems:
– Homographs (bass/bass, wind/wind, desert/desert)
– Abbreviation (dr., st.)
– Numbers (2125551212)
– Acronyms (NAACL, IDIAP)
– Morphological variation (unrelentingly)
– Proper names and unknown words
• rules + dictionaries/dictionaries + rules
• Hybrid model:– FSTs model individual word pronunciation in lexicon
(e.g. reg-noun-stem entry c:k a:ae t:t)
– FSAs model morphology (e.g. reg-noun-stem + s)
– FSTs for pronunciation rules (e.g. s--> z)
– special rules to model name and acronym pronunciation
– default letter2sound rules for other words
Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words
• Rhyming analogy: varoom/room, todo/dodo• Linguistic origin: Infiniti, vingt, Perez• Abbreviation expansion:
– spacious living/dining rm w/frplc/dining room with fireplace
– pls?
Summary
• Phones realize phonemes in different contexts– Different places and manners of articulation result in
acoustic differences that can be detected by ASR systems as well as people
• Versatile FSTs can model phonological as well as morphological and spelling systems
• Many creative approaches toward pronunciation modeling for TTS
• Next time: Read Ch 6 (Guest Speaker: Sameer Maskey)