17
CS 4705 Lecture 4 CS4705 Sound Systems and Text-to- Speech

Lecture 4

  • Upload
    mason

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Lecture 4. CS4705 Sound Systems and Text-to-Speech. Sound Systems of Language. Phonetics The sounds ( phones ) of the world’s languages, the phonemes they map to, and how they are produced Phonology Rules that govern how phones are realized differently in different contexts - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 4

CS 4705

Lecture 4

CS4705

Sound Systems and Text-to-Speech

Page 2: Lecture 4

Sound Systems of Language

• Phonetics – The sounds (phones) of the world’s languages, the

phonemes they map to, and how they are produced

• Phonology– Rules that govern how phones are realized differently

in different contexts

• Technologies:– Automatic Speech Recognition (ASR) systems take

sounds as input and output word hypotheses

– Text-to-Speech (TTS) systems take text as input and produce speech

Page 3: Lecture 4

Letters and Sounds• same spelling = different sounds

o comb, tomb, bomb oo blood, food, good

c court, center, cheese s reason, surreal, shy

• same sound = different spellings[i] sea, see, scene, receive, thief [s] cereal, same, miss

[u] true, few, choose, lieu, do [ay] prime, buy, rhyme, lie

• combination of letters = single soundch child, beach th that, bathe

oo good, foot gh laugh

• single letter = combination of soundsx exit, Texas u use, music

• ‘silent’ lettersk knife, know p psycho, pterodactyl

e moose, bone gh through

Page 4: Lecture 4

Articulators

lips

teethAlveolar ridge

velum

uvula

pharyngeal

vocal folds:glottis

larynx

trachea

palate

Page 5: Lecture 4

Articulators in action

“Why did Ken set the soggy net on top of his deck?”

(Sample from the Queen’s University / ATR Labs X-ray Film Database)

Page 6: Lecture 4

Vocal fold vibration

[UCLA Phonetics Lab demo]

Page 7: Lecture 4

Places of articulation

http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html

labial

dentalalveolar post-alveolar/palatal

velar

uvular

pharyngeal

laryngeal/glottal

Page 8: Lecture 4

Articulatory parameters for English consonants (in ARPAbet)

h

q

glottal

dxflap

yl/r wapprox

ng n mnasal

jhchaffric.

zhsh z sdhth v ffric.

g k d t b pstop

velarpalatalalveolarinter-dental

labio-dental

bilabial

PLACE OF ARTICULATION

MA

NN

ER

OF

AR

TIC

ULA

TIO

N

VOICING: voicedvoiceless

Page 9: Lecture 4

American English vowel space

FRONT BACK

HIGH

LOW

eyow

aw

oy

ay

iy

ih

eh

ae aa

ao

uw

uh

ah

ax

ix ux

Page 10: Lecture 4

Acoustic landmarks

“Patricia and Patsy and Sally”

[p] [t] [p] [t]

[p] [t]

[l][sh] [s] [s][n] [n][ix]

[ix] [ih]

[ih] [ax] [ae] [iy] [iy][ae]

Page 11: Lecture 4

Syllables

• Syllabification important for– pronunciation: deny/denim

– speaking rate calculation: syllables per second

– word recognition in ASR

• (onset) + nucleus + (coda): – c a t

– a

– a t

– t o

• Lexical stress: primary, secondary, terciary– telephone

Page 12: Lecture 4

Phonological Rules

• Not all instances of a given phone [x] sound/look alike

• Phoneme /x/ may have many allophones• Phonological rules map phonemes in context to

allophones, e.g.– simple rules: /{t,d}/ --> [V’ _ V

– FSA’s, FST’s

– declarative constraints: t: V’ _ V

Page 13: Lecture 4

Allophones of /t/

• What we would consider a single ‘sound’ can be pronounced differently depending on the phonetic context. For example, the phoneme /t/:

Figure 4.8: Jurafsky & Martin (2000), page 104.

Page 14: Lecture 4

Application: Word Pronunciation for TTS

• Pronouncing dictionaries (the: [‘dhax],[‘dhiy])• Problems:

– Homographs (bass/bass, wind/wind, desert/desert)

– Abbreviation (dr., st.)

– Numbers (2125551212)

– Acronyms (NAACL, IDIAP)

– Morphological variation (unrelentingly)

– Proper names and unknown words

• rules + dictionaries/dictionaries + rules

Page 15: Lecture 4

• Hybrid model:– FSTs model individual word pronunciation in lexicon

(e.g. reg-noun-stem entry c:k a:ae t:t)

– FSAs model morphology (e.g. reg-noun-stem + s)

– FSTs for pronunciation rules (e.g. s--> z)

– special rules to model name and acronym pronunciation

– default letter2sound rules for other words

Page 16: Lecture 4

Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words

• Rhyming analogy: varoom/room, todo/dodo• Linguistic origin: Infiniti, vingt, Perez• Abbreviation expansion:

– spacious living/dining rm w/frplc/dining room with fireplace

– pls?

Page 17: Lecture 4

Summary

• Phones realize phonemes in different contexts– Different places and manners of articulation result in

acoustic differences that can be detected by ASR systems as well as people

• Versatile FSTs can model phonological as well as morphological and spelling systems

• Many creative approaches toward pronunciation modeling for TTS

• Next time: Read Ch 6 (Guest Speaker: Sameer Maskey)