Prosody Modeling

8/3/2019 Prosody Modeling

1/31

4/30/2012 1

Representing Intonational Variation

Julia Hirschberg

CS 4706


2/31

4/30/2012 2

Today

How can we represent meaningful speechvariation so we can assign in TTS?

Expanded vs. compressed pitch range?

Louder vs. softer speech? Faster vs. slower speech?

Differences in intonational prominence?

Differences in intonational phrasing? Differences in pitch contours?


3/31

4/30/2012 3

Schemes for Representing IntonationalVariation

An early proposal: Joshua Steele

Language Learning Approaches

/ IS it INteresting /

/ dyou feel ANGry? /

/ WHATS the PROBlem? / (McCarthy,

1991:106)

What aspects of speech to capture? Continuous or categorical?

If categorical, what are the possible classes?
http://www2.hawaii.edu/~hunterh/Docs/JoshuaSteel.pdfhttp://www.cels.bham.ac.uk/resources/essays/KumakiDiss.pdfhttp://www.cels.bham.ac.uk/resources/essays/KumakiDiss.pdfhttp://www2.hawaii.edu/~hunterh/Docs/JoshuaSteel.pdf


4/31

4/30/2012 4

Many Models

Auditory: Language teachers: what representations can

learners understand

Acoustic:

Examine the speech signal for critical vs. accidentalvariation

Experimental approaches

Identify potential meaningful variation

Design production or perception studies to test

E.g. what does a contour mean?


5/31

4/30/2012 5

F0 Models

Basic division into linear and superpositionalmodels

Linear models: sequence of independent

choices from an intonation lexicon Superpositional models: hierarchy of

phonological components (utterance toprosodic phrase)


6/31

4/30/2012 6

Intonation Models

LinearorTone sequencemodels British school (Kingdon 58, OConnor &

Arnold 73, Cruttenden 97): based on

auditory analysis

American School (Pierrehumbert 80,

ToBI): mainly acoustic analysis

Dutch school (t Hart, Collier and Cohen

1990): perceptual data

Superpositionalmodels (Fujisaki 1983,Mbius et al. 1993): acoustic/physiological


7/31

4/30/2012 7

Superpositional models

Pitch pattern of intonation modeled with twocomponents: phrase component and accentcomponent.

Phrase has basic shape, and pitchmovements for individual accents aresuperimposed over basic shape:

plus

=

Apples, oranges and tomatoes


8/31

4/30/2012 8

Lily and Rosa thought this was divine.Prince William was gorgeous

and he was looking for a bride.

They dreamed of wedding bells.

Declination: downtrend in f0 over the courseof an utterance

Best seen as statistical abstraction: if onetakes f0 measurements from enough

utterances, over time, a downtrend in f0 willemerge

Good for modeling utterance-level trends


9/31

4/30/2012 9

Superpositional models

Advantages

Good at modeling declination in intonation languages

Successful in speech synthesis for languages like

Japanese (little variation in accent type, e.g.) Capture prosodic structure in languages which have

both tone and intonation (e.g. Mandarin)

Disadvantages

Too rigid: All contours must be modeled with anaccent and a phrase component

Many SAE contours cannot be captured easily


10/31

4/30/2012 10

No account of different accent types, or

variations in phrase endings No notation system which allows users to

share observations from large speech corporaor to compare contours

Used primarily for synthesis


11/31

4/30/2012 11

Tone sequence models

Claim: Intonation is generated fromsequences of (possibly) categorically differentand phonologically distinctive tones


12/31

4/30/2012 12

Types of Tone-sequence Models

t arg

et

H

L

t a

rge t

Type 1: based on pitch movements

Type 2: based on pitch levels

The American School

The British School

The Dutch School


13/31

4/30/2012 13

The British School

Tone sequence model and pitch movement analysis(e.g.fallingvs. risingintonation)

Basic unit of intonational description: intonation phrase(tone unit)

Delimited by pauses, phrase-final lengthening, pitchmovement

Syllables within a tone unit can be stressed or accented

telephone

Accented syllables are stressed and pitch prominent


14/31

4/30/2012 14

An example

...a

POINTwhere you have to

CLEANit

and I think itsHOrriblerrible

Theres a point where you have to clean it and I think its horrible...


15/31

4/30/2012 15

Intonation Phrases

Internal structure

Determined by location of accents in an IP

Each accent defines the beginning of a

prosodic constituent


16/31

4/30/2012 16

Intonation phrase structure

JOHNs never BEEN to Jamaica

Prenuclear accent unit Nuclear accent unit

But

Prehead

Stressed syllable

Head Nucleus


17/31

4/30/2012 17

Six nuclear choices in English

Jam a ic

falling

aic

rising

Ja maa

a c

rising-falling

iJa m a

falling-rising

Jam a i

ca

Rising-falling-rising

a ci

Ja m aa

level

Jam a ica


18/31

4/30/2012 18

The American School

American school-type models make a distinctionbetween accents (what makes a particular wordprominent) and boundary tones (how a phraseends)

Autosegmental metrical or two-tone models

Only two tones, which may be combined

H = high target L = low target


19/31

4/30/2012 19

Pierrehumbert 1980

Contours = pitch accents, phrase accents,boundary tones

Pitch

Accents*

Phrase

Accents*

Boundary

Tone

H* L*

L*+H L+H*

H*+L H+L*

L- H-L% H%


20/31

4/30/2012 20

Price, Ostendorf et al

Break indices: degree ofjuncture betweenwords

0 8 (none to a lot)

What Id like is a nice roast beef sandwich.


21/31

4/30/2012 21

To(nes and)B(reak)I(ndices)

Developed by prosody researchers in fourmeetings over 1991-94

Putting Pierrehumbert 80 and Price,Ostendorf, et al together

Goals:

devise common labeling scheme forStandard American English that is robust

and reliable promote collection of large, prosodically

labeled, shareable corpora


22/31

4/30/2012 22

ToBI standards also proposed for Japanese, German,Italian, Spanish, British and Australian English,....

Minimal ToBI transcription:

Recording of speech

F0 contour

ToBI tiers: orthographic tier: words

break-index tier: degrees of junction (Price et al 89)

tonal tier: pitch accents, phrase accents, boundary tones

(Pierrehumbert 80) miscellaneous tier: disfluencies, non-speech sounds, etc.


23/31

4/30/2012 23

Sample ToBI Labeling


24/31

4/30/2012 24

Online training material,available at:

http://anita.simmons.edu/~tobi/index.html Evaluation

Good inter-labeler reliability for expertand naive labelers: 88% agreement on

presence/absence of tonal category, 81%agreement on category label, 91%agreement on break indices to within 1level (Silverman et al. 92,Pitrelli et al 94)


25/31

4/30/2012 25

Pitch Accent/Prominence in ToBI Which items are made intonationally prominent

and how: tonal targets/levels not movement

Accent type:

H* simple high (declarative) L* simple low (ynq)

L*+H scooped, late rise (uncertainty/incredulity)

L+H* early rise to stress (contrastive focus)

H+!H* fall onto stress (implied familiarity)


26/31

4/30/2012 26

Downstepped accents:

!H*,

L+!H*,

L*+!H

Degree of prominence:

within a phrase: HiF0 (~nuclear accent)

across phrases ??


27/31

4/30/2012 27

Prosodic Phrasing in ToBI Levels of phrasing:

intermediate phrase: one or more pitchaccents plus a phrase accent, H- or L-

intonational phrase: 1 or more intermediate

phrases + boundary tone, H% or L% ToBI break-index tier

0 no word boundary

1 word boundary


28/31

4/30/2012 28

2 strong juncture with no tonalmarkings

3 intermediate phrase boundary

4 intonational phrase boundary


29/31

4/30/2012 29

L*+H

L*

H*

H-H%H-L%L-H%L-L%


30/31

4/30/2012 30

H* !H*

H+!H*

L+H*

H-H%H-L%L-H%L-L%


31/31

4/30/2012 31

Next Class

Predicting prosodic assigments from text

Documents

Prosody Modeling