Prosody Modeling

Embed Size (px)

Citation preview

  • 8/3/2019 Prosody Modeling

    1/31

    4/30/2012 1

    Representing Intonational Variation

    Julia Hirschberg

    CS 4706

  • 8/3/2019 Prosody Modeling

    2/31

    4/30/2012 2

    Today

    How can we represent meaningful speechvariation so we can assign in TTS?

    Expanded vs. compressed pitch range?

    Louder vs. softer speech? Faster vs. slower speech?

    Differences in intonational prominence?

    Differences in intonational phrasing? Differences in pitch contours?

  • 8/3/2019 Prosody Modeling

    3/31

    4/30/2012 3

    Schemes for Representing IntonationalVariation

    An early proposal: Joshua Steele

    Language Learning Approaches

    / IS it INteresting /

    / dyou feel ANGry? /

    / WHATS the PROBlem? / (McCarthy,

    1991:106)

    What aspects of speech to capture? Continuous or categorical?

    If categorical, what are the possible classes?

    http://www2.hawaii.edu/~hunterh/Docs/JoshuaSteel.pdfhttp://www.cels.bham.ac.uk/resources/essays/KumakiDiss.pdfhttp://www.cels.bham.ac.uk/resources/essays/KumakiDiss.pdfhttp://www2.hawaii.edu/~hunterh/Docs/JoshuaSteel.pdf
  • 8/3/2019 Prosody Modeling

    4/31

    4/30/2012 4

    Many Models

    Auditory: Language teachers: what representations can

    learners understand

    Acoustic:

    Examine the speech signal for critical vs. accidentalvariation

    Experimental approaches

    Identify potential meaningful variation

    Design production or perception studies to test

    E.g. what does a contour mean?

  • 8/3/2019 Prosody Modeling

    5/31

    4/30/2012 5

    F0 Models

    Basic division into linear and superpositionalmodels

    Linear models: sequence of independent

    choices from an intonation lexicon Superpositional models: hierarchy of

    phonological components (utterance toprosodic phrase)

  • 8/3/2019 Prosody Modeling

    6/31

    4/30/2012 6

    Intonation Models

    LinearorTone sequencemodels British school (Kingdon 58, OConnor &

    Arnold 73, Cruttenden 97): based on

    auditory analysis

    American School (Pierrehumbert 80,

    ToBI): mainly acoustic analysis

    Dutch school (t Hart, Collier and Cohen

    1990): perceptual data

    Superpositionalmodels (Fujisaki 1983,Mbius et al. 1993): acoustic/physiological

  • 8/3/2019 Prosody Modeling

    7/31

    4/30/2012 7

    Superpositional models

    Pitch pattern of intonation modeled with twocomponents: phrase component and accentcomponent.

    Phrase has basic shape, and pitchmovements for individual accents aresuperimposed over basic shape:

    plus

    =

    Apples, oranges and tomatoes

  • 8/3/2019 Prosody Modeling

    8/31

    4/30/2012 8

    Lily and Rosa thought this was divine.Prince William was gorgeous

    and he was looking for a bride.

    They dreamed of wedding bells.

    Declination: downtrend in f0 over the courseof an utterance

    Best seen as statistical abstraction: if onetakes f0 measurements from enough

    utterances, over time, a downtrend in f0 willemerge

    Good for modeling utterance-level trends

  • 8/3/2019 Prosody Modeling

    9/31

    4/30/2012 9

    Superpositional models

    Advantages

    Good at modeling declination in intonation languages

    Successful in speech synthesis for languages like

    Japanese (little variation in accent type, e.g.) Capture prosodic structure in languages which have

    both tone and intonation (e.g. Mandarin)

    Disadvantages

    Too rigid: All contours must be modeled with anaccent and a phrase component

    Many SAE contours cannot be captured easily

  • 8/3/2019 Prosody Modeling

    10/31

    4/30/2012 10

    No account of different accent types, or

    variations in phrase endings No notation system which allows users to

    share observations from large speech corporaor to compare contours

    Used primarily for synthesis

  • 8/3/2019 Prosody Modeling

    11/31

    4/30/2012 11

    Tone sequence models

    Claim: Intonation is generated fromsequences of (possibly) categorically differentand phonologically distinctive tones

  • 8/3/2019 Prosody Modeling

    12/31

    4/30/2012 12

    Types of Tone-sequence Models

    t arg

    et

    H

    L

    t a

    rge t

    Type 1: based on pitch movements

    Type 2: based on pitch levels

    The American School

    The British School

    The Dutch School

  • 8/3/2019 Prosody Modeling

    13/31

    4/30/2012 13

    The British School

    Tone sequence model and pitch movement analysis(e.g.fallingvs. risingintonation)

    Basic unit of intonational description: intonation phrase(tone unit)

    Delimited by pauses, phrase-final lengthening, pitchmovement

    Syllables within a tone unit can be stressed or accented

    telephone

    Accented syllables are stressed and pitch prominent

  • 8/3/2019 Prosody Modeling

    14/31

    4/30/2012 14

    An example

    ...a

    POINTwhere you have to

    CLEANit

    and I think itsHOrriblerrible

    Theres a point where you have to clean it and I think its horrible...

  • 8/3/2019 Prosody Modeling

    15/31

    4/30/2012 15

    Intonation Phrases

    Internal structure

    Determined by location of accents in an IP

    Each accent defines the beginning of a

    prosodic constituent

  • 8/3/2019 Prosody Modeling

    16/31

    4/30/2012 16

    Intonation phrase structure

    JOHNs never BEEN to Jamaica

    Prenuclear accent unit Nuclear accent unit

    But

    Prehead

    Stressed syllable

    Head Nucleus

  • 8/3/2019 Prosody Modeling

    17/31

    4/30/2012 17

    Six nuclear choices in English

    Jam a ic

    falling

    aic

    rising

    Ja maa

    a c

    rising-falling

    iJa m a

    falling-rising

    Jam a i

    ca

    Rising-falling-rising

    a ci

    Ja m aa

    level

    Jam a ica

  • 8/3/2019 Prosody Modeling

    18/31

    4/30/2012 18

    The American School

    American school-type models make a distinctionbetween accents (what makes a particular wordprominent) and boundary tones (how a phraseends)

    Autosegmental metrical or two-tone models

    Only two tones, which may be combined

    H = high target L = low target

  • 8/3/2019 Prosody Modeling

    19/31

    4/30/2012 19

    Pierrehumbert 1980

    Contours = pitch accents, phrase accents,boundary tones

    Pitch

    Accents*

    Phrase

    Accents*

    Boundary

    Tone

    H* L*

    L*+H L+H*

    H*+L H+L*

    L- H-L% H%

  • 8/3/2019 Prosody Modeling

    20/31

    4/30/2012 20

    Price, Ostendorf et al

    Break indices: degree ofjuncture betweenwords

    0 8 (none to a lot)

    What Id like is a nice roast beef sandwich.

  • 8/3/2019 Prosody Modeling

    21/31

    4/30/2012 21

    To(nes and)B(reak)I(ndices)

    Developed by prosody researchers in fourmeetings over 1991-94

    Putting Pierrehumbert 80 and Price,Ostendorf, et al together

    Goals:

    devise common labeling scheme forStandard American English that is robust

    and reliable promote collection of large, prosodically

    labeled, shareable corpora

  • 8/3/2019 Prosody Modeling

    22/31

    4/30/2012 22

    ToBI standards also proposed for Japanese, German,Italian, Spanish, British and Australian English,....

    Minimal ToBI transcription:

    Recording of speech

    F0 contour

    ToBI tiers: orthographic tier: words

    break-index tier: degrees of junction (Price et al 89)

    tonal tier: pitch accents, phrase accents, boundary tones

    (Pierrehumbert 80) miscellaneous tier: disfluencies, non-speech sounds, etc.

  • 8/3/2019 Prosody Modeling

    23/31

    4/30/2012 23

    Sample ToBI Labeling

  • 8/3/2019 Prosody Modeling

    24/31

    4/30/2012 24

    Online training material,available at:

    http://anita.simmons.edu/~tobi/index.html Evaluation

    Good inter-labeler reliability for expertand naive labelers: 88% agreement on

    presence/absence of tonal category, 81%agreement on category label, 91%agreement on break indices to within 1level (Silverman et al. 92,Pitrelli et al 94)

  • 8/3/2019 Prosody Modeling

    25/31

    4/30/2012 25

    Pitch Accent/Prominence in ToBI Which items are made intonationally prominent

    and how: tonal targets/levels not movement

    Accent type:

    H* simple high (declarative) L* simple low (ynq)

    L*+H scooped, late rise (uncertainty/incredulity)

    L+H* early rise to stress (contrastive focus)

    H+!H* fall onto stress (implied familiarity)

  • 8/3/2019 Prosody Modeling

    26/31

    4/30/2012 26

    Downstepped accents:

    !H*,

    L+!H*,

    L*+!H

    Degree of prominence:

    within a phrase: HiF0 (~nuclear accent)

    across phrases ??

  • 8/3/2019 Prosody Modeling

    27/31

    4/30/2012 27

    Prosodic Phrasing in ToBI Levels of phrasing:

    intermediate phrase: one or more pitchaccents plus a phrase accent, H- or L-

    intonational phrase: 1 or more intermediate

    phrases + boundary tone, H% or L% ToBI break-index tier

    0 no word boundary

    1 word boundary

  • 8/3/2019 Prosody Modeling

    28/31

    4/30/2012 28

    2 strong juncture with no tonalmarkings

    3 intermediate phrase boundary

    4 intonational phrase boundary

  • 8/3/2019 Prosody Modeling

    29/31

    4/30/2012 29

    L*+H

    L*

    H*

    H-H%H-L%L-H%L-L%

  • 8/3/2019 Prosody Modeling

    30/31

    4/30/2012 30

    H* !H*

    H+!H*

    L+H*

    H-H%H-L%L-H%L-L%

  • 8/3/2019 Prosody Modeling

    31/31

    4/30/2012 31

    Next Class

    Predicting prosodic assigments from text