24
Motor Control Strategies for Chinese Intonation Greg Kochanski (University of Oxford, UK) Chilin Shih (University of Illinois, Urbana- Champaign) Tan Lee (Chinese University of Hong-Kong) Hongyan Jing (IBM)

Motor Control Strategies for Chinese Intonation

  • Upload
    danil

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

Motor Control Strategies for Chinese Intonation. Greg Kochanski (University of Oxford, UK) Chilin Shih (University of Illinois, Urbana-Champaign) Tan Lee (Chinese University of Hong-Kong) Hongyan Jing (IBM). http://kochanski.org/gpk. The Goal: Explain intonation in a way that is: - PowerPoint PPT Presentation

Citation preview

Page 1: Motor Control Strategies for Chinese Intonation

Motor Control Strategies for Chinese Intonation

Greg Kochanski (University of Oxford, UK)

Chilin Shih (University of Illinois, Urbana-Champaign)

Tan Lee (Chinese University of Hong-Kong)

Hongyan Jing (IBM)

Page 2: Motor Control Strategies for Chinese Intonation

http://kochanski.org/gpk

Page 3: Motor Control Strategies for Chinese Intonation

• The Goal:– Explain intonation in a way that is:

• Consistent with linguistic assumptions.• Consistent with known Physiology and Neuroscience.

• The Method:– Motion planning over a phrase.– Minimize sum of

• Error between actual pitch and linguistic target• An “effort” cost term that penalizes rapid, jerky motions.

• The Result:– Intonation in tone languages can be represented by:

• A lexically-specified tone template (i.e, you use a dictionary to look up which tone a syllable has).

• A continuous cost-of-error parameter, one per word.

– Evidence that the cost-of-misinterpretations we measure are real:• Cross-language similarities• Metrical patterns• Other

Page 4: Motor Control Strategies for Chinese Intonation

TheChallenge

Page 5: Motor Control Strategies for Chinese Intonation

Tone languages provide the ideal test case for motor control strategies

1. tone is important, and2. you can be sure what the speaker is trying to accomplish.

The meaning of each syllable is determined by the pitch contour over the syllable.

1. Ma (high tone) = “Mother”2. Ma (rising tone) = “Hemp”3. Ma (low falling tone) = “Horse”4. Ma (high falling tone) = “to scold”You can look up the tone in the dictionary.

Pitch contour is determined primarily by muscle tension in the vocal folds.

Page 6: Motor Control Strategies for Chinese Intonation

Another Challenge

Time (10 ms intervals)

F0 (

Hz)

12

34

Tone shapes

Typical tone shapes in green

Page 7: Motor Control Strategies for Chinese Intonation

People talk nearly as fast as possible, therefore dynamics must be important.

Pitch (f0) for a maximum-rate warble.

Conversational Mandarin on the sametime scale

Pitch (f0) for a maximum-rate warble.

Page 8: Motor Control Strategies for Chinese Intonation

The Data

• Male speaker of Madarin (Chinese)• Female speaker of Cantonese (Chinese)• Text from newspaper news stories.• 737 syllables for Mandarin

– 41.4 syllables per second– 1.20.7 seconds per phrase (between pauses).

• Segmented into words by three independent native speakers (Mandarin)

• Tracks of fundamental frequency vs. time (pitch) extracted by get_f0 from ESPS/Waves package.

Page 9: Motor Control Strategies for Chinese Intonation

Basic assumptions used in modeling

• People plan their utterances several syllables in advance.

• People produce optimal, highly practiced speech.– Most of what we say is made from bits and pieces we’ve said before.– There are only 4 (Mandarin) or 6 (Cantonese) tones to combine.– A speaker has the chance to practice and optimize all the common 3- and 4-

tone sequences.

• A simple model for f0 (pitch): f0 is linearly related to muscle tensions.

• A simple model of the muscle control strategy.– No reason to believe pitch is controlled differently from other muscle motions.

Page 10: Motor Control Strategies for Chinese Intonation

Optimize what?

• People want to minimize the chance that they will be significantly misunderstood. Some words will be more important than others:– Risk = P(misinterpreted) * cost-of-misinterpretation– Perhaps weight matches importance.

• People want to minimize effort and/or talk faster– Chairs, Cars

• How to combine the two?– A weighted sum.– Cost-of-misinterpretation plays the role of the weight.

Page 11: Motor Control Strategies for Chinese Intonation

What is the unit of motion planning?Probably a phrase or a sentence.

People start at a higher pitch when they begin longer sentences.Also planning of inhaled air volume.Therefore, there is some plan ~300 ms before start of speech.

(Data courtesy Chilin Shih)

Page 12: Motor Control Strategies for Chinese Intonation

Modeling math

RGtptp

)(

minarg)(

22222 pppdtG Effort.

targets

2

iii rsR R is the total risk for the utterance: ri is the error of

the ith target, and si is the cost if this particular word is misinterpreted.

y(t) is the pitch of a point in the ith target.The time-dependence is suppressed for clarity.

“We’re optimizing something”

p is the realized pitch

Where ri is the error of the ith target

itarget

2)(ti dtypr

(this is an approximation; see elsewhere for correct, more detailed equation)

p is implicitly a function of time

Page 13: Motor Control Strategies for Chinese Intonation

Modeling math – more detail.

targets

2

iii rsR

Total risk for the utterance.

itarget

22)()(t

iiiii dtypyyppr

y is the pitch of a point in the ith target.

A bar denotes an average over a target.

Where ri is the error of the ith target

Alpha () controls how much the shape of the pitch contour matters.

The cost of a misinterpretation of

the ith syllable.

Beta () controls how much the average pitch of the syllable matters.

Page 14: Motor Control Strategies for Chinese Intonation

“Effort”

22222 pppdtG

How does G depend on the form of the pitch curve?

Large effort implies a curve with larger slopes and sharper corners: wigglier.

Page 15: Motor Control Strategies for Chinese Intonation

Model behavior

• For s>>1, Error (R) dominates, and pitch matches target.

• For s<<1, Effort (G) dominates, both speaker and listener accept large deviations, and pitch smoothly interpolates.

• For s~1, everything compromises.

Page 16: Motor Control Strategies for Chinese Intonation

The rest of the model:

• A model is a sequence of targets.– The type of the target (tone1, tone2, …)

is looked up in a dictionary.

• Each target has a cost-of-misinterpretation.– The cost is adjustable for each word– Syllables within a word are derived from word cost via

the metrical pattern for words of a certain length.

• One target per tone.• Targets are stretched to fit syllable duration.• Only one phonological rule: 3323

Page 17: Motor Control Strategies for Chinese Intonation

What’s the procedure?

Compute the pitch curve as a function of phonological inputs

and the cost of a misinterpretation.

Sequence of tones (phonology)

Costs of mis-interpretations

Predicted F0

Data

Nonlinear least-squares fitting algorithm

Page 18: Motor Control Strategies for Chinese Intonation

Model fits for Mandarin Chinese

Tone class (input)Cost-of-misinterpretation

(result)

Inside a word, the cost of a misinterpretation is distributed

by the metrical pattern

Page 19: Motor Control Strategies for Chinese Intonation

Model fits to Mandarin Chinese

0.61 free parameters per syllable, 13 Hz RMS error.

Page 20: Motor Control Strategies for Chinese Intonation

Results are stable under small changes in the model.

The two models have words defined by different labelers

This model allows extra freedom: different tones are allowed to define their targets differently

This model allows less freedom: all tones have the same type of target.

Costs for misinterpreting different syllables.

Page 21: Motor Control Strategies for Chinese Intonation

Model parameters

Mandarin

Cantonese

Phrasing is marked in speech.

Cantonese data courtesy of Prof. Tan Lee

Page 22: Motor Control Strategies for Chinese Intonation

Metrical patterns inside words (Mandarin)

“Normal” segmentation of characters into words.

Random segmentation of characters into words – Note that the metrical pattern disappears, showing that we are measuring something real that is tied to words.

The metrical pattern controls how the cost-of-misinterpretation is split up inside a word. Syllables are marked with . The vertical position is proportional to log(s) for each syllable, so higher syllables have larger s, and will be executed more carefully. For 4-syllable words, the error bars are shown by the pairs of arrows.

Page 23: Motor Control Strategies for Chinese Intonation

Another nice property

•The cost-of-misinterpretation parameter for a syllable is correlated with the mutual information with the preceeding syllable:

•r = -0.175

•>95% confidence

•Pitch patterns are implemented

•sloppily for syllables that are unsurprising, and

•precisely for surprising ones.

(Mutual informations from a database of 15000 newspaper sentences. Syllable identity was defined by phoneme content and tone.)

Page 24: Motor Control Strategies for Chinese Intonation

Conclusion

• Models with motor planning capture important aspects of speech.

• They allow a very compact representation of complex behaviors.• Intonation is represented as:

– a small set of discrete symbols, in sequence,

– modulated by a cost-of-misinterpretation, with

• The cost-of-misinterpretation parameter seems real:– Similar across languages

– Matches language structure

• This model can be applied broadly:

• Two dialects of Chinese

• Some aspects of English

• Separating different singing and speaking styles from the content

• See http://kochanski.org/papers .