Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005

Context in Multilingual Tone and Pitch Accent Recognition

Gina-Anne Levow

University of Chicago

September 7, 2005

Roadmap

• Motivating Context

• Data Collections & Processing

• Modeling Context for Tone and Pitch Accent

• Context in Recognition

• Conclusion

Challenges

• Tone and Pitch Accent Recognition– Key component of language understanding

• Lexical tone carries word meaning• Pitch accent carries semantic, pragmatic, discourse meaning

– Non-canonical form (Shen 90, Shih 00, Xu 01)

• Tonal coarticulation modifies surface realization– In extreme cases, fall becomes rise

– Tone is relative• To speaker range

– High for male may be low for female• To phrase range, other tones

– E.g. downstep

Strategy

• Common model across languages, SVM classifier – Acoustic-prosodic model: no word label, POS, lexical stress info

• No explicit tone label sequence model – English, Mandarin Chinese (also Cantonese)

• Exploit contextual information– Features from adjacent syllables

• Height, shape: direct, relative– Compensate for phrase contour

• Analyze impact of – Context position, context encoding, context type– > 20% relative improvement over no context

• Preceding context greater enhancement than following

Data Collection & Processing

• English: (Ostendorf et al, 95)– Boston University Radio News Corpus, f2b– Manually ToBI annotated, aligned, syllabified– Pitch accent aligned to syllables

• Unaccented, High, Downstepped High, Low – (Sun 02, Ross & Ostendorf 95)

• Mandarin: – TDT2 Voice of America Mandarin Broadcast News– Automatically force aligned to anchor scripts (CUSonic)

– High, Mid-rising, Low, High falling, Neutral

Local Feature Extraction

• Uniform representation for tone, pitch accent– Motivated by Pitch Target Approximation Model

• Tone/pitch accent target exponentially approached – Linear target: height, slope (Xu et al, 99)

• Scalar features: – Pitch, Intensity max, mean (Praat, speaker normalized)– Pitch at 5 points across voiced region– Duration– Initial, final in phrase

• Slope: – Linear fit to last half of pitch contour

Context Features

• Local context:– Extended features

• Pitch max, mean, adjacent points of preceding, following syllables

– Difference features• Difference between

– Pitch max, mean, mid, slope– Intensity max, mean

• Of preceding, following and current syllable

• Phrasal context:– Compute collection average phrase slope– Compute scalar pitch values, adjusted for slope

Classification Experiments

• Classifier: Support Vector Machine – Linear kernel– Multiclass formulation

• (SVMlight, Joachims), LibSVM (Cheng & Lin 01)

– 4:1 training / test splits

• Experiments: Effects of – Context position: preceding, following, none, both– Context encoding: Extended/Difference– Context type: local, phrasal

Results: Local Context

Context Mandarin Tone English Pitch Accent

Full 74.5% 81.3%

Extend LR 74% 80.7%

Extend L 74% 79.9%

Extend R 70.5% 76.7%

Diffs LR 75.5% 80.7%

Diffs L 76.5% 79.5%

Diffs R 69% 77.3%

Both L 76.5% 79.7%

Both R 71.5% 77.6%

No context 68.5% 75.9%



Full 74.5% 81.3%

Extend PrePost 74.0% 80.7%

Extend Pre 74.0% 79.9%

Extend Post 70.5% 76.7%

Diffs PrePost 75.5% 80.7%

Diffs Pre 76.5% 79.5%

Diffs Post 69.0% 77.3%

Both Pre 76.5% 79.7%

Both Post 71.5% 77.6%




Full 74.5% 81.3%

Extend PrePost 74% 80.7%

Extend Pre 74% 79.9%

Extend Post 70.5% 76.7%

Diffs PrePost 75.5% 80.7%

Diffs Pre 76.5% 79.5%

Diffs Post 69% 77.3%

Both Pre 76.5% 79.7%

Both Post 71.5% 77.6%


Discussion: Local Context

• Any context information improves over none

– Preceding context information consistently improves over none or following context information

• English: Generally more context features are better• Mandarin: Following context can degrade

– Little difference in encoding (Extend vs Diffs)

• Consistent with phonological analysis (Xu) that coarticulation is carryover, not anticipatory

Results & Discussion: Phrasal Context

Phrase Context Mandarin Tone English Pitch Accent

Phrase 75.5% 81.3%

No Phrase 72% 79.9%

•Phrase contour compensation enhances recognition•Simple strategy•Use of non-linear slope compensate may improve

Conclusion

• Employ common acoustic representation– Tone (Mandarin), pitch accent (English)

• Cantonese, recent experiments

• SVM classifiers - linear kernel: 76%, 81%• Local context effects:

– Up to > 20% relative reduction in error– Preceding context greatest contribution

• Carryover vs anticipatory

• Phrasal context effects:– Compensation for phrasal contour improves recognition

Current & Future Work

• Application of model to different languages– Cantonese, Dschang (Bantu family)

• Cantonese: ~65% acoustic only, 85% w/segmental

• Integration of additional contextual influence– Topic, turn, discourse structure– HMSVM, GHMM models

• http://people.cs.uchicago.edu/~levow/projects/tai– Supported by NSF Grant #: 0414919

http://people.cs.uchicago.edu/~levow/projects/tai

http://people.cs.uchicago.edu/~levow/projects/tai

Confusion Matrix (English)Recognized Tone

Manually Labeled Tone

Unaccented High Low D.S. High

Unaccented 95%

25% 100%

53.5%

High 4.6%

73% 0% 38.5%

Low 0% 0% 0% 0%

D.S. High 0.3% 2% 0% 8%

Confusion Matrix (Mandarin)Recognized Tone

Manually Labeled Tone

High Mid-Rising Low High-Falling | Neutral

High 84% 9%

5%

13% | 0% |

Mid-Rising 6.7%

78.6%

10%

7% | 27.3% |

Low 0% 3.6% 70% 7% | 27.3%

High-Falling 7.4% 3.6% 10%

70% | 0% |

Neutral 0% 5.3% 5% 1.5% | 45%

Related Work

• Tonal coarticulation: – Xu & Sun,02; Xu 97;Shih & Kochanski 00

• English pitch accent– X. Sun, 02; Hasegawa-Johnson et al, 04;

Ross & Ostendorf 95

• Lexical tone recognition– SVM recognition of Thai tone: Thubthong 01– Context-dependent tone models

• Wang & Seneff 00, Zhou et al 04

Pitch Target Approximation Model

• Pitch target:– Linear model:

– Exponentially approximated:

– In practice, assume target well-approximated by mid-point (Sun, 02)

battT )(

battty )exp()(

Documents

Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005