On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Zhao-yu SuPhonetics Lab, Institute of Linguistics,

Academia Sinica

Applying the Fujisaki model to Mandarin

– 1. Phonetics Lab, Academia Sinica, Taiwan (http://phslab.ling.sinica.edu.tw/) PI: Prof. Chiu-yu Tseng

• Mandarin– automatic extraction of Fujisaki parameters (Mixdorff, 2003)

– 2. Hirose Lab, Tokyo University, Japan (http://www.gavo.t.u-tokyo.ac.jp/) PI: Pro. Keikichi Hirose

• Mandarin--manual extraction of Fujisaki parameters• Japanese—automatic extraction of Fujisaki parameter

– 3. DSP and Speech Technology Lab , CUHK, Hong kong (http://dsp.ee.cuhk.edu.hk/) PI: Prof. CHING Pak-Chung Prof. LEE Tan Prof. WANG Shi-Yuan, William

• Mandarin—manual extraction of Fujisaki parameters

http://phslab.ling.sinica.edu.tw/

http://www.gavo.t.u-tokyo.ac.jp/

http://dsp.ee.cuhk.edu.hk/

Outline

• Introduction--the Fujisaki model• Auto-extraction comparison– methods used a

t two labs to generate the Fujisaki parameters1. Phonetics Lab, Academia Sinica, Taiwan --on Mandarin (Tseng 200

4, 2005, 2006)2. Hirose Lab, Tokyo University, Japan --on Japanese (Hirose and Nar

usawa 2002, 2003)

• Manual extraction—Method used at CUHK to extract Fujisaki parameters

– DSP and Speech Technology Lab– on Mandarin (Wentao Gu 2004, 2005)

log (F0)=base frequency+ phrase components +accent components

The Fujisaki Model (Fujisaki & Hirose 1984)

=

phrase components accent components superposed model

+

Auto-extraction based on Mixdorff’s method (2000, 2003)

High-frequency contour (HFC)Low-frequency contour (LFC)

Original F0 contour

highpass filter(stop frequency at 0.5 Hz)

Decision of phrase commandsLow-frequency contour (LFC) from Mixdorff’s method

Position of local minimum optimization

Perceptual phrase boundary

The method based on perceptual label- Phonetics Lab, Academia Sinica, Taiwan

T

tFMSEF t

2

00

))(ln(evaluation :

Phonetics Lab, Academia Sinica-- Auto-extraction results of Mandarin ( Mixdorf

f 2003)

Hirose Lab— Auto extraction (Narusawa 2002, 2003)

Residual contour--target of phrase components

Original f0 contour

Derivative--

target of phrase components

Decision of phrase commands

The optimum I can be selected when c(I) is maximum.

Dynamic Programming (DP)Residual contour

Hirose Lab— Compensation from text analysis t

o aid auto-extraction

Using parsed text to adjust

extracted Fujisaki parameter

Hirose Lab— Auto-extraction of Japanese (Narusawa 200

2, 2003)• Original method

– An accent component should be located on a phrase component.

• New method

– Pause is considered.– Correction after using information from parsed text.

Auto-extraction of phrase components—Comparison of 2 labs

• Phrase components– Phonetics Lab, IL, AS (modified Mixdorff 2003):

Pre-extraction of phrase components--relatively close.

– Hirose Lab:

Pre-extraction-- not as close, but the final output can be compensated by text analysis.

1. Auto-extract acoustic signal f0 contour

2. Compensate the phrase component with parsed text—unit used: bunsetsu (lexical definition)

Manual adjustment--Gu, CUHK

• Note: 1. Insertion of phrase components is subjective.

2. Boundary identification is NOT explicitly specified -- perception (duration ? Or f0 reset ?)

Manual adjustment--Gu, CUHK

Possible Future Considerations (1/2)

• 1. Distinguishing acoustic feature is only pause? duration? Or f0?

• 2. Or combination of acoustic features—pause, duration, and/or f0?– E.g. Test if duration can compensate F0 reset

Possible Future Considerations (2/2) Improving

auto-extraction of tone components

• 3. The concept of tone nucleus– By retaining only the nucleus of syllable while ign

oring vertical f0 variation (from Hirose’s tone nucleus and Gu’s manual adjustment)

– By ignoring horizontal f0 variation (from Gu’s manual adjustment)

One major ambiguity among 3 labs—phrase component unit selection

1. Phonetics Lab, Academia Sinica, Taiwan –Mandarin prosodic phrase (intonation and phrase)

2. Hirose Lab, Tokyo University, Japan – Japanese lexical word (buntetsu)

3. DSP and Speech Technology Lab, CUHK, Hong Kong – Manually selected:

PPh—adjusted from visual display

PW—adjusted from perceptual decision

Why Prosodic Unit Selection can be a problem unique to Mandarin?

Japanese: Bunsetsu--compound word consisting of two or more content words

Mandarin:1. Phonetics Lab, IL, AS--Length of prosodic phrase--sometimes too long to

maintain the tendency of one application of phrase component function.

2. HKCU--Manual adjustment can be accurate but not systematic enough. e.g. A phrase component sometimes corresponds to a prosodic phrase,

sometimes shorter.

Concluding Remarks

• 1. Manual adjustment of Fujisaki parameters is more precise but too time consuming.

• 2. What possible improvement can auto-extraction borrow from manual adjustment?– Focusing on nucleus (syllable)– Understanding more of acoustic properties (F0, duration…)

• 3. More linguistic and cognitive knowledge could help improve prosody model in addition to acoustic information. – Linguistic information—parsing (text analysis and syntax), semantics

and pragmatics– Cognitive information---speech planning and processing

Documents

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody