Upload
aaralyn
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody. Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica. Applying the Fujisaki model to M andarin. 1. Phonetics Lab, Academia Sinica, Taiwan ( http://phslab.ling.sinica.edu.tw/ ) - PowerPoint PPT Presentation
Citation preview
On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody
Zhao-yu SuPhonetics Lab, Institute of Linguistics,
Academia Sinica
Applying the Fujisaki model to Mandarin
– 1. Phonetics Lab, Academia Sinica, Taiwan (http://phslab.ling.sinica.edu.tw/) PI: Prof. Chiu-yu Tseng
• Mandarin– automatic extraction of Fujisaki parameters (Mixdorff, 2003)
– 2. Hirose Lab, Tokyo University, Japan (http://www.gavo.t.u-tokyo.ac.jp/) PI: Pro. Keikichi Hirose
• Mandarin--manual extraction of Fujisaki parameters• Japanese—automatic extraction of Fujisaki parameter
– 3. DSP and Speech Technology Lab , CUHK, Hong kong (http://dsp.ee.cuhk.edu.hk/) PI: Prof. CHING Pak-Chung Prof. LEE Tan Prof. WANG Shi-Yuan, William
• Mandarin—manual extraction of Fujisaki parameters
Outline
• Introduction--the Fujisaki model• Auto-extraction comparison– methods used a
t two labs to generate the Fujisaki parameters1. Phonetics Lab, Academia Sinica, Taiwan --on Mandarin (Tseng 200
4, 2005, 2006)2. Hirose Lab, Tokyo University, Japan --on Japanese (Hirose and Nar
usawa 2002, 2003)
• Manual extraction—Method used at CUHK to extract Fujisaki parameters
– DSP and Speech Technology Lab– on Mandarin (Wentao Gu 2004, 2005)
log (F0)=base frequency+ phrase components +accent components
The Fujisaki Model (Fujisaki & Hirose 1984)
=
phrase components accent components superposed model
+
Auto-extraction based on Mixdorff’s method (2000, 2003)
High-frequency contour (HFC)Low-frequency contour (LFC)
Original F0 contour
highpass filter(stop frequency at 0.5 Hz)
Decision of phrase commandsLow-frequency contour (LFC) from Mixdorff’s method
Position of local minimum optimization
Perceptual phrase boundary
The method based on perceptual label- Phonetics Lab, Academia Sinica, Taiwan
T
tFMSEF t
2
00
))(ln(evaluation :
Phonetics Lab, Academia Sinica-- Auto-extraction results of Mandarin ( Mixdorf
f 2003)
Hirose Lab— Auto extraction (Narusawa 2002, 2003)
Residual contour--target of phrase components
Original f0 contour
Derivative--
target of phrase components
Decision of phrase commands
The optimum I can be selected when c(I) is maximum.
Dynamic Programming (DP)Residual contour
Hirose Lab— Compensation from text analysis t
o aid auto-extraction
Using parsed text to adjust
extracted Fujisaki parameter
Hirose Lab— Auto-extraction of Japanese (Narusawa 200
2, 2003)• Original method
– An accent component should be located on a phrase component.
• New method
– Pause is considered.– Correction after using information from parsed text.
Auto-extraction of phrase components—Comparison of 2 labs
• Phrase components– Phonetics Lab, IL, AS (modified Mixdorff 2003):
Pre-extraction of phrase components--relatively close.
– Hirose Lab:
Pre-extraction-- not as close, but the final output can be compensated by text analysis.
1. Auto-extract acoustic signal f0 contour
2. Compensate the phrase component with parsed text—unit used: bunsetsu (lexical definition)
Manual adjustment--Gu, CUHK
• Note: 1. Insertion of phrase components is subjective.
2. Boundary identification is NOT explicitly specified -- perception (duration ? Or f0 reset ?)
Manual adjustment--Gu, CUHK
Possible Future Considerations (1/2)
• 1. Distinguishing acoustic feature is only pause? duration? Or f0?
• 2. Or combination of acoustic features—pause, duration, and/or f0?– E.g. Test if duration can compensate F0 reset
Possible Future Considerations (2/2) Improving
auto-extraction of tone components
• 3. The concept of tone nucleus– By retaining only the nucleus of syllable while ign
oring vertical f0 variation (from Hirose’s tone nucleus and Gu’s manual adjustment)
– By ignoring horizontal f0 variation (from Gu’s manual adjustment)
One major ambiguity among 3 labs—phrase component unit selection
1. Phonetics Lab, Academia Sinica, Taiwan –Mandarin prosodic phrase (intonation and phrase)
2. Hirose Lab, Tokyo University, Japan – Japanese lexical word (buntetsu)
3. DSP and Speech Technology Lab, CUHK, Hong Kong – Manually selected:
PPh—adjusted from visual display
PW—adjusted from perceptual decision
Why Prosodic Unit Selection can be a problem unique to Mandarin?
Japanese: Bunsetsu--compound word consisting of two or more content words
Mandarin:1. Phonetics Lab, IL, AS--Length of prosodic phrase--sometimes too long to
maintain the tendency of one application of phrase component function.
2. HKCU--Manual adjustment can be accurate but not systematic enough. e.g. A phrase component sometimes corresponds to a prosodic phrase,
sometimes shorter.
Concluding Remarks
• 1. Manual adjustment of Fujisaki parameters is more precise but too time consuming.
• 2. What possible improvement can auto-extraction borrow from manual adjustment?– Focusing on nucleus (syllable)– Understanding more of acoustic properties (F0, duration…)
• 3. More linguistic and cognitive knowledge could help improve prosody model in addition to acoustic information. – Linguistic information—parsing (text analysis and syntax), semantics
and pragmatics– Cognitive information---speech planning and processing