38
Syllable triangles, syllable centers, articulatory syllable durations, shadow angles, oh my ! Donna Erickson Kanazawa Medical University, Japan Haskins Laboratories, USA [email protected] samu Fujimura and J.C. Williams, & my colleagues Jangwon Kim, Sungbo ahara, Caroline Menezes, Atsuo Suemitsu, Jeff Moore, Yoshiho Shibuya

Syllable triangles, syllable centers, articulatory syllable durations, shadow angles, oh my ! Donna Erickson Kanazawa Medical University, Japan Haskins

Embed Size (px)

Citation preview

Syllable triangles, syllable centers, articulatory syllable durations, shadow

angles, oh my !

Donna EricksonKanazawa Medical University, Japan

Haskins Laboratories, USA

[email protected] to Osamu Fujimura and J.C. Williams, & my colleagues Jangwon Kim, Sungbok Lee, Shigeto Kawahara, Caroline Menezes, Atsuo Suemitsu, Jeff Moore, Yoshiho Shibuya, & many others

C/D model: what does it model?• The C/D model models how phonological

structures are mapped onto articulatory gestures (Fujimura, 2000; also, see www.cdmodel.wordpress.com) .

• PROSODY is the skeletal base.• Strings of spoken syllables are represented as

“syllable pulse trains” – each syllable is represented as one pulse. – The size of each syllable pulse is determined by its

“syllable magnitude”.•

• “syllable-boundary pulse train--computed as a time function representing the skeletal rhythmic structure of the utterance.”

From Fujimura& Erickson, 2004

• Syllable magnitude correlates with sentence (phrasal) stress.

• “won” receives primary sentential stress; “that” and “ful” receive the secondary sentential stress.

Syllable magnitude• Syllable magnitude is to a first approximation,

how much the jaw opens (jaw displacement from occlusal plane) for each syllable.

• For a string of syllables, we see different amounts of jaw opening, which reflect (I argue) the metrical organization of an utterance (see e.g., Erickson et al., 2012).

From Erickson et al. 2014

Jaw displacement for each syllable measured  from occlusal plane ( marked with arrows)

English

From Erickson et al. 2014

From Williams et al. 2013

Vowel Normalization

• Once we "wash away" the vowel quality effects, utterances with the same metrical structure, regardless of vowel content, show similar patterns of syllable pulse trains (Erickson and Menezes 2013).

Review so far• The C/D Model posits the pulse train as the fundamental

organization of utterances-- in speech planning we start with the rhythm represented by the pulse train.

• Its rhythmic structure is partly represented by different heights of syllable pulses

• In actual utterances, we do observe different amounts of jaw displacement, which reflect those syllable pulses.

• Moreover, patterns of jaw displacement observed in other languages also reflect metrical structure of that language, i.e., Japanese (Kawahara et al. 2014), Chinese (Erickson et al. 2015).

• Both Japanese and Chinese appear to have phrase initial and phrase final stress (which is different from stress patterns of English)

• The jaw displacement patterns of the first language may be carried over into those of the second language.

French?Predictions: French speakers have difficulty distinguishing local stress in English French has final-stressProbably French has large final jaw-openingFrench speakers may be similar to Japanese speakers.

The jaw displacement patterns of the first language may be carried over into those of the second language. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.1196&rank=6

Phrase boundaries• The C/D model also has the power to algorithmically

derive phrase boundaries in a spoken utterance from jaw movement patterns.

• No other model can do this. • Based on the combination of the height of the syllable

pulse (amount of jaw displacement) and the average maximum speed of the onset and offset crucial articulators of the syllable, the model calculates – (a) where the phrase boundary occurs and – (b) how big this boundary is (e.g., Fujimura 1986,

Bonaventura & Fujimura, 2007, Menezes 2004, Kim et al., 2014).

Syllable triangles, syllable centers, articulatory syllable durations, shadow angles, oh my !

• If you concur with the premise that the jaw opens for a syllable, then the rest is just a matter of “computation.”

Pam said BAT that fat cat at that mat

From Erickson et al. 2014

Jaw displacement for each syllable measured  from occlusal plane

There once was a girl from De ca tur

Syllable triangles, syllable centers, articulatory syllable durations, phrase boundaries, shadow angles

Pam said BAT that fat cat at that mat

Crucial articulators & “icebergs”• A syllable consists of a nucleus (vowel) and onset and

coda elements. • For the sentence Pam said bat that fat cat at that mat,

the crucial articulators are lower lip (for p, m, b, f), tongue tip (for s, d, t, th)and tongue dorsum (for k).

• Fujimura (1986) observed that when one overlays the demisyllabic velocity time function of the crucial articulator, there is a point of smallest mean invariance.

• He referred to as the “iceberg” region, which is the average maximum velocity of all the repetitions of a single utterance type.

• The iceberg point (Bonaventura 2003; Menezes 2003; Bonaventura & Fujimura 2007) is algorithmically determined at the minimum variance point of a number of trajectories of the same demisyllable.

• One approach is to find the point of the minimum root-mean-squarred-error in the horizontal direction after optimal time shifting of the trajectories to the reference trajectory (Fujimura 1986; 1994; Bonaventura & Fujimura 2007).

• Another approach is to choose the point of the minimum “iceberg metric” among multiple vertical movement bands of the crucial articulator (Menezes, 2003).

• The iceberg metric is proportional to the variance of articulatory speed and inversely proportional the mean of articulatory speed in the band.

• However, these methods require a large number of trajectory samples to secure the reliability.

• An alternative approach for determining the smallest mean invariance is to use the maximum speed point of the crucial articulators for the onset or coda of each demisyllable (e.g., Erickson 2010, Erickson et al. 2014 &Erickson et al. submitted, Kim et al. 2014).

From Kim et al. 2014

• In this way, the center of the syllable is calculated as the midpoint between the maximum speed of the crucial articulators; quotation marks indicate this is an alternative approach for determining the “iceberg” point.

From Kim et al. 2014

Syllable triangle construction

Pam said BAT that fat cat at that mat

So???

• Test the model• Invariance of articulatory excursion and speed of

crucial articulators?• Do “shadow angles” change as a function of emotion

or contrastive emphasis?• How do “consonants” work?• How do IRF’s change as a function of emotion and

contrastive emphasis?• Articulatory phrase boundaries & perceived

boundaries?

Invariance of articulatory excursion and speed of crucial articulators?

R=0.48

R=0.04R=0.89

R=0.59

R=0.80

R=0.87

R=0.84

R=0.95

bat that fat cat

CV

VC

Red is emphasized

Do “shadow angles” change as a function of emotion or contrastive emphasis?

Error plot bars for shadow angles

Emotion (from Kim et al. 2014)

Contrastive Emphasis (from Kim et al. in progress

How do “consonants” work? How do IRF’s change as a function of emotion and contrastive emphasis?

• Emotion affects amplification of IRFs & timing (Kim et al. 2014)

• Contrastive emphasis—still investigating.

From Kim et al. 2014

Articulatory phrase boundaries & perceived boundaries?

• Perception tests using Rapid Prosodic Transcription (e.g., Cole et al., 2008).

• Tasks (www.gengojeff.com): – 1. where do you hear a boundary?– 2. which words are prominent?

Boundary perception

Prominence perception

Articulatory phrase boundaries & perceived boundaries

Articulatory Prominence

ArticulatoryBoundaries

A03    Perceptual Prominence

r=0.60(p<.001)

r=0.36(p<.001)

Perceptual Boundaries

r=0.43 (p<.001)

r=0.28 (p<.001)

A05    Perceptual Prominence

r=0.68 (p<0.001)

r=0.41(p<0.001)

Perceptual Boundaries

r=0.18 (p<0.05)

r=-0.18n.s.

Summary• 1. C/D model accounts for utterance prominence• 2. C/D model accounts for phrase boundaries• 3. more work is waiting to be done

a. about shadow angles b. IRF’s

c. etc.

• 4. see www.cdmodel.wordpress.com for more discussions about C/D model

Acknowledgements

• Thanks to Osamu Fujimura and J.C. Williams, & my colleagues Jangwon Kim, Sungbok Lee, Shigeto Kawahara, Caroline Menezes, Atsuo Suemitsu, Jeff Moore, Yoshiho Shibuya, & many others

• This work was supported by NSF IIS-- 1116076, ‐NIHDC007124, and Japan Society for the Promotion of Science, Grants in aid for Scientific Research (C) #22520412 and (C) #2537044.

References• Bonaventura, P. 2003. Invariant patterns in articulatory movements. Ph.D. dissertation, The Ohio State

University. • Bonaventura, P., Fujimura, O. 2007. Articulatory movements and prosodic boundaries. In: Beddor, P.,

Ohala, J., Solé, M. (eds.), Experimental Approaches to Phonology, Oxford: Oxford University Press, 209-227.

• Cole, J., Goldstein, L. A. Katsika, A. Y. Mo, Y., E. Nava, E., Tiede, M. 2008. Perceived prosody: Phonetic bases of prominence and boundaries. J. Acoust. Soc. Am. 124, 2496.

• Erickson, D., 1998. Effects of contrastive emphasis on jaw opening. Phonetica 55, 147-169.• Erickson, D. 2002. Articulation of extreme formant patterns for emphasized vowels. Phonetica 59, 134-

149.• Erickson, D. 2010. More about jaw, rhythm and metrical structure. Acoustical Society of Japan Fall

Meeting, p. 103.• Erickson, D., A. Suemitsu, Y. Shibuya, and M. Tiede 2012. Metrical structure and production of English

rhythm. Phonetica 69, 180–190.Fujimura, O. 2000. The C/D model and prosodic control of articulatory behavior. Phonetica 57, 128-138.

• Erickson, D., Kawahara, S., Moore, J., Menezes, C. Suemitsu, A., Kim, J., Shibuya, Y. 2014. Calculating articulatory syllable duration and phrase boundaries. ISSP2014 (Cologne, Germany, May 2014), 102-105.

• Erickson, D., Kim, J., Kawahara, S., Wilson, I., Menezes, C., and Suemitsu, A. submitted. Bridging articulation and perception: The C/D model and contrastive emphasis, ICPHS 2015.

• Erickson, D., Iwata, R., Moore,J., Suemitsu, A., Shibuya, Y. 2015. The jaw keeps the beat: Speech rhythm in English, Japanese and Mandarin. Lexicon Festa-3, Feb. 1, 2015. NINJAL,Tokyo, Japan.

• Fujimura, O. 1986. Relative invariance of articulatory movements: An iceberg model. In: J. Perkell, J. and Klatt, D. H. (eds), Invariance and Variability in Speech Processes, Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. 226-242.

• Fujimura, O. 1994. C/D model: A computational model of phonetic implementation. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 17, 1-20.

• Fujimura, O. 2000. The C/D model and prosodic control of articulatory behavior. Phonetica 57, 128-138.

• Gabor P., Shinobu, M., Kazuhito Y. 2014. Boundary and Prominence Perception by Japanese Learners of English: A preliminary study. Journal of Phonetic Society of Japan 17, 59-66.

• Harrington, J., Fletcher, J., Beckman, M.E. 2000. Manner and place conflicts in the articulation of accent in Australian English. In: Broe, M. Pierrehumbert, J. (eds), Papers in Lab.Phonology V: Language Acquisition and the Lexicon. Cambridge: Cambridge University Press, 40-51.

• http://gengojeff.netau.net/pam/• Jong, K. de. 1995. The supraglottal articulation of prominence in English: linguistic stress as localized

hyperarticulation. J. Acoust. Soc. Am. 97, 491–504.• Kawahara, S., Erickson, D., Moore, J., Suemitsu, A., Shibuya, Y. 2014. Jaw displacement and metrical structure in

Japanese: The effect of pitch accent, foot structure, and phrasal stress. Journal of Phonetic Society of Japan, 77-87• Kim, J., Erickson, D., Lee, S., Narayanan, S. 2014. A study of invariant properties and variation patterns in the

converter/distributor model for emotional speech. Interspeech 2014. 413-417.• Macchi, M. 1985. Segmental and suprasegmental features and lip and jaw articulations. Doct.diss. New York

University (unpublished).• Menezes, C. 2003. Rhythmic pattern of American English: An articulatory and acoustic study. Ph.D. dissertation, The

Ohio State University.• Menezes, C. 2004. Changes in phrasing in semi-spontaneous emotional speech: Articulatory evidences. J. Phonetic

Soc. Japan 8, 45-59.• Menezes, C., Erickson, D., McGory, J., Pardo, B., and Fujimura, O. 2002. An articulatory and perceptual study of

phrasing. Temporal Integration in the Perception of Speech. ISCA Workshop. (Aix-en-Provence, April 8-10), 43.• Menezes, C., Pardo, B., Erickson, D., and Fujimura, O. 2003. Changes in syllable magnitude and timing due to

repeated corrections. Speech Communication 40, 71-8.• Summers, W. V. Effects of stress and final consonant voicing on vowel production: articulatory and acoustic analyses.

J. Acoust. Soc. Am. 82, 847–863.• Westbury, J. Fujimura, O. 1989. An articulatory characterization of contrastive emphasis. J. Acoust. Soc. Am. 85, S98.