View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Time Frames of
Spoken Language
Steven GreenbergInternational Computer Science Institute1947 Center Street, Berkeley, CA 94704
http://www.icsi.berkeley.edu/[email protected]
In Collaboration with Hannah Carvey, Leah Hitchcock and Shawn Chang
Acknowledgements and Thanks
Statistical Analysis and Automatic ClassificationHannah Carvey, Shawn Chang, Leah Hitchcock
Research FundingU.S. National Science FoundationU.S. Department of Defense
For Further Information
Consult the web site:
www.icsi.berkeley.edu/~steveng
OVERTURE
The Central Challenge for Models of Speech Recognition
Language - The Traditional PerspectiveThe “classical” view of spoken language posits a quasi-arbitrary relation between
the lower and higher tiers of linguistic organization
Cat= [k] + [ae] + [t]
Cat = /k/ + /ae/ + /t/
The Serial Frame Perspective on SpeechTraditional models of speech recognition assume the identity of a phonetic segment is derived from a detailed
spectral profile of the acoustic signal (provided courtesy of the auditory system) computed for each interval (frame) of speech
The Serial Frame Perspective on SpeechTraditional models of speech recognition assume the identity of a phonetic segment is derived from a detailed spectral
profile of the acoustic signal (provided courtesy of the auditory system) computed for each interval (frame) of speech (this is literally how automatic speech recognition systems decode the speech signal)
Challenge Number One
Pronunciation Variability
Pronunciation Variability of Real SpeechPronunciation patterns encountered in everyday life are extremely diverse
Pronunciation Variability of Real SpeechPronunciation patterns encountered in everyday life are extremely diverse The are literally dozens of ways in which common words are pronounced
Pronunciation Variability of Real SpeechPronunciation patterns encountered in everyday life are extremely diverse The are literally dozens of ways in which common words are pronounced
(as the following two slides illustrate for the word “and” based on manual phonetic annotation of a corpus comprising telephone dialogues)
How Many Pronunciations of “and”?
82 ae n63 eh n45 ix n35 ax n34 en30 n20 ae n dcl d17 ih n17 q ae n11 ae n d
7 q eh n7 ae nx6 ae ae n6 ah n5 eh nx4 uh n4 ix nx4 q ae n dcl d3 eh n d3 q ae nx
3 eh2 ae n dcl2 ae2 ax m2 ax n d2 ae eh n dcl d2 eh n dcl d2 ax nx2 q ae ae n2 q ix n2 ix n dcl d2 ih 2 eh eh n2 q eh nx2 ix d n1 eh m1 ax n dcl d1 aw n1 ae q1 eh dcl
N Pronunciation N Pronunciation
Canonical pronunciation
How Many Pronunciations of “and”?
1 ah nx1 ae n t1 eh d1 ah n dcl d1 ey ih n dcl1 ae ix n1 ae nx ax1 ax ng1 ay n1 ih ah n d1 ae hh1 ih ng1 ix1 ae n d dcl1 ix dcl d1 ae eh n1 hh n1 ix n t1 ae ax n dcl d1 iy eh n
1 m1 ae ae n d1 nx1 q ae ae n1 q ae ae n dcl d1 q ae eh n dcl d1 q ae ih n1 aa n1 q ae n d1 ? nx1 q ae n q1 eh n m1 q eh en dcl1 eh ng1 q eh n q1 em1 q eh ow m1 q ih n1 q ix en1 er
N Pronunciation N Pronunciation
Pronunciation Variability of Real SpeechThe are literally dozens of ways in which common words are pronounced
And as the following slide illustrates for the 20 most frequent words from the same corpus (Switchboard)
1 I 6 4 9 5 3 5 3 a y
2 a n d 5 2 1 8 7 1 6 a e n
3 th e 4 7 5 7 6 2 7 d h a x
4 y o u 4 0 6 6 8 2 0 y ix
5 th a t 3 2 8 1 1 7 1 1 d h a e
6 a 3 1 9 2 8 6 4 a x
7 to 2 8 8 6 6 1 4 tc l t u w
8 k n o w 2 4 9 3 4 5 6 n o w
9 o f 2 4 2 4 4 2 1 a x v
1 0 it 2 4 0 4 9 2 2 ih
1 1 y e a h 2 0 3 4 8 4 3 y a e
1 2 in 1 7 8 2 2 4 5 ih n
1 3 th e y 1 5 2 2 8 6 0 d h e y
1 4 d o 1 3 1 3 0 5 4 d c l d u w
1 5 s o 1 3 0 1 4 7 4 s o w
1 6 b u t 1 2 3 4 5 1 2 b c l b a h tc l t
1 7 is 1 2 0 2 4 5 0 ih z
1 8 lik e 1 1 9 1 9 4 6 l a y k c l k
1 9 h a v e 1 1 6 2 2 5 4 h h a e v
2 0 w a s 1 1 1 2 4 2 3 w a h z
2 1 w e 1 0 8 1 3 8 3 w iy
2 2 it's 1 0 1 1 4 2 0 ih tc l s
2 3 ju s t 1 0 1 3 4 1 7 jh ix s
2 4 o n 9 8 1 8 4 9 a a n
2 5 o r 9 4 2 3 3 6 e r
2 6 n o t 9 2 2 4 2 4 m a a q
2 7 th in k 9 2 2 3 3 2 th ih n g k c l k
2 8 fo r 8 7 1 9 4 6 f e r
2 9 w e ll 8 4 4 9 2 3 w e h l
3 0 w h a t 8 2 4 0 1 4 w a h d x
3 1 a b o u t 7 7 4 6 1 2 a x b c l b a w
3 2 a ll 7 4 2 7 2 4 a o l
3 3 th a t's 7 4 1 9 1 6 d h e h s
3 4 o h 7 4 1 7 6 1 o w
3 5 re a lly 7 1 2 5 4 5 r ih l iy
3 6 o n e 6 9 8 7 8 w a h n
3 7 a re 6 8 1 9 4 2 e r
3 8 I'm 6 7 9 2 6 q a a m
3 9 rig h t 6 1 2 1 2 8 r a y
4 0 u h 6 0 1 6 4 1 a h
4 1 th e m 6 0 1 8 2 3 a x m
4 2 a t 5 9 3 6 8 a e d x
4 3 th e re 5 8 2 8 2 2 d h e h r
4 4 my 5 8 9 6 6 m a y
4 5 me a n 5 6 1 0 5 8 m iy n
4 6 d o n 't 5 6 2 1 1 4 d x o w
4 7 n o 5 5 8 7 7 n o w
4 8 w ith 5 5 2 0 3 5 w ih th
4 9 if 5 5 1 8 4 1 ih f
5 0 w h e n 5 4 1 8 3 1 w e h n
5 1 c a n 5 4 2 8 1 5 k c l k a e n
5 2 th e n 5 1 1 9 3 8 d h e h n
5 3 b e 5 0 1 1 7 6 b c l b iy
5 4 a s 4 9 1 6 1 8 a e z
5 5 o u t 4 7 1 9 2 2 a e d x
5 6 k in d 4 7 1 7 2 1 k c l k a x n x
5 7 b e c a u e 4 6 3 1 1 5 k c l k a x z
5 8 p e o p le 4 5 2 1 4 4 p c l p iy p c l l e l
5 9 g o 4 5 5 8 3 g c l g o w
6 0 g o t 4 5 3 2 1 5 g c l g a a
6 1 th is 4 4 1 1 4 7 d h ih s
6 2 s o me 4 3 4 4 8 s a h m
6 3 w o u ld 4 1 1 6 2 9 w ih d c l
6 4 th in g s 4 1 1 5 5 2 th ih n g z
6 5 n o w 3 9 1 1 6 9 n a w
6 6 lo t 3 9 9 4 7 l a a d x
6 7 h a d 3 9 1 9 2 4 h h a e d c l
6 8 h o w 3 9 1 1 5 3 h h a w
6 9 g o o d 3 8 1 3 2 7 g c l g u h d c l
7 0 g e t 3 8 2 0 1 3 g c l g e h d x
7 1 s e e 3 7 6 8 0 s iy
7 2 fro m 3 6 1 0 2 8 f r a h m
7 3 h e 3 6 7 3 9 iy
7 4 me 3 5 5 8 7 m iy
7 5 d o n 't 3 5 2 1 1 4 d x o w
7 6 th e ir 3 3 1 9 2 5 d h e h r
7 7 mo re 3 2 1 1 5 6 m a o r
7 8 it's 3 1 1 4 2 0 ih tc l s
7 9 th a t's 3 1 2 0 1 6 d h e h s
8 0 to o 3 1 6 6 0 tc l t u w
8 1 o k a y 3 1 1 7 4 5 o w k c l k e y
8 2 v e ry 3 0 1 1 3 6 v e h r iy
8 3 u p 3 0 1 1 3 4 a h p c l p
8 4 b e e n 3 0 1 1 5 1 b c l b ih n
8 5 g u e s s 2 9 8 4 2 g c l g e h s
8 6 time 2 9 8 6 2 tc l t a y m
8 7 g o in g 2 9 2 1 1 3 g c l g o w ih n g
8 8 in to 2 8 2 0 1 4 ih n tc l t u w
8 9 th o s e 2 7 1 2 4 2 d h o w z
9 0 h e re 2 7 1 1 2 5 h h iy e r
9 1 d id 2 7 1 3 2 3 d c l d ih d x
9 2 w o rk 2 5 8 6 6 w e r k c l k
9 3 o th e r 2 5 1 4 2 6 a h d h e r
9 4 a n 2 5 1 2 2 8 a x n
9 5 I'v e 2 5 7 4 6 a y v
9 6 th in g 2 4 9 5 2 th ih n g
9 7 e v e n 2 4 7 4 0 iy v ix n
9 8 o u r 2 3 9 3 3 a a r
9 9 a n y 2 3 1 1 2 3 ix n iy
1 0 0 w e 're 2 3 8 2 5 w e y r
How Many Different Pronunciations?
1 I 649 53 53 ay2 and 521 87 16 ae n3 the 475 76 27 dh ax4 you 406 68 20 y ix5 that 328 117 11 dh ae6 a 319 28 64 ax7 to 288 66 14 tcl t uw8 know 249 34 56 n ow9 of 242 44 21 ax v
10 it 240 49 22 ih11 yeah 203 48 43 y ae12 in 178 22 45 ih n13 they 152 28 60 dh ey14 do 131 30 54 dcl d uw15 so 130 14 74 s ow16 but 123 45 12 bcl b ah tcl t17 is 120 24 50 ih z18 like 119 19 46 l ay kcl k19 have 116 22 54 hh ae v20 was 111 24 23 w ah z
Rank Word N #PronMost CommonPronunciation
MCP%Total
The 20 most frequent words account for 35% of the tokens
QUESTION
How do listeners decode the speech signal given the large amount of
pronunciation variation?
Challenge Number Two
Acoustic Variability
Effects of Reverberation on the Speech SignalReflections from walls and other surfaces routinely modify the spectro-temporal
structure of the speech signal under everyday conditions
Effects of Reverberation on the Speech SignalReflections from walls and other surfaces routinely modify the spectro-temporal structure of the speech signal under everyday conditions
Yet, the intelligibility of speech is remarkably stable (unless the amount of reverberation or background noise is truly extreme)
Effects of Reverberation on the Speech SignalReflections from walls and other surfaces routinely modify the spectro-temporal structure of the speech signal under everyday conditions
Yet, the intelligibility of speech is remarkably stable (unless the amount of reverberation or background noise is truly extreme)
How can this be so?
QUESTION
Is there some acoustic property that provides a basis for perceptual stability
of the speech signal?
An Invariant Property of the Speech Signal?Low-frequency energy fluctuations of the pressure waveform are largely preserved
under many acoustic-interference conditions
[based on an illustration by Hynek Hermansky]
Modulation Spectrum
An Invariant Property of the Speech Signal?Low-frequency energy fluctuations of the pressure waveform are largely preserved under many acoustic-interference conditions
In reverberant environments the MODULATION SPECTRUM’S peak is attenuated and shifted down to ca. 2 Hz (but is largely preserved)
[based on an illustration by Hynek Hermansky]
Modulation Spectrum
An Invariant Property of the Speech Signal?Low-frequency energy fluctuations of the pressure waveform are largely preserved under many acoustic-interference conditions
In reverberant environments the modulation spectrum’s peak is attenuated and shifted down to ca. 2 Hz (but is largely preserved)
(“What is the modulation spectrum?” you ask)
[based on an illustration by Hynek Hermansky]
Modulation Spectrum
An Invariant Property of the Speech Signal?Low-frequency energy fluctuations of the pressure waveform are largely preserved under many acoustic-interference conditions
In reverberant environments the modulation spectrum’s peak is attenuated and shifted down to ca. 2 Hz (but is largely preserved)
(“What is the modulation spectrum?” you ask) – Let’s find out!
[based on an illustration by Hynek Hermansky]
Modulation Spectrum
Modulation Spectrum Computation
Intelligibility and the Modulation SpectrumSignificant attenuation (or distortion) of the modulation spectrum results in an
appreciable decline in the ability to understand spoken language
Greenberg and Arai (1998)
Intelligibility and the Modulation SpectrumSignificant attenuation (or distortion) of the modulation spectrum results in an appreciable decline in the ability to understand spoken
language
Why should this be so?
Greenberg and Arai (1998)
Anatomy of the Modulation SpectrumWhy is the modulation spectrum’s integrity so crucial for intelligibility?
Anatomy of the Modulation SpectrumWhy is the modulation spectrum’s integrity so crucial for intelligibility?
What does it reflect linguistically?
Anatomy of the Modulation SpectrumWhy is the modulation spectrum’s integrity so crucial for intelligibility?
What does it reflect linguistically?
Why is the bandwidth of the modulation spectrum associated with (intelligible) speech so broad?
Anatomy of the Modulation SpectrumWhy is the modulation spectrum’s integrity so crucial for intelligibility?
What does it reflect linguistically?
Why is the bandwidth of the modulation spectrum associated with (intelligible) speech so broad?
Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)
Anatomy of the Modulation SpectrumWhy is the modulation spectrum’s integrity so crucial for intelligibility?
What does it reflect linguistically?
Why is the bandwidth of the modulation spectrum associated with (intelligible) speech so broad?
Does the modulation spectrum reflect a unitary property of the speech signal?
Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)
Anatomy of the Modulation SpectrumWhy is the modulation spectrum’s integrity so crucial for intelligibility?
What does it reflect linguistically?
Why is the bandwidth of the modulation spectrum associated with (intelligible) speech so broad?
Does the modulation spectrum reflect a unitary property of the speech signal?
Or something more complex?
Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)
The Modulation Spectrum Reflects SyllablesThe peak in the modulation spectrum (for speech) is ca. 5 Hz (200 ms)
The Modulation Spectrum Reflects SyllablesThe peak in the modulation spectrum (for speech) is ca. 5 Hz (200 ms)
The distribution associated with SYLLABLE DURATION is similar to the pattern of the MODULATION SPECTRUM ….
The Modulation Spectrum Reflects SyllablesThe peak in the modulation spectrum (for speech) is ca. 5 Hz (200 ms)
The distribution associated with SYLLABLE DURATION is similar to the pattern of the MODULATION SPECTRUM ….
Syllable duration(in terms of equivalentModulation frequency)
Modulation Spectrum
Modulation spectrum of a short excerpt from the Switchboard Corpus
Syllable duration distribution associated with a 30-minute subset of Switchboard
The Modulation Spectrum Reflects SyllablesThe peak in the modulation spectrum (for speech) is ca. 5 Hz (200 ms)
The distribution associated with SYLLABLE DURATION is similar to the pattern of the MODULATION SPECTRUM ….
Suggesting that the latter reflects SYLLABLES
Syllable duration(in terms of equivalentModulation frequency)
Modulation spectrum of a short excerpt from the Switchboard Corpus
Syllable duration distribution associated with a 30-minute subset of Switchboard
The Trouble with Syllables …The question thus arises …
The Trouble with Syllables …The question thus arises …
If the modulation spectrum truly reflects syllables in the speech signal
The Trouble with Syllables …The question thus arises …
If the modulation spectrum truly reflects syllables in the speech signal
Why is the distribution of syllable duration so broad?
The Trouble with Syllables …The question thus arises …
If the modulation spectrum truly reflects syllables in the speech signal
Why is the distribution of syllable duration so broad?
Modulation spectrum of 15 minutes of spontaneous Japanese speech (OGI-TS corpus) compared with the syllable duration distribution for the same material (Arai and Greenberg, 1997)
Syllable duration(modulation frequency)
Modulation Spectrum
The Trouble with Syllables …The question thus arises …
If the modulation spectrum truly reflects syllables in the speech signal
Why is the distribution of syllable duration so broad?
And does this variability in syllable duration reflect something significant?
Syllable duration(modulation frequency)
Modulation Spectrum
Modulation spectrum of 15 minutes of spontaneous Japanese speech (OGI-TS corpus) compared with the syllable duration distribution for the same material (Arai and Greenberg, 1997)
PART ONE
What Underlies
Variation in Word Duration?
Word DurationMost words (81%) in the Switchboard corpus are monosyllabic, and most
of the remainder are disyllabic (together comprising 95% of the words)
Word DurationMost words (81%) in the Switchboard corpus are monosyllabic, and most of the remainder
are disyllabic (together comprising 95% of the words)
The distribution of word duration therefore largely parallels that of syllables (plotted in units of duration [ms] on a logarithmic scale)
All Words
What Underlies Word Duration Variability?Is this distribution of lexical duration of a uniform nature (and source)?
What Underlies Word Duration Variability?Is this distribution of lexical duration of a uniform nature (and source)?
Or does it reflect a more complex set of phenomena?
What Underlies Word Duration Variability?Is this distribution of lexical duration of a uniform nature (and source)?
Or does it reflect a more complex set of phenomena?
It has been observed for WRITTEN text that the more frequent words tend to be shorter and the less common words longer (i.e., Zipf’s law)
What Underlies Word Duration Variability?Is this distribution of lexical duration of a uniform nature (and source)?
Or does it reflect a more complex set of phenomena?
It has been observed for WRITTEN text that the more frequent words tend to be shorter and the less common words longer (i.e., Zipf’s law)
Does such a relationship hold for spoken language?
What Underlies Word Duration Variability?Is this distribution of lexical duration of a uniform nature (and source)?
Or does it reflect a more complex set of phenomena?
It has been observed for WRITTEN text that the more frequent words tend to be shorter and the less common words longer (i.e., Zipf’s law)
Does such a relationship hold for spoken language?
Let’s find out!
Is Word Duration Related to Word Frequency?Word duration (derived from the phonetically annotated portion of the
Switchboard corpus) can be plotted relative to frequency of occurrence
Is Word Duration Related to Word Frequency?Word duration (derived from the phonetically annotated portion of the
Switchboard corpus) can be plotted relative to frequency of occurrence
0
50
100
150
200
250
300
350
400
450
500
1 10 100 1000
Number of Occurences
Duration (ms)
r = – 0 .42Words with fewer than 5 instances omitted from graph
Is Word Duration Related to Word Frequency?Word duration (derived from the phonetically annotated portion of the Switchboard corpus)
can be plotted relative to frequency of occurrence
Such an exercise shows that there is a WEAK relationship (r = – 0.42) between lexical (unigram) frequency and word duration
0
50
100
150
200
250
300
350
400
450
500
1 10 100 1000
Number of Occurences
Duration (ms)
r = – 0 .42Words with fewer than 5 instances omitted from graph
Is Word Duration Related to Word Frequency?Word duration (derived from the phonetically annotated portion of the Switchboard corpus) can be plotted relative to
frequency of occurrence
Such an exercise shows that there is a WEAK relationship (r = – 0.42) between lexical (unigram) frequency and word duration
There is a lot of variability in word duration for any given frequency range
0
50
100
150
200
250
300
350
400
450
500
1 10 100 1000
Number of Occurences
Duration (ms)
r = – 0 .42Words with fewer than 5 instances omitted from graph
Is Word Duration Related to Word Frequency?Word duration (derived from the phonetically annotated portion of the Switchboard corpus) can be plotted relative to frequency
of occurrence
Such an exercise shows that there is a WEAK relationship (r = – 0.42) between lexical (unigram) frequency and word duration
There is a lot of variability in word duration for any given frequency range
Suggesting that lexical frequency, alone, is unlikely to account for variation in word duration
0
50
100
150
200
250
300
350
400
450
500
1 10 100 1000
Number of Occurences
Duration (ms)
r = – 0 .42Words with fewer than 5 instances omitted from graph
If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and
other durational properties of speech) is STRESS ACCENT
If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and
other durational properties of speech) is STRESS ACCENT
Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word
If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and
other durational properties of speech) is STRESS ACCENT
Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word
Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed
(as is the phonetic pronunciation provided in the dictionary)
If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and
other durational properties of speech) is STRESS ACCENT
Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word
Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed
(as is the phonetic pronunciation provided in the dictionary)
In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)
If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and
other durational properties of speech) is STRESS ACCENT
Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word
Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed
(as is the phonetic pronunciation provided in the dictionary)
In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)
This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words
If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and
other durational properties of speech) is STRESS ACCENT
Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word
Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed
(as is the phonetic pronunciation provided in the dictionary)
In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)
This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words
It is thus possible to ascertain the relationship between stress accent and duration at the level of the word, syllable and phonetic segment
If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and
other durational properties of speech) is STRESS ACCENT
Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word
Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed
(as is the phonetic pronunciation provided in the dictionary)
In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)
This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words
It is thus possible to ascertain the relationship between stress accent and duration at the level of the word, syllable and phonetic segment
The remainder of this presentation focuses on the statistical relationship between stress accent and duration at these different linguistic tiers
If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and
other durational properties of speech) is STRESS ACCENT
Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word
Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed
(as is the phonetic pronunciation provided in the dictionary)
In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)
This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words
It is thus possible to ascertain the relationship between stress accent and duration at the level of the word, syllable and phonetic segment
The remainder of this presentation focuses on the statistical relationship between stress accent and duration at these different linguistic tiers
Before examining these data, let’s briefly consider the nature of the annotated material
If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and other
durational properties of speech) is STRESS ACCENT
Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word
Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed (as is the phonetic pronunciation provided in the dictionary)
In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)
This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words
It is thus possible to ascertain the relationship between stress accent and duration at the level of the word, syllable and phonetic segment
The remainder of this presentation focuses on the statistical relationship between stress accent and duration at these different linguistic tiers
Before examining these data, let’s briefly consider the nature of the annotated material
(this is important for evaluating the reliability of the results obtained)
INTERMEZZO
Being Phonetically (and Prosodically)
Annotated
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
corpus, have been phonetically annotated (labeled and segmented)
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
corpus, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
corpus, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
corpus, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
corpus, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using
automatic methods
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
corpus, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using
automatic methods45 minutes of stress-accent-labeled material
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
corpus, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using
automatic methods45 minutes of stress-accent-labeled materialAn additional four hours of material automatically labeled with respect to accent
(this latter material not used in the current analysis, but will be available soon)
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
corpus, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using
automatic methods45 minutes of stress-accent-labeled materialAn additional four hours of material automatically labeled with respect to accent
(this latter material not used in the current analysis, but will be available soon)
There is a Lot of Diversity in the Material Transcribed
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
corpus, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using
automatic methods45 minutes of stress-accent-labeled materialAn additional four hours of material automatically labeled with respect to accent
(this latter material not used in the current analysis, but will be available soon)
There is a Lot of Diversity in the Material TranscribedSpans speech of both genders (ca. 50/50%), reflecting a wide range of American
dialectal variation, speaking rate and voice quality
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
corpus, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using
automatic methods45 minutes of stress-accent-labeled materialAn additional four hours of material automatically labeled with respect to accent (this
latter material not used in the current analysis, but will be available soon)
There is a Lot of Diversity in the Material TranscribedSpans speech of both genders (ca. 50/50%), reflecting a wide range of American
dialectal variation, speaking rate and voice quality
Transcription SystemA variant of Arpabet (which was also used for transcription of the TIMIT corpus)
Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….
Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….
http://www.icsi/berkeley.edu/real/stp
Phonetic Transcription How was the Labeling and Segmentation Performed?
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics students
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription and
“forced alignments” (automatic estimates of phones and boundaries)
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription and
“forced alignments” (automatic estimates of phones and boundaries) + audio
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription and
“forced alignments” (automatic estimates of phones and boundaries) + audio (listening at multiple time scales - phone, word, utterance) on Sun workstations
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription and
“forced alignments” (automatic estimates of phones and boundaries) + audio (listening at multiple time scales - phone, word, utterance) on Sun workstations
Additionally, automatic segmentation and labeling of articulatory manner was used as a guide for phonetic labeling and segmentation in recent work
Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard
corpus was manually labeled with respect to stress accent
Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard
corpus was manually labeled with respect to stress accent
Three levels of accent were distinguished:
Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard
corpus was manually labeled with respect to stress accent
Three levels of accent were distinguished:
Heavy
Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard
corpus was manually labeled with respect to stress accent
Three levels of accent were distinguished:
Heavy Light
Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard
corpus was manually labeled with respect to stress accent
Three levels of accent were distinguished:
Heavy Light None
Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard
corpus was manually labeled with respect to stress accent
Three levels of accent were distinguished:
Heavy Light None
Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard
corpus was manually labeled with respect to stress accent
Three levels of accent were distinguished:
Heavy Light None
(In actuality, labelers assigned a “1” to a fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)
Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard
corpus was manually labeled with respect to stress accent
Three levels of accent were distinguished:
Heavy Light None
(In actuality, labelers assigned a “1” to a fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)
An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)
Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard
corpus was manually labeled with respect to stress accent
Three levels of accent were distinguished:
Heavy Light None
(In actuality, labelers assigned a “1” to a fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)
An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)
In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5)
Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard
corpus was manually labeled with respect to stress accent
Three levels of accent were distinguished:
Heavy Light None
(In actuality, labelers assigned a “1” to a fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)
An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)
In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5) (and one other labeled as very lightly accented (0.25))
PART TWO
The Relation between
Stress Accent and Word Duration
Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three
principal acoustic parameters of the speech signal:
Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three
principal acoustic parameters of the speech signal:
Fundamental Frequency
Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three
principal acoustic parameters of the speech signal:
Fundamental Frequency Amplitude
Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three
principal acoustic parameters of the speech signal:
Fundamental Frequency Amplitude Duration
Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three
principal acoustic parameters of the speech signal:
Fundamental Frequency Amplitude Duration
Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three
principal acoustic parameters of the speech signal:
Fundamental Frequency Amplitude Duration
In previous studies my colleagues and I have shown that f0 -related cues play a relatively small role in stress accent assignment
(at least for spontaneous American English material)
Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three
principal acoustic parameters of the speech signal:
Fundamental Frequency Amplitude Duration
In previous studies my colleagues and I have shown that f0 -related cues play a relatively small role in stress accent assignment
(at least for spontaneous American English material)
Amplitude and duration appear to play a far more important role than f0
Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three
principal acoustic parameters of the speech signal:
Fundamental Frequency Amplitude Duration
In previous studies my colleagues and I have shown that f0 -related cues play a relatively small role in stress accent assignment
(at least for spontaneous American English material)
Amplitude and duration appear to play a far more important role than f0
Therefore, it is not unreasonable to assume that the stress accent patterns associated with words bear some tangible relation to lexical duration
Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three
principal acoustic parameters of the speech signal:
Fundamental Frequency Amplitude Duration
In previous studies my colleagues and I have shown that f0 -related cues play a relatively small role in stress accent assignment
(at least for spontaneous American English material)
Amplitude and duration appear to play a far more important role than f0
Therefore, it is not unreasonable to assume that the stress accent patterns associated with words bear some tangible relation to lexical duration
So …
Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three
principal acoustic parameters of the speech signal:
Fundamental Frequency Amplitude Duration
In previous studies my colleagues and I have shown that f0 -related cues play a relatively small role in stress accent assignment
(at least for spontaneous American English material)
Amplitude and duration appear to play a far more important role than f0
Therefore, it is not unreasonable to assume that the stress accent patterns associated with words bear some tangible relation to lexical duration
So …, let’s find out!
Word Duration and Stress Accent LevelLet’s first examine the durational properties of heavily accented words
Word Duration and Stress Accent LevelLet’s first examine the durational properties of heavily accented words
(these are words containing at least one heavily accented syllable)
Word Duration and Stress Accent LevelLet’s first examine the durational properties of heavily accented words
(these are words containing at least one heavily accented syllable)
The mean duration of this subset (36%) is 378 ms (s.d. = 168 ms)
Heavily Accented
Word Duration and Stress Accent LevelLet’s first examine the durational properties of heavily accented words (these are words
containing at least one heavily accented syllable)
The mean duration of this subset (36%) is 378 ms (s.d. = 168 ms)
Most of the heavily accented words are longer than 200 ms
Heavily Accented
Let’s now compare the duration of the heavily accented words with those of their lightly accented counterparts (25% of the total)
Word Duration and Stress Accent Level
Heavily Accented
Heavily Accented
LightlyAccented
Let’s now compare the duration of the heavily accented words with those of their lightly accented counterparts (25% of the total)
The mean duration of this subset is 255 ms (s.d. = 116 ms)
Word Duration and Stress Accent Level
Heavily Accented
LightlyAccented
Let’s now compare the duration of the heavily accented words with those of their lightly accented counterparts (25% of the total)
The mean duration of this subset is 255 ms (s.d. = 116 ms)
In many respects the durational properties of these two subsets are similar
Word Duration and Stress Accent Level
Heavily Accented
LightlyAccented
Let’s now compare the duration of unaccented words with that of their accented counterparts
Word Duration and Stress Accent Level
Heavily Accented
LightlyAccented
Unaccented
Let’s now compare the duration of unaccented words with that of their accented counterpartsThe mean duration of the unaccented subset (39%) is 149 ms (s.d. = 78 ms)
Word Duration and Stress Accent Level
Heavily Accented
LightlyAccented
Unaccented
Let’s now compare the duration of unaccented words with that of their accented counterpartsThe mean duration of the unaccented subset (39%) is 149 ms (s.d. = 78 ms)The unaccented words are generally shorter than 200 ms
Word Duration and Stress Accent Level
Heavily Accented
LightlyAccented
Unaccented
Let’s now compare the duration of unaccented words with that of their accented counterpartsThe mean duration of the unaccented subset (39%) is 149 ms (s.d. = 78 ms)The unaccented words are generally shorter than 200 ms and constitute a very different distributional form than their accented counterparts
Word Duration and Stress Accent Level
Heavily Accented
LightlyAccented
Unaccented
Let’s now compare the durational properties of ALL WORDS in the corpus with those pertaining to words of varying accent levels
Word Duration and Stress Accent Level
Heavily Accented
LightlyAccented
Unaccented
All Words
Word Duration and Stress Accent LevelLet’s now compare the durational properties of ALL WORDS in the corpus
with those pertaining to words of varying accent levels
When we do so,
Heavily Accented
LightlyAccented
Unaccented
All Words
Word Duration and Stress Accent LevelLet’s now compare the durational properties of ALL WORDS in the corpus with those
pertaining to words of varying accent levels
When we do so, we notice that the left-hand branch of the lexical distribution largely reflects unaccented words,
Heavily Accented
LightlyAccented
Unaccented
All Words
Word Duration and Stress Accent LevelLet’s now compare the durational properties of ALL WORDS in the corpus with those pertaining to
words of varying accent levels
When we do so, we notice that the left-hand branch of the lexical distribution largely reflects unaccented words, while the right-hand branch reflects mostly accented words (with the peak reflecting both)
Heavily Accented
LightlyAccented
Unaccented
All Words
Word Duration and Stress Accent LevelTherefore, it appears that the broad distribution of word duration
(and, in turn, syllable duration) largely reflects the co-existence of accented and unaccented words within spontaneous speech
Heavily Accented
LightlyAccented
Unaccented
All Words
Word Duration and Stress Accent LevelTherefore, it appears that the broad distribution of word duration (and, in turn,
syllable duration) largely reflects the co-existence of accented and unaccented words within spontaneous speech
What are the implications of this insight?
Breadth of the Modulation SpectrumThe broad bandwidth of the modulation spectrum, therefore, appears to
reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level
Breadth of the Modulation SpectrumThe broad bandwidth of the modulation spectrum, therefore, appears to
reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level
Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)
UnaccentedHeavily Accented
All Accents(Convergnce)
Breadth of the Modulation SpectrumThe broad bandwidth of the modulation spectrum, therefore, appears to
reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level
Does this insight have implications for the lower tiers of spoken language?
Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)
UnaccentedHeavily Accented
All Accents(Convergnce)
Breadth of the Modulation SpectrumThe broad bandwidth of the modulation spectrum, therefore, appears to
reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level
Does this insight have implications for the lower tiers of spoken language? (e.g., the phonetic and phonological levels)
Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)
UnaccentedHeavily Accented
All Accents(Convergnce)
Breadth of the Modulation SpectrumThe broad bandwidth of the modulation spectrum, therefore, appears to
reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level
Does this insight have implications for the lower tiers of spoken language? (e.g., the phonetic and phonological levels)
Let’s find out!
Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)
UnaccentedHeavily Accented
All Accents(Convergnce)
INTERMEZZO
Anatomy of the Syllable
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure
In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure
In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure
In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)
As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure
In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)
As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns
What is an onset?
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure
In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)
As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns
What is a onset? What is a nucleus?
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure
In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)
As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns
What is a onset? What is a nucleus? What is a coda?
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure
In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)
As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns
What is a nucleus? What is a coda? What is a coda?
The following slides provide a brief (and gentle) introduction to syllable structure
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA
“J” = JUNCTURE
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA
Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)
“J” = JUNCTURE
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA
Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)
Most (but not all) syllables also contain an ONSET (usually a CONSONANT)
“J” = JUNCTURE
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA
Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)
Most (but not all) syllables also contain an ONSET (usually a CONSONANT)
Many syllables contain a CODA (also typically a CONSONANT)
“J” = JUNCTURE
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA
Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)
Most (but not all) syllables also contain an ONSET (usually a CONSONANT)
Many syllables contain a CODA (also typically a CONSONANT)
The most common syllable form in English is Onset + Nucleus + Coda (“Nine”)
“J” = JUNCTURE
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA
Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)
Most (but not all) syllables also contain an ONSET (usually a CONSONANT)
Many syllables contain a CODA (also typically a CONSONANT)
The most common syllable form in English is Onset + Nucleus + Coda (“Nine”)
Followed in popularity by Onset + Nucleus (“Two”)
“J” = JUNCTURE
PART THREE
Stress Accent and Syllable Position
The Importance of Syllable StructureBefore going into the details of durational variation at the segmental level
we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent
The Importance of Syllable StructureBefore going into the details of durational variation at the segmental level
we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent
These data serve to illustrate the sort of variation observed that is conditioned by position within the syllable
All Segments
Pronunciation Variation – Syllable and Accent
Deletions
InsertionsSubstitutions
Pronunciation variation is systematic at the level of the syllable
CODATerritory
ONSETTerritory
NUCLEUSTerritory
All Segments
Pronunciation Variation – Syllable and Accent
Deletions
InsertionsSubstitutions
Pronunciation variation is systematic at the level of the syllable
It’s also systematic when stress accent is taken into account
CODATerritory
ONSETTerritory
NUCLEUSTerritory
Pronunciation Variation – Syllable and Accent Pronunciation variation is systematic at the level of the syllable
It’s also systematic when stress accent is taken into account
BOTH syllable structure and accent level are required for a full accounting
All Segments Deletions
InsertionsSubstitutions
CODATerritory
ONSETTerritory
NUCLEUSTerritory
A Coarse Perspective on Pronunciation Variation(at the level of the syllable and stress accent)
Analysis of Durational Properties of SpeechThe following analyses are conditioned on stress accent level and (for the
most part) syllable position
Analysis of Durational Properties of SpeechThe following analyses are conditioned on stress accent level and (for the
most part) syllable position
We will begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration
Analysis of Durational Properties of SpeechThe following analyses are conditioned on stress accent level and (for the
most part) syllable position
We will begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration
However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level
Analysis of Durational Properties of SpeechThe following analyses are conditioned on stress accent level and (for the
most part) syllable position
We will begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration
However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level
Under such conditions, the durational properties associated with light accent are generally intermediate between heavy accent and none
Syllable Duration - Across Syllable FormsThere is a broad range of syllable structures observed in spoken English
Syllable Duration - Across Syllable FormsThere is a broad range of syllable structures observed in spoken English
Together, the V, VC, CV and CVC forms account for 85% of syllables
Syllable Duration - Across Syllable FormsThere is a broad range of syllable structures observed in spoken English
Together, the V, VC, CV and CVC forms account for 85% of syllables
The CVCC and CCVC forms account for another 10%
Syllable Duration - Across Syllable FormsThere is a broad range of syllable structures observed in spoken English
Together, the V, VC, CV and CVC forms account for 85% of syllables
The CVCC and CCVC forms account for another 10%
Together, the CV and CVC forms cover ca. 60% of the syllables
Syllable Duration - Across Syllable FormsIt is not surprising that syllable duration is largely a function of the number
of segments within the syllable (as shown in the graph below)
Canonical Syllable Forms
V = VowelC = Consonant
Syllable Duration - Across Syllable FormsIt is not surprising that syllable duration is largely a function of the number
of segments within the syllable (as shown in the graph below)
Note the systematic lengthening of the syllable for each form as the accent level increases from none to light to heavy
Canonical Syllable Forms
V = VowelC = Consonant
Syllable Duration - Across Syllable FormsIt is not surprising that syllable duration is largely a function of the number
of segments within the syllable (as shown in the graph below)
Note the systematic lengthening of the syllable for each form as the accent level increases from none to light to heavy
This pattern is representative of accent’s impact on duration
Canonical Syllable Forms
V = VowelC = Consonant
Syllable Duration - Across Syllable FormsIt is not surprising that syllable duration is largely a function of the number
of segments within the syllable (as shown in the graph below)
Note the systematic lengthening of the syllable for each form as the accent level increases from none to light to heavy
This pattern is representative of accent’s impact on duration (as we’ll see)
Canonical Syllable Forms
V = VowelC = Consonant
Syllable Duration - Accent Level/Syllable Form
Canonical Syllable Forms
This graph shows the same data as the previous slides, but from the perspective of only two accent levels (heavy and none)
V = VowelC = Consonant
Syllable Duration - Accent Level/Syllable Form
Canonical Syllable Forms
This graph shows the same data as the previous slides, but from the perspective of only two accent levels (heavy and none)
The heavily accented syllables are generally 60-100% longer than their unaccented counterparts
V = VowelC = Consonant
Syllable Duration - Accent Level/Syllable Form
Canonical Syllable Forms
This graph shows the same data as the previous slides, but from the perspective of only two accent levels (heavy and none)
The heavily accented syllables are generally 60-100% longer than their unaccented counterparts
The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV)
V = VowelC = Consonant
Syllable Duration - Accent Level/Syllable Form
Canonical Syllable Forms
This graph shows the same data as the previous slides, but from the perspective of only two accent levels (heavy and none)
The heavily accented syllables are generally 60-100% longer than their unaccented counterparts
The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV)
This pattern implies that accent has the greatest impact on vocalic duration
V = VowelC = Consonant
Canonical Syllable Forms
Nucleus Duration - Accent Level/Syllable FormThe hypothesis delineated on the previous slide (that accent has the most
profound impact on vocalic duration) is confirmed in the graph below
Canonical Syllable Forms
Nucleus Duration - Accent Level/Syllable FormThe hypothesis delineated on the previous slide (that accent has the most
profound impact on vocalic duration) is confirmed in the graph below
The duration of vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts
Canonical Syllable Forms
Nucleus Duration - Accent Level/Syllable FormThe hypothesis delineated on the previous slide (that accent has the most
profound impact on vocalic duration) is confirmed in the graph below
The duration of vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts
This pattern implies that the syllable nucleus absorbs a major component of accent’s impact (at least as far as duration is concerned)
PART FOUR
Stress Accent and the Vocalic Nucleus
Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the structure of the syllable has relatively little impact on vocalic duration
Stress Accent’s Impact on the Vocalic Nucleus
Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the structure of the syllable has relatively little impact on vocalic duration
As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form
Stress Accent’s Impact on the Vocalic Nucleus
Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the structure of the syllable has relatively little impact on vocalic duration
As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form
We now examine vocalic duration in somewhat greater detail and illustrate how duration, stress accent and vocalic identity interact
Stress Accent’s Impact on the Vocalic Nucleus
Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the structure of the syllable has relatively little impact on vocalic duration
As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form
We now examine vocalic duration in somewhat greater detail and illustrate how duration, stress accent and vocalic identity interact
But first … a brief primer on vocalic acoustics
Stress Accent’s Impact on the Vocalic Nucleus
Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the structure of the syllable has relatively little impact on vocalic duration
As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form
We now examine vocalic duration in somewhat greater detail and illustrate how duration, stress accent and vocalic identity interact
But first … a brief primer on vocalic acoustics (which should facilitate digesting the material that follows)
Stress Accent’s Impact on the Vocalic Nucleus
INTERMEZZO
A Brief Primer on Vowel Acoustics
A Brief Primer on Vocalic Acoustics
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue
A Brief Primer on Vocalic Acoustics
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue
• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance
A Brief Primer on Vocalic Acoustics
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue
• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance
• The height parameter is closely linked to the frequency of F1
A Brief Primer on Vocalic Acoustics
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue
• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance
• The height parameter is closely linked to the frequency of F1
In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows:
A Brief Primer on Vocalic Acoustics
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue
• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance
• The height parameter is closely linked to the frequency of F1
In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows:
A Brief Primer on Vocalic Acoustics
The Spatial Patterning of Duration
in
Vocalic Nuclei
Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data
Spatial Patterning of Duration
Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data
The duration will be plotted on a 2-D grid, where the x-axis will always be in terms of hypothetical front-back tongue position
Spatial Patterning of Duration
Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data
The duration will be plotted on a 2-D grid, where the x-axis will always be in terms of hypothetical front-back tongue position (and hence remain a constant throughout the plots to follow)
Spatial Patterning of Duration
Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data
The duration will be plotted on a 2-D grid, where the x-axis will always be in terms of hypothetical front-back tongue position (and hence remain a constant throughout the plots to follow)
The y-axis will serve as the dependent measure expressed in terms of duration or the proportion of fully stressed (or unstressed) nuclei
Spatial Patterning of Duration
Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data
The duration will be plotted on a 2-D grid, where the x-axis will always be in terms of hypothetical front-back tongue position (and hence remain a constant throughout the plots to follow)
The y-axis will serve as the dependent measure expressed in terms of duration or the proportion of fully stressed (or unstressed) nuclei
Spatial Patterning of Duration
Vocalic Duration and Vowel HeightThe spatial patterning of vocalic segments is systematic with respect to
duration
Vocalic Duration and Vowel HeightThe spatial patterning of vocalic segments is systematic with respect to
duration
Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels
Vocalic Duration and Vowel Height
All nuclei Diphthongs Monophthongs
The spatial patterning of vocalic segments is systematic with respect to duration
Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels
Vocalic Duration and Vowel Height
All nuclei Diphthongs Monophthongs
The spatial patterning of vocalic segments is systematic with respect to duration
Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels
Thus, duration appears to be highly correlated with vowel height
Vocalic Duration and Vowel Height
All nuclei Diphthongs Monophthongs
The spatial patterning of vocalic segments is systematic with respect to duration
Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels
Thus, duration appears to be highly correlated with vowel height
But … the situation is a little more complicated than first appearances would suggest
Durational Differences - Stressed/UnstressedThere is a large dynamic range in duration between accented and unaccented
vocalic nuclei
Canonical Syllable Forms
Durational Differences - Stressed/UnstressedThere is a large dynamic range in duration between accented and unaccented vocalic nuclei
Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs
Canonical Syllable Forms
Durational Differences - Stressed/UnstressedThere is a large dynamic range in duration between accented and unaccented vocalic nuclei
Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs
Canonical Syllable Forms
Lax monophthongs
Vocalic Identity Among Unstressed NucleiThe high, lax monophthongs are almost always unstressed
Vocalic Identity Among Unstressed NucleiThe high, lax monophthongs are almost always unstressed
The low vowels, be they monophthongs or diphthongs, are rarely unstressed
Vocalic Identity Among Unstressed NucleiThe high, lax monophthongs are almost always unstressed
The low vowels, be they monophthongs or diphthongs, are rarely unstressed
The high diphthongs and high/mid, tense monophthongs occupy an intermediate position
The high vowels are rarely fully stressed
Vocalic Identity Among Fully Stressed Nuclei
The high vowels are rarely fully stressed
The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed
Vocalic Identity Among Fully Stressed Nuclei
The high vowels are rarely fully stressed
The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed
An intermediate degree of stress accounts for the other vocalic instances
Vocalic Identity Among Fully Stressed Nuclei
The high vowels are rarely fully stressed
The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed
An intermediate degree of stress accounts for the other vocalic instances (but will not be addressed here)
Vocalic Identity Among Fully Stressed Nuclei
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
Is It Stress? Vocalic Identity? Or What?
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)
Is It Stress? Vocalic Identity? Or What?
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Is It Stress? Vocalic Identity? Or What?
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High Vowels
Is It Stress? Vocalic Identity? Or What?
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Is It Stress? Vocalic Identity? Or What?
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress Accent
Is It Stress? Vocalic Identity? Or What?
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs
Is It Stress? Vocalic Identity? Or What?
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs
High Vowels are Fully Stressed Extremely Rarely
Is It Stress? Vocalic Identity? Or What?
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs
High Vowels are Fully Stressed Extremely RarelyThis is particularly so for monophthongs, but also applies to diphthongs
Is It Stress? Vocalic Identity? Or What?
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs
High Vowels are Fully Stressed Extremely RarelyThis is particularly so for monophthongs, but also applies to diphthongs
Thus, Stress Accent Appears to Be Intricately Involved with Vocalic Identity
Is It Stress? Vocalic Identity? Or What?
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs
High Vowels are Fully Stressed Extremely RarelyThis is particularly so for monophthongs, but also applies to diphthongs
Thus, Stress Accent Appears to Be Intricately Involved with Vocalic Identity (as illustrated on the next several slides)
Is It Stress? Vocalic Identity? Or What?
The Vowel Space Under (Full) Stress (Accent) There is a relatively even distribution of segments across the vowel space,
with a slight bias towards the front and central vowels
Canonical Vowels Only
In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space
The Vowel Space Without (Stress) Accent
Canonical Vowels Only
In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space
The low and mid vowels “get creamed”
The Vowel Space Without (Stress) Accent
Canonical Vowels Only
Stress accent exerts a profound effect on the character of the vowel space
The Vowel Spaces Compared
Heavily Accented Unaccented
Canonical Vowels Only
Stress accent exerts a profound effect on the character of the vowel space
High vowels are largely associated with unaccented syllables
The Vowel Spaces Compared
Heavily Accented Unaccented
Canonical Vowels Only
Stress accent exerts a profound effect on the character of the vowel space
High vowels are largely associated with unaccented syllables
Low vowels are mostly associated with accented forms
The Vowel Spaces Compared
Heavily Accented Unaccented
Canonical Vowels Only
Stress accent exerts a profound effect on the character of the vowel space
High vowels are largely associated with unaccented syllables
Low vowels are mostly associated with accented forms
This distinction between accented and unaccented syllables is of profound importance for understanding (and modeling) pronunciation variation
The Vowel Spaces Compared
Heavily Accented Unaccented
Canonical Vowels Only
PART FIVE
Stress Accent’s Impact on Syllable Onsets
Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to
“lexical access”
Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to
“lexical access”
It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level
Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to
“lexical access”
It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level
Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level
Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to
“lexical access”
It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level
Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level
The following slides suggest that this assumption is incorrect
Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to
“lexical access”
It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level
Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level
The following slides suggest that this assumption is incorrect,
And that the structure of the onset is more complex (and more interesting) than initial intuition would suggest
Canonical Syllable Forms
Onset Duration - Accent Level/Syllable FormThe duration of the syllable onset varies significantly as a function of accent
level (though not quite as much as in vocalic constituents)
Canonical Syllable Forms
Onset Duration - Accent Level/Syllable FormThe duration of the syllable onset varies significantly as a function of accent
level (though not quite as much as in vocalic constituents)
Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter
Canonical Syllable Forms
Onset Duration - Accent Level/Syllable FormThe duration of the syllable onset varies significantly as a function of accent level (though not
quite as much as in vocalic constituents)
Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter
The duration of unaccented onsets is similar across syllable forms
Canonical Syllable Forms
Onset Duration - Accent Level/Syllable FormOnsets of accented syllables are generally 50-60% longer than their
unaccented counterparts
Canonical Syllable Forms
Onset Duration - Accent Level/Syllable FormOnsets of accented syllables are generally 50-60% longer than their
unaccented counterparts
Although this durational difference is not quite as large as observed for vocalic nuclei, it is still substantial (and mostly consistent across forms)
Onset Duration and Place of ArticulationIt is of interest to examine accent’s impact on duration of onset (and coda)
constituents in somewhat greater detail
Onset Duration and Place of ArticulationIt is of interest to examine accent’s impact on duration of onset (and coda)
constituents in somewhat greater detail
A convenient means to do so is to partition the data with respect to place of maximum articulatory constriction in order to highlight certain patterns
Onset Duration and Place of ArticulationIt is of interest to examine accent’s impact on duration of onset (and coda)
constituents in somewhat greater detail
A convenient means to do so is to partition the data with respect to place of maximum articulatory constriction in order to highlight certain patterns
What is place of articulation?
Onset Duration and Place of ArticulationIt is of interest to examine accent’s impact on duration of onset (and coda)
constituents in somewhat greater detail
A convenient means to do so is to partition the data with respect to place of maximum articulatory constriction in order to highlight certain patterns
What is place of articulation? Let’s find out!
Place of Articulation – A Brief PrimerThe tongue contacts (or nearly so) the roof of the mouth in producing many of the consonantal sounds in English
AnteriorLabial [p] [b] [m]Labio-dental [f] [v] Inter-dental [th] [dh]
CentralAlveolar [t] [d] [n] [s] [z]
PosteriorPalatal [sh] [zh]Velar [k] [g] [ng]
ChameleonRhoticized [r]Lateral [l]Approximant [hh]
From Daniloff (1973)
Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda)
constituents on the basis of articulatory place
Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda)
constituents on the basis of articulatory place
First, we will examine the anterior consonants, followed by the central and posterior onsets
Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda)
constituents on the basis of articulatory place
First, we will examine the anterior consonants, followed by the central and posterior onsets
Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”)
Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda)
constituents on the basis of articulatory place
First, we will examine the anterior consonants, followed by the central and posterior onsets
Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”)
Although the heavily accented onsets are generally 50-60% longer than their unaccented counterparts …
Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda)
constituents on the basis of articulatory place
First, we will examine the anterior consonants, followed by the central and posterior onsets
Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”)
Although the heavily accented onsets are generally 50-60% longer than their unaccented counterparts …
There is a large disparity in the durational differences due to accent level
Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda) constituents
on the basis of articulatory place
First, we will examine the anterior consonants, followed by the central and posterior onsets
Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”)
Although the heavily accented onsets are generally 50-60% longer than their unaccented counterparts …
There is a large disparity in the durational differences due to accent level
We will now examine the specific durational patterns as a function of articulatory place ...
Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda) constituents on the
basis of articulatory place
First, we will examine the anterior consonants, followed by the central and posterior onsets
Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”)
Although the heavily accented onsets are generally 50-60% longer than their unaccented counterparts …
There is a large disparity in the durational differences due to accent level
We will now examine the specific durational patterns as a function of articulatory place ...
The patterns are revealing
Syllable Onset Duration - ANTERIOR Place
Canonical Syllable Forms
The voiceless consonants ([p] and [f]) are longer than the other segments
Syllable Onset Duration - ANTERIOR Place
Canonical Syllable Forms
The voiceless consonants ([p] and [f]) are longer than the other segments
The largest durational disparity (as a function of accent level) is exhibited in the glide [y]
Syllable Onset Duration - ANTERIOR Place
Canonical Syllable Forms
The voiceless consonants ([p] and [f]) are longer than the other segments
The largest durational disparity (as a function of accent level) is exhibited in the glide [y]
The smallest durational disparity is manifest in the voiced fricative [dh]
Syllable Onset Duration - ANTERIOR Place
Canonical Syllable Forms
The voiceless consonants ([p] and [f]) are longer than the other segments
The largest durational disparity (as a function of accent level) is exhibited in the glide [y]
The smallest durational disparity is manifest in the voiced fricative [dh]
The other segments exhibit intermediate patterns
Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its
impact on segmental realization (i.e., whether the segment is realized canonically or not …)
Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its
impact on segmental realization (i.e., whether the segment is realized canonically or not …)
Usually, non-canonical realizations are manifest as segmental deletions
Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its
impact on segmental realization (i.e., whether the segment is realized canonically or not …)
Usually, non-canonical realizations are manifest as segmental deletions
The pattern of segmental realization bears some correspondence to durational variation as a function of accent level
Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its
impact on segmental realization (i.e., whether the segment is realized canonically or not …)
Usually, non-canonical realizations are manifest as segmental deletions
The pattern of segmental realization bears some correspondence to durational variation as a function of accent level
But also exhibits some interesting differences
Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its
impact on segmental realization (i.e., whether the segment is realized canonically or not …)
Usually, non-canonical realizations are manifest as segmental deletions
The pattern of segmental realization bears some correspondence to durational variation as a function of accent level
But also exhibits some interesting differences(which are potentially significant for models of phonetic organization)
Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its
impact on segmental realization (i.e., whether the segment is realized canonically or not …)
Usually, non-canonical realizations are manifest as segmental deletions
The pattern of segmental realization bears some correspondence to durational variation as a function of accent level
But also exhibits some interesting differences(which are potentially significant for models of phonetic organization)
Before we examine the segmental patterns in detail, a brief primer on the interpretation of these data is presented
Road Map - How to Interpret the Data
Accent
Segment Can Trans Can Trans Can Trans Can Trans
p 203 205 153 153 94 94 450 452
b 126 127 227 225 214 190 567 542
m 137 137 211 211 116 110 464 458
f 136 136 104 104 113 103 353 343
v 35 33 58 58 108 93 201 184
th 62 61 102 100 28 26 192 187
TotalHeavy Light None
dh 95 80 311 257 625 451 1031 788
y 63 72 135 136 193 145 391 353
Compare the numbers in the YELLOW and ORANGE columns
Most numbers in the YELLOW / ORANGE columns will be similar
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Road Map - How to Interpret the Data
Accent
Segment Can Trans Can Trans Can Trans Can Trans
p 203 205 153 153 94 94 450 452
b 126 127 227 225 214 190 567 542
m 137 137 211 211 116 110 464 458
f 136 136 104 104 113 103 353 343
v 35 33 58 58 108 93 201 184
th 62 61 102 100 28 26 192 187
TotalHeavy Light None
dh 95 80 311 257 625 451 1031 788
y 63 72 135 136 193 145 391 353
Compare the numbers in the YELLOW and ORANGE columns
Most numbers in the YELLOW / ORANGE columns will be similar
Indicating that the phonetic realization of the segment is the canonical form
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Road Map - How to Interpret the Data
Accent
Segment Can Trans Can Trans Can Trans Can Trans
p 203 205 153 153 94 94 450 452
b 126 127 227 225 214 190 567 542
m 137 137 211 211 116 110 464 458
f 136 136 104 104 113 103 353 343
v 35 33 58 58 108 93 201 184
th 62 61 102 100 28 26 192 187
TotalHeavy Light None
dh 95 80 311 257 625 451 1031 788
y 63 72 135 136 193 145 391 353
Compare the numbers in the YELLOW and ORANGE columns
Most numbers in the YELLOW / ORANGE columns will be similar
Indicating that the phonetic realization of the segment is the canonical form
A large disparity between columns is marked with a blue box
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Road Map - How to Interpret the Data
Accent
Segment Can Trans Can Trans Can Trans Can Trans
p 203 205 153 153 94 94 450 452
b 126 127 227 225 214 190 567 542
m 137 137 211 211 116 110 464 458
f 136 136 104 104 113 103 353 343
v 35 33 58 58 108 93 201 184
th 62 61 102 100 28 26 192 187
TotalHeavy Light None
dh 95 80 311 257 625 451 1031 788
y 63 72 135 136 193 145 391 353
Compare the numbers in the YELLOW and ORANGE columns
Most numbers in the YELLOW / ORANGE columns will be similar
Indicating that the phonetic realization of the segment is the canonical form
A large disparity between columns is marked with a blue box
READY?
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Road Map - How to Interpret the Data
Accent
Segment Can Trans Can Trans Can Trans Can Trans
p 203 205 153 153 94 94 450 452
b 126 127 227 225 214 190 567 542
m 137 137 211 211 116 110 464 458
f 136 136 104 104 113 103 353 343
v 35 33 58 58 108 93 201 184
th 62 61 102 100 28 26 192 187
TotalHeavy Light None
dh 95 80 311 257 625 451 1031 788
y 63 72 135 136 193 145 391 353
Compare the numbers in the YELLOW and ORANGE columns
Most numbers in the YELLOW / ORANGE columns will be similar
Indicating that the phonetic realization of the segment is the canonical form
A large disparity between columns is marked with a blue box
READY? OK, Let’s go!
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Onset Statistics – ANTERIOR Place
Accent
Segment Can Trans Can Trans Can Trans Can Trans
p 203 205 153 153 94 94 450 452
b 126 127 227 225 214 190 567 542
m 137 137 211 211 116 110 464 458
f 136 136 104 104 113 103 353 343
v 35 33 58 58 108 93 201 184
th 62 61 102 100 28 26 192 187
TotalHeavy Light None
dh 95 80 311 257 625 451 1031 788
y 63 72 135 136 193 145 391 353
Stress accent exerts relatively little affect on anterior onset segments
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Accent
Segment Can Trans Can Trans Can Trans Can Trans
p 203 205 153 153 94 94 450 452
b 126 127 227 225 214 190 567 542
m 137 137 211 211 116 110 464 458
f 136 136 104 104 113 103 353 343
v 35 33 58 58 108 93 201 184
th 62 61 102 100 28 26 192 187
TotalHeavy Light None
dh 95 80 311 257 625 451 1031 788
y 63 72 135 136 193 145 391 353
Syllable Onset Statistics – ANTERIOR PlaceStress accent exerts relatively little affect on anterior onset segments
EXCEPT for [dh] and [y]
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Onset Statistics – ANTERIOR Place
Accent
Segment Can Trans Can Trans Can Trans Can Trans
p 203 205 153 153 94 94 450 452
b 126 127 227 225 214 190 567 542
m 137 137 211 211 116 110 464 458
f 136 136 104 104 113 103 353 343
v 35 33 58 58 108 93 201 184
th 62 61 102 100 28 26 192 187
TotalHeavy Light None
dh 95 80 311 257 625 451 1031 788
y 63 72 135 136 193 145 391 353
Stress accent exerts relatively little affect on anterior onset segments
EXCEPT for [dh] and [y]
[dh] (as in “the” and “them”) tends to delete in unaccented syllables, as does [y] (although to a lesser extent)
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Onset Duration - CENTRAL Place
Canonical Syllable Forms
The voiceless consonants ([t] and [s]) are longer than the other segments
Syllable Onset Duration - CENTRAL Place
Canonical Syllable Forms
The voiceless consonants ([t] and [s]) are longer than the other segments
The alveolar flap [dx] and nasal flap [nx] are the shortest segments and don’t exhibit a durational disparity as a function of accent level
Accent
Segment Can Trans Can Trans Can Trans Can Trans
t 241 245 276 230 513 276 1030 751
d 141 143 149 134 173 128 463 405
dx 0 3 0 62 0 179 0 244
n 133 135 237 196 194 130 564 461
nx 0 2 0 40 0 73 0 115
s 289 290 284 287 187 186 760 763
TotalHeavy Light None
z 14 13 16 16 43 45 73 74
Central segments tend to “disappear” under (absence of) stress (accent)
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Onset Statistics – CENTRAL Place
Accent
Segment Can Trans Can Trans Can Trans Can Trans
t 241 245 276 230 513 276 1030 751
d 141 143 149 134 173 128 463 405
dx 0 3 0 62 0 179 0 244
n 133 135 237 196 194 130 564 461
nx 0 2 0 40 0 73 0 115
s 289 290 284 287 187 186 760 763
TotalHeavy Light None
z 14 13 16 16 43 45 73 74
Central segments tend to “disappear” under (absence) of stress (accent)
There is also a tendency for flaps ([dx] and [dx]) to insert under similar conditions
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Onset Statistics – CENTRAL Place
Accent
Segment Can Trans Can Trans Can Trans Can Trans
t 241 245 276 230 513 276 1030 751
d 141 143 149 134 173 128 463 405
dx 0 3 0 62 0 179 0 244
n 133 135 237 196 194 130 564 461
nx 0 2 0 40 0 73 0 115
s 289 290 284 287 187 186 760 763
TotalHeavy Light None
z 14 13 16 16 43 45 73 74
Syllable Onset Statistics – CENTRAL PlaceCentral segments tend to “disappear” under (absence) of stress (accent)
There is also a tendency for flaps ([dx] and [dx]) to insert under similar conditions
In heavily accented syllables, central segments maintain their canonical identity
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Onset Duration - POSTERIOR Place
CANONICAL Syllable Forms
The voiceless consonants ([k], [sh], [ch]) are longer than the other segments
Syllable Onset Duration - POSTERIOR Place
CANONICAL Syllable Forms
The voiceless consonants ([k], [sh], [ch]) are longer than the other segments
Most of the segments exhibit a durational disparity between accented and unaccented forms
Syllable Onset Duration - POSTERIOR Place
CANONICAL Syllable Forms
The voiceless consonants ([k], [sh], [ch]) are longer than the other segments
Most of the segments exhibit a durational disparity between accented and unaccented forms
The duration of the voiced segments in unaccented syllables is ca. 50-60 ms
Syllable Onset Duration - POSTERIOR Place
CANONICAL Syllable Forms
The voiceless consonants ([k], [sh], [ch]) are longer than the other segments
Most of the segments exhibit a durational disparity between accented and unaccented forms
The duration of the voiced segments in unaccented syllables is ca. 50-60 ms
The glide [w] exhibits a significant disparity between accented and unaccented forms
Accent
Segment Can Trans Can Trans Can Trans Can Trans
k 185 186 189 187 170 168 544 541
g 115 116 138 137 54 51 307 304
ng 0 0 2 3 1 1 3 4
sh 26 26 40 40 73 80 139 146
zh 0 1 2 9 11 17 13 27
ch 32 34 19 27 22 23 73 84
TotalHeavy Light None
jh 31 30 52 43 58 48 141 121
w 201 209 310 330 276 287 787 826
q 0 33 0 64 0 38 0 135
Posterior segments are remarkably stable in onset position
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Onset Statistics – Posterior Place
Syllable Onset Statistics – Posterior PlacePosterior segments are remarkably stable in onset position
The only significant “deviation” from canonical representation is the intrusion of the glottal stop [q], which lacks phonemic status in English
Accent
Segment Can Trans Can Trans Can Trans Can Trans
k 185 186 189 187 170 168 544 541
g 115 116 138 137 54 51 307 304
ng 0 0 2 3 1 1 3 4
sh 26 26 40 40 73 80 139 146
zh 0 1 2 9 11 17 13 27
ch 32 34 19 27 22 23 73 84
TotalHeavy Light None
jh 31 30 52 43 58 48 141 121
w 201 209 310 330 276 287 787 826
q 0 33 0 64 0 38 0 135
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Onset Duration - Place Chameleons
CANONICAL Syllable Forms
Place chameleon segments exhibit a consistent durational disparity between accented and unaccented forms
Syllable Onset Duration - Place Chameleons
CANONICAL Syllable Forms
Place chameleon segments exhibit a consistent durational disparity between accented and unaccented forms
In unaccented syllables the duration of these segments is ca. 50-60 ms
Accent
Segment Can Trans Can Trans Can Trans Can Trans
r 272 269 233 215 233 162 738 646
l 184 180 226 212 220 162 630 554
hh 158 156 169 157 67 37 394 350
er 0 0 0 2 0 0 0 2
lg 0 2 0 8 0 21 0 31
el 0 1 0 0 0 0 0 1
TotalHeavy Light None
Syllable Onset Statistics – Place Chameleons“Chameleons” assimilate their place of articulation to the following vowel
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Accent
Segment Can Trans Can Trans Can Trans Can Trans
r 272 269 233 215 233 162 738 646
l 184 180 226 212 220 162 630 554
hh 158 156 169 157 67 37 394 350
er 0 0 0 2 0 0 0 2
lg 0 2 0 8 0 21 0 31
el 0 1 0 0 0 0 0 1
TotalHeavy Light None
Syllable Onset Statistics – Place Chameleons“Chameleons” assimilate their place of articulation to the following vowel
They are relatively stable at syllable onset, except in unaccented forms
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Accent
Segment Can Trans Can Trans Can Trans Can Trans
r 272 269 233 215 233 162 738 646
l 184 180 226 212 220 162 630 554
hh 158 156 169 157 67 37 394 350
er 0 0 0 2 0 0 0 2
lg 0 2 0 8 0 21 0 31
el 0 1 0 0 0 0 0 1
TotalHeavy Light None
Syllable Onset Statistics – Place Chameleons“Chameleons” assimilate their place of articulation to the following vowel
They are relatively stable at syllable onset, except in unaccented forms
The reduced form of [l] is [lg], a glide-like element – it tends to assume the functional status of [l] in unaccented syllables
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Pronunciation Patterns – Syllable OnsetsThe ANTERIOR and POSTERIOR onsets are generally canonically realized
(the exceptions typically function as “junctures,” rather than as segments)
C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables
Place of Articulation Approximants
Pronunciation Patterns – Syllable OnsetsThe ANTERIOR and POSTERIOR onsets are generally canonically realized
(the exceptions typically function as “junctures,” rather than as segments)
The CENTRAL and PLACE CHAMELEON onsets are often non-canonical (and also often function as “junctures”)
C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables
Place of Articulation Approximants
PART SIX
Stress Accent’s Impact on Syllable Codas
Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets
Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets
The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are NOT taken into account)
Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets
The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are NOT taken into account)
There is a far greater probability of segmental deletion in coda constituents
Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets
The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are NOT taken into account)
There is a far greater probability of segmental deletion in coda constituents
Accent level exerts a powerful influence on segmental deletion and on segmental duration
Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets
The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are NOT taken into account)
There is a far greater probability of segmental deletion in coda constituents
Accent level exerts a powerful influence on segmental deletion and on segmental duration
To a certain degree segmental deletion and duration interact (or are flip sides of the same phonetic coin)
Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets
The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are NOT taken into account)
There is a far greater probability of segmental deletion in coda constituents
Accent level exerts a powerful influence on segmental deletion and on segmental duration
To a certain degree segmental deletion and duration interact (or are flip sides of the same phonetic coin)
(for this reason the durational properties of ALL syllables, including those in which coda segments are deleted, are also shown)
Syllable Coda Duration - ANTERIOR Place
CANONICAL Syllable Forms
The durational disparity between accented and unaccented forms is smaller for codas and for onsets
Syllable Coda Duration - ANTERIOR Place
CANONICAL Syllable Forms
The durational disparity between accented and unaccented forms is smaller for codas and for onsets
Certain segments exhibit little if any difference in duration as a function of accent (e.g., [b], [m], [v])
Syllable Coda Duration - ANTERIOR Place
CANONICAL Syllable Forms
The durational disparity between accented and unaccented forms is smaller for codas and for onsets
Certain segments exhibit little if any difference in duration as a function of accent (e.g., [b], [m], [v])
Such segments manifest certain properties of flaps
Syllable Coda Duration - ANTERIOR Place
ALLSyllable Forms
Because of the significant number of deletions in coda constituents, particularly in unaccented syllables, the durational disparity between accented and unaccented syllables is preserved when duration is computed across ALL syllable forms (including those with deletions)
Syllable Coda Duration - ANTERIOR Place
ALLSyllable Forms
Because of the significant number of deletions in coda constituents, particularly in unaccented syllables, the durational disparity between accented and unaccented syllables is preserved when duration is computed across ALL syllable forms (including those with deletions)
Those segments exhibiting flap-like properties (e.g., [b], [m], [v]) tend to delete the most in unaccented codas
Accent
Segment Can Trans Can Trans Can Trans Can Trans
p 33 32 39 32 17 13 89 77
b 9 6 4 4 1 1 14 11
m 108 96 148 148 112 83 368 327
f 37 36 40 40 36 48 113 124
v 63 55 102 87 172 94 337 236
th 11 10 24 16 34 20 69 46
TotalHeavy Light None
dh 0 0 0 4 0 5 0 9
Syllable Coda Statistics – Anterior PlaceAnterior coda segments are relatively stable under stress (accent)
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Accent
Segment Can Trans Can Trans Can Trans Can Trans
p 33 32 39 32 17 13 89 77
b 9 6 4 4 1 1 14 11
m 108 96 148 148 112 83 368 327
f 37 36 40 40 36 48 113 124
v 63 55 102 87 172 94 337 236
th 11 10 24 16 34 20 69 46
TotalHeavy Light None
dh 0 0 0 4 0 5 0 9
Syllable Coda Statistics – Anterior PlaceAnterior coda segments are relatively stable under stress (accent)
The segments [m] and [v] are exceptions
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Accent
Segment Can Trans Can Trans Can Trans Can Trans
p 33 32 39 32 17 13 89 77
b 9 6 4 4 1 1 14 11
m 108 96 148 148 112 83 368 327
f 37 36 40 40 36 48 113 124
v 63 55 102 87 172 94 337 236
th 11 10 24 16 34 20 69 46
TotalHeavy Light None
dh 0 0 0 4 0 5 0 9
Syllable Coda Statistics – Anterior PlaceAnterior coda segments are relatively stable under stress (accent)
The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Accent
Segment Can Trans Can Trans Can Trans Can Trans
p 33 32 39 32 17 13 89 77
b 9 6 4 4 1 1 14 11
m 108 96 148 148 112 83 368 327
f 37 36 40 40 36 48 113 124
v 63 55 102 87 172 94 337 236
th 11 10 24 16 34 20 69 46
TotalHeavy Light None
dh 0 0 0 4 0 5 0 9
Syllable Coda Statistics – Anterior PlaceAnterior coda segments are relatively stable under stress (accent)
The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and
They tend to delete in unaccented syllables
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Coda Duration - CENTRAL Place
CANONICAL Syllable Forms
The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables (see durational data for ALL syllables)
Syllable Coda Duration - CENTRAL Place
CANONICAL Syllable Forms
The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables (see durational data for ALL syllables)
The duration of many of the coda segments do not exhibit a difference in duration (when computed for the canonical syllable forms)
Syllable Coda Duration - CENTRAL Place
CANONICAL Syllable Forms
The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables (see durational data for ALL syllables)
The duration of many of the coda segments do not exhibit a difference in duration (when computed for the canonical syllable forms)
Most of the unaccented codas are short in duration
Syllable Coda Duration - CENTRAL Place
ALL Syllable Forms
Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions
Syllable Coda Duration - CENTRAL Place
ALL Syllable Forms
Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions
In some sense the default duration for central codas is very short (more on this point later on in the presentation)
Accent
Segment Can Trans Can Trans Can Trans Can Trans
t 322 126 575 191 562 172 1459 489
d 200 119 295 127 370 96 865 342
n 311 237 498 381 773 542 1582 1160
s 142 135 202 214 151 155 495 504
z 179 149 258 208 271 221 708 578
TotalHeavy Light None
Syllable Coda Statistics – Central PlaceCentral coda segments are extremely unstable under stress (accent)
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Accent
Segment Can Trans Can Trans Can Trans Can Trans
t 322 126 575 191 562 172 1459 489
d 200 119 295 127 370 96 865 342
n 311 237 498 381 773 542 1582 1160
s 142 135 202 214 151 155 495 504
z 179 149 258 208 271 221 708 578
TotalHeavy Light None
Syllable Coda Statistics – Central PlaceCentral coda segments are extremely unstable under stress (accent)
(except for the fricatives [s] and [z])
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Accent
Segment Can Trans Can Trans Can Trans Can Trans
t 322 126 575 191 562 172 1459 489
d 200 119 295 127 370 96 865 342
n 311 237 498 381 773 542 1582 1160
s 142 135 202 214 151 155 495 504
z 179 149 258 208 271 221 708 578
TotalHeavy Light None
Syllable Coda Statistics – Central PlaceCentral coda segments are extremely unstable under stress (accent)
(except for the fricatives [s] and [z])
The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Accent
Segment Can Trans Can Trans Can Trans Can Trans
t 322 126 575 191 562 172 1459 489
d 200 119 295 127 370 96 865 342
n 311 237 498 381 773 542 1582 1160
s 142 135 202 214 151 155 495 504
z 179 149 258 208 271 221 708 578
TotalHeavy Light None
Syllable Coda Statistics – Central PlaceCentral coda segments are extremely unstable under stress (accent)
(except for the fricatives [s] and [z])
The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables
The major effect of stress accent is its affect on the probability of segmental deletion (which is appreciably higher in unaccented forms)
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Coda Duration - POSTERIOR Place
CANONICAL Syllable Forms
Many coda consonants are short in duration
Syllable Coda Duration - POSTERIOR Place
CANONICAL Syllable Forms
Many coda consonants are short in duration
Most segments exhibit relatively little sensitivity to accent level
Syllable Coda Duration - POSTERIOR Place
ALL Syllable Forms
There are relatively few deletions in coda segments, hence the durational patterns are similar for ALL syllable forms relative to the canonical syllable forms
Accent
Segment Can Trans Can Trans Can Trans Can Trans
k 170 150 196 162 51 39 417 351
g 10 10 8 10 4 5 22 25
q 0 42 0 71 0 54 0 167
ng 63 60 139 126 203 129 405 315
sh 9 9 2 2 4 6 15 17
zh 1 0 0 4 0 2 1 6
TotalHeavy Light None
ch 26 25 27 25 12 12 65 62
jh 10 10 11 10 15 12 36 32
w 0 4 0 2 0 6 0 12
Syllable Coda Statistics – Posterior PlacePosterior coda segments are relatively stable under stress (accent)
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Accent
Segment Can Trans Can Trans Can Trans Can Trans
k 170 150 196 162 51 39 417 351
g 10 10 8 10 4 5 22 25
q 0 42 0 71 0 54 0 167
ng 63 60 139 126 203 129 405 315
sh 9 9 2 2 4 6 15 17
zh 1 0 0 4 0 2 1 6
TotalHeavy Light None
ch 26 25 27 25 12 12 65 62
jh 10 10 11 10 15 12 36 32
w 0 4 0 2 0 6 0 12
Syllable Coda Statistics – Posterior PlacePosterior coda segments are relatively stable under stress (accent)
The primary exception is [ng], which tends to delete in unaccented syllables
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Accent
Segment Can Trans Can Trans Can Trans Can Trans
k 170 150 196 162 51 39 417 351
g 10 10 8 10 4 5 22 25
q 0 42 0 71 0 54 0 167
ng 63 60 139 126 203 129 405 315
sh 9 9 2 2 4 6 15 17
zh 1 0 0 4 0 2 1 6
TotalHeavy Light None
ch 26 25 27 25 12 12 65 62
jh 10 10 11 10 15 12 36 32
w 0 4 0 2 0 6 0 12
Syllable Coda Statistics – POSTERIOR PlacePosterior coda segments are relatively stable under stress (accent)
The primary exception is [ng], which tends to delete in unaccented syllables
The “infamous” glottal stop [q] tends to insert in this context
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Coda Duration - Place Chameleons
CANONICAL Syllable Forms
There is a large durational disparity between the accented and unaccented chameleon segments
Syllable Coda Duration - Place Chameleons
CANONICAL Syllable Forms
There is a large durational disparity between the accented and unaccented chameleon segments
In unaccented syllables the duration of these segments is ca. 60 ms
Syllable Coda Duration - Place Chameleons
ALL Syllable Forms
There are a lot of deletions of coda chameleons in unaccented syllables
Syllable Coda Duration - Place Chameleons
ALL Syllable Forms
There are a lot of deletions of coda chameleons in unaccented syllables
Hence the mean duration of these segments in unaccented forms is short
Syllable Coda Statistics – Place ChameleonsChameleon segments are unstable under stress (accent)
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Place ChameleonsChameleon segments are unstable under stress (accent)
This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Place ChameleonsChameleon segments are unstable under stress (accent)
This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms
The segment [r] tends to delete in unaccented syllables, but not otherwise
Can = Canonical formTrans = Transcribed (i.e., phonetically realized)
Pronunciation Patterns – Syllable CodasThe ANTERIOR and POSTERIOR codas are generally canonically realized
(the exceptions typically function as “junctures,” rather than segments)
C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables
Place of Articulation Approximants
Pronunciation Patterns – Syllable CodasThe ANTERIOR and POSTERIOR codas are generally canonically realized
(the exceptions typically function as “junctures,” rather than segments)
The CENTRAL and PLACE CHAMELEON segments are often non-canonical (and also often function as “junctures”)
C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables
Place of Articulation Approximants
PART SEVEN
Onset and Coda Patterns Compared
Comparison of Syllable Onsets and CodasOnsets tend to be more stable than codas
C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables
Place of Articulation Approximants
Comparison of Syllable Onsets and CodasOnsets tend to be more stable than codas
The centrally articulated segments are highly unstable in both contexts
C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables
Place of Articulation Approximants
Comparison of Syllable Onsets and CodasOnsets tend to be more stable than codas
The centrally articulated segments are highly unstable in both contexts
As are the place chameleons
C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables
Place of Articulation Approximants
Comparison of Syllable Onsets and CodasOnsets tend to be more stable than codas
The centrally articulated segments are highly unstable in both contexts
As are the place chameleons
The unstable anterior and posterior phones are mostly “junctures”
C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables
Place of Articulation Approximants
PART EIGHT
A Preliminary Juncture-Accent Model
A means of visualizing important properties of the acoustic signal
Road Map to the Juncture-Accent Model
A means of visualizing important properties of the acoustic signal
The juncture-accent representation is based on log, critical-band energy across time and frequency
Road Map to the Juncture-Accent Model
A means of visualizing important properties of the acoustic signal
The juncture-accent representation is based on log, critical-band energy across time and frequency
Although it is not intended as an auditory representation, it does represent spectro-temporal properties of the signal in a manner consistent with auditory principles
Road Map to the Juncture-Accent Model
A means of visualizing important properties of the acoustic signal
The juncture-accent representation is based on log, critical-band energy across time and frequency
Although it is not intended as an auditory representation, it does represent spectro-temporal properties of the signal in a manner consistent with auditory principles
Let’s take a look at some illustrations – Spectro-Temporal Profiles or “STePs”
Road Map to the Juncture-Accent Model
Anatomy of a Spectro-Temporal Profile
[s]
[eh]
[vx]
[en]
juncture accented syllable
unaccented syllable
“Seven”
mean duration
Full-spectrumperspective
OGI Numbers95
[s] [eh] [vx] [en]
[s]
[eh]
[vx][en]
juncture accented syllable
unaccented syllable
mean duration
“Seven”
Anatomy of a Spectro-Temporal ProfileHigh-frequency
perspective
OGI Numbers95
[s] [eh] [vx] [en]
Anatomy of a Spectro-Temporal Profile
juncture accented syllable
unaccented syllable
[z]
mean duration
“Zero”
[ih]
[r]
[ax]
Full-spectrumperspective
OGI Numbers95
[z] [ih] [r] [ah]
Spectro-Temporal Profile
juncture unaccented
syllable
mean duration
“Zero”
[ih][r]
[ax]
accented syllable
[z]
High-frequencyperspective
OGI Numbers95
[z] [ih] [r] [ah]
Spectro-Temporal Profile
mean duration
“Three”
[iy][r]
accented syllable
[th]
Full-spectrumperspective
OGI Numbers95
[th] [r] [iy]
Spectro-Temporal Profile
mean duration
“Three”
[r]
accented syllable
[iy]
High-frequencyperspective
OGI Numbers95
[th]
[th] [r] [iy]
Summary and Conclusions(at last!)
Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous
American English (Switchboard) the following conclusions are drawn:
Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous
American English (Switchboard) the following conclusions are drawn:
Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels
Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous
American English (Switchboard) the following conclusions are drawn:
Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels
Stress accent’s impact on duration is most pronounced in the vocalic nucleus
Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous
American English (Switchboard) the following conclusions are drawn:
Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels
Stress accent’s impact on duration is most pronounced in the vocalic nucleus
But also affects the duration of the syllable onset
Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous
American English (Switchboard) the following conclusions are drawn:
Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels
Stress accent’s impact on duration is most pronounced in the vocalic nucleus
But also affects the duration of the syllable onset
The duration of the syllable coda is less affected by stress accent, however ...
Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous
American English (Switchboard) the following conclusions are drawn:
Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels
Stress accent’s impact on duration is most pronounced in the vocalic nucleus
But also affects the duration of the syllable onset
The duration of the syllable coda is less affected by stress accent, however ...
Coda constituents are more prone to deletion as a function of stress accent
Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous
American English (Switchboard) the following conclusions are drawn:
Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels
Stress accent’s impact on duration is most pronounced in the vocalic nucleus
But also affects the duration of the syllable onset
The duration of the syllable coda is less affected by stress accent, however ...
Coda constituents are more prone to deletion as a function of stress accent
Thus, stress accent has an (indirect) impact on duration even for codas (via segmental deletion)
Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous
American English (Switchboard) the following conclusions are drawn:
Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels
Stress accent’s impact on duration is most pronounced in the vocalic nucleus
But also affects the duration of the syllable onset
The duration of the syllable coda is less affected by stress accent, however ...
Coda constituents are more prone to deletion as a function of stress accent
Thus, stress accent has an (indirect) impact on duration even for codas (via segmental deletion)
These data are inconsistent with a segmental model of spoken language
Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous
American English (Switchboard) the following conclusions are drawn:
Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels
Stress accent’s impact on duration is most pronounced in the vocalic nucleus
But also affects the duration of the syllable onset
The duration of the syllable coda is less affected by stress accent, however ...
Coda constituents are more prone to deletion as a function of stress accent
Thus, stress accent has an (indirect) impact on duration even for codas (via segmental deletion)
These data are inconsistent with a segmental model of spoken language
But is consistent with a JUNCTURE-ACCENT model based on syllable forms of variable accent level
That’s All, Folks
Many Thanks for Your Time and Attention
What’s Going on in Pronunciation?
With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not
What’s Going On? (in pronunciation)
With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not
Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior
What’s Going On? (in pronunciation)
With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not
Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior
The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables
What’s Going On? (in pronunciation)
With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not
Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior
The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables
The place chameleons (i.e., the approximants) are not very stable in either onset or coda position
What’s Going On? (in pronunciation)
With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not
Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior
The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables
The place chameleons (i.e., the approximants) are not very stable in either onset or coda position
The vowels are divisible into two main groups – accented and unaccented
What’s Going On? (in pronunciation)
With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not
Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior
The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables
The place chameleons (i.e., the approximants) are not very stable in either onset or coda position
The vowels are divisible into two main groups – accented and unaccented
The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space
What’s Going On? (in pronunciation)
With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not
Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior
The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables
The place chameleons (i.e., the approximants) are not very stable in either onset or coda position
The vowels are divisible into two main groups – accented and unaccented
The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space
The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space
What’s Going On? (in pronunciation)
With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not
Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior
The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables
The place chameleons (i.e., the approximants) are not very stable in either onset or coda position
The vowels are divisible into two main groups – accented and unaccented
The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space
The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space
Certain segments are actually junctures – e.g., the flaps and the glottal stop
What’s Going On? (in pronunciation)
With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not
Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior
The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables
The place chameleons (i.e., the approximants) are not very stable in either onset or coda position
The vowels are divisible into two main groups – accented and unaccented
The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space
The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space
Certain segments are actually junctures – e.g., the flaps and the glottal stop
Many so-called segments are actually junctures (as they are flaps), the most noteworthy examples are [dh] and [v]
What’s Going On? (in pronunciation)
With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not
Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior
The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables
The place chameleons (i.e., the approximants) are not very stable in either onset or coda position
The vowels are divisible into two main groups – accented and unaccented
The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space
The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space
Certain segments are actually junctures – e.g., the flaps and the glottal stop
Many so-called segments are actually junctures (as they are flaps), the most noteworthy examples are [dh] and [v]
None of these properties is consistent with a segmental model of language
What’s Going On? (in pronunciation)
Syllable Duration and Number of SegmentsFor syllables greater than a single segment there is relatively little difference
in duration as the number of segments (within a syllable) increases
Canonical Syllable Forms
Syllable Duration and Number of SegmentsFor syllables greater than a single segment there is relatively little difference
in duration as the number of segments (within a syllable) increases
Suggesting that syllable duration is largely controlled by processes independent of segmental production
Canonical Syllable Forms