72
Intonation Components in short English Statements Yi Xu Haskins Laboratories New Haven, Connecticut Ching X. Xu Department of Communication Sciences and Disorders Northwestern University, Evanston Running Title: Intonation Components in English Address: Yi Xu Haskins Laboratories 270 Crown Street New Haven, CT 06511 USA Telephone: (203)865-6163 E-mail: [email protected]

Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

  • Upload
    others

  • View
    35

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in short English Statements

Yi Xu

Haskins Laboratories

New Haven, Connecticut

Ching X. Xu

Department of Communication Sciences and Disorders

Northwestern University, Evanston

Running Title: Intonation Components in English

Address: Yi XuHaskins Laboratories270 Crown StreetNew Haven, CT 06511USA

Telephone: (203)865-6163

E-mail: [email protected]

Page 2: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

2

ABSTRACT

In this study we attempt to identify the basic components of statement intonation as related t o

focus, accent and lexical stress in General American English. Instead of viewing f0 contours as direct

acoustic correlates of intonation components, we regard them as the outcome of implementing

different functional components of intonation under various articulatory constraints. Eight

American English speakers were recorded while reading aloud short declarative sentences with or

without narrow focus at different locations. Results of analyses suggest that f0 contours in short

declarative sentences in English are determined by three separate specifications: local pitch target,

articulatory effort, and pitch range. Every syllable seems to be associated with a pitch target which

determines the ideal local pitch contour. Non-focused, non-final accents seem to carry a static

[high], and word-final accent under focus and sentence-final accent seem to carry a dynamic [fall].

Unaccented syllables, whether or not lexically stressed, probably carry a static [mid] rather than

being completely targetless. Articulatory effort determines how forcefully a local target is

implemented. The pitch targets of accented syllables seem to be implemented with strong efforts,

those of unaccented syllables with weak efforts, while lexically unstressed syllables with even weaker

efforts. Pitch range determines the height and span of f0 at which local pitch targets are

implemented. Focus appears to operate by expanding the pitch range of the on-focus stressed

syllables, suppressing the pitch range of all post-focus syllables, and leaving the pitch range of pre-

focus words intact. To account for the present data as well as other recent findings, a new model of

intonation is considered.

Page 3: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

3

1. INTRODUCTION

A major objective in studying intonation is to determine its basic components. This is by no means a

trivial task. The meandering f0 curve of a speech utterance could be viewed as being constructed in

various ways. Some studies treat pitch contours such as rising, falling, and more complex shapes as

the basic components (Pike, 1945, 1948; Abramson, 1978; Bolinger, 1951, 1986; Crystal, 1969;

't Hart, Collier, & Cohen, 1990; Taylor 1994). Some studies analyze tone and intonation into pitch

registers such as H (high) and L (low) (Woo, 1969; Leben, 1973; Gandour, 1974; Anderson, 1978;

Duanmu, 1994), and associate the H and L directly to the peaks and valleys in the f0 tracings

(Pierrehumbert, 1980; Pierrehumbert & Beckman, 1988; Arvaniti, Ladd, & Mennen, 1998; Ladd et

al., 1999; Ladd, Mennen & Schepman, 2000). In particular, there has been a long-standing debate

over whether contours or registers are the most basic components of tone and intonation (Anderson,

1978; Pierrehumbert & Beckman, 1988; Duanmu, 1994).

One of the reasons why these issues are not easily settled is that observed f0 contours do not always

correspond directly to real functional units of intonation. In the study of segmental units in speech

such as consonants and vowels, there have been a consensus that the acoustic forms of these units in

connected speech are usually variants of their canonical forms. It follows that there probably is also a

discrepancy between the canonical and surface forms of intonational units. The difficulty with

intonation research is that the canonical forms are often difficult to isolate. Nevertheless, some

studies have considered ways to reconcile the discrepancy between surface and underlying forms of

intonation components. The superposition theories of intonation (Fujisaki 1988; Gårding 1979;

Grønnum 1995) regards surface f0 contours as the outcome of local pitch contours superimposed on a

global intonation curve. The approach taken by researchers at the Institute of Perception Research

in the Netherlands regards observed f0 contours as consisting of perceptually (and communicatively)

relevant straight lines which are complicated by micro-variations due to phonetic overspecification.

They believe that perceptually relevant contours can be discovered by replacing observed f0 contours

with stylized straight lines that are perceptually indistinguishable from the raw f0 contours. This

account does not specify, however, how exactly these straight lines, assuming they are the underlying

forms of intonation components, become complicated surface f0 contours through "phonetic

overspecification."

The autosegmental and metrical (AM) approach, as represented by Pierrehumbert (1980, 2000) and

Pierrehumbert and Beckman (1988), assumes that, underlyingly, English intonation consists of only

two level tones — H and L, and that surface f0 contours are linked to them through a set of elaborate

Page 4: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

4

phonetic implementation rules. The essence of these rules is interpolation and pitch range

modification, and pitch height readjustment, which we will discuss in more detail in 1.8.

Fujisaki (Fujisaki, 1983, 1988, 1992) adopted an approach that attempts to link surface f0 directly t o

muscle commands. He proposes that surface f0 contours result from the responses of a second order

linear system to two types of underlying commands, accent commands and phrase commands. The

accent commands have idealized stepwise waveforms and the phrase commands have idealized

impulse waveforms (Fujisaki, 1988: 348). The responses to these commands generate critically

damped oscillation of f0 which rises exponentially in the direction of the commands and then falls

back exponentially to the baseline after the termination of the commands. Both types of commands

therefore generate critically damped curves that rise and fall at various rates. The output f0 curve

generated by this model is the arithmetic sum of the logarithmic representations of the curves

generated by the two types of commands. This model thus specifies an explicit connection between

complicated f0 contours and underlying commands that are rather simple in form.

The Fijisaki model makes two important assumptions. The first is that there exists a constant

“restoring” force that is always in the opposite direction of both the accent and phrase commands

(Fujisaki, 1992). Due to this restoring force, f0 always goes back toward the baseline after the

termination of an accent or phrase command. In the model, the restoring force comes from the

elasticity of the vocal folds which act like a pair of passive springs being stretched by the

cricothyroid muscle (CT) during phonation. There is a challenge to this assumption, however.

According to Hollien (1960) and Hollien and Moore (1960), the vocal folds are actually the longest

at rest rather than during phonation, and that at the onset of voice, the vocal folds always shortens.

During phonation, vocal fold length does increase with fundamental frequency, but it never exceeds

its length during rest. It is therefore unlikely that the vocal folds would snap back as soon as the CT

stops contracting, causing f0 to drop automatically.1 Another key assumption of the Fujisaki model is

that surface f0 is directly linked to muscle commands without an intermediate level of organization.

In this way, f0 generation is not linked or constrained by supralaryngeal structures such as the

syllable. As will be discussed next, such link and constraint are critical to our understanding of f0 of

Mandarin tones, and possibly to f0 contours in English as well.2

Despite questions regarding its basic assumptions, the Fujisaki model demonstrates the possibility that

complex surface f0 contours may be generated by an interaction between simple but linguistically

driven underlying events and an articulatory system that implements these events. Assuming that an

interaction of this nature does occur and it does play a critical role in pitch contour generation,

understanding the properties of both the linguistic events and the articulatory system then becomes

the key to the understanding of how pitch contours work in speech. In recent years, a number of

Page 5: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

5

findings seem to have made such understanding easier than before. These include the findings about

contextual tonal variations, f0-syllable alignment, maximum speed of pitch change, the realization of

focus, downstep and declination, and the realization of purportedly toneless elements. In the

following, we will briefly review these findings and discuss how they may guide our understanding of

the data collected in the present study.

1.1. Contextual f0 variations in tone languages

In languages like Mandarin, Thai and Vietnamese, a lexical tone is carried by a syllable.3 A tone-

syllable combination can be said in isolation, e.g., as a monosyllabic utterance, or just as a

monosyllabic word or morpheme in citation form. Due to this property, the f0 contours of isolated

tones have been well established over the years (e.g., Bai, 1934; Pike, 1948; Chao, 1956, 1968;

Abramson, 1962, 1976; Lin, 1965, 1988; Howie, 1976; Shih, 1988). In recent years, much attention

has been given to variation of f0 contours of lexical tones when they are said in connected speech

(Han & Kim, 1974; Gandour, Potisuk & Dechongkit 1994; Lin & Yan, 1991; Wu, 1982, 1984,

1988, 1990; Xu, 1993, 1994, 1997, 1999). When a tone is produced next to other tones, its f0

contour deviates from the citation form, sometime extensively. Figure 1 (a-c) shows examples of

carryover and anticipatory effects on H, R and F in Mandarin. Each graph in Figure 1 displays mean

f0 contours of four five-syllable sentences in which only the tone of the second syllable varies across

H, R, L and F. As can be seen, the f0 contour of the initial portion of the third syllable varies

extensively with the tone of the second syllable. In fact, they each seem to be transitions from the

end of the previous tone to the most appropriate f0 contour for the tone of the third syllable: high-

level for H, rising for R, and falling for F. As a result, the most proper contour of a tone is best

approximated in the later portion of the third syllable, while the influence of the preceding tone is

the most salient in the early portion of the syllable. Similar effects have been reported for other tone

languages (Gandour et al. 1994 for Thai, and Li & Lee, 2002 for Cantonese).

Insert Figure 1 about here

In Figure 1 the f0 contours of the first syllable also seem to vary to some extent with the tone of the

second syllable. But the variations are much smaller in amplitude than the carryover variation just

mentioned. Furthermore, these anticipatory variations are mostly dissimilatory in the sense that f0 is

raised by any tone of the second syllable that contains a low value: R, L or F. This kind of

anticipatory effect has been found in a number of languages (Gandour et al., 1992 and Gandour et al.

1994 for Thai, Hyman 1993 for Enginni, Mankon, and Kirimi, Laniran 1992 for Yoruba, Laniran &

Gerfen 1997 for Igbo; and Xu 1993 for Mandarin). The underlying mechanism of this effect is still

unclear, although there have been some hypotheses (Gandour et al., 1992; Gandour et al., 1994; Xu

Page 6: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

6

1993, 1997). Thus the realization of a tone seems to vary asymmetrically with the surrounding

tones. The variation with the preceding tone appears assimilatory whereas the variation with the

following tone dissimilatory.

1.2. Maximum speed of pitch change — The key source of what makes f0 contours

nonequivalent to underlying targets

As can be seen in Figure 1, the assimilatory effect of a tone upon the following tone is largely in the

form of a seemingly long transition between the ending f0 of the preceding tone and underlying onset

pitch of the following tone. This may suggest that there is an articulatory constraint on how fast

speakers can change pitch. But it is also possible that speakers deliberately make these transitions

long. It would thus be helpful to know how much of the transitions is indeed directly due to the

constraint of maximum speed of pitch change. For this purpose, Xu and Sun (2002) assessed how fast

speakers can make pitch changes voluntarily. In the study, native speakers of Mandarin and English

produced alternate high and low pitches as rapidly as possible by imitating a number of fast synthetic

pitch alternation patterns. It is found that, for both English and Mandarin subjects, the maximum

speed of pitch change is positively related to the magnitude of pitch change, i.e., the larger the

magnitude, the faster the maximum speed of pitch change. It is also found that the minimum time it

takes to complete a pitch change is also positively related to the magnitude of the pitch change. The

linear equations for the speed and time of pitch change (for all subjects) as a functions of pitch

change magnitude are shown in (1) to (4).

s = 10.8 + 5.6 d (1)

s = 8.9 + 6.2 d (2)

t = 89.6 + 8.7 d (3)

t = 100.4 + 5.8 d (4)

where s is the average maximum speed of pitch change in semitones per second (st/s), t is the amount

of time it takes (in ms) to complete the pitch change and d is the size of pitch change in semitone.

Xu & Sun (2002) further compared the mean maximum speed of pitch change computed with

equations (1)-(4) to the maximum speed of pitch change reported for several languages, including

Mandarin, English, and Dutch. The two kinds of speed were found to be largely comparable for all

these languages, provided that the speed of pitch change was really the fastest possible in each case.

This finding indicates that in many occasions, the fastest speed of pitch change is indeed approached

in speech. This in turn suggests that our understanding of f0 contours in speech should always take

this articulatory constraint into consideration. For example, according to (3), it would take at least

142 ms to complete a 6-semitone pitch rise. Applying this to speech, it means that in a syllable with

Page 7: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

7

a duration of 180 ms, the greater half of the f0 contour in the syllable would have to be used for

completing the pitch rise from L to H even if speakers used their maximum speed of pitch change.

This would suggest that the long f0 transitions due to carryover influence probably is largely due to a

physical limitation that Mandarin speakers cannot overcome. Nor should English speakers be able t o

avoid similar transitions when they change pitch from one level to another, since their maximum

speed of voluntary pitch change is essentially the same as that of Mandarin speakers (Xu & Sun,

2002).

1.3. Pitch targets

Putting together the findings about contextual tonal variations and maximum speed of pitch change,

it becomes evident that observed f0 events in speech cannot be the underlying functional units per se.

Rather, they are more likely products of speaker’s effort to implement some kind of underlying

pitch targets under various articulatory constraints. This view is summarized in Xu and Wang (2001)

as the pitch target implementation model of tone realization. According to this model, observed f0

contours are generated by interactions between underlying pitch targets and articulatory constraints.

The underlying targets can be either static or dynamic, as illustrated in Figure 2. The vertical lines in

Figure 2 indicate the onset and offset of two adjacent syllables. The dashed lines represent two

adjacent pitch targets: a dynamic [rise] and a static [low]. For Mandarin, these targets are assumed t o

be associated with the R and L tones which are carried by the two syllables in the figure, respectively.

The solid curve represents the surface f0 contour, which is assumed to be the result of implementing

the pitch targets under various articulatory constraints, including the maximum speed of pitch

change. Due to the combined pressure to realize them both as rapidly as possible and as accurately as

possible, these targets are approached asymptotically, as is indicated by the shape of the solid curve

corresponding to either syllable 1 and syllable 2 in Figure 2.

Insert Figure 2 about here

Two other likely articulatory constraints are also incorporated in the model as illustrated in Figure 2.

First, although, at the abstract phonetic level, each target is assigned to a syllable without stringent

alignment requirement, its implementation nevertheless strictly coincides with the entire syllable,

i.e., starting at the syllable onset and ending at the syllable offset. This is due to a likely constraint

on synchronization of laryngeal and supralaryngeal movements (See 1.6. for more detailed

discussion). Second, due to inertia and friction, there is an acceleration period before the target-

approaching f0 movement reaches full speed. This is seen in the convex-up shape at the very

beginning of the solid curve in both syllables in Figure 2. The convex shape is more prominent in

Page 8: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

8

syllable 2 because the inertia to be overcome has a positive velocity as a result of implementing the

target [rise] in the first syllable (Xu & Wang, 2001).

Note that in this model, unlike the Fujisaki model discussed earlier, there are no automatic forces

that return f0 to a neutral value. This is because empirical studies such as Gandour et al. (1992, 1994),

Xu (1993, 1997 1998, 1999) and Li and Lee (2002) have found that the most appropriate f0 contour

of a tone is always best approximated in the final portion of a syllable, and that the subsequent f0

contour in the following syllable is always going toward the next tonal target rather than toward a

common neutral value, as can be seen in Figure 1.

1.4. Focus

Focus, i.e., discourse/pragmatics motivated emphasis, is also known as focal prominence, contrastive

stress, emphatic stress, sentence-level stress, etc. The acoustic realization of focus has been

investigated by many studies (Bruce 1977; Bruce, & Touati 1992; Caspers & van Heuven 1993;

Cooper, Eady & Mueller 1985; D'Imperio, 2001; Eady & Copper 1986; Eady et al. 1986; Gårding

1987; Jin 1996; Liberman & Pierrehumbert 1984; Pierrehumbert 1980; Prieto, van Santen &

Hirshberg 1995; Shih 1988). The general consensus has been that focus is conveyed mainly through

variations in f0. This may potentially be a problem for tone languages like Mandarin, because tones

are also conveyed mostly through f0. However, as found by Jin (1996) and Xu (1999), focus and

tones are realized concurrently in Mandarin by varying different aspects of f0 contours. In general,

tone identities are implemented as local pitch targets, while focus is implemented as regional pitch

range variations. As can be seen in Figure 3, the pitch range directly under focus is expanded; and the

pitch range after focus is suppressed (lowered and compressed). Furthermore, as can be also seen in

Figure 3, the pitch range before focus does not seem to deviate from the neutral-focus condition.

Insert Figure 3 about here

Though there has not been a consensus on whether languages like English, too, implement focus with

three distinct pitch ranges, existing data suggest that this may be the case. For example,

Pierrehumbert (1980) examined the relative f0 heights of an early pitch accent and a later one in an

utterance as a function of focus location. Her data suggest that when there is an early focus in the

utterance, the f0 range in the later portion of the utterance is reduced, whereas the earlier f0 contour

is only slightly lowered when the focus is on a later pitch accent. In a series of studies by Cooper and

Eady and their colleagues (Cooper et al., 1985; Eady & Cooper, 1986; Eady et al., 1986), it was

found that the effect of a narrow focus4 in a declarative English sentence is to raise the f0 of the

Page 9: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

9

focused word and to lower the f0 of the later words in the sentence. In contrast to the lowered f0 of

the post-focus words, however, f0 of the pre-focus words was found to remain much the same as in a

focus-neutral sentence. Gårding (1987) reported a similar asymmetry of f0 variation around focus.

1.5. Downstep and declination

Downstep refers to the phenomenon that in a tone string of H L H, the second H is lower in f0 than

the first H. It was first reported for African tone languages (Stewart, 1965, 1983; Meeussen, 1970;

Hyman, 1973). It was also reported for non-tone languages (e.g. Pierrehumbert, 1980 for English;

Poser, 1984 and Pierrehumbert & Beckman, 1988 for Japanese; and Prieto, Shih, & Nibert, 1996 for

Spanish). Declination refers to the phenomenon that the overall f0 level as well as the f0 peaks and

valleys becomes gradually lower over the course of an utterance. It is found for both tone languages

and non-tone languages. For non-tone languages, the phenomenon is first reported by Cohen and

't Hart et al. (1967). For tone languages, it is reported as downdrift, as opposed to downstep which is

more local (Hombert, 1974; Laniran & Gerfen, 1997). There have been various accounts of

declination. Some accounts attribute the effect to physiological factors, such as reduction of

subglottal pressure (Lieberman & Tseng, 1980). Other accounts attribute the effect to meaningful

linguistic structures. Liberman and Pierrehumbert (1984), for example, point out that many of the

physiological accounts were posited without analysis of the tonal components of the utterances. Xu

(1999) shows how observed downstep and declination can be decomposed into different contributing

factors when both tone and focus are systematically controlled. Through detailed analyses,

contributions of independent mechanisms can be identified. As shown in Figure 4, downstep seems t o

stem from two mechanisms: anticipatory raising and carryover lowering. Both effects are exerted by

a L tone, which raises the f0 of the preceding H and lowers the f0 of the following H. The two effects

combined generates a negative tilt of the f0 surrounding the L. This effect can occur repeatedly when

there are more L tones intervening the H tones. This repeated applications of anticipatory raising

and carryover lowering would thus generate a gradual f0 descent over the course of the entire

utterance. But downstep is only one of the sources of declination. Focus, with its characteristic on-

focus pitch range expansion and post-focus pitch range suppression, generates an additional down

trend. This can be seen in Figure 4b where the rather steep downtrend seems to be due to both focus

and downstep. There is also another known factor that can potentially generate an even greater

down trend than downstep and focus. As shown by Lehiste (1975) and Umeda (1982) the

introduction of a new topic at the beginning of a paragraph may introduce an initial f0 peak almost

one octave higher than later f0 peaks in an utterance.

Insert Figure 4 about here

Page 10: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

10

1.6. f0-syllable alignment

A number of recent studies have reported that certain f0 events such as peaks and valleys have a

relatively stable alignment with the onset or offset of the syllable. These findings come from two

major sources, research on tone languages (Kim 1999; Xu 1998, 1999, 2001a) and research on non-

tone languages (Arvaniti, Ladd and Mennen 1998; Caspers & van Heuven 1993; Ladd, Mennen and

Schepman 2000; Prieto et al., 1995). For non-tone languages, it has been found that f0 peaks and

valleys are aligned with both the onset and offset of a syllable carrying a pitch accent. Caspers and

van Heuven (1993) find that the onset of an “accent-lending” f0 rise is always aligned with the

syllable onset. Arvaniti et al. (1998) report that in Greek an f0 maximum “is very precisely aligned

just after the beginning of the first postaccentual vowel” (p. 23). Ladd et al. (1999) find that in

English pre-nuclear accent, the f0 peak occurs around 40 ms after the offset of the stressed syllable at

normal speech rate. Ladd et al. (2000) observe that in Dutch, the rising prenuclear pitch accent has

two different alignment patterns for the phonologically long and short vowels. When the vowel in

the accented syllable is phonologically long, the f0 peak usually occurs at the end of the vowel. When

a vowel in the accented syllable is phonologically short, however, the f0 peak usually occurs in the

following consonant. More interestingly, when the accented syllable contains the vowel /i/ which is

phonologically long but phonetically similar in duration to the short vowel /I/, the f0 peak also

occurred in the following consonant, though the location of the peak is still significantly earlier than

that with /I/. This seems to be evidence that f0 contour alignment is determined both by phonological

vowel length and by articulatory constraint on how fast pitch can be changed.

For tone languages, earlier reports of the experimental results have put much emphasis on the finding

that certain f0 peaks and valleys are consistently aligned with the syllable offset (Kim 1999; Xu

1998, 1999, 2001a). For example, Kim (1999) reports that in Chichewa f0 peaks occur consistently

right after the offset of the H-bearing syllable if the syllable is pre-penult. Xu (1998, 1999, 2001a)

reports that in Mandarin the f0 peak associated with R and f0 valley associated with F remain close t o

syllable offset, and that the f0 peak associated with H and f0 valley associated with L generally occur

before syllable offset but also remain close to syllable offset. In contrast, certain earlier turning

points, e.g., f0 valley in R and f0 peak in F, occur near the center of the syllable, as can bee seen in

(b)-(d) in Figure 1. This emphasis on the f0 alignment with the syllable offset may have

overshadowed another important aspect of the same set of findings. That is, there is also strong

evidence that the onset of the movement toward each pitch target coincides with the onset of the

syllable. In Figure 1, for example, regardless of the tone of the second syllable in each graph, the

movements toward the high-level, rising and falling contours appropriate for H, R and F,

respectively, always start from the onset of the third syllable. Furthermore, in Figure 1c a valley

consistently occurs around the boundary between syllables 3 and 4, and in Figure 1d a peak

Page 11: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

11

consistently occurs soon after the boundary between syllables 3 and 4. Given the likely underlying

targets of the adjacent tones in both cases, these turning points seem to be where the implementation

of the tone of syllable 3 ends and that of syllable 4 begins. Similar evidence has also been seen in

Yoruba (cf. discussion in Xu, 2002).

The consistent alignment of f0 events with segmental elements of the syllable has been interpreted in

different ways. On the one hand, it has been interpreted as evidence that these f0 events are

deliberately targeted at specific locations in the syllable (D’Imperio, 2002; Ladd et al., 1999; Ladd et

al., 2000; Ladd & Schepman, 2003). On the other hand, Xu and Wang (2001) have argued that these

patterns should be interpreted as evidence that the underlying pitch targets have to be synchronously

implemented with the syllable, presumably due to the biomechanic constraint that concurrent motor

movements have to be fully synchronized, especially when they continually reoccur at high speed

(Kelso, 1984; Kelso et al. 1981; Kelso, Southard & Goodman, 1979; Schmidt, Carello & Turvey,

1990). Note that the first interpretation is heavily dependent on the assumption that speakers have

the freedom to align the f0 turning points anywhere they want. Based on accumulating evidence, as

argued in Xu (2002), speakers do not have such freedom. Therefore, it is unlikely that f0 turning

points are the properties of the intonational components themselves. Rather, they are only evidence

for the properties of the underlying components.

1.7. Neutral tone

In Mandarin, beside the four full lexical tones, there is also a fifth tone often known as the neutral

tone. This tone is similar to the unstressed syllable in English in terms of pitch specification because

it is generally believed to be toneless (Chao, 1968; Yip, 2002). Its f0 is believed to be totally

dependent on the tonal context, and due specifically to either spreading from the preceding tone or

interpolation between the preceding and the following tones (Chao, 1968; Shih, 1988; Yip, 1990). A

recent study, however, found that neither spreading nor interpolation is likely to be the mechanism

responsible the f0 contours of the neutral tone (Chen & Xu, 2002). Figure 5 shows f0 contours of the

neutral tone as compared to full tones in similar tonal contexts. In Figure 5a, the F tone in syllable 2

immediately follows four different tones in syllable 1. In Figure 5b, three neutral tones occur before

the F tone. As can be seen in the Figure 5b, the f0 of the first neutral tone indeed varies substantially

with the preceding tone, but so does the F tone in Figure 5a. What is different is that, whereas the f0

contours of the F tone in Figure 5a fully converge in the final portion of the syllable, the contours

remain well separated by the end of the first neutral-tone syllable, and they do not fully converge

even by the end of the third neutral-tone syllable.5 Chen and Xu (2002) conclude that these patterns

demonstrate that the neutral tone is not totally targetless. Rather, it seems to be associated with a

static target [mid] (or [mid-low]), judging from the fact that its f0 contours converge toward a value

Page 12: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

12

lower than the high level of the F tone but higher than the low level of the L tone. What makes the

neutral tone different from all other lexical tones is that its target seems to be implemented with a

rather weak articulatory effort, as is evident from the much slower convergence than in a full tone.

Despite the weak effort, nonetheless, the f0 of the neutral tone does not seem to be affected by the

following full tone. In fact, as can be also seen in Figure 5b, it is the offset f0 of the neutral tone that

seems to determine the onset f0 of the following full tone.

In Figure 5b we can also see that the f0 peak occurs after the end of the H-tone syllable. This is in

contrast with earlier findings that the f0 peak associated with the H tone rarely occurs after the

syllable offset when followed by a full-tone syllable (Xu, 1999, 2001a). This "peak delay" is

understood as resulting from the neutral tone's weak ability to reverse the final f0 movement in the

preceding syllable due to its weak articulatory effort.

Given the seeming similarity between the Mandarin neutral tone and unstressed syllables in English, it

is conceivable that what has found about the Mandarin neutral tone is applicable also to English.

Insert Figure 5 about here

1.8. The case of English

Currently, the most widely accepted phonological framework of American English intonation is the

Pierrehumbert model (Pierrehumbert, 1980; Pierrehumbert & Beckman, 1988), which is also known

as the autosegmental and metrical (AM) model (Ladd, 1996). The Pierrehumbert model assumes that,

underlyingly, English intonation consists of only two level tones — H and L. A string of H and L

tones are organized into pitch accents, which are strung together linearly to form intermediate

phrases, which are then organized into intonational phrases. An intermediate phrase is marked by a

phrase accent at its edges (H- or L-), and an intonational phrase is marked by boundary tones at its

right edge (H% or L%). Pitch accents, phrase accents, and boundary tones are all linearly ordered and

can be combined into various mono-tonal and bi-tonal combinations: H*, L*, H*+L, H+L*, L*+H,

L+H*.

In each pitch accent the "starred" tone is assumed to be aligned with the stressed syllable while the

non-starred tone(s) with the unstressed syllable(s). The model further assumes that non-accented

words do not carry tones, and their f0 comes from interpolation between adjacent accents. In fact, all

the surface f0 contours are assumed to result from phonetic interpolation of tones which are the f0

turning points such as peaks and valleys. The interpolation is either straight-lined or curved. The

curved interpolation is so-called "sagging" interpolation, which makes the f0 of the unaccented

Page 13: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

13

syllable(s) between two pitch accents "sag" like a rope hung between two trees (Pierrehumbert, 1980,

1981). The Pierrehumbert model views intonation as strictly linear in two senses. First, all global

shapes of f0 are the results of sequentially ordered local f0 registers. Second, with the exception of

non-categorical factors such as overall effort and emotional state, there is no temporal overlap of

tonal components. Thus at any give time interval, there can be one and only one tone. The only

exception is that both the intermediate phrase boundary tone and the intonational phrase boundary

tone, as developed later in the theory, may influence the realization of all the tones within the same

phrase (Pierrehumbert & Beckman, 1988). Finally, the Pierrehumbert model does not reserve any

special status for nuclear accent other than referring to it as the last pitch accent in an intonational

phrase.

The Pierrehumbert model of English intonation is similar to our current understanding of Mandarin

tone and intonation (Xu, 2001b; Xu & Wang, 2001) in that both recognize that underlying tonal and

intonational units are not equivalent to surface f0 contours. The two differ from each other,

however, in terms of how the underlying units are linked to surface f0 contours. The Pierrehumbert

model assumes that tones correspond directly to the extreme f0 points, i.e., peaks and valleys, and

that the rest of the f0 contours come from interpolation between the extreme points. Our

understanding of Mandarin tone and intonation, on the other hand, is that the underlying tonal and

intonational units are linked to f0 contours through articulatory approximation of simple, linear

pitch targets at linguistically specified pitch ranges with linguistically specified amount of effort, as

has been discussed in 1.3 and 1.7. Recent findings about the similarity between English and Mandarin

speakers in terms of maximum speed of pitch change (Xu & Sun, 2002) suggest that this

understanding is potentially applicable to English as well. First, since English speakers are also bound

by the same articulatory constraints that Mandarin speakers are subjected to, surface f0 contours of

English, including the turning points, should be treated also only as evidence for the underlying pitch

targets rather than as the targets themselves. Second, our recent findings about the acoustic

manifestation of tone and focus in Mandarin demonstrate that it is possible for multiple categorical

components of tone and intonation to co-occur at the same location in a sentence: a lexical tone is

not eradicated whether it is on-focus, pre-focus or post-focus, while focus itself is also effectively

conveyed (Xu, 1999). Since there is apparently multiple layers of information that need to be

conveyed through intonation in English, it is possible that in English, too, different intonational

components can occur concurrently, i.e., overlapping with one another in time. There are at least

three kinds of prominence that are conveyed mainly or partially through f0 in English, namely,

lexical stress, focus and pitch accent. Lexical stress is the relative prominence of individual syllables

in a word, which is lexically specified. Focus, as discussed in 1.4., is discourse/pragmatics motivated

emphasis, whose occurrence is required by the information flow of the dialogue or monologue. Pitch

Page 14: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

14

accents, which occur on certain words in an utterance and make them more prominent than other

words, have often been equated to focus (And, focus has often been referred to as the nuclear accent,

cf. Ladd, 1996 for detailed discussion). Ladd (1996) shows, however, at least impressionistically,

pitch accents do not always coincide with focus. Hirschberg (1993) demonstrates that the most

important predictor of pitch accents (although both nuclear and pre-nuclear accents are included) is

part of speech, which can predict three quarter of the human-labeled pitch accents. Part of speech is

apparently different from discourse/pragmatics motivated focus, nor were most of the other factors

that Hirschberg (1993) found to further improve the prediction of pitch accents. Thus there is a need

to separate pitch accents from focus, and a need to find out whether and how the two are

differentially manifested through f0 contours.

1.9. Goal of the study

The foregoing discussion leads us to two critical questions about English intonation. First, what are

the underlying forms of pitch targets associated with local intonational components? Second, can

different types of intonational components co-occur in time without eradicating each other? And if

they can, how do they each manifest themselves effectively in terms of f0? The present study is

designed to address these questions from a rather rudimentary level. We will examine short

declarative sentences said with narrow focus at various locations in order to address the following

specific questions.

a) What are the pitch targets associated with local prominences in a declarative sentence: static

[high], or dynamic [rise] or [fall]?

b) Is focus realized with pitch specification only for the accented/stressed syllable or with pitch

specifications both for accented/stress syllable and for post-focus syllables (including the unstressed

syllable after the stressed syllable of the focused word)?

c) Do post-focus words have no pitch targets of their own and are thus implemented with only a

flat low f0 contour, or do they still have their own pitch targets, which are implemented with reduced

pitch range?

d) Do stressed syllables lose their original accents when under focus, or do they retain their

accents but with changed pitch range?

e) Do syllables between pitch accents carry any pitch targets, or is f0 only interpolated through

these syllables?

Page 15: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

15

2. Method

2.1. Stimuli

The stimuli are short declarative sentences. To make extensive f0 alignment analysis possible, we

need to use words that have sonorant (preferably nasal) onsets and with no coda consonants if

possible. The target sentences used are in the form of “Lee may know my niece.” The italicized

words vary in word length, stressed pattern, phonological length of stressed syllable and focus status.

Word length varies from monosyllabic to trisyllabic. Stress pattern varies between word-final and

non-final. Phonological length of stressed syllable is either long or short. Focus status varies from

on-focus to pre-focus and/or post-focus, as focus location varies from sentence-initial to sentence-

medial to sentence-final. The following are the compositions of the stimulus sentences. Two words,

‘may’ and ‘my’, remain unchanged in all sentences, and they are usually unaccented (unless in special

contexts, which are not included in the present design). There are three sentence groups composed

for examining f0 contours at three locations in the sentence: beginning, middle, and end. In each

sentence group, the alternative words in the same location rotate to form different sentences.

Sentences in each group were produced in two focus conditions: no narrow focus, and focus on the

underscored word.

1. Lee / Nina / Lamar / Emily / Ramona may know my niece 5 (words) ¥ 2 (foci) ¥ 7 (repetitions) = 70

2. Lee may lure / mimic / minimize my niece 3 (words) ¥ 2 (foci) ¥ 7 (repetitions) = 42

3. Lee may know my niece / nanny / mummy 3 (words) ¥ 2 (foci) ¥ 7 (repetitions) – 7 =

35 6

Focus is controlled by having subjects say the target sentences as answers to prompt questions that

ask about specific pieces of information available in the target sentences. This method has been used

successfully in previous studies (Cooper et al.,1985; Xu, 1999). The prompt questions are shown

below together with illustration of focus locations in exemplar target sentences.

Prompt: Target:

Who may know your niece? Lee may know my niece.

What may Lee do to your niece? Lee may lure my niece.

Who may Lee know? Lee may know my niece .

What did you say? Lee may know my niece.

The overall duration of these sentences was also manipulated by having subjects say the same

sentence at two different speaking rates: normal and fast. (A pilot test found that some speakers had

Page 16: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

16

difficulty maintaining focus consistently at slow speaking rate. So, only two speaking rates were

used.) This is to elicit a wide range of duration variation in order to make f0 alignment analysis more

reliable.

2.2. Subjects

Eight native speakers of American English, aged 20-35, participated as subjects. Four of them were

females, and the others males. They were recruited from the Northwestern University campus and

were paid for their participation. None of them reported having any speech disorders. They all spoke

general American English without noticeable accents.

2.3. Recording Procedure

Recording was conducted in a sound-treated booth at the Speech Acoustics Laboratory in the

Department of Communication Sciences and Disorders at Northwestern University. The subject was

seated comfortably in front of a computer monitor in the booth. The microphone was placed by the

side of the monitor, approximately 1 foot away from the subject's lips. In each trial, the subject

pressed the “Next” button displayed on the screen and the target sentence was displayed on the

screen. At the same time, a prompt question was played through a loudspeaker. The subject then read

aloud the displayed sentence as a response to the prompt question. The prompt questions were

recorded at two speaking rates, normal and fast. Subjects were instructed to say the target sentence at

similar speaking rate as that of the prompt question. They were also instructed not to pause in the

middle of a sentence. In case a mistake was made as judged by the experimenter, the subject was asked

to repeat the sentence. The sentences were presented in random order, and a different order was used

for each subject. Before the start of the real trials, the subject went through a number of practice

trials until he/she was familiar with the procedure.

2.4. f0 extraction

The acoustic analysis procedure was similar to those used in Xu (1997, 1998, 1999, 2001a). First the

digitized signals were converted to a format readable by programs in the ESPS/waves+ signal

processing software package (Entropic Inc.). Then individual target sentences were extracted and

saved as separate ESPS signal files. The program epochs in the ESPS package was then run to mark

every vocal cycle in the target words. After that, the marked signals were labeled manually in the

ESPS xwaves program for the onset and offset of each segment (both consonants and vowels) of the

target words using the xlabel program. Manual editing was performed to correct spurious vocal pulse

labeling by the epochs program (such as double-marking or vocal-cycle skipping).

Page 17: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

17

The vocal pulse markings and segment labels for each utterance were saved by the xlabel program in

a text file. Those text files were then processed by a set of custom-written computer programs.

These programs first converted the duration of vocal cycles into f0 values, and then smoothed the

resulting f0 curve using a trimming algorithm that eliminated abrupt bumps and sharp edges (cf. Xu,

1999 for details).

3. Analysis and Results

Recognizing that acoustic patterns do not resemble underlying phonetic targets directly, as discussed

in the Introduction, our goal is not to find direct "acoustic correlates" of either pitch targets or focus.

Rather, the goal is to find acoustic evidence for the underlying pitch targets and pitch range

specifications that are associated with pitch accents and focus. The search for the evidence will

follow the following rationale, which is based mostly on what we have learned about articulatory

constraints on f0 production as discussed in the Introduction:

(1) It takes time to change pitch articulatorily. Thus a significant portion of observed f0 contours

must be transitions toward the intended underlying targets rather than being the targets themselves.

(2) Due to rigid coordination of laryngeal and supralaryngeal movements, there is little room for

speakers to make micro-adjustments of f0 alignment. And, based on recent findings (Ladd et al.,

1999; Ladd & Schepman, 2003), we take it as our working assumption that in English the syllable is

also the unit of pitch target alignment like in Mandarin, unless proven otherwise.

(3) Based on (1) and (2), the f0 contour in the early portion of a syllable is understood as mainly a

transition toward the pitch target associated with the syllable, whereas the later portion of the f0 in

the syllable will be viewed as more directly reflecting the underlying target, especially if the syllable is

sufficiently long.

(4) It also takes effort to change pitch. Thus less effort should lead to slower pitch changes. The

reverse should also be true. That is, slower pitch movements during transitions should be indication of

weaker efforts rather than total absent of underlying targets.

Guided by these rationales, our data analysis attempts to find answers to the questions listed near the

beginning of the Method section. Table 1 lists these questions again together with the specific f0

events we were looking for in order to answer the questions.

Insert Table 1 about here

Page 18: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

18

The following analysis consists of two phases. In phase I we perform visual inspection of the f0

contours. In phase II, we perform various quantitative analyses.

3.1. Phase I — Visual Inspection of f0 Contours

The first step in visual inspection was to check for outliers. The purpose was to exclude sentences

that were said with apparently wrong focus. f0 contours of the 7 repetitions of each sentence with the

same focus and speaking rate were displayed as illustrated in Figure 6, which displays the f0 contours

of the sentence “Nina may know my niece” with no narrow focus, produced by all subjects at

“normal” rate. These curves are displayed using normalized time, i.e., with the same number of

points taken from each syllable at equal proportional interval, e.g., 0, 1/20, 2/20, 3/20, …, 20/20. As

can be seen, displayed in this way, the f0 curves by each subject, except subject 2 (whose case will be

discussed later), are highly consistent across the seven repetitions. When an inconsistency was

noticed, the following criteria were used to determine if an outlier was involved and if it should be

excluded.

A repetition is excluded if and only if

a) it is obviously different from the rest of the repetitions, and

b) it has the wrong focus as judged auditorily by the authors

A repetition is not excluded if

a) it differs from other repetitions only in pitch range but not in perceived focus

Insert Figure 6 about here

Altogether, a total of four repetitions from subject 2 were excluded (1.4% of the total, and all from

different conditions) and 1 from subject 4 (0.3% of the total) was excluded.

After excluding the outliers, for each subject, the repetitions of each sentence at each speaking rate

were averaged to obtain a mean f0 curve. Then the mean duration of each syllable across the

repetitions was computed. This mean duration was used in displaying the f0 contours of each syllable

in the sentences in the same focus condition. In this way we could compare the tonal contours of

different sentences without losing sight of the actual duration of each syllable. Figure 7 displays mean

f0 curves of all sentences produced at normal rate by all subjects except subject2. F0 curves of subject

2 were not included in the mean F0 curves because of their apparent inconsistencies with those of

other subjects'. The open squares, circles and diamonds on the f0 curves indicate syllable boundaries.

For syllables with initial sonorants, the boundaries are set at the point where the spectral pattern

Page 19: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

19

makes an abrupt shift into a typical nasal or lateral pattern. (cf. Xu, 1999 for more detailed

description of the labeling procedure). For syllables with stops and fricatives, the boundaries are set at

the onset of closure or frication.

Insert Figure 7 about here

Through visual inspection, we made a number of direct observations on the f0 curves, most of which

are visible in Figure 7, but some, especially the individual differences, are not. First, we noticed fairly

consistent patterns in the height of f0 peaks of focused words as compared to the neutral focus

sentence:

1. The f0 peak of a word is consistently higher under a narrow focus than in the neutral-

focus sentence.

2. The f0 peaks of all words after a narrow focus are lower than in the neutral-focus

sentence.

3. The f0 peaks of words before a narrow focus are lower than in the neutral-focus sentence

for some subjects (mostly females) but not for others (mostly males).

Second, we observed the following patterns and trends in terms of the location of f0 peaks in and

around focused words:

4. In all accented syllables, f0 starts to rise near the beginning of the syllable.

5. If the lexical stress is word final (Lee, Lamar or lure), the f0 peak usually occurs within

but near the end of the stressed syllable.

6. If the lexical stress is not word final (Nina, Emily, Ramona, mimic, minimize, nanny or

mummy), the peak mostly occurs in the unstressed syllable following the stressed syllable.

7. In a final monosyllabic word (niece), the peak occurs around the middle of the stressed

syllable.

Third, we observed some further details, some of which overlap with observations 4-7.

8. f0 peak occurs earlier when the vowel of the stressed syllable is phonologically (and

phonetically) long (Lee, Lamar, Nina, Ramona, lure, nanny) than when the vowel is

short (Emily, mimic, minimize, mummy).

Page 20: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

20

9. f0 drop after a stressed syllable is faster in a focused word than in a non-focused word —

this may suggest post-focus suppression as an active force.

10. The scope of post-focus suppression seems to include not only all post-focus words but

also post-accent unstressed syllable(s) in the focused word.

Finally, we notice a number of individual differences.

11. For subject 2, the first and second f0 peaks in all sentences without narrow focus are much

later than other subjects

12. For subjects 1, 2, 4, 6, the f0 peak occurs right before the offset of the accented syllable

in “Lee”, “Lamar”, and “lure” when they are under focus. For subjects 3, 5, 7 8, however,

the f0 peak occurs well before the offset of the accented syllable in these words.

13. While there are apparent on-focus pitch range expansion and post-focus pitch range

suppression, for subjects 7 and 8 at least, there are also visible f0 movement

corresponding to the accented words in the post-focus region. Faint traces of post-focus

accents can be also seen in the f0 curves of subjects 1, 4, 5 (in “know”, f0 rises after

syllable onset).

3.2. Phase II — Quantitative Analyses

In this section, we first report results of statistical analyses performed to verify the observations

described in the previous section. We then report results of further quantitative analyses aimed at

finding out the underlying mechanisms of the observed patterns. The following measurements were

taken from individual f0 curves produced by all eight subjects using a set of custom-written C

programs.

• Minf0 (st) — lowest f0 in the stressed syllable of the accented words (or in all words for some

analyses), measured in semitone with the lowest f0 of each subject as the reference.

• Maxf0 (st) — highest f0 in the stressed syllable of the accented words (or in all words for

some analyses), measured in semitone with the lowest f0 of each subject as the reference.

• Rise size (st) — difference in semitone between maximum f0 and minimum f0 in the stressed

syllable of an accented word

• Accent-dur (ms) — duration of the stressed syllable in an accented word

Page 21: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

21

• Rise time (ms) — time interval between f0 minimum and f0 maximum in the stressed syllable

of an accented word

• Rise speed (st/s) = 1000 * Rise size / Rise time

• Maxf0-to-C2 — time interval between f0 maximum and onset of the first post-accent syllable

• C1-to-maxf0 — time interval between onset of the accented syllable and f0 maximum

• Minf0-to-C1 — time interval between f0 minimum and onset of the accented syllable

• Peak location = 100 ¥ C1-to-maxf0 / accent-dur

• Valley location = 100 ¥ C1-to-minf0 / accent-dur

3.2.1. Focus effect

We first address the issue of how focus is realized in terms of f0 of the accented syllable under focus.

Table 2 displays maxf0, minf0, rise size, rise speed, and accent-dur broken down according to focus

(on/none), speaking rate (normal/fast), accent location (word-final/word-nonfinal), and position

(word1/word3/and word5). Also displayed in the table are probability values resulting from four-factor

repeated-measures ANOVAs performed on the five measurements. (The effect of gender was found

to be non-significant for any of the dependent variables in a set of five-factor mixed-measure

ANOVA’s. We therefore excluded it in the ANOVA’s reported in Table 2.) As can be seen in Table

2, the effect of focus is highly significant for all dependent variables except minf0. Under focus,

maximum f0 becomes higher, the size of f0 rise becomes larger, the speed of f0 rise becomes faster,

and the duration of the accented syllable becomes longer. It is worth pointing out that although the

speed of f0 rise under focus increased drastically, it is still well below the maximum speed of pitch rise

reported by Xu and Sun (2002) for the corresponding rise size (23.4 st/s at 4.4 st vs. 10.8 + 5.6 x 4.4

= 35.4 st/s per Table VI in Xu & Sun (2002)). But this speed is similar to what was reported by Ladd

et al. (1999) and Ladd et al. (2000).

Accent-dur is significantly longer at normal rate than at fast rate, as would be expected. The effect of

rate is also significant for rise size. But the difference between the two rates is very small.

Table 2 shows that the effect of accent location is significant on minf0, rise speed and accent-dur.

When the accent is word final, the duration of the accented syllable is increased by 66.6 ms, but the

speed of f0 rise is also increased. The increase in rise speed may seem to be related to the increase in

rise size, because, according to Xu and Sun (2002), rise speed is directly related to rise size. However,

Page 22: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

22

the range of rise size increase in Table 2 is only 0.2 st, which, according to Table VI of Xu and Sun

(2002), can generate a speed difference of only 0.72 st/s, much smaller than the 2.7 st/s shown in

Table 2. This rise speed increase thus appears deliberate. That is, the underlying pitch target is more

like a [fall] in a word-final accent than in a word-nonfinal accent. However, there is a significant

interaction between accent location and position. The largest difference between word-final and

word-nonfinal accent is in word 5 (5.8 st/s), whereas in word 1 and word 3 the differences are 1.5 and

0.9 st/s, respectively. There is also a significant three-way interaction between focus, accent location

and position. The largest difference between word-final and word-nonfinal accent is in word 5 under

focus: 9.2 st/s, whereas in word 1 and 3 either under focus not under focus, the largest difference is

2.3 st/s. Thus it seems that the sentence final position under focus is somewhat special. This will

become clearer in further analysis later.

The effect of position is significant for all dependent variables except rise size. As the position of

the accented syllable becomes later in a sentence, maximum and minimum f0 become lower, rise

speed becomes slower, and accent duration becomes longer.

Insert Table 2 about here

Overall, Table 2 shows that when the effects of rate, accent location and position are controlled,

under focus, the accented syllable becomes longer, the maximum f0 associated with the accented

syllable becomes higher, the size of the pitch rise becomes larger, and the speed of the pitch rise

becomes faster.

To examine the effect of focus on the f0 of post-focus words, a set of ANOVA’s were performed on

the f0 of non-accented syllables and the results are displayed in Table 3.

Insert Table 3 about here

Table 3 shows the mean values of maximum f0 in each word after word 1 and word 3 when they are

under focus and when there is no narrow focus in the sentence. The post-focus words are also divided

depending on whether the accented syllable in the focused word is followed by an unstressed syllable

(adjacency — close: Lee, Lamar, Lure; far: Nina, Ramona, Emily, mimic, minimize). As can be seen,

overall, the maximum f0 of words following the focused word is significantly lower than the same

words in the no focus condition whether focus is on word 1 or word 3. Also, the maximum f0 of post-

focus words is higher when the first post-focus syllable is immediately following the accented syllable

in the focused word than when separated by one or more unstressed syllables in the focused word.

Page 23: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

23

However, there is a significant interactions between focus and adjacency, and between adjacency and

position (p = 0.0133 and p < 0.0001). As can be seen in Figure 8, it is only when adjacency is close

that maximum f0 of the word immediately following the focused word is higher in the focused

condition than in the no-focus condition. From what we know about the mechanisms of f0

production and from what can be seen in Figure 7, this difference is mainly attributable to the fact

that it takes time for the f0 raised by focus to drop to the desired post-focus level.

Insert Figure 8 about here

The fact that maximum f0 of the syllable immediately following the accented syllable in the focus

condition is higher than in the non-focus condition, as seen in Figure 8 and Table 3, may suggest that

the scope of the focus includes the following unaccented syllable. The curves in Figure 7 indicate,

however, that there is a sharp drop in f0 even in the unstressed syllable following the accented

syllable. Figure 9 shows the mean f0 in semitone at different locations in the post-accent syllable

broken down by focus and post-accent stress. Only word1 and word3 sentences are included, because

in word5 sentences, “niece” does not have any post-accent syllable. As can be seen in Figure 9, the

downward slope is shallower when the post-accent syllable is weak than when it is strong. At the same

time, post-accent f0 drop is faster when the accented syllable is focused than when it is not focused,

whether or not the post-accent syllable is an unstressed syllable within the focused word or a stressed

syllable in the following word. A four-factor (focus, accent location, position, location in syllable)

repeated-measures ANOVA finds a highly significant interaction between focus and position (p =

0.0068), confirming that f0 drops sharply within the post-accent syllable. (The effect of focus is

non-significant, but those of accent location and location highly significant (p < 0.0001 for both).

There is also a significant interaction between focus and accent location (p < 0.0001).) Hence, the

high maximum f0 of the first post-focus syllable is immediately followed by a sharp fall toward a

much lower f0. And this fall seems to be due to speakers' attempt to lower f0 immediately after the

focused, accented syllable, whether or not the following syllable is part of the focused word.

Insert Figure 9 about here

Table 3 also shows that maximum f0 of a word is not significantly different whether or not it

precedes a focus. This despite the fact that for some subjects pre-focus syllables seem to have lower

f0 maxima than the when there is no focus in the same sentence, as can be seen in Figure 7.

Insert Figure 10 about here

Page 24: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

24

Another question that needs to be answered is whether post-focus words are totally accentless after a

narrow focus. Figure 10 shows percentage of discernable post-focus f0 peaks and the rise size of these

peaks in semitone in sentences with initial focus (in sentence group 1 shown in 2.1.1.). A peak is

defined as discernable if there is an f0 point between the onset and offset of the words “know” and

“niece” that is higher than both the starting and ending f0 of the word. The graph on the left

indicates that there are greater number of discernable peaks when there is no focus than when either

word 1 or word 3 is focused. A three-factor repeated measures ANOVA with focus, rate and position

as independent variables finds the effect of focus to be highly significant (p < 0.0001), but the effect

of position non-significant. Nevertheless, the lowest percentage of peak occurrence in post-focus

condition is still nearly 60%. The graph on the right in Figure 10 shows that there is again a

difference in rise size between the focus and no focus condition. However, a three-factor repeated

measures ANOVA finds the effect of focus to be non-significant, but the effect of position significant

(p = 0.0088). There is also no significant interaction between focus and position. Note that although

the mean rise size is quite small overall, the rise occurs in a declining f0 contour. So the size of the

accent is actually larger than the rise size seems to suggest directly.

Finally, previous studies have found that sentence-final pitch accents do not differ in f0 due to focus.

To test whether this is the case in the present data, a four-factor repeated measures ANOVA was

performed on maximum f0 of word 5. The effect of focus turns out to be significant, with maximum

f0 being higher under final focus (9.3 st) than when there is no narrow focus (6.9 st), F(1,6) = 14.793,

p = 0.0085.

3.2.2. f0 events associated with accented syllables

As discussed earlier, although f0 peaks and valleys are not necessarily the critical acoustic correlates

of a linguistic tonal unit such as lexical tone, pitch accent or focus, analysis of their alignment

relative to segmental/syllabic unit may help us identify the underlying pitch targets associated with

the accented syllables. Table 4 displays different kinds of pitch targets and their f0 alignment with the

associated syllables according to previous studies (Xu, 1998, 1999, 2001a).

Insert Table 4 about here

3.2.2.1. Alignment of f0 peaks

Five factors that may potentially contribute to f0 peak alignment are controlled: speaking rate (rate),

focus, position in sentence (position), location of accented syllable within word (accent location),

and length of accented syllable (accent length — phonological length of accented syllable: long —

Page 25: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

25

"Lee", "Nina", "Lamar", "Ramona", "lure", "niece", "nanny"; short — "Emily", "mimic",

"minimize", "mummy"). The last two factors, however, are not independent of each other in the

data set. Their effects therefore have to be examined separately. Also, because stress cannot be on a

word-final open syllable with short vowel, we excluded words with short accented syllables ("Emily",

"mummy") when examining the effect of location of accented syllable within word, and excluded

words with final stress when examining the effect of length of accented syllable. And, because accent

location and accent length fully coincide at the word 3 position, this position is not included in the

alignment analysis reported next. The alignment patterns in those words, nevertheless, did conform

to the same pattern as in the other two positions. Two separate sets of four-factor repeated

ANOVAs were performed and the probability values together with the means are displayed in the

upper and lower halves of Table 5, respectively. Also shown in Table 5 are mean values of maxf0-to-

C2 and peak location broken down by focus, rate, position, accent location and accent length.

From Table 5 we can see that the effects of focus and rate are significant when maxf0-to-C2 and

peak location are broken down by accent length, but not when they are broken down by accent

location. But it is interesting that it is in the no-focus condition that the delay is greater, whether the

difference is statistically significant. This suggests that the underlying target under focus is more like

a [fall] than a [high].

In sharp contrast to those of focus and rate, the effects of accent location, accent length and

position are highly significant on both dependent variables. Figure 11 shows the mean values of the

two dependent variables broken down further by position, accent location and accent length. In the

figure, we can see that the tendency for f0 peaks to occur later is related to three conditions: when

accent location is not word-final, when accent length is short, and when the accented word is not

sentence final. This is true in both the upper and lower panels in the figure.

Interestingly, looking at Table 2 again, we notice that duration of the accented syllable decreases

with position in an orderly manner: word1 < word3 < word5. This agrees with the trend in the right

panel of Figure 11 quite well. It is possible that all these situations are related to shorter duration of

the accented syllable, and that it is the shortened duration of the accented syllable that pushes the f0

peak location rightward. To verify this possibility, we recomputed mean duration of the accented

syllable in word 1 and word 5 according to focus, rate, accent location and accent length. They are

displayed in Figure 12. The graph on the left excludes data from words with short accented syllables;

and the graph on the right excludes data from words with word-final accented syllables. In both

graphs, a general trend can be seen: the longer the duration of the accented syllable, the earlier the

location of the f0 peak. In general, it is when the duration of the accented syllable is shorter than 200

ms that the f0 peak occurs in the following syllable, with the exception of the sentence final

Page 26: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

26

position. It seems that f0 peaks tend to occur earlier in the sentence final position, especially when

the accented syllable is sentence-final. Further examinations need to be done.

Insert Table 5 about here

Insert Figure 11 about here

Insert Figure 12 about here

3.2.2.2. More detailed analyses / f0 peak alignment in relation to duration of accented syllables

The analyses so far have revealed certain gross patterns related to focus and accents realization. The

sources of these pattern and their variations, however, are still not clear. As discussed in regard t o

Figure 12 and Table 5, we are still not certain what determines the exact location of the f0 peak

associated with an accented syllable, i.e., whether it occurs before or after the offset of the accented

syllable and how far away the peak is from the syllable offset. We have seen, however, peak location

is related to whether the accented syllable is word-final (accent location) and whether the vowel in

the accented syllable is phonologically short or long (vowel length). To determine which of the two

factors is dominant, a set of regression analyses were performed using accent duration as predictor

and maxf0-to-C2 as dependent variable. Again, to separate the effect of accent location, we excluded

short-vowel syllables, since there were no short-vowel word-final accented syllables in the data. And,

to examine the effect of vowel length, we excluded word-final accents. Figure 13 displays the

regression results. Because accent location and vowel length fully coincide in word 3, this position is

not included in the figure. The upper panel of Figure 13 shows the values of r2, and the lower panel

shows the slope of the regression line. In the left-hand graphs, the results are broken down by focus

and accent location, and in the right-hand graph the results are broken down by focus and length of

accented vowel. As can be seen, when the accent is word final (Lee, Lamar, niece) and on-focus, the

r2 values are quite large: 0.356 and 0.381 for word 1 and word 5, respectively. The corresponding

slopes of regression lines are large and positive (0.351, 0.364), indicating that the f0 peak occurs

increasingly earlier in the accented syllable as the syllable becomes longer. As was seen in Figure 11,

the relative location of the f0 peak is 80% and 56% of the syllable duration in word-final accents

when word 1 and word 5 are under focus. The difference in relative peak location between word 1 and

word 5 does not seem to have much to do with the slightly longer duration of the accented syllable in

word 5 than in word 1. This is because word 5 has much earlier peaks than word 1 even when the

durations of the accented syllables are comparable. For example, according to the regression

Page 27: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

27

equations, when accent duration is 200 ms, maxf0-to-C2 is 19.2 and 37.0 ms for no-focus and focus

conditions in word 1, respectively, but 99.9 and 84.0 ms for on-focus and focus conditions in word 5,

respectively. This indicates that f0 peaks have a stronger tendency to occur early in sentence final

position than in sentence-initial position. When the sentences have no narrow focus, the r2 values

are all very small except if the accent is sentence final (niece): 0.455. In the latter case, the slope of

the regression line is also positive, indicating that the peak location moved earlier as the syllable

duration became longer. The right-hand graphs of Figure 13 show the regression results broken down

by focus and vowel length. Note that none of the r2 values is greater than 0.2. This indicates that the

location of f0 peaks is not closely related to the duration of the accented syllable with either vowel

length.

Insert Figure 13 about here

Insert Figure 14 about here

To verify if what we have seen in sentence initial and sentence final positions also occurs in the

sentence medial position, Figure 14 displays regression results for word 3. The only sizeable r2 value

for word 3 is for the word-final/long-vowel syllable: r2 = 0.477. This indicates that it is only when the

accented vowel is long and/or word-final and when it is under focus that the f0 peak is affected by

duration of the accented syllable. When the accented vowel is short and non-word-final, the mean

values of maxf0-to-C2 are negative whether on focus or not: –29 ms and –18 ms, indicating that the

peaks mostly occurs after the offset of the syllable. This is in contrast to word 5 as shown in

Figure 11, where maxf0-to-C2 is mostly positive both when the accent is short and when it is non-

word-final.

To summarize, (a) when neither under focus nor sentence final, the f0 peak associated with an

accented syllable occurs close to and before the syllable offset if it is not followed by an unstressed

syllable, but close to and after the syllable offset if it is followed by an unstressed syllable; in neither

case does the peak location vary systematically with syllable duration. (b) If the accented syllable is

sentence final or if it is both word final and under focus, the f0 peak occurs well before the offset of

the accented syllable and its location becomes increasingly earlier relative to the syllable offset when

the duration of accented syllable increases. Implications of these patterns will be discussed in the

General Discussion.

Page 28: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

28

3.2.2.3. Alignment of f0 valleys

As shown in Table 4, determination of the basic form of the pitch target associated with an accent

requires not only information about alignment of f0 peaks, but also that of f0 valleys. In particular,

knowing the f0 valley alignment can help us distinguish [rise] from [high] and [fall], and [rise] from

[low]. The analysis of f0 peaks has already indicated that the pitch target in the accent in word 1, 3,

or 5 is unlikely to be [low], because it is never the case that f0 maximum occurs at the beginning of

the accented syllable. The preceding analyses also suggest that the target is either [high] or [fall]

depending on a number of factors. However, there is another possibility that has not be fully tested,

i.e., an f0 rise during a syllable could also be due to a [rise]. There has already been some evidence that

this is not highly likely target, because f0 peaks mostly occur within the accent syllable unless the

syllable duration is very short. This is in contrast to the R tone in Mandarin, which presumably

carries a [rise], where the peak always occurs after the end of the syllable, regardless of syllable

duration. Further verification of this understanding may be obtained by analysis of f0 valley

alignment. Table 6 compares the effects of several factors on two measurements of f0 valley

alignment, namely, C1-to-minf0 and valley location (100 x C1-to-minf0 / accent-dur). In the table

we can see that the largest mean value of C1-to-minf0 is 19.2 ms in word-final accents. But even this

value correspond to only 7.0% of the duration of the accented syllable. There are only two

marginally significant difference: between normal and fast speaking rates and between word-final

accent and word-nonfinal accent. However, the difference in the means are so smaller (7% vs. 9%),

we can still regard them to be not very different. In general, therefore, the pitch target is unlikely t o

be [rise] in the accented syllable in the sentences examined in the present study.

Insert Table 6 about here

3.2.3. f0 events during unaccented syllables

The f0 contours of an utterance consist of not only prominent peaks, but also curves in between the

peaks. As discussed in the Introduction, there are different theories about how contours between

peaks are formed. They can be divided into three major groups, spreading, interpolation and target

implementation. Spreading is mostly from left to right. Interpolation involves tones both before and

after the f0 contours at issue. All three hypotheses can be verified by examining the influence of the

f0 events upon each other. An interpolated curve should equally reflect the influence of the preceding

and following pitch targets, whether the interpolation is linear or "sagging" (Pierrehumbert, 1980,

1981). Spreading, on the other hand, implies that there should be 100% influence from the tone on

the left. Target implementation implies influence mainly from the left, which would diminish over

Page 29: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

29

time as the target is being approached. To examine the relative influence of the preceding and

following accents on a non-accented syllable, we performed several sets of regression analyses on the

f0 height at different locations in the unaccented syllable immediately after the accented syllables.

Figures 15 displays the results of the regression analyses.

Insert Figure 15 about here

In Figure 15, the regressor is rise-size (in semitone relative to the minimum f0 of the word), and the

dependent variables are post-pitch at 50, 100, 150 and 200 ms after the accented syllable. Post-pitch

is computed by subtracting minimum f0 of the word from the f0 values at the four locations in the

post-accent syllable. The graphs on the left show r2, which indicates how much of the variation in

post-pitch can be accounted for by the height of the preceding accent as represented by rise size. The

graphs on the right display the slopes of the regression lines, indicating whether post-pitch varies in

the same or opposite (i.e., when the slope is negative) directions as rise size. As can be seen, overall,

post-pitch at 50 ms after the post-accent syllable can be well predicted by rise-size in word 1 and

word 3 positions. The prediction is not as good in word 5, although it can still account for 25.6,

34.2% of the variations for the no-focus and post-focus conditions, respectively. The predictability

reduces over time. But the rate of reduction is faster when the post-accent syllable is stressed ("may"

after "Lee" and "Lamar") than when it is unstressed (in "Nina", "Ramona" and "Emily"). The slope

of the regression line also changes over time, but it remains positive in both word 1 and word 3

positions while becoming negative at 100 ms post-accent in word 3. The negative slope of the

regression line at sentence final position seems to indicate an extra effort to implement something

that is quite independent of the preceding accent.

Insert Figure 16 about here

In Figure 16, the regressor is again rise-size, but the dependent variables are pre-pitch at 50 ms and

100 ms prior to the onset of the accented syllable. Similar to post-pitch, pre-pitch is computed by

subtracting minimum f0 of the word from the f0 values at three locations in the pre-accent syllable.

The graphs on the left again show r2, while the graphs on the right show the slope of the regression

line. In these graphs, pre-pitch is overall poorly predicted by rise-size. Only in word 2 are there r2

values over 0.2, and those are at locations farthest away from the accented syllable. Since they

occurred only in two conditions in word 2, it is difficult to determine if these higher r2 values reflect a

real anticipatory influence or are merely accidental. Thus there appears to be little consistent

influence of the accented syllables on the f0 of the pre-accent syllables.

Page 30: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

30

The results of the regression analyses in this section do not seem to favor either the spreading or

interpolation account for the f0 of the unaccented syllables. Rather, they seem to agree with the

prediction of the target implementation account, i.e., substantial influence from the preceding accent

which reduces over time and little influence from the following accent. Also consistent with the

target implementation account is the finding that a preceding accent exerts more influence on the f0

of an unstressed syllables than on a stressed syllable. An additional finding is that the influence of the

preceding accent is very quickly overcome if the unaccented syllable is sentence final, and the slope

of the regression line actually become negative 100 ms after the preceding accent. This seems t o

suggest an extra effort to implement something with an independent identity.

3.2.4. The case of subject 2

Most of the analyses so far have excluded data from subject 2 because of her extensive inter-trial

inconsistency in terms of the basic f0 patterns. As can be seen in Figure 6, where all the other subjects

would have a high f0 value, subject 2 sometimes has a low f0 value, and vice versa. Informal listening

to her sentences suggested that she may have used different tonal patterns for accented syllables as

well as unaccented ones. Upon closer inspection, we noticed that such alternate f0 patterns in terms

of location of peaks and valleys occurred in other sentences as well. When using peak location

patterns as reference, we can see what is happening when there is no narrow focus in the sentence:

the first f0 peak often occurs much later, mostly in the middle or later in the word "may" (59 out of

98 trials). In contrast, the first f0 peak always occurs well before or around the onset of "may" for

other speakers. More consistently, the f0 contour in the accented syllable in word 3 usually assumes a

sharp fall toward a valley near the syllable offset (69 out of 98 trials), indicating that this speaker

actually tried to implement a [low] pitch target for this syllable.

For whatever reason, subject 2 seems to have assigned [low] pitch target to the second accented word

in these sentences, and a high pitch target to the last accented word. This is apparently different for

the other 7 subjects examined in the present study. Such free alternation of high and low tonal targets

has been suggested before by Goldsmith (1999). Since it occurred in only one subject in the present

study, no definitive conclusions can be drawn about it. Also interestingly, the alternating peak

locations with this subject is true only for sentences without a narrow focus on word 1 or word 3.

Whenever there is a narrow focus on word 1 or word 3, the location of the f0 peak becomes quite

consistent, and they are not different from the general peak location patterns of other speakers as

shown in Figure 7: the first f0 peak never occurs later than the middle of the word "may", and there is

never a sharp f0 fall in the accented syllable in word 3 toward a valley near the syllable offset. This

indicates that there is no [high] or [fall] pitch target for this syllable when focus is on word 1 or 3.

Page 31: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

31

4. General Discussion

The present study is motivated by a number of recent advances in the understanding of tone and

intonation. First, there has been converging evidence that surface f0 contours do not directly

resemble underlying pitch components that function in speech intonation. Articulatorily, not only is

the larynx unable to change f0 instantaneously, but also it is unable to change f0 fast enough to render

the transitional movements negligible (Xu & Sun, 2002). In fact, at a normal speaking rate, the

observed f0 contours are likely to be mostly transitions toward various ideal targets rather than being

the targets themselves (Xu, 1997, 1999). Biomechanically, there is a strong constraint to fully

synchronize related movements (Kelso et al., 1979; Kelso, 1984; Kelso et al. 1981; Schmidt et al.,

1990), and there has been evidence that in the case of tone, a tonal target is synchronously

implemented with a syllable (Xu, 1998, 2001a, 2002; Xu & Wang, 2001). Such synchronization

implies that in each syllable, the earlier portion of the f0 contour is mostly transitional, whereas the

later portion is closer to the ideal target. The first task of the present study is therefore to search for

evidence of the underlying forms of the local pitch targets, assuming that observed f0 movements are

mostly transitions toward these targets. The present study is also motivated by the recent finding

that different layers of information can be conveyed simultaneously by f0 in a tone language (Xu,

1999). The second task of the present study is therefore to examine whether and how such

simultaneous conveyance of intonational information by f0 is also done in a non-tone language like

English. Finally, recent research on Mandarin has suggested that syllables previously thought to be

"toneless" are likely produced with local pitch targets just as syllables with full tones, and that their

highly variable f0 contours are probably due to the weak effort used in implementing the targets

(Chen & Xu, 2002). The third task of the present study is therefore to investigate if this mechanism

also underlies the generation of f0 contours of the "accentless" syllables in English.

4.1. Local pitch targets

As afore-mentioned, our search for the underlying targets of pitch accents in English is guided by two

newly-gained understandings: (a) that an f0 transition toward any target takes time and it often spans

much of the duration of a syllable, and (b) that the implementation of a pitch target is synchronized

with the syllable associated with the target. The following summaries the local pitch targets of

syllables together with specific evidence reported in section 3.

1. No [rise] found.

When a syllable carries a [rise], (a) f0 falls first and then rises in the later portion of the

syllable, and the rise accelerates towards the end of the syllable; (b) an f0 minimum occurs

well before syllable offset and its location varies systematically with syllable duration, and (c)

Page 32: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

32

an f0 maximum consistently occurs immediately after the accented syllable, regardless of the

vowel length. Our analysis, however, did not find these characteristics in any of the syllables

we examined. Instead, the results of the analyses in 3.2.2.3. show that f0 minima consistently

occur very early in the syllable (7-9% into the syllable: 3.2.2.3). Thus there is no evidence

for the existence of [rise] in the pitch accents of the short declarative sentences we

examined.

2. [high] in non-focused accented syllables that are not sentence-final, and in focused accented

syllable that are not word-final.

Evidence: (a) f0 contours in these syllables mostly rise throughout their duration but slow

down toward their offset, especially when the duration is relatively long (i.e., when the vowel

is phonologically long) (Figure 7). (b) f0 peaks occur around the end of these syllables, but

their exact locations vary: before the syllable offset when followed by a stressed syllable, but

after the syllable offset when followed by an unstressed syllable (Figure 11). However, the

location of the f0 peaks does not vary systematically with the duration of the syllable

(Figures 13 &14). (c) f0 valleys consistently occur immediately after the syllable onset (Table

6).

3. [fall] in focused word-final accent and in non-focused sentence-final accent.

Evidence: (a) f0 contours in these syllables rise first and then fall in the later portion of the

syllable (Figure 7); the speed of the rise is faster than that in non-final accents (Table 2); and

the speed of the fall accelerates toward the end of the syllable (Figure 7). (b) f0 peaks occur

well before the syllable offset, and their locations vary quite systematically with syllable

duration (Figure 11). (c) f0 valleys consistently occur immediately after the syllable onset

(Table 6).

4. [mid] in all unaccented syllables. The evidence for this will be discussed later in 4.3.

It has been well-known that the distinction between H* and LH* in the Pierrehumbert model of

intonation is difficult to make (Ladd & Schepman, 2003). The solution proposed by Ladd and

Schepman (2003) was to merge the two accents into LH* because the f0 minima consistently

occurred at the beginning of the accented syllable in both of the alleged accents. As we have discussed

above, based on the new understanding, that an f0 minimum consistently occurs at the beginning of

an accented syllable is actually evidence, together with other patterns listed in Table 1, that the

underlying pitch target is either [high] (when neither under focus nor sentence final) or [fall] (when

word-final and under focus or sentence final).

Page 33: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

33

Silverman and Pierrehumbert (1990) also investigated f0 peak alignment in American English. Unlike

in the present study, they focused mostly on one alignment measurement, namely, the distance

between onset of accented vowel and the location of f0 peak, which they referred to as peak delay.

They compared peak delay in words with 0, 1, or 3 unaccented syllables after the accent syllable.

Their grouping is not exactly the same as the word-final vs. word-nonfinal grouping as in the present

study. However, when viewing their data using the word-final/word-nonfinal grouping, we can see that

the peak location is almost always around or after the end of the accented syllable in word-nonfinal

accents even as rhyme duration increases. In contrast, in word-final accents, peak location remains

around the middle of the rhyme as rhyme duration increases. Thus their data provide further support

for our interpretation of the underlying pitch targets of the accented syllables in English.

4.2. Manifestation of focus

Our analysis found that, under focus, the accented syllable becomes longer, the maximum f0

associated with the accented syllable becomes higher, the size of the pitch rise becomes larger, and

the speed of the pitch rise becomes faster. In addition to that, however, the analyses have also shown

that, similar to Mandarin (Xu, 1999), the realization of focus in English is not only in terms of f0

changes in the syllable directly under focus, but also in terms of f0 changes in syllables after focus.

Overall, the maximum f0 of words following the focused word is significantly lower than the same

words in the no focus condition whether focus is on word 1 or word 3. Furthermore, as shown in

Figure 7, the high maximum f0 under focus is immediately followed by a sharp fall toward a much

lower f0. And this fall seems to be due to speakers' attempt to lower f0 immediately after the accented

syllable after focus, whether or not the following accented syllable is part of the focused word. Such

post-focus f0 drop was reported by Cooper et al. (1985), Eady & Cooper (1986) and Eady et al.

(1986) but has not generally been accepted as part of the manifestation of focus. There has been

evidence in studies of focus perception that low f0 after focus is critical for the correct identification

of focus by listeners (Rump & Collier, 1996; Hasegawa & Hata, 1992; Xu, Xu & Sun, 2003; Xu & Xu,

2003). So, the evidence seems compelling that post-focus pitch range compression is part of the

manifestation of focus itself rather than of something else.

Our quantitative analysis further shows that Post-accent f0 drop is faster when accented syllable is

focused than when it is not focused, whether or not the post-focus syllable is stressed. This finding

has two implications. First, post-focus f0 drop is likely to be done with an active articulatory force.

Second, the scope of post-focus f0 drop includes not only all the post-focus words but also the post-

accent syllable(s) in the focused word.

Page 34: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

34

Research on focus in Mandarin has found that the f0 changes related to focus is not simply raising the

f0 of the focused words and lowering the f0 of the post-focus words, it is in fact modifying the pitch

ranges of the of the sentence: expanding it for the focused word, raising its maximum f0 and lowering

its minimum f0, suppressing, i.e., lowering and narrowing it for post-focus words, and leaving it intact

for pre-focus words. The design of the present study, which limits the current investigation t o

declarative sentences only, does not allow us to verify this bi-directional pitch range modification for

the accented syllable, because, as has been found in Mandarin, pitch targets such as [high] and [fall]

expands only upwards. Only the L tone and the R tone, which presumably have [low] and [rise] pitch

targets, were found to expand downward in Mandarin. The data reported by Eady & Cooper (1986),

however, suggest that the pitch targets associated with stressed syllables in a Yes-No question are

probably either [low] or [rise] and the minimum f0 of those syllable did become lower under focus.

Our analysis also found that the underlying pitch target of an accented syllable is not always the same

under focus and not under focus. In a word-final stress, the accented syllable tends to have a [fall]

target, whereas in a non-word-final stress the accented syllable tends to have a [high] target. This is

different from Mandarin, in which the pitch targets are assigned lexically and are not changed by

focus.

In Mandarin, post-focus pitch range suppression, though quite extensive, does not totally eliminate f0

contours related to lexical tones, as can be seen in Figure 3b. It has been an open question, however,

whether there are stress-related f0 movements after focus in English. In the Pierrehumbert model of

intonation, no pitch accents occur after focus, which, also known as the nuclear accent, is by

definition the last pitch accent in a intonation phrase. But a recent study by Di Cristo and Jankowski

(1999) has found that, at least in French, post-focus accents still retain their identity in the form of

certain f0 contour patterns. In the present study, analyses of both peak occurrence and size of f0 rise

in post-focus words demonstrate that the percentage of peak occurrence is still nearly 60% or higher

and the size of the f0 rise is not significantly different when post-focus and when the sentence has no

narrow focus (cf. Figure 10). It is particularly worth noting that, the f0 rises occur against a declining

f0 contour which is possibly related to post-focus suppression. So the size of the intended f0 rises is

likely larger than the observed size. Thus the present data indicate that in English, too, pitch accents

in post-focus words still remain, albeit suppressed severely by the focus.

Data in Table 3 also show that maximum f0 of an accented word is not significantly different whether

or not it precedes a focus. This indicate that pre-focus accents generally remain intact. This also

agrees with the findings of Cooper et al. (1985) for English and Xu (1999) for Mandarin.

Page 35: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

35

Finally, unlike what has been found in Mandarin by Xu (1999) and Cooper et al. (1985), the present

data show that a sentence-final focus significantly raises f0 of the last word in the sentences examined

in the present study as compared to the same sentences without a narrow focus. This seems to agree

with the finding of Rump and Collier (1996) that it is possible to find patterns of f0 configuration

which can be perceived as either having a single sentence-final focus and the so-called broad focus.

However, their data also show that the distinction between focus vs. non-focus is smaller at the

sentence-final position than in earlier positions. Even in our data, a sentence-final accent not under

focus has a [fall] target just as does a non-sentence-final but word-final accents under focus. This

could indicate that a declarative sentence with no narrow focus carries a default final focus, as

assumed by some phonological analyses of intonation (Ladd, 1996). However, there is also a

possibility that it is the low boundary tone attached to the end of a declarative sentence

(Pierrehumbert, 1980) that is partially responsible for the final fall in f0. So, the question regarding

the exact nature of sentence-final pitch accent remains open.

4.3. f0 of unaccented syllables

The analyses described in 3.2.3. indicate that f0 values of unaccented syllables are influenced much

more by the preceding accent than by the following accent. This influence, however, reduces over

time and the reduction is not accompanied by increase in influence from the following accent. This

pattern suggests that f0 of an unaccented syllable does not come from "tone spreading", i.e., the

spreading of tonal feature from one tone-bearing unit to the next (Goldsmith, 1990; Hyman &

Schuh, 1974; Pierrehumbert, 1980), which would predict sustained full influence of the preceding

accent. Neither can the pattern be explained as resulting from interpolation between flanking pitch

accents, whether the interpolation is straight-lined or curved (Kochanski & Shih, 2003;

Pierrehumbert, 1980), because interpolation would generate equal amount of influences from both

the preceding and following accents. More importantly, the pattern in fact calls into question the

general consensus that unaccented syllables do not have any underlying pitch targets of their own. If

the influence of the preceding accent decreases over time with little increase in the influence of the

following accent, there must be a third source for the f0 movements during the unaccented syllables.

In a study of Mandarin neutral tone, which is also generally believed to be toneless, Chen and Xu

(2002) found that the f0 of the neutral tone is best understood if we assume that (a) this tone has its

own pitch target, which is probably a static [mid], and (b) this target is implemented with

categorically less articulatory effort than those of the full tones. The English sentences used in the

present study did not provide exactly the same manipulations as did in Chen and Xu (2002).

Nevertheless, we can see from Figure 7 that pre-accent f0 minimum is usually higher than the lowest

f0 of the speaker, which can be found at the end of the sentence or in the post-focus region. Thus it

is possible that unaccented syllables are implemented with a static [mid] target. Furthermore, the

Page 36: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

36

regression analyses in 3.2.3 show that an unstressed syllable is more susceptible than a stressed but

unaccented syllable to the influence of the preceding pitch accent. Thus even when not accented, the

relative strength of a syllable can be further differentially manifested through articulatory effort.

In the Pierrehumbert model of English intonation, stressed syllables in words carrying pitch accents

are always specified with tones such as H and L. Unstressed syllables of accented words sometimes are

each specified with a tone but sometimes are not. Words that do not carry pitch accents are not

specified with tones at all. In that framework, f0 of those syllables without a tone come from linear

or nonlinear interpolation between neighboring accentual tones or boundary tones. Such phonetic

interpolation, however, is inconsistent with the principle proposed by Pierrehumbert (1980) and

emphasized in her later work (Pierrehumbert, 2000) that there is no long-distance phonetic look

ahead. Mathematically, interpolation, especially a linear one, is rather simple to implement.

Articulatorily, however, interpolation would be a rather laborious mechanism if not an impossible

one, because it requires the articulatory system to store both the preceding and following tones as

references, and to continuously calculate the current state based on the exact elapsed time at every

moment during articulation. Assigning a pitch target to the unstressed syllable and approaching it

with a weak effort makes the articulation task much simpler, because the articulatory system does

not need to anticipate the f0 value of the upcoming accent, and it does not need to refer to the

precise proportional time between accents for determining the moment-to-moment f0 value. The

only on-line assessment of current state needed is that of the distance from the targeted state.

Identifying the sources of f0 for the unaccented syllables is not only important for understanding the

f0 contours of these syllables, but also critical for understanding the f0 contour of accented syllables.

Assuming that the implementation of a pitch accent is synchronized with the syllable, f0 at the

beginning of the syllable has to start from the level achieved in the preceding unaccented syllable. If

the accent has the pitch target of [high] or [fall], as discussed earlier, f0 has to rise during the initial

portion of the syllable, thus generating the appearance that both the beginning "low" and the initial

"rise" are part of the pitch accent. In a series of recent studies, Ladd and his colleagues found

consistent alignment of f0 valleys with the onset of accented syllables (Arvaniti et al., 1998; Ladd et

al., 1999; Ladd et al., 2000). And they interpret such alignment as indication that there is a L

associated with the accent which is aligned to the syllable onset. With the understanding that

unaccented syllables also have their own pitch targets which are implemented also asymptotically,

the pitch targets of the accented syllables can be simplified by excluding the portion due to the

influence of the preceding unaccented syllables. Thus the f0 valley found to consistently align with

the onset of an accented syllable should not be interpreted as a L, but rather as evidence that the

implementation of the pitch target associated with the pitch accent starts from the onset of the

syllable.

Page 37: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

37

4.4. Implications for intonation theories

The findings of the present study have implications not only for our understanding of English

intonation, but also for theoretical understanding of intonation in general. As discussed in the

Introduction, most existing theories can be identified as either linear or superpositional (Fujisaki

1988; Gårding 1979; Grønnum 1995; Ladd, 1996; Pierrehumbert, 1980; Pierhumbert & Beckman,

1988; 't Hart et al. 1990). The findings of the present study suggest that both of these general

approaches have merits and yet neither seems adequate for the new data. The findings of the present

study indicate that there are both linearity and superposition in intonation. The linearity is seen in

the finding that every syllable is likely to have a pitch target and the generation of local f0 contours

does not involve either long-distance or bidirectional interpolation. The superposition is seen in the

finding that functions like focus may involve adjustment of pitch ranges for a number of consecutive

syllables. These findings seem to call for a new theoretical model of intonation. Such a model should

be based on the recognition of individual components of intonation (Xu, 2001b). The model should,

first of all, make a distinction between communicative functions that convey meanings through

intonation and articulatory mechanisms that implement these functions. This distinction is necessary

because intonation is produced by an articulatory system whose physical properties introduce extra f0

variations not intended by the speakers. These variations therefore should not be viewed as part of

the communicative meanings. This is especially true if f0 variations due to the articulatory system

are non-trivial, as is the case with inertia and synchronization. Due to inertia, no intended pitch

changes can be made instantaneously. In fact, speakers often have to make f0 movements as fast as

they can when changing pitch in their speech (Xu & Sun, 2002). Due to the need to synchronize

related motor movement (Kelso, 1984; Kelso et al., 1979), transitions toward underlying tonal

targets have to start from the beginning of the syllable and stop by the end of the syllable (Xu, 1997,

1999; present data), despite the fact that this would often mean insufficient time to complete the

transition. As a result, the closest resemblance to the ideal f0 contour of a lexical tone is often found

only near the end of the syllable. Furthermore, the same synchronization of tone and syllable seems

to take place regardless of the segmental composition of a syllable (Xu, 1998; Xu & Xu, in press).

The articulatory pitch-production system, however, is only an instrument that generates surface f0

contours, and it has to be controlled by the input that carries communicative information. There are

many different communicative functions that need to be conveyed through melody in speech,

including (but certainly not limited to) the Lexical, Syntactic/Semantic, Sentential, Focal and Topical

functions. These functions often need to be transmitted simultaneously. The present data as well as

those from previous studies (Chen & Xu, 2002; Xu, 1999) demonstrate that simultaneous

transmission of multiple communicative functions can be done by each function manifesting itself

through a unique way of controlling one or more of three parameters: pitch target, pitch range and

Page 38: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

38

articulatory effort. Data from Xu (1999) demonstrate that lexical tone and focus can be realized

concurrently in Mandarin, and the present data demonstrate that pitch accent and focus can be also

realized concurrently in English. In both cases, pitch accent and tone are manifested as local pitch

targets, while focus is expressed mostly through manipulation of pitch ranges. The present data also

demonstrate that lexical stress in English can be realized in parallel with both pitch accent and focus.

In both cases the weak stress seems to be achieved by deliberately reduced articulatory effort. Data

from other studies also suggest that there are separate controls for sentence type, i.e., statement vs.

question (Eady & Cooper, 1986), and for new topic, i.e., the introduction of a new subject into the

conversation or monologue (Lehiste, 1975; Umeda, 1982).

Based on these understandings, a new model of intonation is sketched in Figure 17. The model

assumes a major division between communicative functions that convey meanings through

intonation and articulatory mechanisms that implement these functions. Intonation related

communicative functions manifest themselves in parallel by separately specifying (a) local pitch

target, (b) pitch range and (c) articulatory effort. Taking these specifications as input, the

articulatory module applies physical forces to successively approach local targets at the specified

pitch ranges with the specified amounts of effort. The timing control in the articulatory module

synchronizes the local targets with the associated syllables. The resulting f0 contours thus continually

approach successive local targets within different pitch ranges at varying speeds (column 4). Due t o

space limitation, a full elaboration of the model will be presented in a separate paper.

Insert Figure 17 about here

4.5. Caveats

Due to limited scope of the present study, many issues are left unaddressed, including both the

communicative functions and the articulatory mechanisms. Regarding the communicative functions,

the role of many pragmatic, attitudinal and emotional functions are not addressed. Regarding the

articulatory module, the present study did not definitively prove the target-syllable synchronization

in English, although consistent alignment of f0 minimum around syllable onset was found,

corroborating similar findings in previous research (Ladd et al. 1999). A recent study has found

consistent gradient language-specific alignment patterns (Atterer & Ladd, forthcoming). However,

alignment differences across languages does not necessarily mean asynchrony between syllable and

pitch target. They may be reflections of the cross-language gradient differences in the underlying

pitch targets themselves rather than gradient differences in the degrees of synchrony across the

languages. A more definitively way of verifying the synchronization hypothesis would be t o

manipulate the tonal context, especially the preceding tonal context of an accent. This can be easily

Page 39: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

39

done in a tone language like Mandarin (as did in Xu, 1999), but is rather difficult in languages like

English. The closest experimental manipulation we have seen is done by Ladd et al. (2003), where

the number of unstressed syllables intervening two accents is manipulated. What they found was that

the duration of the f0 rise in the second accent remains constant despite variation in the height of

the f0 minimum between the two accents. However, they did not examine whether the alignment of

the f0 minimum remains constant relative to the onset of the second accented syllable. So, whether

pitch targets are fully synchronized with the syllable in English still needs to be further investigated.

The nature of boundary tones is also not specifically examined in the present study, although some

evidence of it is seen in the analysis of unaccented syllables at sentence final versus earlier positions

as discussed in 3.2.3. The understanding of boundary tone is also important for understanding other

local pitch targets. For example, the final syllable in a sentence would contain a boundary tone. If so,

shouldn't the overall falling pattern observed in the sentence-final syllable be interpreted as

consisting of a [high] followed by a [low] boundary tone? This question needs to be answered by

future studies specifically designed to look into the nature of boundary tones.

5. Conclusions

It has been long debated over whether pitch registers or pitch contours should be considered as the

primary components of intonation in English as well as in many other languages. Recent findings

about articulatory constraints on the production of pitch movements suggest that neither

understanding is likely to be adequate because they both implicitly assume that observed f0 contours

directly resemble the underlying functional components. In the present study, we take it as given that

any intended pitch change needs substantial amount of time to complete and hence the real

components of intonation, especially the local ones, can be only partially reflected in the f0

contours. We therefore treated f0 events such as turning points, slopes, and their alignment with

segmental units only as evidence for the possible underlying intonation components. Recent studies

on Mandarin has found that both static and dynamic targets may underlie f0 contours corresponding

to lexical tones in the language, and that these targets are likely to be implemented quite

synchronously with their associated syllables. One of the tasks of the present study was therefore t o

use similar cues as used in Mandarin to reveal the nature of the underlying local pitch targets in

English.

It has also been long debated over whether intonation components are linearly sequenced or

superposed on top of each other. In the present study we recognize that for any communicative

function to be effectively conveyed in intonation, it must have its own unique manner of

manifestation. And for different communicative functions to be conveyed concurrently, they would

Page 40: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

40

not wipe out each other's cues so that each would reach the listener only at the expense of the

others. Thus there may be a multitude of different means to manifest different intonation meanings,

both linear and superpositional. The second task of the present study was therefore to examine how

each of the possible components of intonation is manifested and how they can coexist with other

components. In particular, we examined whether and how lexical stress, pitch accent and focus can be

conveyed at the same time.

Our analysis of f0 contours of short declarative sentences in American English provided data relevant

to both tasks. In terms of local pitch targets, first, non-focused and non-word-final accents seem t o

be associated with a static [high], and word-final accent under focus and sentence-final accents,

whether or not under narrow focus, seem to be associated with a dynamic [fall]. Second, unaccented

syllables, whether stressed or unstressed, seem to be associated with a static [mid] rather than being

completely targetless, and their f0 contours come from implementation of this static target rather

than from interpolation between surrounding accents. Third, unstressed syllables seem to be

associated with a weak articulatory effort that distinguishes them further from unaccented but

stressed syllables. In terms of concurrent realization of different functions, focus is found to raise the

pitch range of the on-focus, stressed syllables, suppress the pitch range of all post-focus syllables, and

leave the pitch range of pre-focus words largely intact. In other words, neither are lexically related

pitch targets directly under focus fully replaced by focus itself, nor are post-focus pitch accents

completely eliminated.

Finally, we considered implications of the present data on the theoretical understanding of intonation

in general by contemplating a new model of intonation. The model assumes a major division between

a functional module and an articulatory module. The functional module consists of communicative

functions that are parallel to each other, and the articulatory module is composed of various physical

forces. In this model, separate communicative functions parallelly determine the local pitch targets,

the pitch ranges for the pitch targets and the amount of effort given to each pitch target. The

articulatory module then implements the pitch targets at the specified pitch ranges with the specified

amounts of effort. The resulting f0 contours therefore exhibit not only manifestation of each and

every individual communicative function, but also the effects of various physical properties of the

articulatory system.

ACKNOWLEGEMENT

This work is supported in part by NIH Grant DC03902.

Page 41: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

41

6. References

Abramson, A. S. (1962) The Vowels and Tones of Standard Thai: Acoustical Measurements andExperiments. Bloomington: Indiana University Research Center in Anthropology, Folklore,and Linguistics, Pub. 20.

Abramson, A. S. (1976) Thai tones as a reference system. In Thai linguistics in honor of Fang-KueiLi (T. W. Gething, J. G. Harris, & P. Kullavanijaya, editors), pp. 1-12. Bangkok:Chulalongkorn University Press.

Abramson, A. S. (1978) The phonetic plausibilirty of the segmentation of tones in Thai phonology.In Proceedings of The twelfth International Congress of Lingusitics, Vienna, pp. 760-763.

Anderson, S. R. (1978) Tone features. In Tone: A linguistic survey (V. A. Fromkin, editor), pp. 133-175. New York: Academic Press.

Arvaniti, A., Ladd, D. R., & Mennen, I. (1998) Stability of tonal alignment: the case of Greekprenuclear accents. Journal of Phonetics, 36, 3-25.

Atterer, M. & Ladd, D. R. (forthcoming) On the phonetics and phonology of “segmental anchoring”of f0: evidence from German. Submitted to Journal of Phonetics.

Bai, D. (1934) Guanzhong shengdiao shiyan lu [Experiments with tones of Guanzhong dialects]. In InShiyusuo Jikan [A Collection by Shiyusuo] (, pp. 355-361.

Bolinger, D. L. (1951) Intonation: levels versus configuration. Word, 7, 199-210.

Bolinger, D. (1986). Intonation and its parts: melody in spoken English. Stanford University Press,Palo Alto.

Bruce, G. (1977) Swedish word accents in sentence perspective. In TRAVAUX DE L'INSTITUTE DELINGUISTIQUE DE LUND XII. (B. Malmberg & K. Hadding, editors). Lund: Gleerup.

Bruce, G. & Touati, P. (1992) On the analysis of prosody in spontaneous speech withexemplification from Swedish and French. Speech Communication, 11, 453-458.

Caspers, J. & van Heuven, V. J. (1993) Effects of time pressure on the phonetic realization of theDutch accent-lending pitch rise and fall. Phonetica, 50, 161-171.

Chao, Y. R. (1956) Tone, intonation, singsong, chanting, recitative, tonal composition, and atonalcomposition in Chinese. In For Roman Jakobson (M. Halle, editor), pp. 52-59. Mouton:The Hague.

Chao, Y. R. (1968) A Grammar of Spoken Chinese. Berkeley, CA: University of California Press.

Chen, Y. & Xu, Y. (2002) Pitch Target of Mandarin Neutral Tone. Presented at LabPhon 8, NewHaven, CT.

Cohen, A. & 't Hart, J. (1967) On the anatomy of intonation. Lingua, 19, 177-192.

Cooper, W. E., Eady, S. J., & Mueller, P. R. (1985) Acoustical aspects of contrastive stress inquestion-answer contexts, Journal of the Acoustical Society of America, 77, 2142-2156.

Crystal, D. (1969) Prosodic Systems and Intonation in English. London: Cambridge UniversityPress.

Di Cristo, A. & Jankowski, J. (1999) Prosodic organisation and phrasing after focus in French. InProceedings of The 14th International Congress of Phonetic Sciences, San Francisco, 2, pp.1565-1568.

D'Imperio, M. (2001) Focus and tonal structure in Neapolian Italian. Speech Communication, 33,339-356.

Page 42: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

42

D’Imperio, M. (2002) Language-Specific and Universal Constraints on Tonal Alignment: The Natureof Targets and “Anchors”. In Proceedings of The 1st International Conference on SpeechProsody, Aix-en-Provence, France, pp. 101-106.

Duanmu, S. (1994) Against contour tone units. Linguistic Inquiry, 25, 555-608.

Eady, S. J. & Cooper, W. E. (1986) Speech intonation and focus location in matched statements andquestions, Journal of the Acoustical Society of America, 80, 402-416.

Eady, S. J., Cooper, W. E., Klouda, G. V., Mueller, P. R., & Lotts, D. W. (1986) Acousticcharacteristics of sentential focus: Narrow vs. broad and single vs. dual focus environments,Language and Speech, 29, 233-251.

Fujisaki, H. (1983) Dynamic characteristics of voice fundamental frequency in speech and singing. InThe Production of Speech (P. F. MacNeilage, editor), pp. 39-55. New York: Springer-Verlag.

Fujisaki, H. (1988) A note on the physiological and physical basis for the phrase and accentcomponents in the voice fundamental frequency contour, In Vocal Physiology: VoiceProduction, (O. Fujimura, editor), pp. 347-355. New York: Raven Press, Ltd.

Fujisaki, H. (1992) Modeling the process of fundamental frequency contour generation. In SpeechPerception, Production and Linguistic Structure (Y. Tohkura, E. Vatikiotis-Bateson, & Y.Sagisaka, editors), pp. 313-326. Amsterdam: IOS Press.

Gandour, J. (1974) On the representation of tone in Siamese. UCLA Working Papers in Phonetics,27, 118-146.

Gandour, J., Potisuk, S., & Dechongkit, S. (1994) Tonal coarticulation in Thai, Journal of Phonetics,22, 477-492.

Gandour, J., Potisuk, S., Dechongkit, S., & Ponglorpisit, S. (1992) Anticipatory tonal coarticulationin Thai noun compounds, Linguistics of the Tibeto-Burman Area, 15, 111-124.

Gårding, E. (1979) Sentence intonation in Swedish. Phonetica, 36, 207-215.

Gårding, E. (1987) Speech act and tonal pattern in Standard Chinese, Phonetica. 44, 13-29.

Goldsmith, J. A. (1990) Autosegmental and Metrical Phonology. Oxford: Blackwell Publishers.

Goldsmith, J. A. (1999) Dealing with prosody in a text-to-speech system. International Journal ofSpeech Technology, 3, 51-63.

Grønnum, N. (1995) Superposition and subordination in intonation — a non-linear approach. InProceedings of The 13th International Congress of Phonetic Sciences, Stockholm, 2, pp.124-131.

Han, M. S. and K.-O. Kim (1974) Phonetic variation of Vietnamese tones in disyllabic utterances,Journal of Phonetics, 2, 223-232.

Hasegawa, Y. & Hata, K. (1992) Fundamental frequency as an acoustic cue to accent perception.Language and Speech, 35, 87-98.

Hirschberg, J. (1993) Pitch accent in context: Predicting prominence from text. ArtificialIntelligence, 63, 305-340.

Hollien, H. (1960). Vocal pitch variation related to changes in vocal fold length. Journal of Speechand Hearing Research 3: 150-156.

Hollien, H. & Moore, G. P. (1960) Measurements of the vocal folds during changes in pitch. Journalof Speech and Hearing Research, 3, 157-165.

Hombert, J.-M. (1974) Universals of downdrift: their phonetic basis and significance for a theory oftone. Studies in African Linguistics, Supplement 5, 169-183.

Page 43: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

43

Howie, J. M. (1976) Acoustical Studies of Mandarin Vowels and Tones. London: CambridgeUniversity Press.

Hyman, L. M. (1973) The role of consonant types in natural tonal assimilations. In ConsonantTypes and Tone (L. M. Hyman, editor), pp. 151-179. Los Angeles, CA: Department ofLinguistics, University of Southern California.

Hyman, L. M. (1993) Register tones and tonal geometry. In The Phonology of Tone (H. v. d. Hulst& K. Snider, editors), pp. 75-108. New York: Mouton de Gruyter.

Hyman, L. & Schuh, R. (1974) Universals of tone rules. Linguistic Inquiry, 5, 81-115.

Jin, S. (1996) An Acoustic Study of Sentence Stress in Mandarin Chinese. Ph.D. dissertation, TheOhio State University.

Kelso, J. A. S. (1984) Phase transitions and critical behavior in human bimanual coordination.American Journal of Physiology: Regulatory, Intergrative and Comparative, 246, R1000-R1004.

Kelso, J. A. S., Holt, K. G., Rubin, P., & Kugler, P. N. (1981) Patterns of human interlimbcoordination emerge from the properties of non-linear, limit cycle oscillatory processes:Theory and data. Journal of Motor Behavior, 13, 226-261.

Kelso, J. A. S., Southard, D. L., & Goodman, D. (1979) On the nature of human interlimbcoordination. Science, 203, 1029-1031.

Kim, S.-A. (1999) Positional effect on tonal alternation in Chichewa: Phonological rule vs. phonetictiming. In Proceedings of Annual Meeting of Chicago Linguistic Society, Chicago, 34, pp.245-257.

Kochanski, G. & Shih, C. (2003) Prosody modeling with soft templates. Speech Communication, 39,311–352.

Ladd, D. R. (1996) Intonational phonology. Cambridge: Cambridge University Press.

Ladd, D. R., D. Faulkner, H. Faulkner and A. Schepman (1999). "Constant "segmental anchoring" off0 movements under changes in speech rate," J. Acoust. Soc. Am. 106, 1543-1554.

Ladd, D. R., I. Mennen and A. Schepman (2000). "Phonological conditioning of peak alignment inrising pitch accents in Dutch," J. Acoust. Soc. Am. 107, 2685-2696.

Ladd, D. R. & Schepman, A. (2003) "Sagging transitions" between high pitch accents in English:experimental evidence. Journal of Phonetics, 31, 81–112.

Laniran, Y. (1992) Intonation in Tone Languages: The phonetic Implementation of Tones inYorùbá. Unpublished Ph.D. dissertation, Cornell University.

Laniran, Y. & Gerfen, C. (1997) High raising, downstep and downdrift in Igbo. In Proceedings of The71st Annual Meeting of the Linguistic Society of America, Chicago, pp. p. 59.

Leben, W. R. (1973) Suprasegmental Phonology. Unpublished Ph.D. dissertation, MassachusettsInstitute of Technology.

Lehiste, I. (1975) The phonetic structure of paragraphs. In Structure and process in speechperception (A. Cohen & S. E. G. Nooteboom, editors), pp. 195-206. Springer-Verlag: NewYork.

Li, Y. J. & Lee, T. (2002) Acoustical f0 analysis of continuous Cantonese speech. In Proceedings ofInternational Symposium on Chinese Spoken Language Processing 2002, Taipei, Taiwan,pp. 127-130.

Page 44: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

44

Liberman, M. & Pierrehumbert, J. (1984) Intonational invariance under changes in pitch range andlength. In Language Sound Structure (M. Aronoff & R. Oehrle, editors), pp. 157-233.Cambridge, Massachusetts: M.I.T. Press.

Lieberman, P. & Tseng, C. Y. (1980) On the fall of the declination theory: breath-group versus"declination" as the base form for intonation. Journal of the Acoustical Society of America,67, S63.

Lin, M.-C. (1965) Yingao xianshiqi yu Putonghua shengdiao yingao texing [The pitch indicator andthe pitch characteristics of tones in Standard Chinese]. Acta Acoutica Sinica, 2, 8-15.

Lin, M.-C. (1988) Putonghua shengdiao de shengxue texing he zhijue zhengzhao [The acousticcharacteristics and perceptual cues of tones in Standard Chinese]. Zhongguo Yuwen [ChineseLinguistics], 204, 182-193.

Lin, M.-C. & Yan, J. (1991) Tonal coarticulation patterns in quadrisyllabic words and phrases ofMandarin. In Proceedings of The 12th International Congress of Phonetic Sciences, 3, pp.242-245.

Liu, F. & Xu, Y. (in press) Underlying targets of initial glides -- Evidence from focus-related f0

alignments in English. In Proceedings of To appear in Proceedings of The 15th InternationalCongress of Phonetic Sciences, Barcelona.

Meeussen, A. E. (1970) Tone typologies for West African Languages. African Language Studies, 11,266-71.

Nishizawa, N., Sawashima, M. and Yonemoto, K. (1988). Vocal fold length in vocal pitch change.Vocal Physiology: Voice Production. O. Fujimura. Raven Press, Ltd., New York: 75-83.

Pierrehumbert, J. (1980) The Phonology and Phonetics of English Intonation. Ph.D. dissertation,Massachusetts Institute of Technology.

Pierrehumbert, J. (1981) Synthesizing intonation. Journal of the Acoustical Society of America, 70,985-995.

Pierrehumbert, J. (2000) Tonal elements and their alignment. In Prosody: Theory and Experiment(M. Horne, editor), pp. 11-36. London: Kluwer Academic Publishers.

Pierrehumbert, J. & Beckman, M. (1988) Japanese Tone Structure. Cambridge, MA: The MIT Press.

Pike, K. L. (1945) The Intonation of American English. Ann Arbor: University of Michigan Press.

Pike, K. L. (1948) Tone Languages. Ann Arbor: University of Michigan Press.

Poser, W. (1984) The phonetics and phonology of tone and intonation in Japanese. Ph.D.dissertation, MIT, Cambridge, MA.

Potisuk, S., Harper, M. P., & Gandour, J. (1999) The classification of Thai tone sequences insyllable-segmented speech using the analysis-by-synthesis method. IEEE Transactions onSpeech and Audio Processing, 7, 95-102.

Prieto, P., Santen, J. v., & Hirschberg, J. (1995) Tonal alignment patterns in Spanish, Journal ofPhonetics, 23, 429-451.

Prieto, P., Shih, C., & Nibert, H. (1996) Pitch downtrend in Spanish. Journal of Phonetics, 24, 445-473.

Rump, H. H. & Collier, R. (1996) Focus conditions and the prominence of pitch-accented syllables.Language and Speech, 39, 1-17.

Schmidt, R. C., Carello, C., & Turvey, M. T. (1990) Phase transitions and critical fluctuations in thevisual coordination of rhythmic movements between people. Journal of ExperimentalPsychology: Human Perception and Performance, 16, 227-247.

Page 45: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

45

Shih, C.-L. (1988) Tone and intonation in Mandarin, Working Papers, Cornell Phonetics Laboratory,No. 3, 83-109.

Silverman, K. E. A. & Pierrehumbert, J. B. (1990) The timing of prenuclear high accents in English.In Papers in Laboratory Phonology 1 — Between the Grammar and Physics of Speech (J.Kingston & M. E. Beckman, editors), pp. 72-106. Cambridge: Cambridge University Press.

Stevens, K. N. (2002) Toward a model for lexical access based on acoustic landmarks and distinctivefeatures. Journal of the Acoustical Society of America, 111, 1872-1891.

Stewart, J. M. (1965) The typology of the Twi tone system. Legon, Ghana: Institute of African Studies,University of Ghana.

Stewart, J. M. (1983) Key lowering (downstep/downglide) in Dschang, Journal of African Languagesand Linguistics, 3, 113-138.

't Hart, J., Collier, R., & Cohen, A. (1990) A perceptual Study of Intonation — An experimental-phonetic approach to speech melody. Cambridge: Cambridge University Press.

Taylor, P. A. (1994). A phonetic Model of Intonation in English (Indiana University Linguistics ClubPublications, Bloomington, Indian).

Umeda, N. (1982) “f0 declination” is situation dependent, Journal of Phonetics, 10, 279-290.

Wang, C., Yue, W., Hirose, K. and Fujisaki, H. (1994). A scheme for Chinese speech synthesis byrule based on pitch-synchronous multi-pulse excitation LP method. Proceedings ofInternational Conference on Spoken Language Processing, Yokohama. pp. 1679-1682.

Woo, N. (1969) Prosody and phonology. Ph.D. dissertation, Massachusetts Institute of Technology.

Wu, Z. (1982) Putonghua yuju zhong de shengdiao bianhua [Tonal variations in Mandarin sentences].Zhongguo Yuwen [Chinese Linguistics], 439-450.

Wu, Z. (1984) Putonghua sanzizu biandiao guilü [Rules of tone sandhi in trisyllabic words in StandardChinese]. Zhongguo Yuyan Xuebao [Bulletin of Chinese Linguistics], 2, 70-92.

Wu, Z. (1988) Tone-sandhi patterns of quadro-syllabic combinations in Standard Chinese. Report ofPhonetic Research, Institute of Linguistics (CASS), Beijing, China, PL-ARPR/1988, 1-13.

Wu, Z. (1990) Can poly-syllabic tone-sandhi patterns be the invariant units of intonation in spokenStandard Chinese? In Proceedings of ICSLP 90, pp. 12.10.1-4.

Xu, C. X. & Xu, Y. (2003) Recognizing focus in noise filled sentences. Journal of the AcousticalSociety of America, 113, Pt. 2, 2327.

Xu, C. X. & Xu, Y. (in press) Effects of Consonant Aspiration on Mandarin Tones. Journal of theInternational Phonetic Association.

Xu, Y. (1993) Contextual Tonal Variation in Mandarin Chinese, Ph.D. dissertation. The Universityof Connecticut.

Xu, Y. (1994) Production and perception of coarticulated tones. Journal of the Acoustical Society ofAmerica, 95, 2240-2253.

Xu, Y. (1997) Contextual tonal variations in Mandarin, Journal of Phonetics, 25, 61-83.

Xu, Y. (1998) Consistency of tone-syllable alignment across different syllable structures and speakingrates. Phonetica, 55, 179-203.

Xu, Y. (1999) Effects of tone and focus on the formation and alignment of f0 contours. Journal ofPhonetics, 27, 55-105.

Xu, Y. (2001a) Fundamental frequency peak delay in Mandarin. Phonetica, 58, 26-52.

Page 46: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

46

Xu, Y. (2001b) Sources of tonal variations in connected speech. Journal of Chinese Linguistics,monograph series #17, 1-31.

Xu, Y. (2002) Articulatory constraints and tonal alignment. In Proceedings of The 1st InternationalConference on Speech Prosody, Aix-en-Provence, France, pp. 91-100.

Xu, Y. & Liu, F. (2002) Segmentation of glides with tonal alignment as reference. In Proceedings of7th International Conference On Spoken Language Processing, Denver, Colorado, pp. 1093-1096.

Xu, Y. & Sun, X. (2002) Maximum speed of pitch change and how it may relate to speech. Journalof the Acoustical Society of America, 111, 1399-1413.

Xu, Y. & Wang, Q. E. (2001) Pitch targets and their realization: Evidence from Mandarin Chinese.Speech Communication, 33, 319-337.

Xu, Y., Xu, C. X., & Sun, X. (2003) Identifying intrinsic constituents of focus through "imitationvia restoration". Journal of the Acoustical Society of America, 113, Pt. 2, 2327.

Yip, M. (1990) The Tonal Phonology of Chinese. New York: Garland Publishing.

Yip, M. (2002) Tone. Cambridge: Cambridge University Press.

Page 47: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

47

Footnotes

1 Also the vocal fold length does not monotonically increase with frequency (Nishizawa, Sawashima, &

Yonemoto, 1988).

2 In its recent development, the Fujisaki model has incorporated negative accent commands that lower f0 rather than

raising it as do the positive commands (Potisuk, Harper & Gandour, 1999; Wang et al. 1994). The negative

commands are introduced to account for tones such as L (Low) and R (Rising) in Mandarin and Thai, in which f0

drops sometimes are too fast to be accounted for by elasticity. However, the mechanism of the automatic return to

the central level after the cessation of a negative command is even less clear than that of a return after a positive

command.

3 Here and throughout the paper, we are using the most conventional understanding of a syllable as its working

definition. That is, a syllable consists of all the segments that are commonly considered to belong to it, including

the onset and coda consonants and the vowel. Acoustically, certain apparent acoustic landmarks (Stevens, 2002)

such as the onset of stop and nasal closure, are treated as marking the syllable boundaries. We do not claim or even

believe this to be the ultimate definition of the syllable, as some of our own studies are already suggesting

otherwise (Xu & Liu, 2002; Liu & Xu, in press). For the purpose of the present paper, nevertheless, we found the

conventional definition of the syllable to be convenient for our discussion.

4 It is often claimed that a sentence bears a broad focus even when there is no special emphasis on any part of a

sentence (See Ladd, 1996 for detailed arguments.)

5 Figure 6 also shows that the neutral tone after L tone seems to behave somewhat differently as when following

other tones. The L tone seems to have a power to raise the f0 of the following neutral tone, which is maximally

manifested in the second neutral-tone syllable. This raising effect of the L tone seems to be independent of the

overall behavior of the neutral tone.

6 The "–7" is because unfocused “Lee. . . niece” is used to contrast both with focused “Lee” and focused “niece,” as

shown below.

Page 48: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

48

Tables

Table 1. Specific f0 events that can help to answer the above questions.

a) What are the pitch targets associated with local prominences in a declarative sentence: static

[high], or dynamic [rise] or [fall]?

f0 event [high] [rise] [fall]

Location ofmaximum

around the end of accentedsyllable; but the exact locationvaries: before syllable offset whenfollowed by a stressed syllable, butafter syllable offset when followedby an unstressed syllable

always immediately afteraccented syllable

always well beforeoffset of accentedsyllable

Location ofinitialminimum

consistently around onset ofaccented syllable

well after onset ofaccented syllable

consistently aroundonset of accentedsyllable

b) Is focus realized with pitch specification only for the accented/stressed syllable or with pitch

specifications both for accented/stress syllable and for post-focus syllables?

f0 event Specification for focused syllableonly

Specification for both focused and post-focussyllables

Pitch range expanded for focused syllable;unchanged for post-focus syllables

expanded for focused syllable; lowered andnarrowed for post-focus syllables

c) Do post-focus syllables have no pitch targets of their own and are implemented with only a flat

low f0 contour, or do they still have their own pitch targets, which are implemented with reduced

pitch range?

f0 event No target Targets implemented with reduced pitch rangeContour totally flat similar to those on and before focus but much reduced in pitch range

d) Does focus change the pitch target of a stressed syllables?

f0 event Same pitch target Changed pitch targetContour similar f0 alignment different f0 alignment

e) Do syllables between pitch accents carry their own pitch targets, or is f0 only interpolated through

these syllables?

f0 event No target, only interpolation With their own pitch targets, possibly [mid]Height of f0

max, min, meanfully dependent on bothpreceding and following accents

only partially dependent on preceding accents;little dependence on following accent

Contour shape straight or curved interpolationbetween preceding and followingaccents

becomes lower and less dependent onpreceding accent over time; the lowest pointin the last unstressed syllable becomes thestarting point of f0 of contour of thefollowing accented syllable

Page 49: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

49

Table 2. Mean values of various measurements in the accented syllable under theeffects of focus, rate, accent location and position, together with probabilityvalues from four-factor repeated-measures ANOVAs. p values smaller than 0.05are printed in boldface.

Focus Rate Accent location Position

yes no normal fast final non-final word1 word3 word5

Maxf0 (st) 11.0 8.2 9.6 9.7 9.6 9.7 10.9 9.9 8.1

p =0.0086 p =0.3839 p =0.1455 p <0.0001

Minf0 (st) 6.6 6.8 6.5 6.9 6.5 6.9 7.7 6.8 5.6

p =0.2845 p =0.2845 p =0.0001 p <0.0001

Rise size (st) 4.4 1.4 3.0 2.7 3.0 2.8 3.2 3.0 2.4

p =0.0092 p =0.0035 p =0.0594 p =0.1217

Rise speed (st/s) 23.4 9.5 15.6 17.4 17.8 15.1 18.2 16.6 14.5

p =0.0041 p =0.0511 p <0.0001 p =0.0356

Accent-dur (ms) 222.6 195.4 232.2 185.8 242.3 175.7 188.5 208.0 230.5

p =0.0001 p <0.0001 p <0.0001 p <0.0001

Page 50: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

50

Table 3. Mean values of various measurements in the post- and pre-accent syllablesunder the effects of focus, adjacency and position, together with probabilityvalues from three-factor repeated-measures ANOVAs. p values smaller than0.05 are printed in boldface.

Focus Adjacency Position

yes no close far word2 word3 word4 word5

Maxf0-post-word1 (st) 6.2 7.4 7.1 6.6 8.9 6.5 5.9 6.0

p =0.0016 p =0.0002 p <0.0001

Maxf0-post-word3 (st) 6.5 6.8 7.3 6.0 7.2 6.1

p =0.0205 p =0.0028 p =0.0309

Focus Position

yes no word1 word2 word3 word4

Maxf0-pre-word3 (st) 9.3 9.9 9.9 9.3

p =0.1256 p <0.0042

Maxf0-pre-word5 (st) 8.6 8.6 10.0 9.5 7.8 7.2

p =0.8775 p <0.0001

Page 51: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

51

Table 4. Potential pitch targets and their associated alignment patterns.

Pitch targetf0 event [high] [rise] [fall] [low]f0 maximum after syllable

offsetafter syllableoffset

around middle ofsyllable

around syllableonset

f0 minimum around syllableonset

around middle ofsyllable

around syllableonset

around syllableoffset

Page 52: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

52

Table 5. Mean values of maxf0-to-C2 and peak location (= 100 ¥ C1-to-max / accent-dur)under the effects of focus, rate, accent location (upper half), accent length(lower half) and position, together with probability values from four-factorrepeated-measures ANOVAs. p values smaller than 0.05 are in boldface.

Focus Rate Accent location Position

yes no normal fast Final non-final word1 word5

Maxf0-to-C2 (ms) 52.6 42.0 50.5 44.2 71.9 22.7 8.1 86.6

p =0.1033 p =0.0902 p <0.0001 p =0.0004

Peak location (%) 78.5 81.2 80.9 78.8 68.9 90.8 97.2 62.5

p =0.4467 p <0.2118 p <0.0001 p =0.0011

Focus Rate Accent length Position

yes no normal fast long short word1 word5

Maxf0-to-C2 (ms) 12.5 -16.4 -5.7 1.8 30.2 -34.2 -52.8 48.9

p =0.0036 p =0.0302 p <0.0001 p <0.0001

Peak location (%) 83.6 124.1 107.9 99.9 87.6 120.1 130.6 77.1

p =0.0051 p =0.0431 p =0.0165 p =0.0136

Page 53: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

53

Table 6. Mean values of C1-to-minf0 and Valley location (100 x C1-to-minf0 / accent-dur) under the effects of focus, rate, accent location, and position, together withprobability values from four-factor repeated-measures ANOVAs. p valuessmaller than 0.05 are printed in boldface.

Focus RateAccentlocation Position

yes no normal fast final non-final word 1 word 3 word 5

C1-to-minf0 (ms) 3.9 16.9 12.2 8.6 19.2 1.6 14.3 7.9 8.9

p =0.1977 p =0.0416 p <0.3204 p=0.6520

Valley location (%) 1.2 6.7 4.2 3.7 7.0 0.9 5.8 3.4 2.7

p =0.1879 p <0.6842 p <0.0365 p =0.6342

Page 54: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

54

Figure Captions

Figure 1. In (a)-(c) Mandarin F, H and R tones (in syllable 3) are preceded by four different tones and

followed by H tone. In (d) the R tone in syllable 3 is followed by L tone. Vertical lines indicate

syllable boundaries. Adapted from Xu (1999).

Figure 2. Illustration of the pitch target implementation model. The vertical lines represent syllable

boundaries. The dashed lines represent underlying pitch targets. The thick curve represents the f0

contour that results from asymptotic approximation of the pitch targets. Adapted from Xu & Wang

(2001).

Figure 3. (a) Interaction of tone and focus in Mandarin in H H H H H (left) and H L H L H (right)

sequences. The locations of focus are indicated by the labels around the curves. (b) Suppression of

post-focus tones when syllable 2 carries four different tones. In all cases, the focus is on the first two

syllables. Adapted from Xu (1999).

Figure 4. f0 down trend introduced by downstep (a), and by both initial focus and downstep (b).

Adapted from Xu (1999). The thin curves are f0 tracings of the tone sequence H H H H H, whereas

the thick curves are those of H L H L H. The f0 interruptions at the beginning of syllable 5 in the H

L H L H sequences are due to a voiceless stop [t].

Figure 5. Mean f0 contours of Mandarin sentences containing 0 or 3 neutral tone (N) syllables. In

both graphs, the tone of syllable 1 alternates across H, R, L and F. In (a) the tone following syllable 1

is F. In (b), there are 3 neutral tones syllables following syllable 1. Vertical lines in the graphs indicate

syllable boundaries. Data from Chen and Xu (2002).

Figure 6. Time-normalized f0 curves of seven repetitions of “Nina may know my niece” said by eight

subjects at “normal” rate with no narrow focus.

Figure 7. Mean f0 contours of all sentences produced at normal rate by 7 subjects. In each graph, the

ordinate is the mean f0 in Hz averaged over 49 repetitions by 7 subjects, and the abscissa is time in

ms. The duration of each syllable in a f0 curve is the grand average of 49 repetitions by 7 subjects.

The thicker curves have narrow focus on one of the words as indicated by the underscore in the

sentence printed in each graph. The open squares and circles indicate syllable boundaries, located at

the first vocal pulse of the initial consonants. In the sentences containing the words “mimic” and

minimize,” the gaps in f0 curves correspond to the closure or frication of the final consonants.

Figure 8. Post-focus maximum f0 broken down by adjacency to preceding focus and position in

sentence when focus is on word 1 (left) and word 3 (right).

Page 55: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

55

Figure 9. Mean f0 in semitone at different locations in the post-accent syllable broken down by focus

and post-accent stress.

Figure 10. Percentage of discernable post-focus f0 peaks (left) and size of the peaks (right) in the

post-focus stressed syllables. A peak is discernable if there is an f0 point between the onset and offset

of the words “know” and “niece” that is higher than both the starting and ending f0 of the word.

Figure 11. Mean values of maxf0-to-C2 (top) and peak location (bottom) broken down by focus,

position, accent location (left) and accent length (right).

Figure 12. Mean duration of the accented syllable in word 1 and word 5 according to focus and accent

location (left), and focus and accent length (right).

Figure 13. Results of regression analyses with accent duration as predictor and maxf0-to-C2 as

dependent variable. Upper panels: r2; Lower panels: slope of the regression line. Left: results broken

down by focus and accent location. Right: results broken down by focus and length of accented vowel.

Figure 14. Results of regression analyses on word 3 with accent duration as predictor and maxf0-to-C2

as dependent variable.

Figure 15. Results of regression analyses on f0 height at different locations in the unaccented syllable

immediately after accented syllables.

Figure 16. Results of regression analyses on f0 height at different locations in the unaccented syllable

immediately preceding accented syllables.

Figure 17. A brief sketch of a dual-module model of intonation. The model assumes a major division

between communicative functions that convey meanings through intonation and articulatory

mechanisms that implement these functions. Intonation related communicative functions manifest

themselves in parallel (column 1 from left) by separately specifying (a) local pitch targets, (b) pitch

ranges and (c) articulatory effort (column 2). Taking these specifications as input, the articulatory

module (column 3) applies physical forces to successively approach local targets at the specified

pitch ranges with the specified amounts of effort. The timing control in the articulatory module

synchronizes the local targets with the associated syllables. The resulting f0 thus continually approach

successive local targets within different pitch ranges at varying speeds (column 4). See Figure 2 for

what the surface local f0 contours generated by this model may look like.

Page 56: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

56

Figure 1.

(a) (b)

60

80

100

120

140

160

0 17 34 51 68 85

H HHH

H

L

R

F

60

80

100

120

140

160

0 17 34 51 68 85

H HHR

H

L

R

F

(c) (d)

60

80

100

120

140

160

0 17 34 51 68 85

H HHF

H

L

R

F

60

80

100

120

140

160

0 17 34 51 68 85

H HLR

H

L

R

F

Page 57: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

57

Figure 2.

Page 58: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

58

Figure 3.

(a)

60

80

100

120

140

160

0 17 34 51 68 85

H HHH

Word 2

H

Word1

None

Word 3

60

80

100

120

140

160

0 17 34 51 68 85

H HLH

Word 2

L

Word1

None

Word 3

(b)

60

80

100

120

140

160

0 17 34 51 68 85

H HH

H

R

F

H

60

80

100

120

140

160

0 17 34 51 68 85

H HH

H

R

F

R

60

80

100

120

140

160

0 17 34 51 68 85

H HH

H

R

F

L

60

80

100

120

140

160

0 17 34 51 68 85

H HH

H

R FF

Page 59: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

59

Figure 4.

(a) (b)

60

85

110

135

160

0 10 20 30 40 50 60 70 80

No Narrow Focus

H HHH/L H/L

60

85

110

135

160

0 10 20 30 40 50 60 70 80

Focus

H HHH/L H/L

Page 60: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

60

Figure 5.

(a)

50

100

150

200

250

0 100 200 300 400 500 600 700 800

H

L

R

F F

(b)

50

100

150

200

250

0 100 200 300 400 500 600 700 800

Time (ms)

H

L

R

F

F

NN

N

Page 61: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

61

Figure 6.

S2

0

100

200

300

400

Ni may know my niecena

S2

0

100

200

300

400

S2

0

100

200

300

400

S2

0

100

200

300

400

S2

0

50

100

150

200

Ni may know my niecena

S2

0

50

100

150

200

S2

0

50

100

150

200

S2

0

50

100

150

200

Normalized time

Page 62: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

62

Figure 7.

100

150

200

0 200 400 600 800 1000 1200

Mean time (ms)

Lee may know my mummy

100

150

200

0 200 400 600 800 1000 1200

Mean time (ms)

Ramona may know my niece

100

150

200Lee may know my niece

100

150

200Emily may know my niece

100

150

200Lamar may know my niece

100

150

200Nina may know my niece

100

150

200Lee may Lure my niece

100

150

200Lee may know my nanny

100

150

200Lee may minimize my niece

100

150

200Lee may mimic my niece

Page 63: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

63

Figure 8.

Focus:

9.1 8.87.4 7.2 6.9 6.8 6.7 6.7

10.7

7.25.9 5.5 4.9 5.0 5.2 5.2

0

2

4

6

8

10

12

14

close far close far close far close far

Post

-focu

s m

axf0

(st)

no yes

Adjacency:word2 word5word4word3Position:

Focus:

7.3 6.6 6.8 6.69.4

5.6 5.8 5.3

0

2

4

6

8

10

12

14

close far close far

Post

-focu

s m

axf0

(st)

no yes

Position:Adjacency: word5word4

Page 64: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

64

Figure 9.

4

6

8

10

12

0 25 50 75 100

Location in post accent syllable (%)

F0 (

st)

strong_no-focus weak_no-focus

strong_post-focus weak_post-focus

Page 65: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

65

Figure 10.

Focus:

83.4 89.271.6 59.7

0

20

40

60

80

100

word3 word5

Position

No

. of

pea

ks (

%)

no yes

Focus:

0.37

1.21

0.260.77

0

0.5

1

1.5

word3 word5

PositionP

eak

rise

siz

e (s

t)

no yes

Page 66: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

66

Figure 11.

Accent locaton:

-21

11

724524

54

121 106

-200

-100

0

100

no yes no yes

Maxf

0-t

o-C

2 (

ms)

non-final word final

Focus:word1 word5Position:

Accent length:

-121

-47

32 17

-21

11

72 75

-200

-100

0

100

no yes no yes

Max

f0-t

o-C

2 (

ms)

short long

Focus:word1 word5Position:

Accent locaton:

110 95 67 8287 79 45 560

100

200

300

no yes no yes

Pea

k lo

cation (

%)

non-final word final

Focus:word1 word5Position:

Accent length:

292

16880 91110 95 67 69

0

100

200

300

no yes no yes

Pea

k lo

cation (

%)

short long

Focus:word1 word5Position:

Page 67: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

67

Figure 12.

Accent locaton:

182 204 227 252207 244 241 259

0

100

200

300

no yes no yes

Acc

ent

dura

tion (

ms)

non-final word final

Focus:word1 word5Position:

Accent length:

64 73

177 191182 204 227 256

0

100

200

300

no yes no yes

Acc

ent

dura

tion

(m

s)

short long

Focus:word1 word5Position:

Page 68: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

68

Figure 13.

maxf0-to-C2 regressed over accent-dur

0

0.2

0.4

0.6

0.8

r2

word 1 0.025 0.018 0.023 0.356

word 5 0.046 0.006 0.455 0.381

no focus on-focus no focus on-focus

non-final word-final

maxf0-to-C2 regressed over accent-dur

0

0.2

0.4

0.6

0.8

r2

word 1 0.074 0.161 0.025 0.018

word 5 0.027 0.069 0.046 0.006

no focus on-focus no focus on-focus

short vowel long vowel

maxf0-to-C2 regressed over accent-dur

-0.2

0

0.2

0.4

0.6

0.8

1

Slo

pe

of

regre

ssio

n lin

e

word 1 -0.202 0.11 0.11 0.351

word 5 0.431 0.107 0.587 0.364

no focus on-focus no focus on-focus

non-final word-final

maxf0-to-C2 regressed over accent-dur

-0.6

-0.2

0.2

0.6

1

slope

of

regre

ssio

n lin

e

word 1 -0.1386 0.677 -0.202 0.11

word 5 -0.428 0.3 -0.431 0.107

no focus on-focus no focus on-focus

short vowel long vowel

Page 69: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

69

Figure 14.

maxf0-to-C2 regressed over accent-dur in word 3

0.005 0.011 0.002

0.477

0

0.2

0.4

0.6

0.8

no focus on-focus no focus on-focus

non-final word-final

r2

maxf0-to-C2 regressed over accent-dur in word 3

-0.161

0.108

-0.046

0.413

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

no focus on-focus no focus on-focus

non-final word-finalSlo

pe

of re

gre

ssio

n lin

e

Page 70: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Yi Xu & Ching. X. Xu

70

Figure 15.

post-pitch regressed over rise-size in word 1

0

0.2

0.4

0.6

0.8

1

r2

50 ms 0.684 0.734 0.473 0.838

100 ms 0.33 0.593 0.206 0.601

150 ms 0.122 0.529 0.039 0.308

200 ms 0.064 0.445 0.013 0.094

stressed unstressed stressed unstressed

no-focus post-focus

post-pitch regressed over rise-size in word 1

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Slo

pe

of

regre

ssio

n lin

e

50 ms 0.847 0.987 0.568 1.12

100 ms 0.489 0.856 0.282 0.618

150 ms 0.369 0.259 0.746 0.098

200 ms 0.193 0.558 0.049 0.163

stressed unstressed stressed unstressed

no-focus post-focus

post-pitch regressed over rise-size in word 3

0

0.2

0.4

0.6

0.8

1

r2

50 ms 0.186 0.622 0.561 0.912

100 ms 0.053 0.223 0.347 0.622

150 ms 0.022 0.059 0.175 0.29

200 ms 0.005 0.018 0.05 0.203

stressed unstressed stressed unstressed

no-focus post-focus

post-pitch regressed over rise-size in word 3

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Slo

pe

of

regre

ssio

n lin

e

50 ms 0.557 0.834 0.555 1.067

100 ms 0.295 0.658 0.257 0.936

150 ms 0.215 0.397 0.162 0.446

200 ms 0.12 0.24 0.099 0.345

stressed unstressed stressed unstressed

no-focus post-focus

post-pitch regressed over rise-size in word 5

0

0.2

0.4

0.6

0.8

1

r2

50 ms 0.256 0.342

100 ms 0.055 0.106

150 ms 0.12 0.143

200 ms 0.22 0.001

no-focus post-focus

post-pitch regressed over rise-size in word 5

-0.6

-0.4

-0.2

0

0.20.4

0.60.8

1

1.2

Slo

pe

of

regre

ssio

n lin

e

50 ms 0.556 0.549

100 ms -0.413 -0.225

150 ms -0.512 -0.5

200 ms -0.453 -0.036

no-focus post-focus

Page 71: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

71

Figure 16.

pre-pitch regressed over maxf0 in word 1

0.0

0.2

0.4

0.6

0.8

r2

50ms 0.013 0.03 0.044 0.013

100ms 0.038 0.034 0.022 0.081

start 0.015 0.038 0.00900 0.002

stressed unstressed stressed unstressed

no-focus pre-focus

pre-pitch regressed over maxf0 in word 1

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Slo

pe

of re

gre

ssio

n lin

e

50ms -0.03 -0.044 -0.243 -0.049

100ms 0.361 0.295 0.052 0.135

start 0.295 0.367 0.061 0.044

stressed unstressed stressed unstressed

no-focus pre-focus

pre-pitch regressed over maxf0 in word 3

0.0

0.2

0.4

0.6

0.8

r2

50ms 0.0005 0.035 0.124 0.04

100ms 0.042 0.02 0.143 0.032

start 0.348 0.285 0.00002 0.001

stressed unstressed stressed unstressed

no-focus pre-focus

pre-pitch regressed over maxf0 in word 3

-0.2

0

0.2

0.4

0.6

0.8

1

Slo

pe

of re

gre

ssio

n lin

e

50ms -0.012 -0.113 -0.046 -0.033

100ms 0.167 0.138 -0.089 -0.057

start 0.952 0.899 -0.002 0.014

stressed unstressed stressed unstressed

no-focus pre-focus

pre-pitch regressed over maxf0 in word 5

0.0

0.2

0.4

0.6

0.8

r2

50ms 0.003 0.165 0.037 0.00001

100ms 0.03 0.002 0.018 0.087

start 0.086 0.065 0.040 0.005

stressed unstressed stressed unstressed

no-focus pre-focus

pre-pitch regressed over maxf0 in word 5

-0.2

0

0.2

0.4

0.6

0.8

1

Slo

pe

of re

gre

ssio

n lin

e

50ms -0.006 -0.048 -0.06 0.003

100ms -0.088 0.021 -0.028 -0.062

start 0.244 0.186 0.074 -0.023

stressed unstressed stressed unstressed

no-focus pre-focus

Page 72: Intonation Components in short English Statements · Intonation Components in English 5 findings seem to have made such understanding easier than before. These include the findings

Intonation Components in English

Figure 17.