20
Haskins Laboratories Status Report on Speech Research 1990, SR-1031104, 1-20 The Role of Contrast in Limiting Vowel-to-vowel Coarticulation in Different Languages* Sharon Y. Manuel t Languages differ in their inventories of distinctive sounds and in their systems of contrast. Here we propose that this observation may have predictive value with respect to how extensively various phones are coarticulated in particular languages. This hypothesis is based on three assumptions: (1) there are "output constraints" on just how a given phone can be articulated; (2) output constraints are, at least in part, affected by language- particular systems of phonetic contrast; and (3) coarticulation is limited in a way that respects those output constraints. Together, these assumptions lead to the expectation that, in general, languages will tend to tolerate less coarticulation just where extensive coarticulation would lead to confusion of contrastive phones. This prediction was tested by comparing acoustic measures of anticipatory vowel-to-vowel coarticulation in languages which differ in how they divide up the vowel space into contrastive units. The acoustic measures were the first and second formant frequencies, measured in the middle and at the end of the target vowels 10/ and lei, followed by IpV/, where IVI was li,e,a,o,uf. Two languages (Ndebele and Shona) with the phonemic vowels /i,e,a,o,uf were found to have greater anticipatory coarticulation for the target vowel 10/ than does a language (Sotho) that has a more crowded mid· and low·vowel space, with the phonemic vowels li,e,e,a,::l,o,uf. The data were based on recordings from three speakers of each of the languages. INTRODUCTION Languages differ in their inventories of distinctive sounds and in their systems of contrast. The premise of this paper is that this observation may have predictive value with respect to how extensively certain phones are coarticulated in particular languages. We provide an empirical test of our theoretical perspective by comparing vowel-to-vowel coarticulation in languages that differ in how they divide the vowel space into contrastive units. I would like to thank the faculty and students of the University of Zimbabwe for their generous help in the data collection. I am grateful to Suzanne Boyce, Harvey Gilbert, Marie Huffman, Rena Krakow, Harriet Magen, and Kenneth Stevens and three anonymous reviewers for valuable comments on an earlier draft of this paper, and to Louis Goldstein, Alvin Liberman, and Ignatius Mattingly for advice on niy doctoral dissertation, on which this paper is based. I would also like to thank Tracy Sheppard, Linnea Bankey, and Yvonne Manning.Jones for help with manuscript preparation. This research was supported principally by NllI Grant HD. 01994 to Haskins Laboratories and more recently by a NllI Postdoctoral Training Grant (DC.Q0005) to MIT. 1 Words are distinguished from one another by their phone composition, and since articulatory gestures (which ones, when they occur) and their acoustic consequences distinguish phones, we might expect speakers to exercise some effort to ensure that the acoustic consequence'S of articulatory gestures remain distinct (see, for example, Lindblom, 1983; Lindblom & Engstrand, 1989; Martinet, 1952, 1957; Stevens, 1989). That is, there may be output constraints (tolerances) on phones, definable in terms of articulatory or acoustic dimensions, that limit how much their articulatory gestures are allowed to stray from an ideally distinctive pattern. For example, a speaker who intends to say the word tick must make a tight tongue tip-alveolar closure-too weak a constriction will produce something that might be heard as sick or thick. In speech, the articulatory requirements ofone phone are often anticipated during the production of a preceding phone. This phenomenon, known as anticipatory coarticulation, results in contextually induced variability in the portion of the signal

The Role ofContrast in Limiting Vowel-to-vowel ...vowel coarticulation in English, a language with a relatively crowded vowel space (13 to 15 vowels, depending on dialect), and two

  • Upload
    others

  • View
    23

  • Download
    0

Embed Size (px)

Citation preview

Haskins Laboratories Status Report on Speech Research1990, SR-1031104, 1-20

The Role of Contrast in Limiting Vowel-to-vowelCoarticulation in Different Languages*

Sharon Y. Manuelt

Languages differ in their inventories of distinctive sounds and in their systems of contrast.Here we propose that this observation may have predictive value with respect to howextensively various phones are coarticulated in particular languages. This hypothesis isbased on three assumptions: (1) there are "output constraints" on just how a given phonecan be articulated; (2) output constraints are, at least in part, affected by language­particular systems of phonetic contrast; and (3) coarticulation is limited in a way thatrespects those output constraints. Together, these assumptions lead to the expectationthat, in general, languages will tend to tolerate less coarticulation just where extensivecoarticulation would lead to confusion of contrastive phones. This prediction was tested bycomparing acoustic measures of anticipatory vowel-to-vowel coarticulation in languageswhich differ in how they divide up the vowel space into contrastive units. The acousticmeasures were the first and second formant frequencies, measured in the middle and atthe end of the target vowels 10/ and lei, followed by IpV/, where IVI was li,e,a,o,uf. Twolanguages (Ndebele and Shona) with the phonemic vowels /i,e,a,o,uf were found to havegreater anticipatory coarticulation for the target vowel 10/ than does a language (Sotho)that has a more crowded mid· and low·vowel space, with the phonemic vowelsli,e,e,a,::l,o,uf. The data were based on recordings from three speakers of each of thelanguages.

INTRODUCTIONLanguages differ in their inventories of

distinctive sounds and in their systems ofcontrast. The premise of this paper is that thisobservation may have predictive value withrespect to how extensively certain phones arecoarticulated in particular languages. We providean empirical test of our theoretical perspective bycomparing vowel-to-vowel coarticulation inlanguages that differ in how they divide the vowelspace into contrastive units.

I would like to thank the faculty and students of theUniversity of Zimbabwe for their generous help in the datacollection. I am grateful to Suzanne Boyce, Harvey Gilbert,Marie Huffman, Rena Krakow, Harriet Magen, and KennethStevens and three anonymous reviewers for valuablecomments on an earlier draft of this paper, and to LouisGoldstein, Alvin Liberman, and Ignatius Mattingly for adviceon niy doctoral dissertation, on which this paper is based. Iwould also like to thank Tracy Sheppard, Linnea Bankey, andYvonne Manning.Jones for help with manuscript preparation.This research was supported principally by NllI Grant HD.01994 to Haskins Laboratories and more recently by a NllIPostdoctoral Training Grant (DC.Q0005) to MIT.

1

Words are distinguished from one another bytheir phone composition, and since articulatorygestures (which ones, when they occur) and theiracoustic consequences distinguish phones, wemight expect speakers to exercise some effort toensure that the acoustic consequence'S ofarticulatory gestures remain distinct (see, forexample, Lindblom, 1983; Lindblom & Engstrand,1989; Martinet, 1952, 1957; Stevens, 1989). Thatis, there may be output constraints (tolerances) onphones, definable in terms of articulatory oracoustic dimensions, that limit how much theirarticulatory gestures are allowed to stray from anideally distinctive pattern. For example, a speakerwho intends to say the word tick must make atight tongue tip-alveolar closure-too weak aconstriction will produce something that might beheard as sick or thick.

In speech, the articulatory requirements of onephone are often anticipated during the productionof a preceding phone. This phenomenon, known asanticipatory coarticulation, results in contextuallyinduced variability in the portion of the signal

2 Manuel

that we conventionally associate with a givenphone. An example is the difference in how /t) isproduced in tea and tree. In the word tree thetongue may make a relatively more posteriorcontact with the roof of the mouth and may itselfbe retroflexed, in anticipation of the followingretroflex consonant /r/. Similarly, in a vowel-nasalsequence such as in pan, the velum typicallybegins (and may complete) its lowering movement,associated with the nasal /nI, while the vocal tractis still open for the vowel /re! and well before theoral occlusion for the /nI is achieved.

Since coarticulation affects the very primitivesof contrast between phones, Le., articulatorygestures and their acoustic consequences, extremecoarticulation would possibly put speakers at riskofblurring or even obliterating phonetic contrasts.We might expect, then, that speakers generallylimit coarticulation such that it does not destroythe distinctive attributes of gestures. That is,coarticulation might be limited so that the outputconstraints on distinctive gestures (i.e., gesturesthat have consequence for distinctiveness­analogous to distinctive features) are not violated(see also Engstrand, 1988; Manuel & Krakow,1984; Manuel, 1987a, 1987b; Martinet, 1952,1957; Schouten & Pols, 1979; Tatham, 1984;Keating, 1990).

But what counts as a distinctive gesture?Clearly, what counts varies from phone to phoneand from language to language. For example, ifthe oral tract is completely closed, it matters quitea bit whether or not the velar port is appreciablyopen, since this is the major articulatory differ­ence (and is responsible for the major acousticdifference) between voiced oral and nasal stops(e.g., /dl vs. /n/). On the other hand, vowels inEnglish are not distinguished by virtue of the factthat they are nasalized or not (though of coursethey normally are relatively nonnasal, theexceptions being when they occur in a sequencewith nasal consonants1). It is therefore notsurprising that English tolerates the presence of asubstantial velar lowering gesture during vowelsin nasal contexts.

With respect to language-particular aspects ofdistinctiveness, we note that different languagesessentially take a universally availablearticulatory/acoustic space and divide it updifferently. For example, Swedish distinguishes[+round] from [-round] front vowels, whereasEnglish does not. Similarly, vowels in French aredistinctively [+nasa1] or [-nasa1], but as notedabove, English vowels are nasalized only in thecontext of a nasal consonant.

In some languages, very similar articulatorygestures may result in linguistically distinctphones. In order to maintain distinctivenessbetween those gestures, each must have a fairlynarrow constraint on its production. On the otherhand, if a language makes only a single phone in aparticular region of some phonetic dimension, wemight expect a fair amount of variability in pro­duction to be tolerated (see Keating and Huffman,1984, for related discussion). For example,Malayalam makes two distinctive voicelesscoronal stops-alveolar and dental, whereasEnglish only makes one distinctive voiceless stopin that area of the vocal tract. English speakershave been shown to exhibit more variability inproduction of English voiceless coronal stops thando Malayalam speakers for Malayalam alveolar ordental stops (Jongman, Blumstein, & Lahiri,1985). This observation suggests that the outputconstraints on gestures associated with particularphones, and consequently the degree to whichthose phones are susceptible to coarticulatory in­fluence of neighboring phones, can be predicted toa large extent by examining the way a languageuses a particular phonetic space.2

The results of several previous studies can beindirectly or directly related to the role thatcontrast plays in constraining coarticulatorybehavior in different languages (e.g., Chasaide,1979; Lubker & Gay, 1982; Magen, 1984; Manuel& Krakow, 1984; Ohman, 1966). Ohman's earlystudy of vowel-to-vowel coarticulation showed thatfor English and Swedish V1CV2 utterances, thearticulators begin assuming configurations neededfor V2 before the occlusion is reached for themedial consonant. However, similar effects weregenerally not found for Russian. As Ohman pointsout, Russian contrasts palatalized and unpalatal­ized consonants, and it is the position of thetongue body that distinguishes these two sets.Presumably in Russian the tongue body is not freeto begin assuming the configuration for V2 duringthe V1 to C transition because the consonant itselfmakes (possibly conflicting) requirements on thetongue body.

Manuel and Krakow (1984) compared vowel-to­vowel coarticulation in English, a language with arelatively crowded vowel space (13 to 15 vowels,depending on dialect), and two Bantu languages(Shona and Swahili) for which the vowels are wellspread-out (the phonemic vowels of Shona andSwahili are /i,e,a,o,uI). The prediction in Manueland Krakow was that since the vowels of Englishare closer together in the articulatory/acousticspace, the range of productions for each English

The Role of Contrast in Limiting Vowel-ta-vowel Coarticulation in Different Languages 3

vowel should be relatively small, while in contrast,since the vowels of Shona and Swahili are morespread apart, they could tolerate larger ranges ofproductions without danger of neutralizingphonemic differences. The main result of theexperiment was as predicted. Vowel-to-vowelcoarticulation (as reflected by values of the firstand second formant frequencies measured in themiddle of the vowels) was in fact stronger inShona and Swahili than in English.

The results of the Manuel and Krakow study areconsistent with the idea that proximity of con­trastive phonetic units affects coarticulation.However, because the study was based on a singlespeaker of each language, we cannot concludewith certainty that Manuel and Krakow's resultsreflected language differences rather than simplyspeaker-to-speaker differences, given that speak­ers of a language may vary somewhat in amountand type of coarticulatory patterns (Lubker &Gay, 1982; Nolan, 1985) In addition, English vow-

. els are often diphthongized, whereas Shona andSwahili vowels tend to be monophthongal. It ispossible that the restricted degree of coarticula­tion in English was not due to constraints on thequality of vowel nuclei, but was instead due todemands of the diphthongal second part of theEnglish vowels. Despite these confounding

problems, the major finding was predicted by theassumption that distribution of vowels affectstheir output constraints, which in turn affect theamount of vowel-to-vowel coarticulation. We areunaware of any other hypothesis that has suchpredictive power for these results.

The current experiment is an expansion of thebasic paradigm used by Manuel and Krakow. Thethree languages analyzed are all in the samefamily (Southern Bantu), and they arephonologically, morphologically, and syntacticallysimilar to each other. Several speakers of eachlanguage were recorded, to allow comparison ofcoamculation both between speakers of the samelanguage, and across languages.

We included two languages (Shona and Ndebele)with the phonemic vowels li,e,a,o,uJ and onelanguage (Sotho) that has a more crowded vowelinventory li,e,e,a,::>,o,uJ. It should be kept in mindthat what is at issue here is not the number ofvowels per se, but their distribution. As can beseen in Figure 1, the Sotho vowel space is morecrowded in the low- and mid-vowel region than arethe Shona and Ndebele vowel spaces. As amnemonic for which languages have relativelyless crowded (LC) and more crowded (MC) vowelspacing, we will use the shorthand Ndebele(LC),Shona(LC) and Sotho(MC).

F2. kHzIn

2.0 1.5 1.0 .5 2.0 1.5 1.0 .5I I I I I I I I

- - I- - 200u i u

e e 0I- 0 - -- - 400

FI in Hza - '- - 600

aI- NDEBELE - - SHONA - 800, f f I I , , I

Figure 1. Examples of phonemic vowels of Ndebele, Shona and Sotho. Data are from one speaker of each of thelanguages.

4 Manuel

In Sotho(MC), the vowels I£,~I intervene be­tween phonemic 101 and phonemic Ie ,01. If thisrelative crowding affects output constraints andlimits coarticulation in Sotho(MC), we would ex­pect to find smaller vowel-to-vowel coarticulatoryeffects Qn 101, lei, and 101 in Sotho(MC) than inShona(LC) and Ndebele(LC). Specifically, we ex­pect Sotho(MC) 101 to show less movement into theI£,~I space. For the vowel leI, we expect lessmovement into the lei space, and for 10/, lessmovement toward the l-;i space. Here we examineanticipatory coarticulation for the target vowels 101and lei. The acoustic measures of coarticulationwere the first (F1) and second (F2) formantfrequencies.

MethodA. Further specifics of languages and

subjectsThe languages we studied are characterized by

open syllables, and the vowels are monophthongal(see Cole, 1955; Doke, 1954). Word stress is notdistinctive, but rather it is fixed on thepenultimate syllable. Tone is phonemic in allthree of the languages. Phonetically, the vowels Iiiand lui are quite high in these languages.Similarly, the mid vowels leI and 101 are alsophonetically rather high, with lei approaching thequality of [1.]. The low vowel 101 is more frontedthan its English counterpart. The Sotho(MC)vowel lei is phonetically slightly higher than thevowel in English bet and Sotho(MC) I~I is similarto the vowel in English caught (for dialects whichdistinguish the vowels in caught and cot). Ofpossible relevance to the present study is theclaim that Sotho(MC) has a phonological rulewhich raises its mid vowels le,E.,o,~1 when they arefollowed in the next syllable by a higher vowel or asyllabic nasal, as well as by some palatalconsonants. This process has the effect of raisingthe target vowel by a half step. For example,raised lei is higher than nonraised lei but not ashigh as Iii. Similarly, raised lei is higher thannonraised lei, but lower than nonraised lei.

The data reported here are from three adultmale native speakers of each of the languagesstudied. All subjects were fluent speakers ofEnglish as a second language. The subjects werepaid for their participation.B. Materials

The 101 and lei vowels that we analyzed were asubset of a larger data base. This larger databasewas comprised of nonsense trisyllables of the formIpVWV2PV-j, where the vowels could be anyone ofthe five vowels la,e,i,o,uI. The consonant Ipl was

chosen because its articulation is relativelyindependent of articulations involving the tongue.These nonsense trisyllables were spoken in carrierphrases which were selected so that the lastphoneme of the word preceding the target trisyl­lable was an 101, and the first syllable of the wordfollowing the trisyllable was Ipol. The carrierphrases and their glosses are shown below3:

Shona(LC): Taura pepapa pachena"Say 'pepapa' clearly."

Ndebele(LC): Ngiakhulumaphephapha pakati"I speak 'pepapa' in the middle."

Sotho(MC): Kebuaphephapha phakisa"Say 'pepapa' quickly."

For each combination of target vowel andfollowing vowel context (e.g., target lei followed bythe context vowel Iii), we selected for analysisthree of the 10 possible types of trisyllables whichcontained that particular combination. If thetarget vowel was in V1, then V2 was the contextvowel, and V3 was either Ial (e.g., Ipepipal), or wasidentical to V2 (e.g., Ipepipil). When the contextvowel was 101, these two types were not distinct. Ifthe target vowel was itself V2, then V3 was thecontext vowel, and V1 was always 101 (e.g.,Ipapepi/). We will pool data from these three typesof trisyllables, as no relevant effects are foundwhen the types are considered separately (seeManuel, 1987).

C. RecordingsSubjects were told that the purpose of the

experiment was to find out how speakers ofdifferent languages produce speech sounds. Theywere instructed to read five randomized lists ofthe 125 trisyllables, inserting each trisyllable intothe appropriate carrier phrase. Subjects wereasked to produce all syllables on a low tone, and tospeak at a normal rate. Despite this instruction,some subjects spoke very rapidly, particularly theSotho(MC) speakers (this may have been due tothe semantic content of the carrier phrase orbecause, according to some of the subjects, in theirculture there is a popular game in which highvalue is put on speaking rapidly).

The recordings were made in a sound-treatedrecording studio at the University of Zimbabwe.The speech was recorded on audio tape at 3 3/4 ipson a NAGRA tape recorder. At the end of each list,subjects were asked to repeat trisyllables that hadobviously been misread.

We had hoped to record those Sotho(MC) vowelswhich were phonetically closest to the Shona(LC)and Ndebele(LC) mid vowels, leI and 101.Generally, in Sotho(MC) all of the allophones of

1M Role of Contrast in Limiting Vowel-to-oowel Coarticulation in Different Languages 5

both lei and lei are represented orthographicallybye, and the allophones of 101 and I~I as 0; some­times diacritics are employed to distinguish thephonemic vowels. Before reading the test materi­als, the Sotho(MC) speakers were presented withthe five orthographic vowels a, e, i, 0, u, andasked how they pronounced those vowels is isola­tion. For every speaker the quality of the isolatede was judged to be [e], and that of 0 to be [0]. Wealso monitored the speakers as they read the listsof trisyllables, and generally judged orthographice and 0 to be produced as [e] and [0], respectively.However, subsequent listening to the audiorecordings revealed that for two of the Sotho(MC)speakers there was a very large variability in theproduction of orthographic e, ranging from [t] to[e], and one Sotho(MC) speaker produced a lowermid vowel, more like English lei. This observationindicates that, for the vowel lei, we cannot be cer­tain whether subjects were consistently producinga single phonemic vowel, and if not, which one in agiven case. Though this results in some noise inthe data, it does not crucially affect the ability totest the hypotheses under consideration. First, asshown in Figure 1, both Sotho(MC) lei and lei havecloser neighboring vowels than does the mid vowellei of Shona(LC) and Ndebele(LC). Second, as willbe discussed below, when we look at the contexteffects on target vowel Ia/, we concentrate on theeffects of the Iii and lui, with less attention to themid-vowel contexts.D. Acoustic measurements

The speech was low-pass filtered with a cutofffrequency of 5 kHz and digitized at a 10 kHzsampling rate. The first (F1) and second (F2)formant frequencies were measured using linearpredictive (LPC) analysis in the ILS package. A 20ms analysis window was moved in 5 msincrements over the trisyllable. The formantpeaks for each analysis frame were calculatedusing the root-solving procedure.

Here we will report on measures of F1 and F2that were made at two points in the targetvowels.4 The first measurement point was made inthe middle of the vowel. This point was selected byexamining the waveform and formant tracks andchoosing a point, in the middle region of the vowel,which seemed to be most clearly associated withmaximal F1 values. It was expected that thispoint approximated the time of maximal jaw open­ing. We expected the anticipatory effects of thefollowing vowel to be seen more clearly at the endthan in the middle of the target vowel. Thereforethe second measurement point, the end point, wasmade as close to the Ip/ closure (as observed in the

waveform) as possible but with plausible F1 andF2 values.5 This procedure yielded four values foreach token [two formant values (F1, F2) by twomeasurement points (middle, end)].

Occasionally this procedure failed to yield F1 orF2 at one of the selected measurement points; inthese cases a point immediately preceding orfollowing the desired point was selected. For somevowels the entire region of interest failed to yieldformant values (this was particularly problematicfor F1) and an attempt was made to determine"thevalues by examining smoothed spectra in thatregion. Finally, for each speaker, the mean andstandard deviation were calculated separately foreach ofthe four measures for both target vowels.Values which were more than two standarddeviations above or below the mean were omittedfrom further analysis. This last procedure resultedin a loss of from 0 to 3% of the values from eachmeasure for anyone subject. Thus, while fivetokens of each type were recorded, not all(occasionally as few as two) were ultimatelyusable; some trisyllables were misread, and forothers it was impossible to determine the formantfrequencies. Since we have pooled data from threetypes of utterances, for each speaker, and for eachcombination of target vowel and context vowel,there are from 9 to 15 (with a mean of 13.4)tokens. The exception is when the context vowelwas Ia/, for which there are only two types ofutterances, and from 7 to 10 (with a mean of 9.1)tokens per target for each speaker.

Results

A. Non-Context effects: Distribution of Ia/and lei

Each speaker's average F1 and F2 values in themiddle of target vowels Ia/ and lei are plotted inthe Fl/F2 space in Figure 2. For comparison,values for these speakers' li,o,uI are also shown;no further analysis was done on these referencevowels. For all subjects except Sotho(MC) speaker1, lei has a relatively high F2 and low F1 value.The general trend of a high F2 and low F1 formost of the speakers' lei vowels is not surprising,given the auditory impression that this usuallywas a fairly high vowel. As noted earlier, someSotho(MC) orthographies are ambiguous as towhether orthographic e represents the phonemelei or lei, and among those orthographic traditionswhich do distinguish the two, the conventions onuse of diacritics are different. It seems notunlikely that Sotho(MC) speaker 1 produced thelower phoneme, while the other two speakersproduced the higher one.

6 Manuel

F2 in kHz2D 1.5 1.0 .5

I I I I200- -

i U

_8 0 400- e 0

!- - 600a

a800I- SHONA ttl -

I I I I

I I I I

200- -i u u8 0 e- - 0 400

FI in Hz- 600

aaa

SHONA "2 800I I

i i 200u u

u e 0 e 0e 400

0

600

800

Figure 2. Average FI and F2 values for the middle of target vowels laJ and lei for the nine subjects. Values for thevowels Iii, 10/, and lui are from the medial vowel in Ipipipil, Ipopopol, and Ipupupul, respectively, and are based onfrom two to five tokens each. Values for laJ and lei are based on more contexts, as indicated in the text.

B. Expected effects on Fl and F2 offrontinglbacking and raisinglloweringcontexts

In Figures 3a-c we show the expected acousticeffects of variation in production of the vowel Ia!.In these figures, the subscript assigned to thevowel Ia! identifies the following vowel context.The locations of the points indicate schematicallyhow the formant frequencies for the vowel Ia! areexpected to be influenced by the differentfollowing vowel contexts. Fronting and backing ofIa! should cause changes in F2, as shown in Figure3b. Raising of Ia! should primarily affect Fl, withlittle effect on F2, as shown in Figure 3a. Acombination of raising and frontlback movementshould be reflected in movement of both Fl and

F2, as shown in Figure 3c. For target vowel la!, weare interested in how much it moves toward thephonetic lei and I~I spaces. All of the vowelcontexts we used here (other than Ia! itself) arecontexts which would potentially move target Ia!into the phonetic spaces of interest.

The potential context effects for the target vowellei are shown in Figures 4a-c. We looked at targetlei as a function of two context vowels (/i,uI) whichmight raise the target lei, three vowels whichmight back lei (la,u,o!), one fronting context (Iii),and one lowering context Ia!. As shown in Figure4a, for a front vowel like lei, variation along thefrontlback dimension should be reflectedacoustically in F2: at a given height, the morefront the vowel is, the higher its F2. Changes in

The Role of Contrast in Limiting Vowel-to-vowel Coarticulation in Different Languages 7

the height of lei have effects both in Fl and F2,since in general the higher a front vowel is, thelower its Fl and the higher its F2 (Fant, 1960).Thus if lei is sensitive to only the height of afollowing vowel, we might expect to see itscontextually induced variants as shown in

Figure 4b. Note that in this case a following luiactually results in a variety of lei which has ahigher F2 than if the target lei is followed by Ia/.Finally, if lei is affected by both the frontlback andheight character of the following vowel, we mightexpect a pattern somewhat like that of Figure 4c.

F2.. ...

Acoustic effect of raising

Acoustic effect°l,uof fronting-backing

0_,0

01 Qe 00

Q o Q u0

0

Acoustic effect0( raising

In combination with

fronting-backing F1

1(a) (b) (c)

Figure 3. Expected acoustic correlates of context-induced variation in the target vowel/a!: a) fronting-backing; b)raising; c) combination of raising and fronting-backing. The subscripts indicate the quality of the following vowel; forexample,oi indicates the vowel/a! when it is followed by the vowel Iii.

F2... ... ~

Acoustic effectAcoustic effect of raising-lowering

Acoustic effect of raising-lowering in combination withof fronting-backing fronting-backing F1e

0

1(a) (b) (c)

Figure 4. Expected acoustic correlates of context-induced variation in the target vowel leI: a) fronting-backing; b)raising-lowering; c) combination of raising and fronting-backing. Subscripts indicate the quality of the following'context vowel <e.g., ei indicates the vowel leI when it is followed by the vowelli/).

8 Manuel

C. Obtained effects ofvowel context: General

To get an overall impression of the anticipatorycoarticulation effects, we averaged the data acrossthe three speakers of each language. These aver­aged data are plotted in the FlJF2 space as afunction of context and measurement point inFigures 5a-c (individual subject data are given inTable 1). In Figures 5a-c, for each language, thereare two scatters of points, one scatter for thevowel 10/ (in the lower portion of the graphs), andone for the vowel lei (in the upper left portion ofthe graph). Each of these scatters is composed oftwo smaller sets of data, one for measurementsmade in the middle of the vowel (points enclosedby inner loops), and one for measurements madeat the end of the vowel (points enclosed by outerloops). Each symbol represents the context inwhich the target vowel occurred (e.g., filledsquares indicate values for the target vowelsfollowed by the context vowel Iii).

D.. Context effects on the vowel Ia!Beginning with the Ndebele(LC) data shown in

Figure 5a, we see that all F1 values for target 10/are lower at the end than in the middle of thevowel, presumably due to the labial closure. Theconsonant closure generally had a lowering effecton F2 as well, as can be seen by comparing themidpoint and endpoint F2 values for target vowel/0/ followed by context vowel 10/ (open circles).

In general, fot Ndebele(LC), the following vowelcontext had a large effect on the target vowel 10/.Relative to the 10/ context, high vowels li,uI tend tolower F1 of target vowel 10/. The mid-vowel

contexts le,ol also lower F1 of target vowel 10/,though to a lesser extent. Front vowels li,e/ raiseF2, and the back rounded vowels 10,uI lower F2.Even in the middle of Ia! there is an effect of thefollowing vowel. The effects of context on F2increase as the measurement point moves closerto the end of the target vowel, and therefore to thecontextual vowel itself. The vowel context effect onF1 does not appear to be much larger at the endthan in the middle of the target vowel.

The vowel-context effects for Shona(LC) andSotho(MC) are shown in Figures 5b and 5c,respectively. The influence of the context vowel issmaller in these languages, particularly inSotho(MC), where there is very little effect of fol­lowing vowel context at the steady-state point. InSotho(MC) there is essentially no difference in F2for target 10/ produced in the contexts of a follow­ing 10/,101, and lui, whereas in both Ndebele(LC)and Shona(LC) there are some differencesbetween these contexts, although they are smallin the steady state. At later points in the targetvowel, a frontJback effect emerges for Sotho(MC),but it is much smaller than that seen inNdebele(LC) or Shona(LC). Height effects inSotho(MC) also appear to be relatively small. Ingeneral, Sotho(MC) 10/ appears to spread less intothe F1JF2 space than does 10/ in Ndebele(LC) or inShona(LC). In fact, most of the movement ofSotho(MC) 10/ is straight up the middle of thevowel space, and has the form expected if thismovement were due mostly to the intervocalic /p/.In what follows, we will examine the F1 and F2effects separately for the target vowel 10/.

1800 1400 1000 600

F21n Hz

1800 1400 1000 600 1800 1400 1000 600

~ .~.

0 0 -

300

400Fl

500 InHz

600

700

(a) Ndebele (b) Shona

Context Vowel:4 e 0 a ~ 0

(c) Solho

Figure 5. Acoustic effects of following context vowel on the target vowels loJ and leI in the three languages. Inner loopsenclose data from the middle of the target vowels, outer loops enclose measurements made at the end of the targetvowels. Data are averaged over the three speakers of each language.

The Role of Contrast in Limiting Vowel-ta-vowel Coarliculation in Different Languages 9

Table 1. Individual subject data.

Context Vowel e Q 0 uMeasurement Point MID END MID END MID END MID END MID END

F2 of Target VowellalSpeaker

Sotho(MC) #1 1256 1266 1212 1199 1248 1136 1227 1130 1218 1131Sotho(MC) #2 1169 1151 1160 1180 1118 1017 1108 1021 1142 1063Sotho(MC) #3 1348 1281 1339 1272 1275 1169 1296 1174 1282 1111

Shona(LC)#1 1397 1384 1380 1353 1346 1227 1314 1183 1276 1157Shona(LC)#2 1408 1348 1384 1342 1361 1230 1310 1192 1301 1196Shona(LC) #3 1386 1313 1344 1298 1208 1128 1260 1141 1224 1124

Ndebele(LC)#l 1235 1295 1209 1234 1200 1070 1145 901 1129 892Ndebele(LC)#2 1413 1458 1369 1385 1301 1134 1244 1029 1221 997Ndebele(LC)#3 1476 1458 1470 1444 1332 1176 1350 1082 1353 1098

Fl of Target Vowellal

Sotho(MC) #1 672 540 659 575 661 564 670 597 667 556Sotho(MC) #2 660 552 686 563 674 602 692 607 701 625Sotho(MC) #3 621 385 625 382 608 414 653 423 622 423

Shona(LC) #1 755 673 748 695 772 725 750 687 741 695Shona(LC) #2 703 418 736 534 753 630 713 561 690 561Shona(LC)#3 602 447 630 494 587 479 621 504 609 504

•Ndebele(LC)#l 688 483 694 566 705 589 681 445 674 402Ndebele(LC)#2 612 450 620 478 692 499 625 456 574 445Ndebele(LC)#3 605 513 710 491 699 489 682 496 645 488

F2 of Target Vowel leI

Sotho(MC) #1 1681 1546 1597 1461 1624 1463 1581 1397 1620 1419Sotho(MC) #2 1847 1744 1813 1619 1746 1486 1774 1312 1871 1573Sotho(MC) #3 1848 1634 1850 1649 1828 1628 1804 1358 1815 1435Shona(LC)#1 2054 1965 2041 1921 2002 1864 2053 1717 2054 1480Shona(LC) #2 1934 1790 1969 1763 1931 1612 1905 1459 1879 1419Shona(LC) #3 1756 1635 1723 1596 1715 1560 1734 1468 1706 1463

Ndebele(LC)#1 1918 1754 1914 1726 1919 1580 1904 1518 1883 1468Ndebele(LC)#2 1985 1885 1932 1804 1866 1394 1907 1377 1922 1245Ndebele(LC)#3 1991 1777 1931 1741 1996 1497 1934 1501 1964 1501

F1 of Target Vowel leI

Sotho(MC) #1 437 356 475 398 477 431 477 413 445 379Sotho(MC) #2 326 311 349 331 373 365 358 343 318 298Sotho(MC) #3 280 260 287 277 291 283 284 269 294 284Shona(LC) #1 371 319 377 329 373 351 370 352 375 353Shona(LC)#2 328 327 344 337 342 351 339 346 337 327Shona(LC)#3 290 280 295 282 308 290 303 293 309 309Ndebele(LC)#1 317 283 319 303 335 319 329 318 335 311Ndebele(LC)#2 329 309 343 325 377 378 365 368 359 342,Ndebele(LC)#3 383 342 399 379 385 381 400 391 399 378

D.l Frontlback effects (F2) on target 10/. Tosimplify the analysis of frontJback coarticulation,we concentrated on the two contexts that tendedto yield the highest and lowest F2 values, that isIii and lui, respectively. The F2 data for 10/

followed by Iii and by lui, averaged across speakerswithin each language, are shown in Figures 6a-c.In these figures, which can be thought of as highlyschematic formant trajectory plots, F2 is shown onthe vertical axis, and the two measurement points

10 Manuel

are shown on the horizontal axis. Individualsubject data are shown in a similar fashion inFigures 6d-l.

Again, it is clear that on average, Ndebele(LC)speakers displayed a large amount of frontlbackcoarticulation. The difference in the Ii - uI contextswas 141 Hz in the middle of the vowel, and 408 Hzat the end of the target vowel. These Ndebele(LC)data were submitted to an analysis of variancewith repeated measure factors of the Iii vs. luicontexts and the two measurement points. Whilethe individual speaker data show thatNdebele(LC) speaker 2 contributes more to theaverage Ii - uI difference than do the other twoNdebele(LC) speakers, the frontlback contrast issignificant [F(1,2) = 107.6, P < 0.01]. Thefrontlback contrast is much larger at the end ofthe vowel, and this was reflected in an interactionof the front/back effect with the measurementpoint [F(1,2) = 238.5, p < 0.01]. Simple maineffects tests confirm that the frontlback contrast issignificant at the 0.05 level in the middle of thevowel, and at the 0.01 level at the end of thevowel.

For Shona(LC), the Ii - uI difference is about thesame (130 Hz) as Ndebele(LC) in the middle ofthe vowel, but considerably smaller (189 Hz) thanNdebele(LC) at the end of the vowel.The Shona(LC) speakers all showed patternssimilar to each other, and the frontlback contrastwas in fact significant [F(1,2) = 112, P < .01].Although each Shona(LC) speaker had largerIi - uI differences at the end of the vowel thanin the middle, the interaction betweenmeasurement point and the Ii - uI differencewas not statistically significant [F(l,2) = 6.16,p >0.05]

On average, the Sotho(MC) speakers showedmuch less of an Ii - uI difference (44 Hz) in themiddle of the target vowel than did speakers ofeither Shona(LC) or Ndebele(LC). While eachspeaker did show at least some tendency for the Iiicontext to give higher F2 values than the luicontext, no Sotho(MC) speaker showed as large adifference as even the smallest difference foundamong the Shona(LC) and Ndebele(LC) speakers.Furthermore, while there was a main effect ofcontext [F(l,2) = 24.9, p < 0.05], simple maineffects indicated that in the middle of the vowel,the Ii - uI contrast was not significant forSotho(MC) speakers [F(1,2) = 14.1, p > 0.05]. Atthe end of the vowel, Sotho(MC) speakers did havea substantial Ii - uI difference (130 Hz), and thiswas statistically significant [F(1,2) = 30.4, p <0.05]. Even so, this difference at the end of the

vowel was only as large as the Ii - uI difference inthe middle of the Shona(LC) 10/ vowel.

D.2 Context effects on Fl of target 10/. Theaveraged data shown in F1IF2 plots in Figures Sa­c indicate that there is some effect of context onthe F1 value of target vowel 10/. For example, inthe middle of target vowel 10/ for the Ndebele(LC)speakers, the high-vowel contexts (Iii and lui)lower F1 by about 65 Hz, relative to target 10/ fol­lowed by the low vowel 10/. The mid vowels lei and101 lower F1 by a smaller amount, about 30 Hz. Asimilar pattern can be seen at the end of thevowel, with additional lowering of F1 for the 101and lui contexts, compared to the leI and Iii con­texts. In the other languages, there is also a gen­eral tendency for F1 to lower when the targetvowel is followed by high vowels. However, the de­tails of the patterns are somewhat different. Forexample, for Shona(LC) speakers, at the end oftarget vowel 10/, the lui context has a higher F1than does the Iii context, whereas the oppositepattern was found in Ndebele(LC). In Sotho(MC),a following 10/ context actually gives a lower F1value than does a following lui; As can be seen inTable 1, there was a fair amount of subject-to­subject variability in the F1 data. This variabilitymay have been due in part to general problems ofobtaining F1 data, as discussed above and inNote 5.

As we did for F2, we simplified the F1 analysisby focusing on only some of the possible contextcontrasts. In this case we compared the average ofthe two high-vowel contexts (Iii and lui) to the datafor the low-vowel context 10/. The results, averagedacross the speakers within each language, areshown in Figures 7a-c.

For Ndebele(LC), the average differencebetween the high-vowel and low-vowel contextswas 66 Hz in the middle of the vowel, and 62 Hzat the end of the vowel. The average Shona(LC)data also show differences between the high-voweland low-vowel contexts: 21 Hz in the middle of thevowel and 61 Hz at the end of the vowel. Despitethe overall tendency for F1 to be lower in the high­vowel contexts, neither Ndebele(LC) norShona(LC) show a statistically significant F1effect for high- vs. low-vowel contexts. However,as there are only three speakers for eachlanguage, the statistical tests are particularlysusceptible to speaker-to-speaker variability,which was rather large for the F1 measure inNdebele(LC) and Shona(LC). When we combinethe data from all six speakers of these twolanguages, the high vs. low effect does reachsignificance [F(1,5) = 10.04, P < 0.05].

The Role of Contrast in Limiting Vowel-to-vowel Coarticulation in Different Languages 11

F2inHz

150014001300120011001000

900

Ndebele

.---......

(a)

Shona

··

~ (b)

Sotho

(c)I

··

·

MID END MID END MID END

Context Vowel:

-, c u

150014001300120011001000

900

Ndebele Spkr # 1 Shona Spkr # 1

• •

(e)

Sotllo Spkr # 1

(f)

Sotllo Spkr #2

(1)

·

Shona Spkr # 2

I- (h)(g)

Ndebele Spkr # 2_ --ill~

15001400130012001100 I­

1000900

F2inHz

150014001300120011001000

900

Ndebele Spkr # 3

• •

(j)

Shona Spkr # 3·

·

• f­

- f-

(k)

SOl/lO Spkr # 3

(1), I

.

MID END MID END MID END

Figure 6. F2 of the target vowel/aJ followed by the front vowellU (filled squares) and by the back vowei/ul (opensquares). Data are shown for measurements made in the middle and end of the target voweL Plots a-e: are averagedover the three speakers of each of the languages. Plots dol are individual subject data.

12 Manuel

850

I

Sotho

(c) •

Shona

(b)

Ndebele

f- .

I-

~.

l-

I-

(a)

750

650F1inHz 550

450

MID END MID END MID END

• Average of I and U contextso Q context

850Ndebele Spkr # 1

750

650

550

450(d)

850Ndebele Spkr # 2

750

F1 650

~inHz 550

450 l- .(g)

Shona Spkr#1

~

(e)

Shona Spkr # 2

I-

~.

l-

I-

I-

(h)

Sotho Spkr # 1

(f)

Sotho Spkr # 2

I-

(i)I

850Ndebele Spkr # 3 Shona Spkr # 3

750 !'"

650 , .

550 ~.

450(j) (k)

I

MID END MID END

Sotho Spkr # 3

(l)~MID END

Figure 7. Open circles are Fl values for target vowel Ial in the context of a following low vowel (/aI), and filled squaresshow average of Fl value from the Iii and lui contexts for target vowel Ial. Plots a-e are averaged over the three speakersof each of the languages. Plots dol are individual subject data.

The Role of Contrast in Limiting Vowel-ta-vowel Coarticulation in Different Languages 13

For the Sotho(MC) speakers, the.re was verylittle difference in the high and low contexts,either in the middle of the vowel (which was 7 Hzin the wrong direction) or at the end of the vowel,where the high-vowel contexts decreased F1 byonly 13 Hz. In this case, the average datarepresent very well the individual subject data, asevery Sotho(MC) speaker showed the samepattern (Figures 7f, i, n. Thus, the very smalleffects, which reversed in direction from themiddle to the end of the vowel, gave an interactionof the context and measurement point factors[F(1,2) = 226.7,p < 0.01], and significant effects ofthe highllow context at both the middle of thevowel [F(1,2) = 23, P < 0.05] and end of the vowel[F(1,2) = 56.3,p < 0.05].

E. Context effects on target vowel leIReferring back to Figures 5a-c, we see that

compared to target vowel 10/, target vowel leIshows much less F1 fall as a function of movementtoward the oral tract closure for Ipl than doestarget vowel 10/. This relatively small amount ofFl fall is presumably because lei is already a highvowel, with a lower F1, than target vowel 10/.

Perhaps the most striking aspect of the lei data,and of crucial relevance here, is that in none of thelanguages do the lei values encroach very much onthe phonetic lei space. The vowel lei has a rela­tively high F1 value even when followed by 10/ inNdebele(LC) and Shona(LC). The production offlanking Ipl consonants may have encouraged thesubjects to maintain a relatively high jaw position

during the target vowel, thus limiting or cancelingthe lowering effects of a following low vowel.

To show more clearly the context effects that doexist, in Figures 8a-c we have expanded the upperleft portion of the F1IF2 plots that were shown inFigures 5a-c. The points enclosed by loops arefrom measures made in the middle of target vowellei, and the other points are from the measure atthe end of the vowel. Again, beginning withNdebele(LC), in the middle of target lei there isvery little effect of context, though Iii has a ten­dency to both lower Fl and raise F2. A similar butsmaller effect can be seen in the middle ofShona(LC) lei. In the middle of Sotho(MC) targetlei, both a following Iii and lui give lower F1 andhigher F2 values than a following lei context does.This pattern was found for Sotho(MC) speakers 1and 2, and is consistent with what we would ex­pect from a phonological rule that raises lei whenit is followed by a high vowel (see Figure 4b).

If we look at the measurements made at the endof target leI, a frontlback effect emerges in alllangUages, such that for a given height of thecontext vowel, the back vowel contexts result inlower F2 values. This effect is strongest inNdebele(LC). At the end of the vowel, on averageeach of the languages shows a height effect in thathigh front contexts yield lower F1 values than midfront vowels, and high back vowels yield lower F1values than do mid or low back vowels. However,at a given height specification, the back vowelsgive lower F1 values. This may be due to roundingfor the 101 and for lui contexts.

(a) Ndebele (b) Shona

F21n Hz .1800 1600 14001800

1600

loo

1400

C

­Ao lo C

1800 1600 1400

• 300 F1C 320 In

A

~lo 340 Hz

0 360

380

(c) Sotho

Context Vowel:

-, Ae 00 loo Cu

Figure 8. Expanded view of Figure 5, showing the acoustic effects of following-vowel context on the leI. Loops enclosedata from the middle of the target vowels; remaining points show measurements made at the end of the target vowels.Data are averaged over the three speakers of each language.

14 Manuel

We are primarily interested in seeing if theNdebele(LC) and Shona(LC) lei show more move­ment into the lei phonetic space, a space that isused phonemically in Sotho(MC), but not in theother languages. The only vowel context whichwould be expected to cause both lowering andbacking is 101. Consequently, we focused ourattention on the F1 and F2 values of lei followedby 101 vs. lei followed by lei.

The difference in the Ie - 01 contexts for F1 wassmall, both in the middle and end of the targetvowel, and did not differ substantially fromlanguage to language, especially when individualspeaker differences were taken into account. Ingeneral, there was a high degree of speaker-to­speaker variability with respect to anticipatorycoarticulation effects for target vowel lei. The onlylanguage for which there was a significant effectof context on F1 was Shona(LC) [F(1,2) =41.3, P <0.05], but the Ie - 01 context difference wasactually smallest, and quite negligible, in thislanguage (on average only 3 Hz in the middle ofthe vowel and 15 Hz at the end of the vowel forthe lei vs. 101 contexts). For F2, none of thelanguages had a statistically significant Ie - 01context effect, even though all Ndebele(LC)subjects had at least a 140-Hz higher F2 value forthe lei than for the 101 context, measured at theend of the vowel. On average, Shona(LC) had an81-Hz, and Sotho(MC) a 50-Hz Ie - 01 contextdifference at the end of the vowel. Possibly, withmore speakers of each language, the contexteffects on F2, and even the small context effects onF1, would prove to be significant. When we pooledthe data for the nine speakers together, there wasa significant contrast of the lei and 101 contexts atthe end of the vowel for both F1 [F(1,8) = 14.5,p <0.01] and F2 [F(l,8) =9.4,p < 0.05].

DiscussionA. Inventories of contrast as predictors of

output constraintsOur prediction was that Sotho(MC) speakers

would exhibit smaller anticipatory coarticulationeffects for 101 and leI than would Shona(LC) orNdebele(LC) speakers, because too much anticipa­tion of an upcoming sound in Sotho(MC) wouldmove the articulators (and therefore acoustics andperception) into competing distinctive vowelspaces. For example, we assumed that 101 wouldhave a tighter output constraint in Sotho(MC)than in Shona(LC) or Ndebele(LC), because of theproximity of Sotho(MC) 101 to other phonemicSotho(MC) vowels Ie, :J/.

When the target vowel was 101, our predictionwas borne out. Even when we take into accountspeaker-to-speaker variability, Shona(LC) andNdebele(LC) show larger anticipatory coarticula­tion effects on 101 than does Sotho(MC). Most ofthe Sotho(MC) 101 movement is straight up themiddle of the vowel space, and is mostly due to theoral tract closure associated with the intervocalicIp/. By limiting lateral movement, speakers ofSotho(MC) avoid excessive movement into thephonemic lei and I:JI spaces of this language.

We had expected to see similar effects for thetarget vowel lei, with Shona(LC) and Ndebele(LC)exhibiting more movement of leI toward thephonetic lei space. However, none of the languagesstudied showed much movement into the lei space,and what effects there were did not seem to beless robust in Sotho(MC) than in the otherlanguages. One reason for this result might bethat the target vowel is flanked on both sides bythe consonant Ip/. This Ipl may itself impose con­straints on the height of the jaw, and may limitthe anticipatory lowering movement of the follow­ing 101 during the target vowel lei.

Thus far we have focused on the direct compar­ison of Sotho(MC), Shona(LC) and Ndebele(LC),which after all, do not differ very much in terms oftheir respective spacing of phonemic vowels.Actually, all of these languages might be expectedto have relatively large coarticulatory effects ascompared to a language like English (VeryCrowded), which has many more vowels crowdedinto the vowel space. In the languages in the pre­sent study, the effects OfV2 on Vl are clearly ob­servable in the V1C transitions, but they also ex­tend into the middle portions of V1. and inShona(LC) and Ndebele(LC) these effects are re­markably large. Examples of vowel-to-vowel ef­fects in Ndebele(LC), Sotho(MC), and English<VC)can be seen Figure 9, which compares the medial101 in Ipapapol and Ipapapil for a single token forone speaker each of the three languages (see alsoManuel and Krakow, 1984). The effects inEnglish<VC) are very small, as we would expectgiven the proximity of English<VC) lrel and 101.Somewhat more coarticulation is seen inSotho(MC), in which the nearest front vowel to 101is lei. Finally, it is not surprising that Ndebele(LC)shows very extensive effects, since the next closestfront vowel to 101 is lei in Ndebele(LC).

Given these observations, we fully expect thatanticipatory coarticulation in Shona(LC) andNdebele(LC) may extend further back into thevowel, and perhaps is present even at the

The Role of Contrast in Limiting Vowel-te-vowel Coarticulation in Differtmt Languages 15

beginning of V1. Interestingly, English schwa, aphone that it is known to behave as if it had verylax output constraints, allows quite extensiveanticipatory coarticulation. As Magen (1989) hasrecently shown for English trisyllables like/babAbi/ and /babAbaI, the effects of the third vowelare seen throughout a medial schwa and canextend into the first vowel.

Strictly speaking, since we have not tested anumber of seven-vowel languages against anumber of five-vowel languages, our data alone donot allow us to generalize to other five- vs. seven­vowel comparisons. It is possible, then, that ourresults have nothing to do with output constraintsor phonemic contrasts, but are due to some other(perhaps arbitrary) fact about the languagesstudied. Further testing of other languages isnecessary to determine whether or not the present

data reflect a general tendency, or are due to afortuitous sampling accident.

Having noted this qualification, we observe thatsimilar results have been found in other studiesthat reflect on the effect of contrast inconstraining coarticulation (e.g., Lubker & Gay,1982; Magen, 1984; Manuel & Krakow, 1984; andOhman, 1966). Note, however, Nartey's (1979)cross-linguistic analysis of fricatives found thatvariation in fricative production was notcorrelated with the number or distribution ofcontrastive fricatives in a language. It may be thatthe precision needed to make fricatives, or themore categorical nature of consonant perception orproduction, or perhaps simply arbitrary amountsof pickiness in some of the languages studied, issuch that any added constraint from the system ofcontrasts is negligible.

--

2.0

1.0

En 9lis hpapa pi

.........................

I

' .

Sot'hopapa pi

.............""" .....

..., .

Ndebelepapapi

,...........,'".',.'.

tt .......•••••••• •••••

o>-2.0uzw::>~ 10a::lJ..

o

papa po

. ....,...................'.

I.........................

po pa po

................."..I

.....,..........., .h'

papa po

• ,."" .t•••••• • •••

,II •"

........................

TIME 0.1 S

Figure 9. First and second formants for the middle vowel/al in a single token of lpapapal and Ipapapil from a speakerof each of the languages indicated. The rising F2, in anticipation of following Iii, is particularly striking for theNdebele speaker. The Ndebele and Sotho tokens shown here are part of the present general analysis. The Englishtoken was recorded specifically for this illustration.

16 Manuel

B. Other factors affecting output constraintsAre output constraints determined solely by the

distribution of contrastive phones in a language?It would seem not. In the present study thereappear to be differences between the 2 five-vowellanguages in amount of coarticulation.Ndebele(LC) seems to have more frontlbackcoarticulation on /0/ than does Shona(LC), andthis is not obviously predicted by anything else weknow about the languages. One way of thinkingabout the various contributions to outputconstraints is that minimally phonemes musthave some audible, distinctive output, and thatlanguages are free to restrict the output ofparticular phonemes even further. Ladefoged(1983) has pointed out that languages may chooseto be more or less particular about how they maketheir phonemes. Our claim here, however, is thatthere are likely to be general constraints on thelack offastidiousness.

There may be a number of other factors thatcontribute to acceptability, and the role of contrastmay be to set maximal laxity for output con­straints. One such factor is probably formal vs.informal styles of speech. In formal situations,people are generally more careful in their speech.But what is "careful" if not a more precise, nar­rowed range of production for particular pho­nemes? In situations where communication isdifficult (e.g., noisy conditions, talking to children,nonnative speakers, or the hearing-impaired)people tend to "overarticulate" (Lindblom &Engstrand, 1989; Lindblom & MacNeilage, 1986).Overarticulation can be partially modeled as aslowing of articulation, but in all probability alsoinvolves a compression of the range of acceptableproductions of phonemes, in terms of their spatialcharacteristics (Picheny, Durlach, & Braida, 1985;1986). The very fact that people can vary theirrange of productions at will, and in formal or care­ful speech conditions tend to narrow those rangestoward clear exemplars of the phonemes, impliesthat at some level speakers have an awareness ofthe notion "best production" and the range of ac­ceptable productions.

Are output constraints ever violated? Obviouslythey are, since phonological processes such asnasal place assimilation (e.g., input> [unpG)tJ) arecommon and neutralize the underlying differencebetween contrastive phonemes. However commonthese neutralizations are, though, it is equallyobvious that they are not so rampant in anyonelanguage so as to lead to widespread or wholesaleloss of phonemic contrast. Of course, another way

of looking at these phenomena is to say that theoutput constraints themselves have been relaxed,not that the output constraints have beenviolated. It is not clear what, if any, data woulddistinguish between these two hypotheses. In anycase, as pointed out by Martinet (1952, pp. 129),there are a variety of forces at play in speech, andsometimes the forces which push or pull onephoneme into the territory of another "may simplybe more powerful than the functional factorsworking toward conservation."

C. Incorporating output constraints intomodels of coarticulation

The concept of output constraints, and the roleof contrast in setting output constraints, canpotentially be of value in trying to understandparticular instances of coarticulatory behavior,and can be used to try to' restrict models ofcoarticulation. In this section we briefly explorehow this concept might be incorporated into moreformal models of coarticulation. Note that whilethe following discussion is limited to only a fewtypes of models, the role of output constraints is(or should be) a concern for any coarticulationmodel.

We have suggested elsewhere (Manuel, 1987a;1987b) that output constraints could be thought ofas target spaces in some appropriate dimension(s)and that speakers generally try to move smoothlyfrom one space to the next, crucially always tryingto maximize efficiency (pick the easiest overallroute, according to some measure of "ease" ofarticulation). The idea was that the movementfrom target to target is affected by the narrownessof the target spaces themselves: extremely narrowtargets don't allow for much variability in th~

movement from one target to the next, whereaslarge targets allow various trajectories through agiven space. This concept is shown schematicallyin Figure 10, which demonstrates the effect of thenarrowness of the medial target on a non-speech"connect the circles" drawing task. The task is tostart in Circle A, move through Circle B, and endup in Circle C or D. When Circle B is quite small(analogous to a very restrictive output constraint),the trajectory from A to B is rather insensitive towhether the following target is Cor D. In contrast,when Circle B is large, the trajectory from A to Bis highly affected by the location of the followingtarget, that is, whether it is C or D. A similarproposal, suggesting interpolation between andthrough targets of various sizes, has been made byKeating (1990). Working within a connectionist

The Role of Contrast in Limiting Vowel-ta-vowel Coarticulation in Different Languages 17

framework, Jordan (1990) has recently developedmore explicit models for comparing the effect ofloose output constraints (don't care conditions onoutputs) with strict output constraints (strongly

care conditions on outputs). Jordan's model learnsa path through several target specifications, andJordan demonstrates that varying these con­straints affects just which path his model learns.

Figure 10. When Target B is large (upper panel), the trajectory from A to B is affected by the location of the next Target(C vs. D). In contrast, when Target B is small (lower panel), the trajectory from A to B is more restricted, and is'minimally influenced by the location of the next target.

18 Manuel

The above accounts assume, at least implicitly,th~t given a starting point A and two followinggoals Band C, the actor/speaker somehowcalculates an overall path through the twoupcoming goals. But it may be that, for motorbehaviors in general, the actual surface path isthe result of combining an underlying invariantcommand to move from the starting point A toGoal B, and an underlying invariant command tomove from Goal B to Goal C.

The idea that surface paths may reflect simul­taneous input from distinct, invariant commandsto move to different targets has been applied tospeech, most notably in the coproduction modelsof, for example, Bell-Berti and Harris (1982),Browman and Goldstein (1990), and Fowler (1981;1986). In coproduction models, coarticulation isachieved by allowing a particular articulator (orarticulatory system) to respond simultaneously toinvariant commands associated with adjacent orneighboring phones. For example, the anterior /kJclosure in /okil (vs. the more posterior closure for/akal) can be described by assuming that while thetongue dorsum is still responding to a command(associated with the /kI) to make a velar closure, italso begins responding to commands, associatedwith the goal of making the following Iii, to moveanteriorly. The actual movement of the tonguedorsum reflects both of these inputs. While thebasic concept of simultaneous response to twophones is not new, it has recently been explicitlymodeled (e.g., Saltzman, Rubin, Goldstein, &Browman, 1987).

In coproduction models, varying degrees ofcoarticulatory effects, such as those seen indifferent languages, can be achieved in two ways.First, the amount of temporal overlap of thecommands can be increased. For example, given asequence of phones XY, Y will have a greatereffect on the production of X the more thecommands for X and Y overlap. Second, theparticular weights used to combine the variouscommands can be varied. Given simultaneousinput from X and Y, the inputs could average, theycould sum, or they could combine in some otherlinear or nonlinear fashion such that Y has agreater or smaller affect on the production of X(see Boyce, 1988; Munhall & LOfqvist, 1987;Saltzman, Goldstein, Browman, & Rubin, 1988;Saltzman & Munhall, 1989). Whatever thecombinatorial algorithm is and however it isdetermined, once it is determined, coarticulationpresumably proceeds in a mechanical way.

An important issue for development of suchcoproduction models is whether or not there are

any general principles which determine thecombinatorial algorithms for overlapping gesture­commands. In this paper we have argued that thegoals, or output constraints, of one phoneessentially limit interference from another phone.To the extent that this is true, the algorithmscombining commands for a sequence of phoneswould not be determined in an ad hoc randomfashion. Rather, we expect that the combinatorialalgorithms would be such that, in a sequence ofphones XY, the commands for X have the effect ofsuppressing the commands for Y. (For a discussionof suppression, see Saltzman et al., 1988;Saltzman & Munhall, 1989.) In some cases theoutput constraints on X will be so weak thatcommands for Y will not be suppressed very much,but in other cases the output constraints on X willlead to extreme suppression of the commands forY. For example, in /akil there is a relatively laxoutput constraint on exactly where contact ismade on the roof of the mouth for the /kJ, and thisconstraint does not heavily suppress the forwardmovement for the following Iii vowel. At the sametime, since /kJ is a stop consonant, there is a strictconstraint on the degree of oral tract constriction,and this constraint suppresses any contrarygestures from the following /il which wouldpreclude making a complete oral closure.

The present results for Ndebele(LC), Shona(LC),and Sotho(MC) show that coarticulation can beboth temporally and spatially very extensive.From the point of view of coproduction models, ifanticipatory coarticulatory effects are found earlyin the first vowel of a V1CV2 utterance, we canconclude that the commands for the second vowelbegin ,(at least) that early. In addition, since thefirst vowel clearly dominates production at leastuntil the consonant is approached, thosecommands associated with the second vowel musthave a lesser effect on the surface trajectory inthat time period than do the commands associatedwith the first vowel itself. The fact that V2 has agreater or lesser effect on V1 in differentlanguages could, in a mechanical sense, bemodeled as being due to different tolerances oroutput constraints for particular phoneticgestures. These tolerances, which are predictable(at least in part) from certain general principles,lead to language-dependent amounts of suppres­sion of the V2 gestures in different languages.That is, the determination of the combinatorialalgorithm for neighboring phones is affected byrequirements for the maintenance of intelligiblespeech, i.e., maintaining distinctions betweenphones. Those requirements vary from one

The Role of Contrast in Limiting Vowel-to-vowel Coarticulation in Different Languages 19

speaking style or condition to the next, and fromlanguage to language, partially because languageshave different systems of contrast. Contrastaffects output constraints, and output constraintsdetermine how gestures associated with neigh­boring phonetic units are combined.

SUMMARYThe data presented here show that the vowel/a!

is more susceptible to anticipatory coarticulationwith a following transconsonantal vowel inShona(LC) and Ndebele(LC), which have no nearphonemic neighbors to /a!, than in Sotho(MC),which does have relatively near neighboring andcontrasting phonemic vowels. This result isconsistent with the idea that coarticulation islimited by output constraints on phones, and thatthese output constraints are determined, in part,by the need to maintain phonological distinctionsin a language.

REFERENCESBell-Berti, F., & Harris, K. S. (1982). Temporal patterns of

coarticulation: Lip rounding. JounuU of the Acoustiad Society ofAmerica, 71, 449-454.

Boyce, S. E. (1988). The influence of phonological structure onarticulatory organization in Turkish and in English: Vowel harmonyand coarticulation. Doctoral dissertation, Yale University, NewHaven, cr.

Browman, C. P., & Goldstein, L. (1990). Gestural structures andphonological patterns. In I. G. Mattingly & M. Studdert­Kennedy (Eds.), Modularity and the motor theory of speechperception. Hillsdale, NJ: Lawrence Erlbaum Associates

Chasaide, A (1979). Laterals of Gaoth-Dobhair Irish and Hibemo­English. Occasional Papers in Linguistics and unguage Learning,6, pp. 55-78. The New University of Ulster.

Cole, D. T. (1955). An introduction to Tswana gram77l4r. London:Longmans, Green and Co.

Doke, C. M. (1954). The Southern &lntu languages. London: OxfordUniversity Press.

Engstrand, O. (1988). Articulatory correlates of stress and speakingrate in Swedish VOl utterances. JounuU of the Acoustical Societyof America, 83, 1863-1875.

Fant, G. (1960). Acoustic theory of speech production. The Hague:Mouton.

Flege, J. E. (1989). Differences in inventory size affect the locationbut not the precision of tongue positioning in vowelproduction. Lmguage and Speech, 32,123-147.

Fowler, C. A (1981). Production and perception of coarticulationamong stressed and unstressed vowels. JounuU of Speech andHearing Research, 46, 127-139.

Fowler, C. A (1986). An event approach to the study of speechperception from a direct-realist perspective. JounuU ofPhonetics,14,3-28.

Jongman, A, Blumstein, S. E., & Lahiti, A (1985). Acousticproperties for dental and alveolar stops. Journal of Phonetics, 13,235-251.

Jordan, M. 1. (1990). Serial order: A parallel distributed processingapproach. In J. L. Elman & D. E. Rumelhart (Eds.), Advances inconnectionist theory: Speech. Hillsdale, NJ: Lawrence ErlbaumAssociates.

Keating, P. A. (1990). The window model of coarticulation:articulatory evidence. In J. Kingston & M. E. Beckman (Eds.),Papers in laboratory phonetics I: Between the grammar and thephysics of speech (pp. 451-470). Cambridge: Cambridge U.P.

Keating, P. A., & Huffman, M. (1984). Vowel variation inJapanese. Phonetica, 41, 191-207.

Ladefoged, P. (1983). The limits of biological explanation inphonetics. UCLA Working Papers in Phonetics, 57,1-10.

Lindblom, B. (1983). Economy of speech gestures. In P. F.MacNeilage (Ed.), The production of speech (pp. 217-245). NewYork: Springer-Verlag.

Lindblom, B., & Engstrand, O. (1989). In what sense is speechquantal? Journal ofPhonetics, 17,107-121.

Lindblom, B., & MacNeilage, P. (1986). Action theory: Problemsand alternative approaches. JounuU of Phonetics, 14, 117-132.

Lubker, J. F., & Gay, T. (1982). Anticipatory labial coarticulation:Experimental, biological and linguistic variables. Journal of theAcoustical Society ofAmerica, 71, 437-448.

Magen, H. (1984). Vowel-to-vowel coarticulation in English andJapanese. Journal of the Acoustical Society ofAmerica, Suppl. 1, 75,541.

Magen, H. S. (1989). An acoustic study of tlowel-to-tlowelCDIlrticulation in English. UnpUblished doctoral dissertation, YaleUniversity, New Haven, CT.

Manuel, S. Y. (1987a). Acoustic and perceptual consequences of rowel­to-tlowel coarticulation in three BAntu languages. Unpublisheddoctoral dissertation, Yale University, New Haven, cr.

Manuel, S. Y. (1987b). Output constraints and cross-languagedifferences in coarticulation. JounuU of the Acoustical Society ofAmerica,Suppll,82,Sl15.

Manuel, S. Y., & Krakow, R. A, (1984). Universal and languageparticular aspects of vowel-to-vowel coarticulation. HaskinsUzboratories Status Report on Speech Research, SR-77/78, 69-78.

Martinet, A (1952). Function, structure, and sound change. Word,8, 1-32. Reprinted in P. Baldi & R Werth (Eds.), Readings inhistorical phonology: Chapters in the theory of sound change (pp.121-159). University Park, PA: Pennsylvania State UniversityPress, 1978.

Martinet, A (1957). Phonetics and linguistic evolution. In B.Malberg (Ed.), Manual of phonetics (pp. 252-272). North Holland,Amsterdam.

Munhall, K. G., &: Lofqvist, A. (1987). Gestural aggregation inspeech. PAW ReTJiew, 2, 13-15. (University of ConnecticutPress).

Nartey, J. N. A (1979). A study of phonemic universals, especiallyconcerning fricatives and stops. UCLA Working Papers inPhonetics, 46.

Nolan, F. (1985). Idiosyncrasy in coarticulatory strategies.Uimbridge Papers in Phonetics and Experimental Linguistics 4, 1-9.

Ohman, S. E. G. (1966). Coarticulation in VCV utterances:Spectrographic measurements. JounuU of the Acoustical Society ofAmerica, 39, 151-168.

Picheny, M. A, Durlach, N. I., & Braida, L. D. (1985). Speakingclearly for the hard of hearing I: Intelligibility differencesbetween clear and conversational speech. JounuU of Speech andHearing Research, 28, 96-103.

Picheny, M. A, Durlach, N. I., & Braida, L. D. (1986). Speakingclearly for the hard of hearing II: Acoustic characteristics ofclear and conversational speech. JounuU of Speech and HearingResearch, 29, 434-446.

Saltzman, E. L., & Munhall, K. (1989). A dynamical approach togestural patterning in speech production. Ecological PSlfchology,1,333-382.

Saltzman, E. L., Goldstein, L., Browman, C. P., &: Rubin, P. (1988).Dynamics of gestural blending during speech production.Neural Networks, 1, 316.

20 Manuel

Saltzman, E. L, Rubin, P., Goldstein, L., & Browman, C. P. (1987).Task-dynamic modelling of interarticulator coordination.Journal of the Acoustical Society of America, Suppll, 83, 515.

Schouten, M. E. H., & Pols, L. C. W. (1979). Vowel segments inconsonantal contexts: A spectral study of coarticulation-Part1. Journal of Phonetics, 7, 1-23.

Stevens, K. N. (1989). On the quantal nature of speech. Journal ofPhonetics, 17,3-45.

Tatham, M. A. A. (1984). Towards a cognitive phonetics. Joul7Ul1 ofPhonetics, 12, 37-47.

FOOTNOTES·Joul7Ul1 of the Acoustical5oc~ty of America, 88(3),1286-1298 (1990).1Now at Wayne State University, Department of Communication

Disorders and Sciences, Detroit, MI.I As is well known, in a word like call't there may be no nasal

murmur in the signal. That is, there may be no period of timewhere there is simultaneously (1) a wide velopharyngealopening, (2) oral closure, and (3) substantial vocal fold vibration.In the absence of a nasal murmur, the nasal quality of the vowelitself can distinguish cat from call't.

2It may well be that the range of normal token-ta-token variability(vs. coarticulation induced variability) does not put sufficientpressure on output constralnts to test the hypothesis that outputconstralnts vary as a function of phoneme distribution. Forvowels spoken in a single context, Flege (1989) found that

speakers of Spanish (a language with vowels which are well­spread apart in the acoustic and articulatory space) do not showmore token-ta-token variability for vowel height than dospeakers of English, a language with a more crowded vowelspace.

3Ndebele and Sotho contrast voiceless ejective [p'] (orthographicp) with voiceless aspirated [ph] (orthographic ph). Consistentwith the orthographic traditions of the three languages, thestimulus lists for these two languages used ph, while those forShona used p, which also represents an aspirated [ph].

4In Manuel (1987a) we report data from an additionalmeasurement point, made at 20 ms before the end of the targetvowel. While there are differences in the values obtained at thispoint and the measurement point made at the end of the vowel,the conclusions reached are not affected by the omission of thosedata

SAt points very close to the end of the vowel, where there is arapid change in frequency, and amplitude is dropping, the LPCtechnique sometimes picks spurious values for the formants.This was particularly true for Fl, which tended to be "lost"earlier than F2. Occasionally the end point was actually made asmuch as 20 ms, though more often it was 5 or 10 ms, fromclosure. In addition, for some speakers Fl was hard to determineeven in the middle of the vowel, particularly for the vowel / ai,in which Fl had a wide bandwidth, perhaps due to trachealcoupling. Note that the /p/ in these utterances was often heavilyaspirated.