11
The acoustic signature for intelligibility test words Gary Weismer, RayD. Kent, Megan Hodge, andRuth Martin Department of Communicative Disorders, Waisman Center on MentalRetardation, University of Wisconsin Madison, Madison, Wisconsin 53 706 (Received 16December 1987; accepted for publication 30June1988) As part of a research program that aims to develop an explicit acoustic basis for a single-word intelligibility test, an initial attempt to characterize the formant trajectories and segment durations of seven test words produced by 30 normal speakers is described. These characterizations are referred to as"acoustic signatures." The dataindicate that: ( 1 ) formant trajectories show two sex effects, namely, that females are morevariable asa group than males andtend to have greater slopes for thetransitional segment of thesecond-formant trajectories andthatthese effects areconsistent across words; (2) Barktransformations of thefrequency datado not seem to eliminate the interspeaker differences in formant trajectories, nor do they eliminate either of thesex effects described above; and (3) segment durations have different variabilities depending on the syllabic structure of the word;no sex effect was noted here. The discussion focuses on the appropriate formfor the acoustic signatures, aswell asfactors that should be considered in selecting words for signature development. To demonstrate the potential application of these data, formant trajectory andsegment duration datafrom 18 speakers with amyotrophic lateral sclerosis andvarying degrees of dysarthria arecompared to the acoustic signature for the word wax. PACS numbers: 43.70.Dn INTRODUCTION The formal assessment of speech intelligibilityin per- sons with motor speech disorders can be tracedback to Ti- kofsky's (1970) and Tikofsky and Tikofsky's (1964) effort to design word liststhat were maximallysensitive to varia- tionsin the severity of dysarthria. The general framework for Tikofsky'sapproach was basedon standard testsof speech intelligibilityused in the audiology clinic, or to test communication systems. At about the same time, Darley et al. (1969a,b) were using a 7-point scale to rateintelligibility of dysarthric speakers. As in Tikofsky's work, the primary purpose of thisscaling was to index the severity of the intelli- gibility deficit. More recently, Yorkston and Beukelman ( 1981 ) havedescribed an intelligibilitytestdesigned specifi- cally for the assessment of dysarthria. Like Tikofsky's test, the Yorkston and Beukelman instrument is based on the standard approach to assessment of speech intelligibility and essentially provides a wayto index theseverity of the speech disorder. Whereas these tests have some value in that theyprovide researchers with an apparently easily understood and com- mon index of severity ofinvolvement (i.e., for thereplication of studies or the blocking of subjects into "equivalent-sever- ity" subgroups), they do not contribute to an understanding of thebasis of thespeech disorder [see Kent etal. (in press), for furtherdiscussion of thisissue ]. Similarly, clinicians who treat the dysarthrias can use currently available intelligibil- ity tests for little morethanan index of severity, whichcould probably beas easily derived from listening to a short sample of a patient's spontaneous speech. There havebeena few attempts to understand the basis of intelligibilitydeficits, but thesehave not been formalized as tests.This generalap- proach has been described i•reviously by Monsen (1978) for thespeech of thehearing impaired, and by Ansel(1985) for a group of dysarthricspeakers, althoughneither of these investigators attempted to formalize their analyses with a standard set of materials or measures.Ansel (1985) found that thescaled intelligibility of 16adults with mixed cerebral palsy was predicted by multiple regression analysis with 62.6% accuracy by acoustic measurements relatingto one consonant contrast (fricative-affricate) and three vowel contrasts (front-back, high-low, and tense-lax). Other measuredcontrasts,suchas thoserelated to consonant voic- ing effects and stop-nasal distinctions, did not seem to con- tributein a significant wayto variability in theintelligibility estimates. Whereas these results areinteresting, the suscepti- bility of the multiple regression technique to sampling error and the known problemswith equal-appearing interval scales suggest caution in the interpretation of the findings. In the present article,wedescribe one component in the development of an intelligibility testthat has been designed to have an explanatory function. The basis of theexplanatory function is foundin the explicit acoustic underpinnings of the test items. The test consists of wordsets designed to con- trast acoustic characteristics that are eitherknownto play a direct role in segmental distinctions or that are correlates of articulatory problems commonly found in the dysarthrias. Ultimately, the testitemsshould havea sufficiently detailed acoustic basis to permit an explanation of an intelligibility deficit in termsof specific articulatorydeficits. An initial stepin the development of sucha test is to specify thenormal acoustic profiles of each of thetest words. Whereas the literature contains a fair amount of information on certainacoustic characteristics of phone-sized segments, to our knowledge there areno (relatively) large-scale acous- tic data bases for specific words.Moreover, the characteris- 1281 J. Acoust. Soc.Am. 84 (4), October 1988 0001-4966/88/101281-11500.80 @ 1988 Acoustical Society of America 1281

The Acoustic Signature for Intelligibility Test Wordsusers.abo.fi/jtuomain/EMCL/articles/Weismeretal1988.pdf · The acoustic signature for intelligibility test words Gary Weismer,

  • Upload
    lamdat

  • View
    228

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Acoustic Signature for Intelligibility Test Wordsusers.abo.fi/jtuomain/EMCL/articles/Weismeretal1988.pdf · The acoustic signature for intelligibility test words Gary Weismer,

The acoustic signature for intelligibility test words Gary Weismer, Ray D. Kent, Megan Hodge, and Ruth Martin Department of Communicative Disorders, Waisman Center on Mental Retardation, University of Wisconsin Madison, Madison, Wisconsin 53 706

(Received 16 December 1987; accepted for publication 30 June 1988)

As part of a research program that aims to develop an explicit acoustic basis for a single-word intelligibility test, an initial attempt to characterize the formant trajectories and segment durations of seven test words produced by 30 normal speakers is described. These characterizations are referred to as "acoustic signatures." The data indicate that: ( 1 ) formant trajectories show two sex effects, namely, that females are more variable as a group than males and tend to have greater slopes for the transitional segment of the second-formant trajectories and that these effects are consistent across words; (2) Bark transformations of the frequency data do not seem to eliminate the interspeaker differences in formant trajectories, nor do they eliminate either of the sex effects described above; and (3) segment durations have different variabilities depending on the syllabic structure of the word; no sex effect was noted here. The discussion focuses on the appropriate form for the acoustic signatures, as well as factors that should be considered in selecting words for signature development. To demonstrate the potential application of these data, formant trajectory and segment duration data from 18 speakers with amyotrophic lateral sclerosis and varying degrees of dysarthria are compared to the acoustic signature for the word wax.

PACS numbers: 43.70.Dn

INTRODUCTION

The formal assessment of speech intelligibility in per- sons with motor speech disorders can be traced back to Ti- kofsky's (1970) and Tikofsky and Tikofsky's (1964) effort to design word lists that were maximally sensitive to varia- tions in the severity of dysarthria. The general framework for Tikofsky's approach was based on standard tests of speech intelligibility used in the audiology clinic, or to test communication systems. At about the same time, Darley et al. (1969a,b) were using a 7-point scale to rate intelligibility of dysarthric speakers. As in Tikofsky's work, the primary purpose of this scaling was to index the severity of the intelli- gibility deficit. More recently, Yorkston and Beukelman ( 1981 ) have described an intelligibility test designed specifi- cally for the assessment of dysarthria. Like Tikofsky's test, the Yorkston and Beukelman instrument is based on the

standard approach to assessment of speech intelligibility and essentially provides a way to index the severity of the speech disorder.

Whereas these tests have some value in that they provide researchers with an apparently easily understood and com- mon index of severity of involvement (i.e., for the replication of studies or the blocking of subjects into "equivalent-sever- ity" subgroups), they do not contribute to an understanding of the basis of the speech disorder [ see Kent et al. (in press), for further discussion of this issue ]. Similarly, clinicians who treat the dysarthrias can use currently available intelligibil- ity tests for little more than an index of severity, which could probably be as easily derived from listening to a short sample of a patient's spontaneous speech. There have been a few attempts to understand the basis of intelligibility deficits, but these have not been formalized as tests. This general ap- proach has been described i•reviously by Monsen (1978) for

the speech of the hearing impaired, and by Ansel (1985) for a group of dysarthric speakers, although neither of these investigators attempted to formalize their analyses with a standard set of materials or measures. Ansel (1985) found that the scaled intelligibility of 16 adults with mixed cerebral palsy was predicted by multiple regression analysis with 62.6% accuracy by acoustic measurements relating to one consonant contrast (fricative-affricate) and three vowel contrasts (front-back, high-low, and tense-lax). Other measured contrasts, such as those related to consonant voic- ing effects and stop-nasal distinctions, did not seem to con- tribute in a significant way to variability in the intelligibility estimates. Whereas these results are interesting, the suscepti- bility of the multiple regression technique to sampling error and the known problems with equal-appearing interval scales suggest caution in the interpretation of the findings.

In the present article, we describe one component in the development of an intelligibility test that has been designed to have an explanatory function. The basis of the explanatory function is found in the explicit acoustic underpinnings of the test items. The test consists of word sets designed to con- trast acoustic characteristics that are either known to play a direct role in segmental distinctions or that are correlates of articulatory problems commonly found in the dysarthrias. Ultimately, the test items should have a sufficiently detailed acoustic basis to permit an explanation of an intelligibility deficit in terms of specific articulatory deficits.

An initial step in the development of such a test is to specify the normal acoustic profiles of each of the test words. Whereas the literature contains a fair amount of information

on certain acoustic characteristics of phone-sized segments, to our knowledge there are no (relatively) large-scale acous- tic data bases for specific words. Moreover, the characteris-

1281 J. Acoust. Soc. Am. 84 (4), October 1988 0001-4966/88/101281-11500.80 @ 1988 Acoustical Society of America 1281

Page 2: The Acoustic Signature for Intelligibility Test Wordsusers.abo.fi/jtuomain/EMCL/articles/Weismeretal1988.pdf · The acoustic signature for intelligibility test words Gary Weismer,

tics offorrnant trajectories, which we consider to be of great importance because they reflect the dynamic articulatory processes that are so often disturbed in the dysarthrias (see Kent and Netsell, 1975; Kent et al., 1975; Hardcastle et al., 1985), have received only a small amount of attention in the acoustic phonetics literature, and then only for a limited class of sound sequences. Ansel's (1985) finding of a consid- erable influence of vocalic characteristics on the intelligibil- ity of dysarthric speakers suggests that the acoustic charac- teristics of formant trajectories be incorporated into the kind of test under development. The inability to formulate a set of general rules for the specification of formant trajectories across phonetic contexts (Broad and Clermont, 1987) man- dates an empirically based account offormant trajectories in the individual words of the intelligibility test.

Our initial strategy is to specify the typical characteris- tics of trajectories and segment durations in a population of normal, geriatric speakers. Geriatric speakers are used be- cause so many dysarthric speakers are elderly, and there are known differences in acoustic-phonetic characteristics be- tween normal young adults and geriatrics (see Weismet and Fromm, 1983; Weismet, 1984; Kent and Burkhard, 1981). An attempt is made to represent specific acoustic measure- ments across a group of speakers in terms of a statistical or graphical summary; we refer to these summaries as acoustic signatures, and such summaries of all relevant acoustic char- acteristics of a test word is the overall signature for that word. Ultimately, these normal signatures should be used to compute acoustic or auditory distances (Lindblom, 1978; Syrdal and Gopal, 1986) from dysarthric word productions. The profile of these distances for a given dysarfhric speaker would then furnish the raw material for explanation of an intelligibility deficit. The threefold purpose of the present report, therefore, is to: ( 1 ) describe for selected words the group characteristics of fo .rmant trajectories and segment durations for 15 male and 15 female geriatric speakers; (2) consider alternate representations of the data used to con- struct the acoustic signatures; and (3) evaluate which char- acteristics of formant trajectories and segment durations are likely to be sensitive to disordered articulation. The sensitiv- ity of acoustic signatures is evaluated in a preliminary way by comparing the normal data to data obtained from speak- ers with amyotrophic lateral sclerosis (ALS).

I. METHOD

A. Subjects

Thirty normal geriatric subjects (15 male, 15 female) participated in the present investigation. By "geriatric" we mean individuals who fell between the ages of 65 and 80 (males, 68-80 years; females, 65-80 years), and by "nor- mal" we mean individuals who had no past or current medi- cal problems that might be attributed to neurological dis- ease. Some of the subjects had previous histories of other disease processes (such as heart disease, cancer, and so forth) and were taking medication for, or as a consequence of, those diseases. None of the medications was in the phen- othiazine or butyrophenone categories. Pure tone thresholds were determined at 0.5, 1, 2, and 4 kHz for each subject but were not used as a criterion for participation in the study.

The majority of subjects had thresholds of 35 dB or better at each of the test frequencies, but some subjects had thresholds up to 60 dB for the higher test frequencies. If subjects were able to repeat sentences presented via a loudspeaker at a comfortable listening level, we regarded their functional hearing as adequate for this study. All of the geriatric sub- jects were living independently at the time of testing, and all were able to visit the laboratory with no assistance. The ma- jority of subjects spoke with a dialect heard in Wisconsin, and there was no apparent tendency for speakers with other dialects to favor either the male or female groups.

Data described in Sec. III B 2 were obtained from 18

males with medical diagnoses of amyotrophic lateral sclero- sis (ALS). ALS is a progressive disease characterized by degeneration of anterior horn cells of the spinal cord, brain- stem nuclei, and fibers of the pyramidal tract. The bulbar (brain stem) lesions are thought to produce afiaccM dys- arthria, whereas the combination of hulbar and pseudobul- bar (pyramidal tract) lesions, often associated with later stages of the disease, results in a "mixed"fiaccid-spastic dys- arthria (Darley et al., 1975). The ALS individuals in the present study ranged in age from 27-72 years and had dys- arthrias and disease stages ranging from very mild to very severe. A full report of the acoustic and intelligibility charac- teristics of these subjects' speech is currently in preparation.

B. Speech sample

Each subject produced one of two forms of a word list that has been described by Kent, et al. (in press). Briefly, both forms of the list include 98 items, among which are target words and several dummy words for use in intelligibil- ity testing. Each target word is part of a four-word set that is designed to probe specific linguistic contrasts. For example, the target word ship is part of a set that includes the words sheep, chip, and tip. This set is therefore designed to be sensi- tive to one vowel contrast and two manner contrasts (frica- tive-affricate, and fricative-stop). When subjects produced the target words, they were unaware of the other three members of the word set.

Subjects read each test word a single time from a printed card. There were two decks of cards (one deck per test form) containing the entire set of words; either deck might be used for a patient, and the order of presentation of the cards was either front to back (i.e., 1 to 98) or back to front (98 to 1 ). Full acoustic analyses are described below for seven words (coat, wax, sigh, sip, ship, row, sew). These words were cho- sen for initial exploration of the signature concept because they sampled a fairly wide variety of vocalic nucleus types.

C. Preparation and measurement of spectrograms

Wideband (300 Hz filter) spectrograms were made for all productions of the seven words by the 30 normal subjects and 18 patients with ALS. The spectrograms were prepared on a Kay 7800 digital sonagraph with scale expansions of either 0-4.0 kHz or 0-5.0 kHz. Subsequently, spectrograms were marked for further analysis by (1) segmenting the words in time according to conventional criteria (Peterson and Lehiste, 1960; Klatt, 1976) for the determination of seg- ment durations and (2} tracing the midpoint off I and F2

1282 J. Acoust. Soc. Am., Vol. 84, No. 4, October 1988 Weismet otal.: Acoustic signatures 1282

Page 3: The Acoustic Signature for Intelligibility Test Wordsusers.abo.fi/jtuomain/EMCL/articles/Weismeretal1988.pdf · The acoustic signature for intelligibility test words Gary Weismer,

from the first to last clearly defined glottal pulse of the voca- lie nucleus.

The marked spectrograms were analyzed further by means of a computer program (SPECTI0) that processed and stored data entered on a graphics tablet. The program accepts single-touch inputs for duration measures (one touch per segment boundary) or samples ten points per sec- ond for continuous tracing of formant trajectories. All data are stored in a file that includes an identifier for each mea-

surement and a listing ofxoy points. In the case of formant trajectories, consecutive time points are typically separated by 3-8 ms, depending on the experimenter's tracing speed. Slope computations, described below, were derived from the time-frequency listings in these files.

The frequency data in the files were also converted to Bark according to t.he formula reported by Syrdal and Gopal (1986) (end correction was not applied to these data). Tra- jectories were examined on the Bark scale to see if variability across subjects was reduced relative to trajectories on the frequency scale. The comparison of Bark and frequency scale representations was deemed especially important for the examination of male-female trajectory differences.

D. Derivation of trajectory slopes

For each word, transitional segments of the F 1 and F2 trajectories were identified for slope computation. The selec- tion of the particular transitional segment to use in the slope computation was guided by considerations of underlying ar- ticulatory behavior and qualitative examination of the acoustic trajectories. The transitional segments used for slope computation are bounded by arrows on the stylized trajectories shown for each word in Fig. 1. The actual slope computations were determined by a rule that defined onset and offset of the transitional segment. The onset of the seg- ment was defined as the first point in time from which a 20- ms increment was accompanied by at least a 20-Hz change; the offset of the segment was defined as a succeeding point in time from which a 20-ms increment did not have a corre-

sponding 20-Hz or greater change. This criterion was deter-

COAT

SIP-SHIP

g/AX

v% SIGH

ROW SEW

FIG. 1. Schematic displays off I-F2 trajectories for the seven words stud- ied in the present investigation. A single pair of trajectories is shown for sip and ship because there is little difference in trajectory shape between these words. Arrows indicate the boundaries of the transitional segment for which transition extent and slope measures were derived.

1700

1200

• 700

200

COAT: MALE

TIME (MSJ

COAT: FEMALE

250 o sb •6o l•o 260 2so TIME (MS)

FIG. 2. Group (N = 15) F I-F2 trajectories for coat. Left panel, males; right panel, females.

mined empirically and has been used successfully in previous studies (Weismer et al., 1985 ). The two factors that contrib- uted to the slope computation--the transition extent (the amount of frequency change along the transition) and tran- sition duration--were also recorded and tabulated. In the

present report, only data on slope and transition extent are described. •

Slopes for Bark-transformed trajectories were comput- ed from the same transitional segments shown in Fig. 1, but with a criterion of 0.10 Bark/20 ms. This criterion was based

on simultaneous examination of Bark trajectory listings and plots of Bark-transformed trajectories.

II. RESULTS

A. Formant trajectories: Frequency scale

Group plots of F1 and F2 trajectories for the seven words are shown in Figs. 2-8. In each plot, there are 15 trajectories per formant, corresponding to the 15 speakers in a group. Whereas these plots do not show the variability of trajectories within subjects, that variability is typically much smaller than the between-subject variability evident here. 2 Note that there is no attempt in Figs. 2-8 to identify the pairing off I and F2 by subject. Also, time and frequency scales are matched only for comparisons of the same word across gender. Slope data, !n Hz/ms, are reported in Table I for each trajectory.

Several general observations are suggested by examina- tion of these plots. For example, there is a clear tendency for the female trajectories to be more variable than the male trajectories, especially for F2. This variability difference is most obvious for coat, wax, sew, row, and sip. Variability differences are not so marked for F 1, but, when they do oc: cur, it is typically the female subjects who are less consistent as a group. The female subjects also appear to produce F2 trajectories with consistently greater slope than the male

WAX: MALE WAX: FEMALE 3000

0 [00 200 300 400 500 0 [00 200 300 400 500

FIG. 3. Group F l-F2 trajectories for wax.

1283 J. Acoust. Soc. Am., Vol. 84, No. 4, October 1988 Weisruer ot a/.: Acoustic signatures 1283

Page 4: The Acoustic Signature for Intelligibility Test Wordsusers.abo.fi/jtuomain/EMCL/articles/Weismeretal1988.pdf · The acoustic signature for intelligibility test words Gary Weismer,

SE•4: MALE 200O

• 0 O 1DO 20O 3GO 40O 5DO

SEW: FEMALE

FIG. 4. Group F I-F2 trajectories for sew.

ROW: MALE ROW: FEMALE

5DO

•o i•o 30o 40o 5oo 6o0 7oo TIME {MS!

FIG. 5. Group F I-F2 trajectories for row.

2700

700

SIGH: MALE SIGH: FEMALE

160 260 360 460 560 660 700 0 ]00 200 300 400 500 600 zoo 200

FIG. 6. Group F l-F2 trajectories for sigh.

SiP: MALE SIP: FEMALE

O 20 40 80 80 IOD 120 O 20 40 60 80 10D 120 TIME (M5) TIME {MS]

FIG. 7. Group F I-F2 trajectories for sip.

SHIP: MALE SHIP: FEMALE

2'O 4'0 •'O 8'0 l•O 130 14D O 20 40 60 BO tO0 120 140 TIME (MS) TIME (MS]

•1700

FIG. 8. Group F I-F2 trajectories for ship.

TABLE I. Group mean slopes for males and females reported in Hz/ms, of transitional segment of trajectories. Values in parentheses are the group standard deviations.

FI F2

Word Males Females Males Females

coat

wax

row

sigh sip ship

-- 1.97(0.33) --2.51(0.55) -- 1.96(0.47) --2.66(0.78) 2.65(0.56) 3.16(0.82) 4.93(0.82) 7.42(1.06)

-- 2.03(0.33) -- 1.98(0.51} -- 2.42(0.48) -- 3.62(0.96)

2.24(0.68) 2.41(1.18) -- 1.79(0.47) -- 2.16(0.42} -- 2.47(0.45) -- 2.49(0.72) 2.43(0.34) 3.89(0.67}

2.77(0.63) 2.45(0.52) -- 3.46(1.3) -- 4.59(I.5) 2.33(0.40) Z02(0.54) -- 4.07(1.3) -- 5.91(2.06)

subjects. In some cases, such as wax, sew, and sigh, the slope differences are quite large. The fact that all slope differences for F2 trajectories favor females suggests that it is a noteworthy observation. A sex effect does not seem to char- acterize the F 1 data.

For many of the trajectory plots, the variability of the slope measures (Table I) seems to be relatively small. We considered it possible that this variability could be reduced even further by subgroupings of trajectories based on overall duration of the vowel nucleus, which varied in a substantial

way within a group. These subgroupings were based on a visual inspection of the trajectory plots, as illustrated in Fig. 9. Starting from the upper left-hand panel, clockwise inspec- tion of these plots for coat shows trajectory clusters of short, medium, and long duration; the panel in the lower left-hand corner displays mean trajectories for each of the duration clusters. Mean trajectories were computed by forcing each trajectory in a cluster to begin at time zero, and then deter- mining averages across the cluster trajectories at successive 5-ms increments. End effects, due to fewer and fewer values

being available for computation toward the end of the trajec- tories, have been eliminated in Fig. 9. The mean trajectories appear to be reasonable summaries of the trajectories in each cluster.

Trajectories in the short cluster for coat ranged in dura- tion between 86-123 ms, in the medium cluster between 137-174 ms, and, in the long cluster, between 192-235 ms. The mean frequency change along the transitional segment increased with the duration of the cluster (X = 112, 227, and 264 Hz for the short, medium, and long clusters, respective- ly); also, the mean starting frequency of the trajectory (the "locus" at release of the/k/) was substantially higher for the long cluster (X= 1350 Hz) than for the medium (X = 1172 Hz) and short (X= 1121 Hz) clusters.

Table II reports F2 locus, transition extent, and trajec- tory duration data for all words according to the temporal clustering described above. The F 1 data were not summar- ized in this way. The selection of three clusters for each word is arbitrary but seemed to be the most reasonable way to partition the trajectories in time. Conclusions drawn from these data must be regarded as suggestive only because of the unequal and small N's in the duration clusters. The columns containing mean duration data indicate that the visual clus- tering was generally successful in that differences in mean

1284 J. Acoust. Sec. Am., Vol. 84, No. 4, October 1988 Weismer et aL: Acoustic signatures 1284

Page 5: The Acoustic Signature for Intelligibility Test Wordsusers.abo.fi/jtuomain/EMCL/articles/Weismeretal1988.pdf · The acoustic signature for intelligibility test words Gary Weismer,

600

! 20O

800. 6OO

COAT-F F2

SHORT CLUSTER

26o 25O

COAT-F F2

WEOIUM CLUSTER

•o •õo 260 TIME

600

•1400 ] •:2ooi

800

800

COAT-F F2 CO^T-F F2 CLUSTER MEANS LONG CLUSTER

•o •6o •o • 2soo •o •6o •õo 260 2so TIUE (MS) TIME [MS)

FIG. 9. The F2 trajectories for females' production of coat, partitioned into short, medium, and long duration clusters. Lower left-hand panel shows the mean trajectories for the three respective clusters. See text for additional details.

trajectory duration across clusters were relatively large when compared to the associated standard deviations. Dif- ferences in mean starting frequencies of trajectories ("loci") were not clearly tied to duration cluster, except in the case of

wax where both subject groups had higher starting frequen- cies for the short cluster and decreasing (female group) or roughly equal (male group) starting frequencies across the medium and long clusters. The transition extent data show a more orderly pattern across clusters, with a clear tendency for the long clusters to be associated with greater transition extents when compared to the short clusters. A monotonic increase in transition extent across the short, medium, and long clusters is observed only for sigh in males and coat and wax for females. Although there are exceptions to this pat- tern, it seems reasonable to expect a transition that spans a greater frequency range when the trajectory is long, as com- pared to short.

Some additional comment should be made about the

data for row. As shown in Fig. 10, some of the row trajector- ies had extensive on-glides, whereas others did not. The du- ration clusters reported in Table II are not partitioned ac- cording to type of trajectory, so the cluster values reported for mean starting frequency and mean transition extent may be confounded by the variable presence of on-glide trajector- ies across the duration clusters. In fact, slopes of the transi- tion segment from the on-glide trajectories were always somewhat shallower than slopes from the remaining row tra- jectories.

The F2 transition slope data for the temporal clusters are reported in Table III. There seems to be little relation- ship between temporal cluster and mean transition slope for males, whereas females have the greatest slope in the short cluster for every word except coat and row. Consistent with the group data reported in Table I, slopes from the female

TABLE lI. Mean duration, mean starting frequency (locus), and mean transition extent (TE) for F2 according to temporal cluster for each word and two groups. The number following each cluster category indicates the N in that cluster; standard deviations. computed only for N> 2, are given in parentheses.

Males Females

Word Cluster ß Duration • Start freq • TE Word Cluster • Duration • Start freq .• TE

coat short 4 120(3) 1019(63) 142(40) coat short 6 113(12) 1121(152) 112(88) med 9 148(9) 994(84) 122(44) med 4 159(16) 1172(151) 227(114) long 2 187(...) 1052('") 163('") long 5 215(18) 1350(216) 264(100)

wax short 2 187(---) 872(---) 670(---) wax short 3 172(2) 1029(264) 1202(398) med 5 248(10) 632(66) 1165(107) med 8 231(16} 793(190) 1441(278) long 8 317(26) 660(119) 1059(232) long 4 354(36) 437(77) 1833(263)

sew short 10 260(26) 1227(92) 374(109) sew short 3 132(35) 1695(243) 678(115) med 2 327(---) 1142(---) 242(---) med 7 282(19) 1477(176) 508(194) long 3 393(21) 1251(44) 375(59) long 5 377(31) 1710(103) 692(135)

row short 6 343(55) 911(130) 233(115) row short 4 386(13) 1083(148) 399(111) med 6 450(23) 943(107) 189(121 ) med 6 430(13) 1168(76) 404(142) long 3 558(50) 948(123) 228(56) long 5 549(29) 1147(181) 407(139)

sigh short 5 335(20) 1350(82) 554(45) sigh short 8 350(20) 1721(111) 937(158) med 9 422(20) 1327(151) 685(133) med 5 424(20) 1711(95) 812(118) long I 539(--- ) 1309(-.. ) 709(..- ) long 2 569('-- ) 1573(--- ) 1153(--- )

sip short I 46(---) 1510(---) 186(---) sip short 9 55(10) 2024(83) 226(71) med 7 64(5) 1537(103) 179(68) med 3 84(3) 2047(38) 210(134) long 7 87(6) 1690(127) 175(69) long 3 105(2) 1971(91) 347(42)

ship short 4 63(5) 1702(118) 239(64) ship short 8 62(8) 2206(157) 445(162) med 10 87(7) 1810(165) 352( 141 ) med 2 84(--. ) 2303(--- ) 241 (---) long I 110(---) 1674(--.) 355('--) long 5 109(7) 2267(140) 552(106)

1285 J. Acoust. Soc. Am., Vol. 84, No. 4, October 1988 WeJsmer el aL: Acoustic signatures 1285

Page 6: The Acoustic Signature for Intelligibility Test Wordsusers.abo.fi/jtuomain/EMCL/articles/Weismeretal1988.pdf · The acoustic signature for intelligibility test words Gary Weismer,

ROW: •ALE ROW: •ALE ON-GLIOE NO ON-GLZOE

] TI•E

FIG. 10. The FI-F2 trajeclories for males' production of roto. separated into on-glide and non-on-glide categories.

clusters are typically greater than slopes from the corre- sponding male clusters.

B. Formant trajectories: Bark scale

Table IV reports slopes of Bark-transformed F2 trajec- tories, as well as a comparison of relative variability (RV) for group slope values expressed in Bark and frequency. Group plots of Bark trajectories for each word are not shown because they provide little information gain relative to the frequency plots in Figs. 2-8. One reason for performing the Bark transforms was to determine if group variability in fre- quency slope might be reduced when the data are computed on a scale approximating the auditory representation of fre- quency. Because the frequency and Bark scales are so differ- ent, variability was compared by determining the respective

TABLE lII. Mean slope estimates ofF2 trajectories partitioned into tempo- ral clusters. The numbers in parentheses following each mean indicate the number of trajectories within the cluster. Values not given when/V = I in cluster.

Word Cluster Males Females

coat short -- 1.86(4) -- 2.43(6) med -- 2.32(9) -- 2.94(5) long -- 1.72(2) -- 2.36(4)

wax short 5.07 ( 2 ) 8.26 ( 3 ) med 5.48(5) 7.06(8) long 4.55(8) 7.51 (4)

sew short - 2.32(10) - 5.21(3) med - 2.11(2) - 3.22(7) long - 2.97(3) - 3.23(5)

row short -- 2.09 ( 2 ) -- 2.24 ( 8 ) med -- 1.86(8) -- 2.43(2) long -- 1.70(5) -- 1.93(5)

sigh short 2.44 ( 5 ) 4.28 (8) med 2.41(9) 3.34(5) long (l) 3.71(2)

sip short ( 1 ) -- 5.03(9) med - 3.41(7) - 3.33(3) long - 3.17(7) -- 4.43(3)

ship short - 4.08(4) - 6.97(8) med -- 4.09(I0) -- 3.61(2) long (I) -- 5.14(5)

coefficients of variation, which are expressed in Table III as percentages.

The Bark slopes appear to be more consistent than the frequency slopes across words, with the exception of the slope for wax, which is disproportionately high in the Bark scale. The unusual Bark slope for wax is a reflection of the steep F2 change within a frequency range where the Bark increments are relatively fine-grained. The nonlinear trans- form between frequency and Bark also explains why some of the differences in absolute frequency slope (e.g., sigh versus sew, females, Table I) are reversed when expressed in Bark. One feature of the frequency slopes--the consistently steeper transitions for females, as compared to males--is generally preserved in the Bark data; that is, the apparent sex effect is not normalized by the "auditory spectra" represen- tation.

The data in Table IV do not indicate that the Bark trans-

form reduces intersubject variability of trajectory slopes rel- ative to variability of frequency slopes. Relative variability appears to be similar for the two scales, with values ranging between 14%-37.5% for males and 14.3%-36.1% for fe-

males. Sip and ship have the greatest relative variability val- ues, whereas sigh, wax, and row (females) have the smallest values.

As stated in the Introduction, the Bark transform was also done to explore the feasibility of eliminating male-fe- male differences in overall trajectory parameters. This would simplify the eventual goal of a stable graphical or statistical summary of normal trajectory characteristics. Unfortunately, examination of F l-F2 Bark plots did not suggest that the Bark transform substantially reduced across-sex variability in formant trajectories. An example of Bark-transformed formant trajectories for the word wax is provided in the top two panels of Fig. 11. Note that the Bark plots suggest the same slope and variability differences across sex that were seen in the frequency scale plots for wax (Fig. 3).

Because the F l-F2 pairings cannot be identified in the top panels of Fig. 11, it is possible that some of the variability differences between males and femmes might be reduced by plotting the time history of the difference between Bark F2 and Bark F 1. Syrdal and Gopal (1986) have claimed that Bark differences between adjacent formants (or between F 1 and F0) may reduce interspeaker variability in spectral characteristics of the steady-state portion of vowels. The bottom two panels of Fig. 11 show theF2-F 1 Bark difference as a function of time for the word wax (Bark differences computed every 5 ms). This plot still shows the greater vari- ability for females, which is especially obvious for the F2-F I Bark differences at the onset of the trajectories. The greater F2 and Bark F2 slopes for females are also preserved in the F2-F 1 Bark difference plots.

C. Segment durations

Segment durations are reported for groups in Table V. The reported values are quite similar across groups, with perhaps a slight tendency toward longer segment durations among the female subjects. The segment durations are cer- tainly greater than those in connected speech samples

1286 J. Acoust. Soc. Am., Vol. 84, No. 4, October 1988 Weisruer ot aL: Acoustic signatures 1286

Page 7: The Acoustic Signature for Intelligibility Test Wordsusers.abo.fi/jtuomain/EMCL/articles/Weismeretal1988.pdf · The acoustic signature for intelligibility test words Gary Weismer,

TABLE IV. Bark-transformed trajectory slopes (Bark/ms) and relative variation (RV: s.d./mean X 100, expressed as a percentage) for F2 group data. RV data given for Bark and frequency slopes; values in parentheses are the group standard deviations in Bark.

Males Females

Word Slope (Bark) RV (Bark) RV (freq) Slope (Bark) RV (Bark) RV (freq)

coat - 0.012(0.003) 24.0 25.0 - 0.016(0.005) 30.5 28.8 wax 0.028(0.005) 16.3 16.7 0.038(0.008) 21.4 14.3 sew - 0.014(0.003) 23.8 19.6 - 0.018(0.005) 28.8 26.6 row - 0.011 (0.003) 29. I 26.5 - 0.012(0.002) 17.5 19.6 sigh 0.010(0.002) 18.1 14.0 0.013(0.002) 16.5 17.2 sip - 0.015(0,005) 33.4 37.5 - 0.015(0.004) 26.1 32.7 ship - 0.016(0.005) 31.6 32.6 - 0.019(0.007) 36.1 35.0

(Umeda, 1975, 1977; Klatt, 1975), but group variability is still quite small for most of the measured segments.

III. DISCUSSION

Syrdal and Gopal ( 1986, p. 1095), when discussing the problem of vowel normalization, have stated that, "...it is essential for studies of speech development or for the evalua- tion of speech produced by special populations to be able to compare speech samples from different speakers, such as children and adults or normal and impaired speakers in a phonetically meaningful way." The work reported here is a preliminary attempt to explore some issues concerning the specification of the acoustic deficit in dysarthric speakers who have intelligibility deficits. A first step toward this goal is the description and summary of selected speech acoustic characteristics of intelligibility test words produced by nor- mal speakers. In the present report, a heavy emphasis has been placed on description, whereas various types of sum- mary have been considered in a more cursory way. The gen- eral utility of these observations is to guide future develop- ment of acoustic signatures of intelligibility test words, as discussed below.

A. The form of an acoustic signature

Two types of acoustic data are described in the current report, one purely temporal (segment durations) and the

•AX: MALE •AX: FEMALE

O 1DO 200 300 4OO 0 lOO 200 300 400

O 100 200 300 400 0 100 200 300 400

FIG. I 1. Bark-transformed F I-F2 trajectories for males' and females' pro- duction of wax (top panels); corresponding F2-F 1 Bark differences as a function of time (bottom panels).

other temporo-spectral (formant trajectories). The form of the temporal data for the acoustic signature would seem to be a fairly straightforward depiction of the actual means and standard deviations. The signature might take the form of the normal range for each segment duration, perhaps cap- tured by an interval spanning plus and minus two standard deviations about the mean, or by an explicitly computed con- fidence interval. The distance ofa dysarthric segment dura- tion from the signature might then be indexed as a specific number of milliseconds from the boundary of the signature, or by some categorical index. Specific temporal distances have been used previously to index phonetic dissimilarity between two phonemes (Monsen, 1976), but categorical in- dices have apparently not been used for a corresponding pur-

TABLE V. Mean segment duration data for groups: N= 15 for all values; the number in parentheses is the standard deviation.

Word Males Females

coat

VOT 86(21) 85(23) /au/ 149(23) 162(40) /t/ 119(7) 112(29)

wax

/wae/ 281(54) 273(66) /k/ 94(22) 124(32) /s/ 272(54) 299(56)

sew

/s/ 199(44) 201(37) /ou/ 300(63) 297(70)

row

/rou/ 446(89) 478(76)

sigh /s/ 179(42) 204(50) /al/ 405(62) 410(79)

sip /s/ 201(72) 219(31) /i/ 76(15) 77(23) /p/ 145(33) 157(34)

ship /f/ 176(57) 203(52) /•/ 90(18) 88(25) /p/ 133(38) 166(46)

1287 J. Acoust. Soc. Am., Vol. 84, No. 4, October 1988 Weismer ot a/.: Acoustic signatures 1287

Page 8: The Acoustic Signature for Intelligibility Test Wordsusers.abo.fi/jtuomain/EMCL/articles/Weismeretal1988.pdf · The acoustic signature for intelligibility test words Gary Weismer,

pose. Part of the difficulty in developing such an index in signature analysis, where number codes would represent cat- egories like "close," "somewhat different," "very different," and so forth, is the uncertainty about the relationship of the codes to the perception of segment durations. Although these relationships may be equally unclear for specific tem- poral differences, whatever nonlinearities may exist between the measurement of time and perception of segment dura- tion would still be captured by this approach, albeit in a somewhat inefficient way.

The question of what form the trajectory signatures should take is substantially more complicated. Issues that should be considered include, at a minimum: ( 1 ) how the frequency scale should be represented; (2) how the temporal variability of trajectories across speakers should be handled; and (3) how sex differences should be accounted for.

1. Frequency representation

An optimal frequency representation for trajectories would be one that greatly reduced interspeaker differ- ences-•especially across sex--and that provided the best predictive framework for relating acoustic measurements to speech intelligibility. Comparison of frequency- and Bark- scaled trajectories for the seven words described above failed to show a clear reduction in variability of Bark-scaled trajec- tory characteristics. The greater interspeaker variability of females, as compared to males, as well as greater transition slopes among females were observed in both frequency- and Bark-scaled trajectories. When F 2-F 1 Bark differences were computed for a selected word, there was little evidence that interspeaker variability was reduced relative to the frequen- cy or Bark trajectories. This latter result should not neces- sarily be seen as inconsistent with the findings of Syrdal and Gopal (1986), who claimed that Bark differences greatly reduce interspeaker variability. $yrdal and Gopal's claim was based on a discriminant analysis in which three Bark differences (F1-F0, F2-F1, F3-F2) were the variables used to classify speaker group (men, women, and children). The example shown above was based only on a single Bark difference. It may be the case that a three-dimensional plot, with time as one dimension and two Bark differences as the

remaining dimensions, would more clearly show a reduction of interspeaker trajectory variability.

Even though the Bark transform did not clearly reduce interspeaker variability of trajectories, there may be good reason to develop the trajectory signatures within the frame- work of the Bark scale. Because the purpose of the acoustic signatures is to provide a physical link to an intelligibility score--a measure based on auditory analysis--the eventual difference scores that would index the difference between a

dysarthric trajectory and the signature might be best ex- pressed in dimensions consistent with auditory analysis. The reversal or enhancement of certain slope differences when frequency data are transformed to Bark (see Sec. II) sup- ports this view.

2. Temporal representation

When trajectories were partitioned according to overall duration, the only systematic effect was an increase in transi-

tion extent with increased trajectory duration. This would explain, in part, the lack of a clear influence of trajectory duration on transition slope. It may be that a more produc- tive clustering in the temporal dimension would be based on transition duration, which coyaries strongly but not perfect- ly with overall trajectory duration (Weisruer et al., 1985). The lack of strict prediction between transition and trajec- tory durations may explain why the females tended to have the greatest slopes in the short clusters, but still showed the increase in transition extent with increased trajectory dura- tion.

Because we feel that transition slope measures will be useful as a predictor of speech intelligibility deficits (see footnote 3), we would reject a simple time normalization which would tend to distort slope relationships between tra- jectories. In principle, nonlinear time norrealizations could eliminate temporal variance in the trajectory signature while preserving important spectral features, but the development of such normalization procedures would probably not be cost effective. A slope signature would partially normalize the frequency- (or Bark-) time relationship and is more directly interpretable in articulatory terms than temporo- spectral measures derived from normalized trajectories.

3. Sex differences

The statistical or graphical basis of an acoustic signature would presumably become more stable as additional sub- jects were added to a data base. At the outset of this study, it was hoped that temporal and temporo-spectral differences between the sexes would be minimal or subject to straight- forward normalization schemes, thus allowing the data to be pooled. The data suggest, however, that signatures based on data pooled across sex appear to be reasonable only for seg- ment durations. Interspeaker variability of formant trajec- tories was consistently greater in the female group than in the male group, as were the slopes of transitional segments. Although we have no obvious explanation for the relatively large variability of trajectory characteristics across female subjects, it is perhaps worthwhile to note our impression that many of the females seemed to use a much more formal style of pronounciation than the males. It is possible that greater variation of speaking style within the female group resulted in the greater variability in formant trajectory characteris- tics. Whereas this is a testable hypothesis (see the approach of Picheny et al., 1985), it seems more prudent to construct the trajectory signatures separately for males and females than to study ways in which style factors (if the hypothesis is correct) might be controlled for clinical intelligibility tests.

The consistently greater transition slopes observed for females, as compared to males, is probably a dynamic reflec- tion of the sex-related differences in vocal tract size. If it is

assumed that the average change in vocal tract geometry for a particular vocal tract gesture is similar in magnitude and time for males and females, females should have consistently greater slopes simply by virtue of their shorter vocal tracts. However, because we know of no kinematic data that bear explicitly on possible differences between males and females for the same articulatory sequences, we cannot rule out the

1288 d. Acoust. Soc. Am., Vol. 84, No. 4, October 1988 Weisruer ot a/.: Acoustic signatures t 288

Page 9: The Acoustic Signature for Intelligibility Test Wordsusers.abo.fi/jtuomain/EMCL/articles/Weismeretal1988.pdf · The acoustic signature for intelligibility test words Gary Weismer,

possibility that females produce articulatory transitions dif- ferently from males.

B. Criteria for "good" acoustic signatures

Once the form of an acoustic signature is selected, the criteria for a "good" acoustic signature should be consid- ered. This is important because it would be inefficient to develop acoustic signatures for all 98 words in our intelligi- bility list. By a good signature, we mean one that ( 1 ) shows reasonably small interspeaker variabilities for the measures of interest and (2) can be shown to be sensitive to the acous- tic deficits associated with dysarthric speech production. Ideally, we would like to have a small subset of words that met both these criteria and could be used to predict general speech intelligibility deficits with good accuracy. It may not be realistic, however, to develop a single subset of words that would be equally effective for the various dysarthrias that are often associated with different kinds of articulatory dis- order.

1. The variability criterion

Whereas certain segment durations show a good deal more variability than others (see Table V), some of that variability may be associated with articulatory events that are relatively unimportant for our purposes. For example, the duration of the vocalic segment in words with open sylla- bles (row, sigh, sew) is typically more variable than the voca- lic segment in CVC words (such as sip and ship). Much of the variation in the vocalic segments of open syllables derives from varying lengths of the relatively steady-state portions, which do not figure prominently in our scheme of predicting intelligibility from selected acoustic measures. We, there- fore, would not reject the use of open syllable words for sig- nature development because their vocalic durations tend to be variable across speakers; in fact, as suggested below, such words may be more stable for certain measures that are prominent in our approach.

The variability criterion for selecting good signatures may be more profitably applied in the case of the slope mea- sures. The relative variability data in Table IV suggest that words with presumably simple articulatory gestures throughout the vocalic segment (sip and ship) have slopes that are a good deal more variable across speakers than words with more complicated articulatory gestures (see, especially, wax and sigh). It may be that the latter words, whose rapidly changing vocal tract geometries may actually be the stable elements in the articulatory stream (Kent, 1983), would be preferred for signature development over words like ship, where the' highly variable slope measures cast doubt on the representativeness of a signature. Addi- tional work is required to determine if the pattern of relative variability data seen in Table III is observed for a larger set of words that sample different kinds of articulatory events. There may also be words, such as row (see Fig. 10), that are not good candidates for signature development because of varying styles of production that do not affect phonetic ac- ceptability but do influence the key acoustic measures.

2. The sensitivity criterion

The application of the signatures in a research setting would involve a comparison of acoustic measures derived from a patient's production (s) of a test word with the nor- mal signature for that word. A "good," or productive, signa- ture will be one that shows only some of the measures de- rived from dysarthric speakers to be captured by the signature. The measures that are not captured should be the ones that have an important role in determining speech intel- ligibility deficits. An example of this is given in Table VI, which reports segment duration and F 2 transition slope data for 18 ALS subjects. Values above the upper limit of the signature are indicated by an asterisk, and values below the lower limit are indicated by a double asterisk. The zeros for segment duration indicate that the segment (either the stop closure for/k/, or the frication for/s/) was not observed, and therefore could not be measured. These data produce the desired effect, that is, separating out certain measures that are not captured by the signature. The task for the fu- ture is to link this pattern of results with the speech intelligi- bility scores for each subject.3

IV. SUMMARY AND CONCLUSIONS

The purpose of the present study was to examine certain acoustic measures that may eventually be used to account for, and predict, intelligibility deficits in dysarthric speakers. Formant trajectories and segment durations were examined for 15 normal male and female geriatric individuals. The speech material was a subset of words from an intelligibility

TABLE VI. Values for segment duration (ms) and F2 slopes (Bark/ms) for 18 males having ALS who spoke the word "wax." Signature limits are given at bottom of table. Intelligibility data are expressed as percentages.

ALS

Subject wae k s Slopes Intelligibility

1 329 101 162 b 0.0381 • 85 2 411 • 111 238 0.0369 93 3 344 109 145 b 0.0459 • 94 4 535 • 0 b 0 • 0.0260 65 5 446 a 113 198 0.0331 95 6 506 a 74 371 0.0394 a 83 7 710 • 0 • 0 • 0.0259 47 8 204 85 225 0.0326 93

9 259 138 259 0.0507 • 99

10 340 89 311 0.0496 • 93 11 256 144 a 141 • 0.0294 95 12 358 0 • 294 0.0419 • 63 13 238 100 187 0.0195 97 14 468 • 0 b 0 • 0.0449 a 45 15 453 a 90 257 0.0289 99

16 826 a 230 • 0 • 0.0454 • 79 17 348 75 247 0.0237 94

18 360 100 225 0.0349 97

waelimits = 1734 389 klimits = 504 138

slimits = 1644 380

F2 slopelimits = 0.01840.038

limits based on mean ñ 2 s.d.

Exceeds upper signature boundary. Falls below lower signature boundary.

1289 J. Acoust. Sec. Am., Vol. 84, No. 4, October 1988 Weismer et.a/: Acoustic signatures 1289

Page 10: The Acoustic Signature for Intelligibility Test Wordsusers.abo.fi/jtuomain/EMCL/articles/Weismeretal1988.pdf · The acoustic signature for intelligibility test words Gary Weismer,

test designed to sample a variety of phonetic contrasts. Our examination of the data suggested certain ways in which the acoustic measures might be best used in representing "nor- mal" speech production behavior. We call these representa- tions "acoustic signatures" and discuss various ways in which the signatures might be configured and applied. Our conclusions are that: ( 1 ) not all intelligibility words would be equally good candidates for signature development; (2) the signatures for segment durations can probably be pooled across sex, but signatures for trajectories should not be pooled because of intersex differences in trajectory variabil- ity and slope; (3) whereas the sex differences described in (2) do not seem to be eliminated when the frequency data are transformed to Bark, the Bark scale is preferred for sig- nature development because distances between a dysarthric acoustic measure and the normal signature can be expressed in auditory terms; and (4) the ultimate criterion for good- ness of a signature is its sensitivity to the speech production characteristics of dysarthrie speakers. An example of how a signature might be used with dysarthric speech samples is presented to demonstrate the direction of our work. The continued development of signatures should also account for measures not studied here, such as obstruent noise spectra and fundamental frequency contours. More generally, any measure that may contribute to an intelligibility deficit may have to be studied for consideration of signature develop- ment.

ACKNOWLEDGMENTS

We would like to thank Jane F. Kent, Kim Corbin, Ben-

jamin Brooks, Robert Sufit, Yvonne Sinsset, and Jay Rosen- bek for assistance with various aspects of this study. Don Robin, Betty Tuller, and one anonymous reviewer made valuable comments on an earlier version of the manuscript. The work reported herein was supported by NIH Award •pNS 22458.

qn a few cases, more than one interval met the criterion for a transitional segment within a single trajectory. In these cases, the steepest slope was used in the analysis.

:This statement is based on our previous observations of formant trajector- ies from repeated productions by normal speakers (Weismer and Kimel- man, 1985; Weismer eta!., 1985; Weismet eta!., 1986). In each of these studies, the within-subject repetition variability of forrnant trajectory char- acteristics was far smaller [han the variability observed across individuals in the current study. These previously reported data made use of the same measurement criteria as the present study but were limited to male speak- ers.

-'One of the long-term goals of the present research program is to develop a multiple regression model of intelligibility in dysarthric speakers, using acoustic measures as predictor variables (see Monsen, 1978, for a similar approach in hearing-impaired speakers). As suggested in the Introduction. we consider it likely that formant trajectory characteristics would be an important component of such a model. To lend some preliminary empirical support to this assumption, we computed a simple correlation coefficient between overall intelligibility (as assessed by the test described in Kent et al., in press), and the absolute value of the mean F2 slope across 12 words from the intelligibility test (the seven words reported on above plus hail, shoot. ate, cash, and blend. This correlation was computed for 25 speakers with ALS, including the 18 speakers from Table VI; the 12 words were used in this analysis because those are the ones for which complete formant tra- jectory analyses aye available. The resulting correlation was 0.76, which we take as reasonable evidence that an aYerage measure of formant trajectory slope would be a productive predictor in a regression model of intelligibil- ity. We emphasize the average value of the measure•tbat is, computed

across a set of words in the test--because some words may not show a relationship between F2 slope and speech inte!iigibility that would be con- sistent with the more global tendency. For example, the data in Table VI suggest that none of the ALS speakers had an F2 slope for wax that was less than the normal range, but several ALS speakers had F2 slopes that exceed- ed the normal range. We are unsure how to interpret this latter fact, but it may be that some ALS speakers attempt to compensate for a weakened tongue by producing relatively large displacements and velocities of the jaw in words that require a substantial change in the inferior-superior di- mension of the vocal tract. Moreover, the F2 slope for wax is also likely to be strongly affected by labial function, unlike most of the other words stud- ied here.

Ansel, B. A. (1985). "Acoustic predictors of speech intelligibility in cere- bral palsied dysarthrie adu Its," unpublished doctoral dissertation, Uni- versity of Wisconsin--Madison, Madison, W1.

Broad, D. J., and Clermont, F. (1987). "A methodology for modeling vow- el formant contours in CVC context," J. Acoust. Soc. Am. 81, 155-165.

Dariey, F. L., Aronson, A. E., and Brown, J. R. (1969a). "Clusters of d.e- viant speech dimensions in the dysarthrias," J. Speech Hear. Res. 12, 462-496.

Darley, F. L., Aronson, A. E., and Brown, $. R. (1969b). "Differential diagnostic patterns of dysarthria," $. Speech Hear. Res. 12, 246-269.

Darley, F. L., Aronson, A. E., and Brown, J. R. (1975). M•torSpeech Dis- orders (Philadelphia, PA).

Hardcastle, W. J., Morgan Barry, R. A., and Clark, C. J. (1985). "Articula- tory and voicing characteristics of adult dysarthric and verbal dyspraxic speakers: An instrumental study," Br. J. Disord. Commun. 20, 249--270.

Kent, R. D. (1983). "Segmental organization of speech," in The Production of Speech, edited by P. F. Macoeilage (Academic, New York), pp. 57-89.

Kent, R. D., and Burkhard, R. (1981). "Changes in the acoustic correlates of speech production," in Aging: Communication Processes and Disor- ders, edited by D. S. Beasely and G. A. Davis (Grune & Stratton, New York}, pp. 47-62.

Kent, R. D., and Netsell, R. (1975). "A case study of an ataxic dysarthric: Cineradiographic and spectrographic observations," J. Speech Hear. Disord. 40, 115-134.

Kent, R. D., Netsell, R., and Bauer, L. L. (1975). "Cineradiographic as- sessment of articulatory mobility in the dysarthrias," J. Speech Hear. Disord. 40, 467-480.

Kent, R. D., Weismet, G., Kent, J. F., and Rosenbek, J. C. (in press). "To- ward explanatory intelligibility testing in dysarthria," J. Speech Hear. Disord.

Klatt, D. H. (1975). "Vowel lengthening is syntactically determined in a connected discourse," L Phon. 3, 129-140.

Klatt, D. H. (1976). "Linguistic uses of segmental duration in English: Acoustic and perceptual evidence," J. Acoust. Soc. Am. 59, 1208-1221.

Lindblom, B. (1978). "Phonetic aspects of linguistic explanation," Stud. Linguist. XXXll, 137-153.

Monsen, R. B. (1976). "The production of English stop consonants in the speech of deaf children," J. Phon. 4, 29-41.

Moosen, R. B. (1978). "Toward measuring how well hearing-impaired children speak," $. Speech Hear. Res. 21, 197-219.

Peterson, G. E., and Lehiste, I. (1960). "Duration. ofsyllabi½ nuclei in Eng- lish," 5'. Acoust. Soc. Am. 32, 693-703.

Picheny, M. A., Duriach, N. l., and Braids, L. D. ( 1985 ). "Speaking clearly for the hard of hearing I: Intelligibility differences between clear and con- versational speech," J. Speech Hear. Res. 28, 96-103.

Syrdai, A. K., and Gopal, H. S. (1986). "A perceptual model of vowel rec- ognition based on the auditory representation of American English vow- els," ]. Acoust. Soc. Am. 79, 1086--1100.

Tikofsky, R. S. (1970). "A revised list for the estimation of dysarthric sin- gle word intelligibility," J. Speech Hear. Res. 13, 59-64.

Tikofsky, R. S., and Tikofsky, R. P. (1964). "Intelligibility measures of dysarthric speech," J. Speech Hear. Res. 7, 325-333.

Umeda, N. (1975). "Vowel duration in American English," J. Acoust. Soc. Am. 58, 42 •. •.•.5.

Umeda, N. (1977). "Consonant duration in American English," J. Acoust. Soc. Am. 61, 846-858.

Weismer, G. (1984). "Articulatory characteristics of Parkinsonian sarthria," in The Dysarthriag' Physiology-Acoustics-Perception-Manage- ment, edited by M. R. McNeil, J. C. Rosenbek, and A. Aronson (College- Hill, San Diego, CA), pp. 101-130.

Weismer, G., and Fromm, D. (1983). "Acoustic characteristics of geriatric

1290 J. Acoust. Soc. Am., Vol. 84, No. 4, October 1988 Weismer ot al.: Acoustic signatures 1290

Page 11: The Acoustic Signature for Intelligibility Test Wordsusers.abo.fi/jtuomain/EMCL/articles/Weismeretal1988.pdf · The acoustic signature for intelligibility test words Gary Weismer,

utterances: Segmental and nonsegmental characteristics which relate to laOJngeal function." in Vocal Fold Physiology, edited by D. M. Bless and J. H. Abbs (College-Hill. San Diego, CA), pp. 317-332.

Weismet, G., and Kimelman, M.D. Z. (1985). "Vowel acoustics in Parkin- sonJan dysarthria," J. Acoust. Soc. Am. SuppL ! T7, S87.

Weisruer, G., Kimelman, M.D. Z., and Gorman, S. (1985). "More on the speech production deficit associated with Parkinson's disease," J.

Acoust. Soc. Am. Suppl. 1 78, S55. Weisruer, O., Mulligan, M., and DePaul, R. (1986}. "Selected acoustic

characteristics of the dysarthria associated with amyotrophic lateral scle- rosis," paper presented at the 3rd Clinical Dysarthria Conference, Tuc- son, AZ.

Yorkston, K. M., and Beukelman, D. R. (1981)..4s•essmentoflntelligibil- ity ofDj•sarthric Speech(C. C. Publications, Tigard, OR).

1291 J. Acoust. Soc. Am., Vol. 84, No. 4. October 1988 Weisruer etaL: Acoustic signatures 1291