Fundamental frequency stability characteristics of elderly women’s voices

Fundamental frequency stability characteristics of elderly women's voices

Sue Ellen Linville and Edward W. Korabic

Department of Speech Pathology and Audiology, Marquette University, Milwaukee, Wisconsin 53233

(Received 22 January 1986; accepted for publication 9 December 1986)

The purpose of this investigation was to gather information on how much variability on measures of jitter and fundamental frequency standard deviation (Fo s.d. ) can be expected within individual elderly women when phonating sustained vowels "as steadily as possible." Fifteen repeat productions of the vowels/i/,/a/, and/u/from 18 elderly women (69-90 years) were analyzed for Fo s.d. and jitter. Results indicate that intraspeaker variability on jitter and Fo s.d. measures in elderly women's sustained vowel productions can be quite considerable in some cases. This is a factor which needs to be considered in establishing normative data on elderly speakers' vocal capabilities.

PACS numbers: 43.70.Fq, 43.70.Dn

INTRODUCTION

Previous research has suggested that elderly speakers generally demonstrate reduced stability of fundamental frequency (Fo) in comparison with young speakers (Wilcox, 1978; Stoicheft, 1981; Linville and Fisher, 1985a,b). In addition, studies which have examined frequency stability from sustained vowels produced "as steadily as possible" have indicated that elderly speakers not only demonstrate reduced frequency stability but also show more variability on frequency stability measures than young speakers (Wilcox, 1978; Linville and Fisher, 1985a, b). However, since these studies have been confined to a single production from each speaker, little information is available on the variability individual elderly speakers demonstrate when attempting to phonate "as steadily as possible." This information is critical since high intraspeaker variability on frequency stability measures would necessitate multiple samples from any given speaker in order to assure the validity of a sample selected as "most stable." The purpose of this investigation was to gather information on how much variability on measures ofjitter and fundamental frequency standard deviation (Fo s.d. ) can be expected within individual elderly women when phonating sustained vowels "as steadily as possible."

I. METHOD

A. Speakers

Speakers were 18 elderly women ranging in age from 69-90 years (mean = 75.58, s.d. = 5.95). Speaker selection was restricted to individuals who had never smoked and who

had no professional singing or acting experience. Speakers had no history of voice disorders, or present complaint of voice disorders. In addition, all speakers were judged (by a speech pathologist) to have normal vocal quality. All the ladies were physically active and participating members of the community. None displayed a history of neurological or chronic respiratory disease. Only one subject was taking medication (Prednisone) at the time of testing.

B. Recording procedures

Speakers were recorded individually in a quiet room using a high-quality reel-to-reel audio tape recorder (Sony-- model TC630). Prior testing of the tape recorder revealed the frequency response to be within specifications.

Speakers were required to sustain the vowels/i/,/a/, and/u/, as steadily as possible, for a minimum of 3 s, at a comfortable pitch and intensity level. Twenty-five repeat trials of each vowel were tape recorded at a mouth-to-micro- phone distance of 6 in. Vowel order was randomized across speakers. Prior to recording each vowel group, the speaker practiced producing that vowel "as steadily as possible" by monitoring productions on the CRT of a Visi-Pitch (Kay Elemetrics•model 6087). -

C. Acoustic analysis

The fundamental frequency stability analysis system and the reliability and validity of the system have been reported previously (Korabic et al., 1986). Briefly, the output of the tape recorder was inputed to a Visi-Pitch (Kay Elemetrics•model 6087) which extracted the cycle-to-cycle fundamental frequency of the voice sample. A microprocessor (TRS-80 model 1 with expansion interface) sampled the extracted frequency information from the Visi-Pitch (square wave output from L.A.P. connector on the front panel) and stored the sampling information in memory at a two consecutive memory location as a 2-byte word. The sampling rate of the microprocessor was determined to be approximately 53.7 kHz (i.e., sample period of 18.602 ps). To reduce measurement error, the sampling rate was artifac- tually increased by playing back the recorded speech samples at half speed. The result was an effective sampling rate of approximately 107.5 kHz (i.e., sample period of 9.301 ps).

Sampling information for 100 consecutive cycles was retrieved from memory and period and Fo computed for each cycle. The absolute difference in duration between adja- cent cycles (jitter) was then determined along with mean period, mean jitter, mean F o, percent jitter (mean jitter di-

1196 J. Acaust. Sac. Am. 81(4), April 1987; 0001-4966/87/041196-04500.80; ¸ 1987 Acaust. Sac. Am.; Letters to the Editor 1196

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 150.135.239.97 On: Wed, 17 Dec 2014 23:00:25

vided by mean period times 100), and Fo s.d. The tape recorder was calibrated periodically to assure

that it was free from hum fields, tape drive flutter, and other problems that would interfere with accurate jitter analysis. Analysis of jitter-free square waves from a function gener- ator (Tektronix--model FG 501 ) revealed that the microprocessor measured the extracted frequency information from the Visi-Pitch with a resolution of 9.301/•s, which cor- responds to one sampling period. In addition, reliability of the analysis system was tested by six repeat analyses of the same taped voice segment and found to be acceptable (within ___ one sample period ) .

II. RESULTS

In Fig. 1, jitter values (%) are plotted for each speaker, for each of the three vowels. Each point represents a speaker's mean value for her 15 steadiest productions of that vowel. Variability for each speaker around the mean (her standard deviation for the 15 productions) is indicated by lines extending from the points. Speakers demonstrated the high- est jitter when producing/a/(2.35% ), an intermediate level when producing/u/(1.33%), and the least jitter when producing/i/(1.25%). The t tests indicated that differences between /a/ and /u/ and /a/ and /i/ were significant ( p < 0.01 ). The difference between/i/and/u/was not significant. In addition, significant heterogeneity of variance was present (Fmax -- 3.55, p <0.01), with /i/ showing greater variability in jitter (s.d.--2.03) than /a/ (s.d.- 1.60) and/u/(s.d.- 1.08).

From visual examination of the data in Fig. 1 it is not apparent that/i/has the greatest jitter variability of the

three vowels. In fact, it appears that, with the exception of subject 18,/i/tended to have quite low variability. When speaker 18 was eliminated from the analysis, heterogeneity of variance remained (Fmax = 7.00), but/a/showed the greatest variability (s.d. = 1.52), followed by /u/ (s.d. = 0.99) and/i/(s.d. = 0.57). Mean jitter differences for the three vowels also were altered, with the mean for/i/ (M = 0.80% ) falling to a level significantly lower than/u/ (M = 1.21% ). Jitter levels for/a/(M = 2.21% ) remained significantly higher than the other two vowels. Findings from the analysis on subjects 1-17 are in keeping with results reported previously for speakers 1-10 (Linville and Kora- bic, 1985).

The Fo s.d. values (in semitones) are displayed in Fig. 2. Because of a high correlation between Fo s.d. and jitter (r = 0.90), findings for these measures tended to parallel one another. Results of the t-test analysis including all 18 speakers indicated that Fo s.d. values were significantly higher for/a/(Mean = 0.360), than for/u/(Mean = 0.236), and/i/(0.225 ) ( p < 0.01 ). The difference between/i/and /u/was not significant. Once again, significant heterogeneity of variance was present among the three vowels (gma x =2.86, p<0.01), with /i/ (s.d. =0.23) and /a/ (s.d. = 0.23) showing greater variability than /u/ (s.d. = 0.14). With subject 18 eliminated from the analysis, Fo s.d. values for/a/(M = 0.348) were significantly higher than/u/ (M = 0.225) which, in turn, were significantly higher than/i/(M = 0.175 ). Heterogeneous variances still were evident (Fma x = 9.46, p <0.01 ), with/a/showing the greatest variability (s.d. = 0.23) followed by /u/ (s.d. =0.13) and/i/(s.d. =0.07).

8.400'

7.200

6.600

o•.• 6.000 5.400'

4.800'

4.200'

3.000'

2.400'

1.800'

1.200'

0.600'

T 1-

T

'1 i• - - i• •- .,.

ß ...% ,i i i i i. i i ' i ] [ i i I Sl s2 s3 s4 s5 s6 s7 s8 s9 s10 Sll s12 s13 s14 515

]-

a a .L

•a T

I ,I S16 S17 S18

FIG. 1. Jitter values (%) for each speaker for the vowels/i/,/a/, and/u/. Each point represents a speaker's mean value for her 15 steadiest productions. One standard deviation in either direction is indicated by lines.

1197 J. Acoust. Soc. Am., Vol. 81, No. 4, April 1987 Letters to the Editor 1197


1-

'• a U T i -ra

II I I I I I I I I I I I I S1 S2 S3 S4 S6 S6 S7 S8 S9 S10 Sl 1 S12 S13

u i T a T " T! I u .L T

" .L T-,. J..L

i

I I I I I S14 S15 S16 S17 S18

FIG. 2. The Fo s.d. values (ST) for each speaker for the vowels/i/,/a/, and/u/. Each point represents a speaker's mean value for her 15 steadiest productions. One standard deviation in either direction is indicated by lines.

Interestingly, both jitter ( % ) and F o s.d. were correlat- ed significantly ( p < 0.01 ) with mean F o in these speakers (jitter r = -0.36; Fo s.d. r = --0.30). In other words, there was a tendency for frequency instability to increase as mean Fo decreased, despite the fact that both the frequency stability measures are adjusted to take into account the mean F o level. This correlation might be explained by the fact that the low vowel/a/showed the greatest frequency instability and consistently was produced by speakers at a significantly lower mean Fo level (M = 197 Hz) than/i/(M = 218 Hz) and/u/(221 Hz). Findings of lower mean F o values for low vowels have been reported previously (House and Fair- banks, 1953; Lehiste and Peterson, 1961; Wilcox and Horii, 1980).

Another question of some interest in this study con- cerned the extent to which frequency stability measures might vary as a function of the point in time during the midportion of phonation when sampling occurred. A discrimi- nant analysis was performed to determine whether the first 50 cycles analyzed could be differentiated from the second 50 cycles on the basis of the frequency stability measures. Jitter and Fo s.d. measures did not differ significantly in the two sections analyzed, suggesting that frequency stability measures obtained from the midportion of a sustained vowel do not differ in a systematic manner as a function of the particular segment sampled.

III. DISCUSSION

It is clear from Figs. 1 and 2 that this group of 18 elderly ladies was extremely heterogeneous in both the magnitude of frequency instability displayed as well as the variability demonstrated on these measures. It is difficult to speculate on precise factors responsible for these differences, other than to note that chronological age did not appear to be responsible. In fact, chronological age showed a small negative correlation with both jitter (r= -0.22) and F o s.d. (r = -- 0.13) indicating a slight tendency for frequency to

be more stable in the older ladies in this sample. Health his- tories of these women also provide little insight into these results, being strikingly unremarkable for all subjects.

With notable exceptions in individual speakers, intraspeaker variability tended to be greatest on the low vowel /a/, with less variability on high vowels/i/and/u/. Of the two high vowels,/i? most consistently showed less variability. The low vowel/a/consistently showed higher jitter and F o s.d. levels than the high vowels/u/and/i/. Of/i/and /u/, /i/ tended to show lower jitter and Fo s.d. levels. These findings suggest that ?i/would tend to require fewer trials to obtain a valid estimate of optimal Fo stability performance than the other vowels investigated.

Differences in jitter magnitude among these three vowels have been reported previously, although results are equivocal (Horii, 1980; Wilcox and Horii, 1980; Sorenson and Horii, 1983). While there is a consistent trend for/u/to show low levels of jitter in all the studies, findings for/i/and /a/differ considerably from report to report. Comparisons among the studies are hampered by differences in the instru- mentation used to assess jitter magnitude as well as differences in the population studied, and the task required of subjects. Furthermore, it is possible that conflicting reports of jitter magnitude differences among vowels have resulted, in part, from sampling error.

Overall results of this investigation indicate that intraspeaker variability on jitter and F o s.d. measures in elderly women's sustained vowel productions can be quite considerable in some cases, even when speakers are asked to phonate "as steadily as possible." The variability observed in this study is even more striking given the fact that only the 15 steadiest of 25 total "maximally steady" productions were included in the analysis. These results suggest that a large number of trials (perhaps 15 or more) need to be obtained from an elderly speaker before attempting to select the steadiest production for that speaker. The exact number of trials necessary to ascertain the steadiest production would vary from speaker to speaker. Of course, intraspeaker vari-

1198 J. Acoust. Soc. Am., Vol. 81, No. 4, April 1987 Letters to the Editor 1198


ability may well be considerably less in young speakers. Re- search is currently underway in our laboratory to determine the extent to which variability on frequency stability measures is age related.

ACKNOWLEDGMENTS

We would like to express our thanks to Deb Fabrycki, Katie Bronson, Cathy Armstrong, Nancy Bloom, and Peg Paulus for assistance recording subjects and analyzing data for this study.

Horii, Y. (1980). "Vocal shimmer in sustained phonation," J. Speech Hear. Res. 23, 202-209.

House, A., and Fairbanks, G. (1953). "Influence of consonantal environ- ment on secondary characteristics of vowels," J. Acoust. Soc. Am. 25, 105-113.

Korabic, E., Kuos, S., and Linville, S.E. (1986) "Analysis ofpitch pertur- bation (jitter) using a Visi-Pitch and a microprocessor," J. Comput. Us- ers Speech Hear. 2 (2), 86-93.

Lehiste, I., and Peterson, G. (1961). "Some basic considerations on the analysis on intonation," J. Acoust. Soc. Am. 33, 419-425.

Linville, S.E., and Fisher, H. (1985a). "Acoustic characteristics of women's voices with advancing age," J. Gerontol. 40, 324-330.

Linville, S.E., and Fisher, H. (1985b). "Acoustic characteristics of per- ceived versus actual vocal age in controlled phonation by adult females," J. Acoust. Soc. Am. 78, 40-48.

Linville, S.E., and Korabic, E. (1985). "Fundamental frequency stability characteristics of elderly women's voices," Paper presented to the Ameri- can Speech-Language-Hearing Association, Washington, DC.

$orenson, D., and Horii, Y. (1983). "Frequency and amplitude perturba- tion in the voices of female speakers," J. Commun. Disord. 16, 57-61.

Stoicheft, M. (1981). "Speaking fundamental frequency characteristics of nonsmoking female adults," J. Speech Hear. Res. 24, 437-441.

Wilcox, K. (1978). "Speaker age and vowel differences of vocal jitter in sustained phonation," M.A. thesis, Purdue University, Indiana.

Wilcox, K., and Horii, Y. (1980). "Age and changes in vocal jitter," J. Gerontol. 35, 194-198.

Effects of acoustic modification on consonant recognition by elderly hearing-impaired subjects

Sandra Gordon-Salant

Department of Hearing and Speech Sciences, University of Maryland, College Park, Mary/and 20742 ß

(Received 19 May 1986; accepted for publication 24 November 1986)

In a recent study [S. Gordon-Salant, J. Acoust. Soc. Am. 80, 1599-1607 (1986) ], young and elderly normal-hearing listeners demonstrated significant improvements in consonant-vowel (CV) recognition with acoustic modification of the speech signal incorporating increments in the consonant-vowel ratio (CVR). Acoustic modification of consonant duration failed to enhance performance. The present study investigated whether consonant recognition deficits of elderly hearing-impaired listeners would be reduced by these acoustic modifications, as well as by increases in speech level. Performance of elderly hearing-impaired listeners with gradually sloping and sharply sloping sensorineural hearing losses was compared to performance of elderly normal-threshold listeners (reported previously) for recognition of a variety of nonsense syllable stimuli. These stimuli included unmodified CVs, CVs with increases in CVR, CVs with increases in consonant duration, and CVs with increases in both CVR and consonant duration. Stimuli were presented at each of two speech levels with a background of noise. Results obtained from the hearing-impaired listeners agreed with those observed previously from normal-hearing listeners. Differences in performance between the three subject groups as a function of level were observed also.

PACS numbers: 43.71.Ky, 43.71.Es, 43.66.Sr

INTRODUCTION

A variety of techniques have been described in which the speech signal is transformed acoustically to improve speech intelligibility. Elderly listeners are potential users of speech enhancement techniques because of the prevalence of their speech recognition problems (Dubno et al., 1984; Jo- kinen, 1973). One speech enhancement scheme incorporates some of the acoustic modifications found in naturally produced "clear speech," including increases in the consonant energy relative to the vowel energy (consonant-vowel ratio, or CVR) and increases in the duration of individual speech sounds (Chen, 1980; Picheny and Durlach, 1979).

A recent investigation (Gordon-Salant, 1986) evaluat- ed the effectiveness of computer-generated increments of consonant duration and CVR on speech intelligibility for young and elderly normal-hearing subjects. The stimuli were syllables of the form CV, where C was one of 19 conso- nants and Vwas one of three vowels. They were presented in unmodified form, with consonant duration increased by 100%, with the CVR increased by 10 dB, and with a combi- nation of both the duration and CVR increments. The results showed that the CVR increment and combined incre-

ment improved consonant recognition in noise, whereas the duration increment did not improve performance for either

1199 J. Acoust. Soc. Am. 81(4), April 1987; 0001-4966/87/041199-04500.80; ¸ 1987 Acoust. Soc. Am.; Letters to the Editor 1199


Documents

Fundamental frequency stability characteristics of elderly women’s voices