3

Click here to load reader

Visual feedback during speech production

  • Upload
    nancy

  • View
    215

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Visual feedback during speech production

potential acoustical consequence of the type of asymmetries demonstrated here would be in individual differences in vo- calic transitions to and from/s/and/1/.

Looked at from another perspective, spatial and tempo- ral asymmetries in alveolar contact patterns could also be considered evidence that the longitudinal axis of the vocal tract is not always the same as the anatomical midline. Further, the fact that/1/can be unilateral and asymmetries are not always the same for/s/and/1/within a given subject raises the possibility that the longitudinal axis of the vocal tract may change according to phonetic context. The acous- tically relevant spatial feature of the vocal tract for voiced sounds is the area function, not its curvature or twists. How- ever, symmetry may become an important factor when esti- mates of area function are extrapolated from physiological data taken in a single anatomical plane. In studying defective speech, where the anatomical and neuromuscular bases for articulatory asymmetries may be much more pronounced, it is especially necessary to be cautious in interpreting physio- logical data taken in a single anatomical plane. This caution also holds for articulatory interpretations of acoustical signs of abnormality in speech.

The acoustical consequences of individual differences in subtleties of articulation provide some of the cues for speaker recognition by voice. Coarticulatory effects for/1/have been shown to be distinctive and to separate the speech of individ- uals (No!an, 1983). Since/1/has so many variants, and can in addition differ in degree and type of lateralization, con- texts involving/1/would seem to be fruitful for further study of speaker recognition. Asymmetries in articulation on and near/s/and/1/also provide a good intrinsic motivation for using/s/and/1/transitions for speaker recognition analy-

sis, because they may, in part, be anatomically based. In addition, asymmetries are free to influence speaker-charac- teristic speech patterns, because most people are totally unaware of asymmetries in articulation, and do not con- sciously manipulate them to achieve phonetic distinctions.

ACKNOWLEDGMENT

This research was supported by Grant number DE- 03631 from the National Institute of Dental Research.

Fant, G. (1973). "Acoustic description and classification of phonetic units," in Speech Sounds and Features ( MIT, Cambridge, MA).

Hamlet, S. L., Cullison, B. O., and Stone, M. L. (1979). "Physiological control of sibilant duration: Insights afforded by speech compensation to dental prostheses," J. Acoust. Soc. Am. 65, 1276-1285.

Hardcastle, W. J. (1976). Physiology of Speech Production (Academic, New York).

Heffner, R-M. S. (1952). General Phonetics (University of Wisconsin, Ma- dison).

Ladefoged, P. (1971). Preliminaries to Linguistic Phonetics (University of Chicago, Chicago).

Ladefoged, P. (1982). A Course in Phonetics (Harcourt Brace Jovanovich, New York).

Maimberg, B. (1963). Phonetics (Dover, New York). McCutcheon, M. J., Hasegawa, A., and Fletcher, S. G. (1980). "Effects of

palatal morphology on/s, z? articulation," J. Acoust. Soc. Am. Suppl. 1 67, S94.

McGlone, R. E., and Profitt, W. R. (1974). "Comparison of lingual pres- sure patterns in lisping and normal speech," Folia Phoniatr. 26, 389-397.

Nolan, F. (1983). The Phonetic Bases of Speaker Recognition (Cambridge U. P., London).

Shadle, C. H. (1985). "Acoustic characteristics of fricatives and fricative- like models," J. Acoust. Sec. Am. Suppl. 1 77, S85.

Umeda, N. (1977). "Constant duration in American English," J. Acoust. Sec. Am. 61, 846-858.

Visual feedback during speech production Nancy Tye-Murray Department of Otolaryngology--Head and Neck Surgery, University Hospitals, University of lowa, Iowa City, Iowa 52242

(Received 10 September 1985; accepted for publication 12 December 1985)

The question of whether visual information can affect ongoing speech production arises from numerous studies demonstrating an interaction between auditory and visual information during

, speech perception. In a preliminary study, the effect of delayed visual feedback on speech production was examined. Two of the 13 subjects demonstrated speech errors that were directly related to the delayed visual signal. However, in the main experiment, providing immediate visual feedback of the articulators did not diminish the effects of delayed auditory feedback for 11 speakers.

PACS numbers: 43.70.Bk, 43.71.Ky

INTRODUCTION

Visual information can enhance speech understanding for hard-of-hearing individuals and influence the perception of acoustically unambiguous consonants (e.g., McGurk and MacDonald, 1976) and vowels (e.g., Summerfield and McGrath, 1984). For instance, McGurk and MacDonald

(1976) reported that coupling the visual articulation of the speech segment/ga-ga/with the acoustic version of/ba- ba/resulted in a percept of/da-da/. This finding suggests that listeners integrate visual and aural information to achieve a congruent perception. In this case,/d/represents an intermediate place of articulation.

1169 J. Acaust. Sac. Am. 79(4), April 1986; 0001-4966/86/041169-03500.80; ¸ 1986 Acaust. Sac. Am.; Letters to the Editor 1169

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 146.189.194.69 On: Mon, 22 Dec 2014 10:04:44

Page 2: Visual feedback during speech production

An issue that arises, •elevant to the understanding of perceptual and production processes and to the development of speech-training devices for the deaf, is whether visual in- formation can affect oral behavior (see Fletcher, 1984; Melt- zoff and Moore, 1977), particularly during speech produc- tion.

In a preliminary study, we evaluated the effects of de- layed visual feedback (DVF) on ongoing speech produc- tion. Thirteen speakers were instructed to monitor a video display of their head and neck on a 20- X 16-in. television screen. The signal was delayed 2 s via two coupled reel,to- reel video tape recorders. The video camera was concealed behind a wall panel immediately above the television set. Subjects recited two different nursery rhymes from memory, each two times successively, while listening to amplified pink noise sufficient to mask their acoustic outputs (109 dB).

Interesting observations were made for two of the sub- jects. Permanent split-screen video image recordings of their productions under DVF were obtained, with the visual feed- back image on one-half of the screen and the real-time pro- ductions on the second half. During recitation of the rhyme Jack $pratt ("lack Spratt could eat no fat, his wife could eat no lean, and so between..."), one subject substituted the word "see" for "so," and then immediately corrected her production. • A visual examination of the split screen re- yealed that the production of "so" overlapped with the de- layed visual signal of"eat" in "... could eat no lean .... "Ap- parently, the subject's seeing her lips spread for production of/i/resulted in her articulators moving to an/i/rather than/o/posture. Likewise, during the production of"keep" in the rhyme Peter P•rnpkin Eater ("Peter, Peter Pumpkin Eater, had a wife but could not keep her..." ), another subject substituted the word "keed" for "keep." The permanent vid- eo signal revealed that this subject's production overlapped with the delayed signal of "Eater." Here, it appears that the absence of/p/articulatory activity following an/i/posture resulted in her articulators moving to a nonbilabial closure. These findings demonstrate visual information as a carrier of usable and, in some cases, conflicting articulatory informa- tion. [ See also the discussion of delayed auditory feedback (DAF) by Zimmermann et al., submitted for publication. ] However, the remaining subjects did not demonstrate such gross articulatory errors, nor errors that were readily relat- able to the visual feedback. The long delay time of the moni- tored signal, considerably longer than the 200-ms interval employed in studies of DAF, may have minimized the effects of the mismatched visual signal.

Instead of asking whether a temporally displaced visual signal would affect articulation in the absence of auditory feedback, the question in this investigation was whether ac- curate visual feedback, typically considered as "spatial" in- formation, would restore disruptions in the time base of speech resulting from DAF. Presenting immediate visual feedback (IVF) corresponding with the speaker's oral activ- ity might counteract the disorienting and disruptive effects of a temporally displaced auditory signal (see Yates, 1963 for a review), and might result in more normal speech rates.

I. METHODS

k. Subjects and materials

Eleven paid student volunteers with normal hearing participated in this experiment. One practice and eight test sentences were constructed, representative examples of which appear below. They required production of highly vi- sible phonemes, such as/p, m, b, f, •6/.

1. Pam fears a bomb blast in March.

2. Brenda made her famous mustard before lunch.

3. We fixed some pancakes for everyone this morning.

B. Procedures

Subjects were tested individually, seated at a small table in a quiet room. Five subjects began by reciting the sentences with DAF only, and then recited them with DAF plus IVF. The remaining six subjects recited the sentences with DAF plus IVF first. Subjects also recorded the sentences with in- stant auditory feedback (IAF) plus IVF for the control con- dition. In between conditions, subjects removed their head- phones and spoke about a general topic to prevent any habituation to DAF.

Before the DAF condition, subjects read the following instructions:

"Read the sentence on each card silently to yourself. Once you are confident that you have memorized the sentence, look up and recite it aloud. You will hear your own voice slightly delayed in time as you speak."

In the DAF-plus-IVF condition, a circular mirror with an 8- in. diameter was placed on a small stand in front of the sub- ject and adjusted to reflect the area between the nose and lower neck. In addition to receiving the above instructions, subjects were also asked to "monitor your mouth and jaw as closely as possible."

The delayed auditory signal of 200 ms was achieved with a Phoenix Audio Laboratory DAF device. Subjects spoke into a microphone attached to the Tandberg head- phones. The microphone was positioned to ensure that lip and jaw behavior were not obscured. Sentence durations were measured with a Data Precision 6000 Digital Oscillo- scope.

II. RESULTS

All subjects demonstrated longer sentence durations in the DAF than the IAF-plus-IVF conditions. The group mean sentence duration for each of the three conditions ap- pears in Table I. A one-way analysis of variance revealed a significant condition effect (F(2,263) = 32.32, p <0.001 ). A post hoc t-test indicated that sentences spoken in the IAF-

TABLE I. Mean sentence duration in ms for each of the three conditions

(N= 88).

DAF DAF + IVF IAF + IVF

3323 3559 2182

(SD = 1285) (SD = 1608) (SD= 375)

1170 J. Acoust. Soc. Am., Vol. 79, No. 4, April 1986 Letters to the Editor 1170

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 146.189.194.69 On: Mon, 22 Dec 2014 10:04:44

Page 3: Visual feedback during speech production

plus-IVF conditions were significantly shorter than those spoken in either the DAF (p <0.01) or DAF-plus-IVF (p <0.01) conditions. There was no significant difference between the DAF and DAF-plus-IVF conditions.

III. DISCUSSION

While the present results do not dismiss the possibility that visual information can affect ongoing speech produc- tion, they indicate that the effects of DAF are not diminished by presenting IVF of the speaker's visible articulators. In fact, although not statistically significant, DAF + IVF sen- tence durations were longer, on the average, than were DAF sentences alone.

The fact that the visual and auditory signals reflected speech events distanced by 200 ms means that when one signal carried information about opening gestures (or vow- els), the other often carried information about closing ges- tures (or consonants). These two signals were probably too disparate to be integrated into a single percept. It appears that only one signal, the auditory signal, was influential, even though the visual signal corresponded with oral sensory information and the speaker's conscious awareness, and speakers were explicitly instructed to attend to the visual signal. Perhaps more specific instructions on how to use the visual information may have resulted in a different finding.

The apparent primacy of the auditory signal during speech production may be explained in at least two ways. First, the auditory signal may be a more effective conveyor of articulatory information than the visual signal. This possi- bility could be tested in several ways. For example, it may be that magnifying the visual signal would. increase the visi- bility of tongue tip activity and thereby render it more useful. In addition, a magnified visual image might be qualitatively more comparable to an amplified acoustic signal. Second, the speaker is presumably more practiced in monitoring his (her) auditory output (although Borden, 1979 and Siegel

and Pick, 1984, among others, have argued that adult speak- ers do not typically monitor their speech outputs), since one usually does not speak before a mirror, and subjects did not receive a practice session. It will be interesting in future work to determine whether hearing-impaired individuals who are shown to be susceptible to DAF are more responsive to IVF, as they are more likely to rely on visual information during normal speech listening.

ACKNOWLEDGMENTS

This work was supported by NIH Grant NSO-7555. I thank the editor and two reviewers for their helpful com- ments and suggestions for future research, and Herb Tous- saint for his technical assistance.

•This finding suggests that auditory information may not be critical for either detecting or correcting a gross mispronunciation.

Borden, G. J. (1979). "An interpretation of research on feedback interrup- tion in speech," Brain Lang. 7, 307-319.

Fletcher, S. (1984). "Vision in hp positioning skills of children with normal and impaired hearing," presented at the American Speech-Language- Hearing Association Convention, November 1984, San Francisco.

McGurk, H., and MacDonald, J. (1976). "Hearing hps and seeing voices," Nature 264, 746-748.

Meltzoff, A. N., and Moore, M. K. (1977). "Imitation of facial and manual gestures by human neonates," Science 198, 75-78.

Siegel, G. M., and Pick, B. L., Jr. (1984). "Auditory feedback in speech," presented at the American Speech-Language-Hearing Association Con- vention, November 1984, San Francisco.

Summerfield, A. Q., and McGrath, M. (1984). "Detection and resoltrtion of audiovisual incompatibility in the perception of vowels," Q. J. Exp. Psychol. 36A, 51-74.

Yates, A. J. (1963). "Delayed auditory feedback," Psychol. Bull. 60, 213- 232.

Zimmermann, G. N., Kelso, J. A. S., Brown, C., and Forrest, K. (submitted for pubhcation). "The association between acoustic and auditory events in a delayed auditory feedback paradigm."

1171 J. Acoust. Soc. Am., Vol. 79, No. 4, April 1986 Letters to the Editor 1171

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 146.189.194.69 On: Mon, 22 Dec 2014 10:04:44