10
Journal of Psycholinguistic Research, Vol. 20, No. L 1991 Filled Pauses and Gestures: It's Not Coincidence Nicholas Christenfeld, 1,2 Stanley Schachter, 1 and Frances Bilous ~ Accepted September 24, 1990 Though filled pauses and gestures frequently accompany speech, their function is not well understood. We suggest that it may be helpful in furthering our knowledge of these phenomena to examine their relationship to each other. To this end, we carried out two studies examining whether they tend to occur together, or to occur at separate times. Both faculty colloquium speakers" and undergraduate subjects used filled pauses less frequently when they were gesturing than when they were not gesturing. This effect held for 30 out of 31 subjects. We suggest that detailed theories may be premature, but speculate that gestures may be an indication that the speech production apparatus has completed its search for the next word, phrase or idea and is ready to continue. When people talk, no matter what the content or purpose of their speech, they tend to do two things. They wave their arms about and they say "urn." In studies of the speech mannerisms of lecturers in a variety of academic disciplines (Schachter, Christenfeld, Ravina, & Bilous, 1990), we noticed what appeared to be a consistent dissociation between these two phenomena. It was our observation that lecturers rarely seemed to use filled pauses (the term for such interruptions in the flow of speech as "uh," "ah," "'er," and "um") while gesturing. It is the purpose of We thank Bernard Ravina, Julie Odegard, and Kathrin Wanner for their help with the data collection and Barbara Landau and Robert Krauss for comments on an earlier draft of this manuscript. This research was supported by a grant to Stanley Schachter, made for other purposes, from the Russell Sage Foundation. i Columbia University. 2 Address all correspondence to Nicholas Christenfeld, Department of Psychology, Co- lumbia University, 406 Schermerhorn Hall, New York, New York, 10027. 0090-6905/91/0100--0001506.50/0 1991 Plenum Publishing Corporation

Filled pauses and gestures: It's not coincidence

  • Upload
    fjsll

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Journal of Psycholinguistic Research, Vol. 20, No. L 1991

Fi l l ed P a u s e s and Ges tures : It's Not C o i n c i d e n c e

Nicho las Chr i s tenfe ld , 1,2 Stanley Schachter , 1 and Frances Bi lous ~

Accepted September 24, 1990

Though filled pauses and gestures frequently accompany speech, their function is not well understood. We suggest that it may be helpful in furthering our knowledge of these phenomena to examine their relationship to each other. To this end, we carried out two studies examining whether they tend to occur together, or to occur at separate times. Both faculty colloquium speakers" and undergraduate subjects used filled pauses less frequently when they were gesturing than when they were not gesturing. This effect held for 30 out of 31 subjects. We suggest that detailed theories may be premature, but speculate that gestures may be an indication that the speech production apparatus has completed its search for the next word, phrase or idea and is ready to continue.

When people talk, no matter what the content or purpose of their speech, they tend to do two things. They wave their arms about and they say "urn." In studies of the speech mannerisms of lecturers in a variety of academic disciplines (Schachter, Christenfeld, Ravina, & Bilous, 1990), we noticed what appeared to be a consistent dissociation between these two phenomena. It was our observation that lecturers rarely seemed to use filled pauses (the term for such interruptions in the flow of speech as "uh," "ah," "'er," and "um") while gesturing. It is the purpose of

We thank Bernard Ravina, Julie Odegard, and Kathrin Wanner for their help with the data collection and Barbara Landau and Robert Krauss for comments on an earlier draft of this manuscript. This research was supported by a grant to Stanley Schachter, made for other purposes, from the Russell Sage Foundation.

i Columbia University. 2 Address all correspondence to Nicholas Christenfeld, Department of Psychology, Co-

lumbia University, 406 Schermerhorn Hall, New York, New York, 10027.

0090-6905/91/0100--0001506.50/0 �9 1991 Plenum Publishing Corporation

2 Christenfeld, Schachter, and Bilous

the present studies to examine this relationship and, after presenting the evidence, to speculate briefly on the implications of our findings for theories of gesture and speech disfluency.

" U m s " and gestures share a somewhat nebulous relationship with verbal output. It is fairly clear that they are both products of the general speech production system, since they both are obviously related to verbal output (McNeill, 1985; Rochester, 1973). However, it is not clear for either whether they serve a role in helping the listener understand the message or in helping the speaker produce it, or whether they are simply functionless byproducts of the speech apparatus. Except in the case of a few very specific tasks (Birdwhistell, 1970), gestures do not seem to help the receiver of a message understand it better (Krauss, Apple, Mor- encey, Wenzel, & Winton, 1981). Furthermore, people often gesture when they are speaking on the telephone or over an intercom, when the gestures cannot possibly be of use to the listener (Cohen, 1977). In turn, there is no real evidence that gestures help the speaker formulate the message. The basic questions about why people gesture have not been answered.

As to filled pauses, a variety of research has suggested that "urns" are indications of time out while the speech production apparatus searches for the next word, phrase, or idea (Goldman-Eisler, 1968; Rochester, 1973). This suggests that "urns" have a purpose for the speaker as a means of stalling for time to think. However, that end could just as well be served with silent pauses, and in the research literature, there are no hypotheses of which we know to account for the use of filled rather than silent pauses. As far as the listener goes, there are no indications that " u m s " serve any particular function. There is some evidence, in both field studies and experiments, that listeners are almost entirely insensitive to the use of "'urns" and that their impressions of both the speaker and the message are unaffected by the frequency with which filled pauses are used (Schachter, Christenfeld, & Rodstein, 1990). Finally, it has been suggested that "'urns" serve a floor-keeping function (Maclay & Osgood, 1959), that is, they indicate to a listener that the speaker has more to say. Perhaps this is so in conversation, but in formal lectures, where there is no possibility of interruption, filled pauses are used with astonishing frequency. 3

Because the function of "urns" and gestures is not clear, and they are both such common companions of speech, it may provide some

3 As can be derived from Table I, colloquium speakers average 3.17 " u m s " per minute during a 50-rain lecture.

Filled Pauses and Gestures 3

insight into the nature of these phenomena to examine their relationship to each other. Many people have looked at the relationship between gestures and verbal output; however, it is hard to make a clear prediction about the relationship between filled pauses and gestures from this work because different researchers have focused on different types of move- ments and disfluencies. Schegloff (1984) and a number of others (But- terworth and Beattie, 1978; Krauss and Morrel-Samuels, 1988) have found that gestures seem slightly and consistently to precede their lexical affiliates. These researchers suggest that this may be because gestures are easier to produce since they are selected from a smaller repertoire.

Butterworth and Goldman-Eisler's (1979) work on the timing of gestures and pauses was concerned specifically with the onset of what they term speech-focused movements (SFMs) and silent pauses. They found that SFMs are as likely to begin during a silent pause as during the act of speaking. Whether this is the case for filled pauses one cannot say, for while some people have found similarities between filled and silent pauses (Beattie & Butterworth, 1979), others have not. For ex- ample, in Mahl's extensive work on the effects of anxiety (reviewed in Mahl, 1987), he has found that most disfluencies increase with anxiety, but that there is consistently no effect on the rate of filled pauses. In fact, filled pauses proved so resistant to these manipulations that he threw them out of his index of speech disturbances. Other researchers have tried a number of manipulations that affect silent pauses but not filled pauses (Greene & Lindsey, 1989) or filled pauses but not silent pauses (Vrolijk, 1974). Butterworth and Goldman-Eisler were concerned only with silent pauses and only with the onset of SFMs. It is difficult, there- fore, to extend their results to make a prediction about the co-occurrence of filled pauses and gestures.

Ragsdale and Silvia (1982) also examined the temporal relation of body movements and speech disturbances, but they excluded filled pauses from their measure, and included much more general movements, such as posture shifts and movement of the feet. They did find a fairly strong association of movements and speech errors, with the movement coming just before or simultaneously with the disfluency. Dittman and Llewellyn (1969) reported a similar finding, but they were concerned with the overlap of gestures and what they termed starts, the beginning of a phonemic clause, a silent pause, or a filled pause. Hadar, Steiner, and Rose (1984) found that movements instead tend to follow disfluencies, but their data were based only on movements of the head, and they were only interested in silent pauses and general repetitions.

Because the existing literature does not explore the temporal rela-

4 Christenfeld, Schachter, and Bilous

tionship between filled pauses and hand gestures, and because we saw indications of such a relationship, we conducted two studies to address the issue directly.

STUDY 1: OBSERVATION OF F O R M A L TALKS

The first study involved systematic observation of 18 successive speakers at Columbia University's Psychology Colloquium. This is a biweekly affair in which outside speakers present their most recent re- search and thinking to the faculty and graduate students of the psychology department as well as to any interested outsiders. Typically, the audience consists of some 40-60 people and the talk takes roughly one hour.

Two observers sitting toward the back of the room systematically noted the speakers' gesturing behavior and tallied their use of filled pauses. One of the observers--the gesture coder--recorded the amount of time each speaker spent gesturing. Pointing, scratching, and fiddling with clothes (deictics and self-manipulations) were not counted as ges- tures. Self-manipulations were not counted, since they seem fairly clearly not to be related to speaking (this distinction is discussed in Freedman, 1972), and pointing was not included since it seemed to be simply a function of the amount of data and type of data presentation the speaker chose. If, for example, the speaker used slides, he or she was likely to point to the portion of the figure or table under discussion. All other hand-arm movements were counted as gestures. This first observer, using a stopwatch held in one hand, simply kept a cumulative record of the time that each speaker spent gesturing, starting the watch when the speak- er's hands started moving and stopping it when they returned to a rest position. This observer also used a continuous hand signal to indicate whether or not the subject was gesturing. This was a simple thumb up or thumb down signal with the non-stopwatch hand.

The second observer--the " u m " coder--kept track of filled pauses. He listened to the talk and, relying on the hand signal from the first observer, recorded whether or not each "um'" occurred during a gesture. If he heard an "'urn,'" and the gesture observer's thumb was up, he tallied it as an "u rn" while gesturing, and if the thumb was down, as an " u m " while not gesturing. The second observer also kept track of the length of the talk, which was simply the elapsed time from the start to the end of the talk, excluding questions from the floor, film clips, and other

Filled Pauses and Gestures 5

external impediments to speech. The lecturer was unaware that these observations were being made.

For seven of the colloquia, we had a second observer record ges- tures. Before coding any of the colloquia, these gesture coders had prac- ticed coding for many hours. They had coded 11 previous colloquia, as well as practiced their coding skills on videotapes of people speaking. With these videotapes, the coders practiced determining when gestures started and stopped, and also practiced indicating this with the thumb signal. They were trained to consider a gesture as starting when the hands left a neutral, resting position--hanging at the speaker's sides, folded in his or her lap etc.--and to consider it over when the hands returned to a resting position. This is not a simple matter, since it requires making rapid decisions about whether a gesture is starting, or if the speaker is simply adjusting clothing, scratching, or pointing at some specific object. However, with practice these determinations can be made reliably.

To assess the reliability, we used the intraclass correlation, which is based on the analysis of variance, to arrive at an estimate of the part of the measurement that is attributable to true differences between sub- jects and the part that is due to error. Unlike the Pearson correlation coefficient, this measure is directly interpretable as the percent of vari- ance attributable to the true differences between subjects. [See Lord & Novick (1968) and Fleiss (1986) for a more extensive discussion of this procedure.] For the seven colloquia in this study for which we had a second gesture coder also record the percent of time that the speaker spent gesturing, the reliability was R = .99.

The "'urn'" coder similarly had extensive experience at his job. He had coded 20 previous colloquia, and hours of other speech, as well as practicing the system with one of the gesture coders on the videotapes. He was trained to regard any sound such as "'urn,'" " e r , " " u h , " "'ah,'" and the like as a filled pause, but to exclude any sound that formed part of a word, however garbled or incomplete. (This task soon became sec- ond nature, and our coder had to make a special effort to stop coding these filled pauses when off duty.) The only real ambiguity occurred between the indefinite article a and a filled pause, but almost always this could be resolved by paying some attention to the context. If a speaker said " u r n " several times in succession, these were each counted as in- dividual occurrences of a filled pause. Although none of the colloquia in the present study was coded by more than one "urn'" counter, 10 previous ones were. For these, the reliability of the rate of "urns'" per minute was calculated as R = .99. This reliability, although almost

6 Christenfeld, Schaehter, and Bilous

disturbingly high, is in line with the reliability for similar codings re- ported by Feldstein, Brenner and Jaffe (1963), Mahl (1987), and Panek and Martin (1959). This kind of reliability is, in fact, not hard to achieve if you are willing to sacrifice all understanding of what speakers are actually saying.

The speakers spoke for an average of 54 min, and gestured for 20% of the time that they were speaking. They averaged 3.17 "urns" per minute during the talk.

These data, then, provide measures of the amount of time each speaker spent gesturing. In addition, these data indicate how many "urns" the speaker used while gesturing, and how many while not gesturing. One can then compute the average number of "urns" used per minute while gesturing, and while not gesturing. If the two phenomena are unrelated, these two rates should be equal. If "urns" and gestures tend to occur together, the rate of "urns" while gesturing should be higher than the rate while not gesturing, and if they tend to occur separately, the rate should be lower while gesturing. The last two columns of Table I present the relevant data. The average subject used only 1.33 "urns"

Table I. " U m " Rates While Gesturing and Not Gesturing for Colloquium Speakers

ml i

Minutes Minutes Total "Ums"/min "'Ums"/min Subject talking gesturing "urns" gesturing not gesturing

1 45.8 5.2 131 1.45 3.04 2 53.2 17.7 118 0.57 3.04 3 53.2 7.9 141 0.76 3.05 4 63.0 4.6 204 1.09 3.41 5 44.0 1.8 182 1.14 4.27 6 45.0 2.6 181 1.14 4.20 7 50.6 13.4 209 0.90 5.29 8 53.8 9.8 55 0.36 1.17 9 61.1 11.0 160 2.45 2.65

10 55.0 11.6 591 3.89 12.57 11 56.4 7.3 132 0.54 2.61 12 59.6 10.3 75 0.19 1.48 13 75.0 16.7 172 0.72 2.74 14 48.2 12.9 135 2.39 2.95 15 59.2 18.1 181 1.99 3.53 16 50.0 15.7 230 2.67 5.49 17 42.4 13.7 92 1.24 2.61 18 54.0 13.8 9.__6.6 0.5.1 2.2! Average 53.9 10.8 171 1.33 3.68

Filled Pauses and Ges tu res 7

per minute while gesturing, and 3.68 " u m s " per minute when not ges- turing. Every one of the 18 speakers had a lower "urn'" rate while gesturing. The two rates are significantly different, with a paired t(17) = 5.32 with p < .0001.

Although these findings are remarkably strong and consistent, there is always the possibility that some sampling or methodological artifact may be responsible for these data. First, this is a narrowly selected population of subjects, for they are almost all practiced speakers who make their living lecturing at universities. Second, there is a possibility of inadvertent bias in the observations, for the observers could both hear and see the speakers since there was no discreet way that we could manage to have the observer of filled pauses sit with his back to the speaker. In order to examine the phenomenon with other sorts of subjects in a context where we could rule out some of the possible sources of bias, we analyzed videotapes that had been made in an earlier experiment 4 (Krauss & Morrel-Samuels, 1988). In addition, two pairs of observers were used in order to check on the reliability of the observation tech- niques.

STUDY 2: OBSERVATIONS OF U N D E R G R A D U A T E SPEAKERS

In this study, undergraduate subjects were simply asked to describe various pictures and sounds to a confederate. Thirteen video tapes made of these subjects were coded for gestures and filled pauses. The obser- vation techniques were the same as in the colloquium study except that the observer of filled pauses sat with his back to the monitor so that, using earphones, he could hear the speech but not see the videotaped subjects, while the observer of gestures faced the screen and systemati- cally timed and signaled gestures but, lacking earphones, could not hear what was said.

The data for the 13 subjects are given in Table II, which, for each subject, presents the average for the two pairs of coders. The subjects spoke for an average of 11.4 min, and spent 33% of that time gesturing. In 12 of the 13 cases the rate of "urns" was lower while gesturing. On average, subjects used 3.00 "ums'" per minute while they were gesturing

4 We are grateful to Dr. Krauss for giving us access to his tapes. His research was supported in part by National Science Foundation grant BNS-8616131.

8 Christenfeld, Sehachter, and Bilous

Table II . "Urn" Rates While Gesturing and Not Gesturing for Undergraduate Subjects

i

Minutes Minutes Total "Ums"/min "Ums"/min Subject talking gesturing "urns" gesturing not gesturing

1 8.3 1.5 36.0 2.27 4.82 2 10.6 2.8 40.0 3.00 4.03 3 12.9 6.5 27.5 1.23 3.06 4 11.3 4.3 27.0 1.52 2.94 5 15.4 7.9 48.5 2.96 3.35 6 14.3 3.7 178.0 8.60 13.72 7 10.5 1.7 51.0 7.81 4.31 8 6.3 1.6 18.5 0.31 3.80 9 13.4 7.0 66.0 4.45 5.42

10 18.5 3.4 111.0 3.53 6.58 11 15.6 7.2 32.0 1.88 2.20 12 5.9 0.1 8.0 0.00 1.39 13 5.1 2.._~1 15.5 1.4,5 .4..11 Average 11.4 3.8 50.7 3.00 4.59

and 4.59 " u m s " per minute while they were not gesturing. This is significantly different, with t(12) = 2.81, p < .02.

For these data, more extensive reliability estimates could be com- puted. Again, for the percent of t ime spent gesturing, the reliability was R = .99. For the rate of " u r n s " per minute it was also R = .99. For the rate of " u r n s " per minute while the speaker was gesturing, compar ing the two pairs of coders produced a reliability of R = .95. For the rate while the speaker was not gesturing, the reliability was R = .99. It should be borne in mind that these last reliabilities depend on the gesture coders agreeing on when the speaker was gesturing, the "'urn" coders agreeing on when the speaker said " u m , " and also on picking up the signal correctly f rom the gesture coders. In any case, it is almost exces- sively clear that this can be measured reliably. Once again, we hasten to point out that our coders had spent well over 100 hours honing their coding skills.

D I S C U S S I O N

Taken together, these studies leave little doubt that people tend to say " ' u m " less frequently while they are gesturing. Of the 31 subjects, 30 showed this trend. Furthermore, the effect existed for two different speaking tasks, for experienced and inexperienced speakers, and for a

Filled Pauses and Gestures 9

large range of ages. The findings indicate that "urns" and gestures are systematically, and not randomly, distributed in the flow of speech.

In spite of the strength of the finding, one cannot draw any firm conclusions about the two phenomena and the nature of their relationship. Since the findings are based on correlational studies, one must make assumptions about one of the factors in order to conclude anything about the other.

If we take as a fact that "urns" signal time out while there is an ongoing search for the next word or phrase, then the present finding has implications for the placement of gestures. Since people tend not to gesture while they are "umming , " gestures should be an indication that no search is in progress. Perhaps gestures are only initiated when a search has been successful. The gestures may be linked to specific words, in which case they clearly cannot start until the word has been found, or it may be that gestures are simply held in check until the verbal channel is ready to continue. In either case, we should anticipate that pauses would tend to precede gestures immediately.

This idea of gestures is very different from the common-sense idea that people grope for words by waving their hands. If this were the case, then gestures, at least part of the time, should be a sign that a search is underway. One might then expect that, sharing the same cause, gestures and "urns" would tend to co-occur. The fact that they do not suggests, but by no means proves, that gestures are not often used to grope for words.

However, it seems to us that our understanding of gestures is still so primitive that we are loathe to linger over speculation about the the- oretical implications of these findings. The facts are firmly established. Perhaps they will be useful in furthering our eventual understanding of filled pauses and gestures.

REFERENCES

Beattie, G. W., & Butterworth, B. L. (1979) Contextual probability and word frequency as determinants of pauses and errors in spontaneous speech. Language and Speech, 22, 201-211.

Birdwhistell, R. L. (1970). Kinesics and context. Philadelphia: University of Pennsyl- vania Press.

Butterworth, B. & Beattie, G. W. (1978). Gesture and silence as indicators of planning in speech. In N. R. Campbell & P. T. Smith (Eds.), Recent advances in the psy- chology of language: Formal and experimental approaches (pp. 347-360). New York: Plenum Press.

Butterworth, B., & Goldman-Eisler, F. (1979). Recent studies in cognitive rhythm. In

10 Christenfeld, Schachter, and Bilous

A. W. Siegman & S. Feldstein (Eds.), Of speech and time: Temporal speech pat- terns in interpersonal contexts (pp. 211-224). Hillsdale, NJ: Erlbaum.

Cohen, A. A. (1977). The communicative function of hand illustrators. Journal of Com- munication, 27, 54-63.

Dittman, A. T., & Llewellyn, L. G. (1969). Body movement and speech rhythm in social conversation. Journal of Personality and Social Psychology, 11, 98-106.

Feldstein, S., Brenner, M. S., & Jaffe, J. (1963). The effect of subject sex, verbal interaction and topical focus on speech disruption. Language and Speech, 6, 229-239.

Fleiss, J. L. (1986). The design and analysis of clinical experiments. New York: John Wiley.

Freedman, N. (1972). The analysis of movement behavior during the clinical interview. In A. Siegman and B. Pope (Eds.), Studies in dyadic communication (pp. 153-175) Elmsford, NY: Pergamon Press.

Goldman-Eisler, F. (1968). Psycholinguistics: Experiments in spontaneous speech. Lon- dong: Academic Press.

Greene, J. O., & Lindsey, A. E. (1989). Encoding processes in the production of multiple-goal messages. Human Communication Research, 16, 120-140.

Hadar, U., Steiner, T. J., & Rose, F. C. (1984). The relationship between head move- ments and speech dysfluencies. Language and Speech, 27, 333-342.

Krauss, R. M., Apple, W., Moreney, N., Wenzel, C., & Winton, W. (1981). Verbal, vocal, and visible factors in judgments of another's effect. Journal of Personality and Social Psychology, 40, 312-320.

Krauss, R. M., & Morrel-Samuels, P. (1988, February). Some things" we do and don't know about hand gestures. Paper presented at meeting of the American Association for the Advancement of Science, Boston.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

Maclay, H. & Osgood, C. E. (1959). Hesitation phenomena in spontaneous English speech. Word, 15, 19-44.

Mahl, G. F. (1987). Explorations in nonverbal and vocal behavior. Hillsdale, NJ: Erl- baum.

McNeill, D. (1985). So you think gestures are nonverbal? Psychological Review, 92, 350-371.

Panek, D. M., & Martin, B. (1959). The relationship between GSR and speech disturbances in psychotherapy. Journal of Abnormal and Social Psychology, 58, 402--405.

Ragsdale, J. D., & Silvia, C. F. (1982). Distribution of kinesie hesitation phenomena in spontaneous speech. Language and Speech, 25, 185-190.

Rochester, S. R. (1973). The significance of pauses in spontaneous speech. Journal of Psycholinguistic Research, 2, 51-81.

Schachter, S., Christenfeld, N. J. S., Ravina, B., & Bilous, F. (1990). Speech disfluency and the structure of knowledge. Journal of Personality and Social Psychology, (in press).

Schachter, S., Christenfeld, N. J. S., & Rodstein, B. (1990). On the perception of pauses in spontaneous speech. Unpublished manuscript.

Schegloff, E. A. (1984). On some gestures' relation to talk. In J. M. Anderson & J. Heritage (Eds.), Structures of social action: Studies in conversational analyses (pp. 266-296). Cambridge, England: Cambridge University Press.

Vrolijk, A. (1974). Habituation as a mode of treatment of speaking anxiety. Gedrag Tijdschrift voor Psychologie, 2, 332-338.