14
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report The identification of the mood of a speaker by hearing impaired listeners ¨ Oster, A-M. and Risberg, A. journal: STL-QPSR volume: 27 number: 4 year: 1986 pages: 079-090 http://www.speech.kth.se/qpsr

The identification of the mood of a speaker by hearing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The identification of the mood of a speaker by hearing

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

The identification of themood of a speaker by hearing

impaired listenersOster, A-M. and Risberg, A.

journal: STL-QPSRvolume: 27number: 4year: 1986pages: 079-090

http://www.speech.kth.se/qpsr

Page 2: The identification of the mood of a speaker by hearing
Page 3: The identification of the mood of a speaker by hearing

A. THE IDENTIFICATION OF THE MOOD OF A SPEAKER BY HEARING IMPAIRED

LISTENERS AnreMarie aster and Arne Risberg

Abstract Recordings w e r e made when t w o professional actors, one male and one

female, read a number of sentences in the moods angry, astonished, sad, af ra id , happy and posit ive. Based on l i s ten ing t e s t s with normal- hearing adults, a s e t of sentences were selected on which the listeners agreed to the mood of the speaker. From these sentences, a test list was compiled. In the list, the number of different moods w e r e reduced to four: angry, astonished, sad ad happy. An analysis was made of the median fundamental frequency and the total range of fundamental ire- quency variation in the test sentences.

Normal-hearing children, age ten, hearing impaired children ard adults w e r e tested with this list. For the normal-hearing children, the number of confusions were few but many of the hearing impaired subjects had great diff icult ies i n identifying the speakers' moods. A t e s t was also made when normal-hearing persons listened to the test sentences when they were low-pass f i l tered with a cutoff frequency of 500 Hz. This reduced considerably the subjects' ab i l i t i es to identify the moods and about the same confusions were made as by the hearing impaired listen- ers. A plausable explanation of the results of the both normal-hearing listeners i n the f i l ter ing situation and the results from the hearing impaired subjects seems to be the reduced frequency discrimination abil i ty.

I-ION Ahearing impairment results i n di f f icul t ies to detect and identify

the acoustic elements of the different speech sourds. This diff iculty can be explained by reduced useful dynamic and frequency range, degrada- tion in frequency selectivity, in reduced ab i l i ty to detect frequency and amplitude changes etc. Hearing impaired persons' d i f f icul t ies to understand speech is often measured by means of lists of monosyllabic words, but sometimes sentences are used which ought to give a more valid measure of a person's d i f f icul t ies to communicate w i t h others. In a communication situation, however, the true meaning of the communication is also transmitted by - how something is said, how words are emphasized, the speaker's mood and at t i tude toward what is said, etc. ?his type of information is transmitted by temp, rhythm ad intonation, changes in voice quality, etc. Hearing impaired persons' ab i l i t i es to identify t h i s type of information have been the topic i n a few s tudies only.

Page 4: The identification of the mood of a speaker by hearing

Fourcin (1980) have used synthetic speech stimuli to study hearing impaired children's abilities to identify intonation contours in state- ments and questions. Risberg & Melfors (1978) studied hearing impaired persons' abilities to identify which word was emphasized in a sentence. The information is in both cases mainly transmitted by means of changes in the fundamental frequency. The results of the studies showed that many of the subjects had difficulties in using this information.

The acoustic correlates of a speaker's different moods have been studied among others by Cowan (1936) ; Fairbanks & Hoaglin (1941) ; Fairbanks & Pronovost (1939); Lieberman & Michaels (1962); Williams &

Stevens (1972). They all found that the most important factor in sig- naling the speaker's mood is the mean fundamental frequency and the range but that other factors also contribute, e.g., intensity, voice quality, formant frequency changes, etc. As the above-mentioned experi- ments of Fourcin (1980) and experiments of Risberg & Melfors (1984) have shown that hearing impaired persons' abilities to use information in fundamental frequency changes are reduced, it is also possible that they have difficulties in identifying the speaker's mood. The aim of this study is to shed some light on this problem.

MEXMoD Recording of speech material

In studies of the acoustic correlates of a speaker's rclood and the listeners' abilities to identify these, two different types of material can be used. The first is "field" recordings frcm actual situations where the speaker's mood is evident from the situation. The second type of material is recordings of professional actors simulating specific moods. The first type of material might be more realistic than the second but has several drawbacks, eq., limited possibilities to select the speech material, a poor control of the acoustic situation, etc. Williams & Stevens (1972) compared the recording of a speaker reporting from a dramatic event (the Hindenburg disaster) with the recording f r o m an actor simulating the reporter's emotional state during the event. They found differences in details but general agreements in the mode of speaking and in the fundamental frequency range and variation. In the study presented here, it was decided to use recordings from professional actors.

Two speakers were used: one male and one female professional actor. They were asked to read the sentences in Table I in the mods "angry", "astonished", "sad", "afraid", "happy" and "positive".

In studies of this type, it is necessary to select sematically neutral sentences. As it was planned to use the material with children, it was also necessary to use simple sentences which referred to the children's interests. Some of the sentences in Table I might not be ideal in tests with naive listeners, as they might cause difficulties for the actors to express the intended mood. In the listening tests,

Page 5: The identification of the mood of a speaker by hearing

Table I: Sentences used in the experiment.

I. Fri,Xen kan for sent till sblan. (The teacher was late to school)

11. Dan karmer p8 torsdag (They are caning on Thursday)

111. Det var Olle san vann tkivlingen (It was Olle who wm the canpetition)

IV. Sarmarlwet barjar sent i &r (Surmrer vacation starts late this year)

V. Bollen studsade in g e m fdnstret (The ball bounced in through the

VI. Det finns en dtta i skafferiet (There is a mouse in the pantry)

these sentences might also have caused some interaction between the most likely moods, based on the meaning of the sentences, and the actor's intended mood. Ebth actors read the sentences in the six different moods. The recordings were made with a high-quality microphone and tape recorder in an anechoic room.

Selecting stimuli for the test list It was apparent that the actors, more or less successfully, had

been able to achieve the intended mood in the different sentences. For same sentences, it was apparent that they had been unsuccessful and in some, the acoustic quality was unsatisfactory. The first author se- lected 72 sentences from the recordings. Each interded mood was pre- sented 12 times. Tb select the stimuli for the final test list, 23 members of the Dept. of Speech Cbmmunication & Music Arxxlstics listened to the tape wer head-nes. On the answering sheet with all the sen- tences in the test, they marked which of the six moods they thougt was intended by the speaker. The results of the listening test are shown in Table 11.

In the table the disagreement between the listeners for the 72 different sentences is shown. For the sentences with the intended mood "sad" (mood no 3), for example, all listeners agreed on sentences 34 and 42. On sentence 6, one listener identified the mood as no 2, "asto- nished", on sentence 11 one listener also disagreed and identified the intended mood as no 1, "angry", and so on. The total per cent confu- sions made in the test on the sentences are shown in Fig. 1.

Page 6: The identification of the mood of a speaker by hearing
Page 7: The identification of the mood of a speaker by hearing

MP3no1. "Angry"

Sentence Gcntenca m type

68 MI 69 Mm 15 N

('139 MI + 57 PlVI

62 MI11 10 FII 29 FI 45 MI

7 FIII 32 E'IV 2 E'IV

Mmd no 4. "Afraid"

Sentence Sentence Conf* no type aiane

MVI M MI1 MI FII Em FII Mv WI FII WI MI1

Fixla m 2. -Astcnishedn

Sentence Sentenfx Onfu- no type aioMI

5 l?Iv - 20 FV -

* 33 MI1 - 38 MV -

('155 Mlv - 66 MI1 - 67 MVI - 70 M - 35 MI 585 49 MI11 5.6

3 M 484,484,40484

Sentence sentence no type

rn FII FII w t.UII FIII MI1 F'III MI11 M MIV MVI

Nood no 3. "Sad"

34 E'III 42 Mm 6 FI

("111 FIII 18 FV 60 MI1 22 EV 43 w 56 MI11 61 PlVI 63 M I 23 FI

Sentence Sentence Ccn* no type sions

FI Pa11 MI11 MVI EVI MVI MI1 MI11 Em MI1 FI PI1

Table 11. NtPnber of confusians and type of ccmfusians made by a group of 23 normal-hearing subjects on the test tape. "Sentence no" is the nmber of the sentence on the tape. "Sentence type", Mmale , F=f emale speaker, I-VI from Table I. In the colm "Confusions", the nmber of cmfusims and which cmfusims that were made are shuwn. The d i f fe ren tmds are nLPnbered 1-6. Sentences marked * were used in the final test tape.

Page 8: The identification of the mood of a speaker by hearing

Hz ANGRY

60 - I I

Hz I I I

AFRAID

60 - Hz

I I I I I

300 - HAPPY

60 - Hz

I I

300 POSITIVE

60 l l l l l ~ ~ ~ ~ ' ~ l l l l l l l l l l l l l t

0 .5 10 15 2.0 25 sec

Fig. 2. FLnzdamental frequency variations in the sentence "Dan karoner pb torsdag" (They are caning on Thurs- day) in the four different mods for the male speaks.

I I I ..... .... 0 .., . . X ANGRY . . . . - ::o :. 0 ASTONISHED

0 SAD

: ' o .. v AFRAID - O.j A H A P P Y -

- ...... - ... A .. ... ...'+'. . . . . . " ..... - . *.... + <A . . . - . . . . . . . . . . . . . . . i.;:;.+. .::*....$. ; :...... ..... ..:.:;. .......... A j

........... : .. + .+ .... 0.: . . ....... . . . . . . . . . :.:. . . . . . . ...... - : v.:x :.:g;.x; - . * . i : .... . . . . ..%' ; . . . . . - : . . . . . . - : . . .I . . .' ..' ' :.+.:* . . .. : . . . ..' ..Q..' . . - .....: -

i n .'. .....

I I I

100 150 200 FUNDAMENTAL FREQUENCY, HZ

MEDIAN VALUE

Fig. 3. Relatim between median value and total range in the different moods for the m a speaker. The figure shows results for sentences where more than 75% of 23 noml-hearing listeners agreed m the mood.

Page 9: The identification of the mood of a speaker by hearing
Page 10: The identification of the mood of a speaker by hearing

test, 22 normal-hearing adult visitors at the Department listened to the tape over headphones and selected one of the four moods marked on the answering sheets for each test sentence. The result is shown as per cent confusions in the matrices shown in Fig. 5. In the next experi- ment, the test tape was presented over a loadspeaker in a normal class- room to 20 normal hearing children of age ten years. The results are sbwn in Fig. 6. In the last experiment, ten normal-hearing members of the Department listened to the test tape when the signal was low-pass filtered with a cutoff frequency of 500 Hz, damping 70 d~/oct. The results are shown in Fig. 7.

TOTAL MALE VOICE FEMALE VOlCE

ANGRY

ASTONISHED

SAD

HAPPY

ANGRY

ASTONISHED

SAD

HAPPY

ANGRY

ASTONISHED

SAD

HAPPY

Fig. 5. Confusions in per cent between diff erent moods of the speaker for 22 normal-hearing adults on the test list with four moods.

TOTAL MALE VOICE FEMALE VOICE

ANGRY

ASTONISHED

SAD

HAPPY

ANGRY

ASTONISHED

SAD

HAPPY

ANGRY

ASTONtSHED

SAD

HAPPY

Fig. 6. Ccmf usions in per cent between different moods of the speaker for 2 0 nonnal-hearing children.

MALE VOlCE

ANGRY m] ANGRY

ASTONISHE0 1 - ( 100 1 - ( - ( ASTONISHED

SAD

HAPPY

FEMALE VOlCE Fig. 7. Confusicms in

per cent be- twen different moods of the speaker for 10 no&-hearing adults. nbe test tap was lots-pass filtered with a cutoff f re - quency of 500 Hz.

Page 11: The identification of the mood of a speaker by hearing

msts with hearing impaired subjects Two groups of hearing impaired subjects were tested. The first was

a group of 18 children from the School for the Partially Hearing in Stockholm. They were between 11 and 14 years old, with a mean of 13 years. Their hearing losses were between 40 and 97 dB for the frequen- cies 500, 1000 and 2000 Hz for the best ear, with a mean of 76 dB. In all cases, the hearing impairment was congenital or early acquired. The method of communication used in the schcx>l is oral, and the children always used hearing aids. The children listened to the test tape wer headphones (llX39). Before the actual test they were carefully trained with the four training sentences until they understood the task. The results are slmwn in Fig. 8. The children were also tested with a list of three-word sentences where the emphasis was placed on the first, second or the third word. The main acoustic difference in these test sentences is changes in the fundamental frequency (Risberg & Agelfors, 1978). The children's abilities to detect small changes in a sinusoidal signal was also measured (Risberg & Agelfors, 1984).

TOTAL MALE VOICE FEMALE VOICE

ANGRY

ASTONISHED

SAD

HAPPY

ANGRY

ASTONISHED

SAD

HAPPY

ANGRY

ASTONISHED

SAD

HAPPY

Fig. 8. Cm£usims in per cent between different- of the speaker for 1 8 hearing impaired children.

The other group of hearing impaired subjects consisted of 45 pa- tients at the Rehabilitation Clinic of the South Hospital in Stodkholm. The patients' ages varied from 26 to 74 with a mean of 55 years. Their hearing losses were between 10 to 88 dB in the best ear for the frequen- cies 500, 1000 and 2000 Hz in the best ear with a mean of 38 dB. The cause of hearing impairment was in most cases presbyacusis or mise- induced hearing loss. This group listened to the test with their per- sonal hearing aids when the sentences were presented from a loadspeaker in an ordinary room. Before testing, they were trained with the four training sentences. For 24 of the subjects, the same test tape was presented twice with three weeks interval between the two test sessions.

Page 12: The identification of the mood of a speaker by hearing

The confusions made in the first test session with total group of 45 patients are shown in Fig. 9.

TOTAL MALE VOICE FEMALE VOICE

ANGRY

ASTONISHED

SAD

HAPPY

ANGRY

ASTONISHED

SAD

HAPPY

ANGRY

ASTONISHED

SA 0

HAPPY

Fig. 9. Ccnfusions in per cent between different mods of the speaker for 45 hearing -red adults.

DISCUSSION The final test list with 16 sentences and with the four moods:

"angry", "astonished", "sad" and "hapm)' seemed be satisfactory. In the test with normal hearing listeners, the number of disagreements was low. Eighteen of the 22 adult listeners agreed with the intended mood on all 16 sentences, two disagreed on one sentence, and one on two sen- tences and one on three sentences. Fbr the test with the normal hearing ten-years old children, the number of disagreements was higher. Six of them agreed with the intended mood on all 16 sentences, seven agreed on 15, four on 14 and Wee on 13 of the sentences. The main disagreement was on sentence V, "The ball bounced in through the wirdod', prorwxlnced by the female voice in the mood "angry" and identified as "sad", and for the same sentence the stimulus in the mood "hapmj' for the male voice was identified as "angry". Sentence 111, "It was Olle who won the competition" pronounced in the mood "happy" was for the male voice often identified as "angry". The speaker's mood was in this stimulus expressed in a boisterous way that in many respects resembled the way he expressed "angry". It is possible that especially the first sentence for the children was too loaded with the associations that influenced them. In continued work in this area, it is necessary tx put more effort in selecting semantically neutral sentences, especially if the test is to be used with children.

Many of the hearing impaired subjects, both children and adults, had difficulties in identifying the speaker's mood, see Figs. 8 and 9. For the children, the per cent correct identification was 63% and for

Page 13: The identification of the mood of a speaker by hearing
Page 14: The identification of the mood of a speaker by hearing

References Oowan, M. (1936): "Pitch and intensity characteristics of stage speech", Arch. of Speech, Sqpl., Dec., pp. 3-92.

Fairbanks, G. & Hoaglin, L.W. (1941) : "An experimental study of the durational characteristics of the voice during the expression of emo- tion", Speech Monograph, - 8, pp. 85-91.

Fairbanks, G. & Pronovost, W. (1939): "An experimental study of the pitch dharacteristics of the voice during the expression of emutmn , II

Speech Monograph, 6, pp. 87-104.

Fastl, H. & Weinberger, M. (1981): "Frequency discrimination of pure tones and complex tones", Ikustica, - 49, pp. 77-78.

Fonagy, I. (1981): "Emotion, voice and music", pp. 51-79 in (J. Sund- berg, ed) : Research aspects on singing, Proc. from a seminar organized by the committee for the acoustics of music, Publ. issued by the Ibyal Swedish lkademy of Music, no 33, Stockholm.

Fourcin, A.J. (1980): "Speech pattern audiometry", pp. 170-208 in (HA Beagley, ed.): Auditory investigation; the Scientific and logical Basis, Clarendon Press, Oxfod.

Huttar, G.L. (1967) : "Some relations between emotions and the prosodic parameters of speech", Speedh Comm. Lab., Inc, St. Ebrbara/~A, Momgra@h no 1, July 1967.

Huttar, G.L. (1968) : " Wlations between prosodic variables and e m - tions in normal American English utterances", J. Speech w i n g Ftes. - 11, pp. 481-487.

Lieberman, P. & Michaels, S.B. (1962) : "Some aspects of fundamental frequency and envelope amplitude as related to the emotional content of speech", J. Acoust.Soc.Am. - 34:7, pp. 922-927.

Risberg, A. & Welfors, E. (1978): "On the identificatian of intanation contours by hearing impaired listeners", STL-QPSR 2-3/1978, pp. 51-61.

Risberg, A. and Agelfors, E. (1984): "m the relation between frequency discrimination ability and the degree of hearing loss", m P S R 4/1984, pp. 59-70.

Williams, C.E. & Stevens, K.L. (1972): "Emotions and speech: Some acous- tical correlates", J~ust.SocAn. 52:4, part 2, pp. 1238-1250. -