10
Accuracy of Cochlear Implant Recipients in Speech Reception in the Presence of Background Music Kate Gfeller, PhD; Christopher Turner, PhD; Jacob Oleson, PhD; Stephanie Kliethermes, MS; Virginia Driscoll, MA Objectives: This study examined speech recognition abilities of cochlear implant (CI) recipients in the spectrally com- plex listening condition of 3 contrasting types of background music, and compared performance based upon listener groups: CI recipients using conventional long-electrode devices, Hybrid CI recipients (acoustic plus electric stimulation), and normal-hearing adults. Methods: We tested 154 long-electrode CI recipients using varied devices and strategies, 21 Hybrid CI recipients, and 49 normal-hearing adults on closed-set recognition of spondees presented in 3 contrasting forms of background music (piano solo, large symphony orchestra, vocal solo with small combo accompaniment) in an adaptive test. Outcomes: Signal-to-noise ratio thresholds for speech in music were examined in relation to measures of speech recog- nition in background noise and multitalker babble, pitch perception, and music experience. Results: The signal-to-noise ratio thresholds for speech in music varied as a function of category of background music, group membership (long-electrode, Hybrid, normal-hearing), and age. The thresholds for speech in background music were significantly correlated with measures of pitch perception and thresholds for speech in background noise; auditory status was an important predictor. Conclusions: Evidence suggests that speech reception thresholds in background music change as a function of listener age (with more advanced age being detrimental), structural characteristics of different types of music, and hearing status (residual hearing). These findings have implications for everyday listening conditions such as communicating in social or commercial situations in which there is background music. Key Words: cochlear implant, music, pitch, preserved hearing, speech perception. Annals of Otology, Rhinology & Laryngology 121(12):782-791. © 2012 Annals Publishing Company. All rights reserved. 782 From the Iowa Cochlear Implant Clinical Research Center (Gfeller, Turner, Driscoll), the Department of Communication Sciences and Disorders (Gfeller, Turner), the School of Music (Gfeller), the Department of Otolaryngology–Head and Neck Surgery (Turner), and the Department of Biostatistics (Oleson, Kliethermes), University of Iowa Hospitals and Clinics, University of Iowa, Iowa City, Iowa. This study was supported by grant 2 P50 DC00242 and grant 1R01 DC000377 from the National Institute on Deafness and Communi- cation Disorders, National Institutes of Health, by grant RR00059 from the General Clinical Research Centers Program, National Cen- ter for Research Resources, National Institutes of Health, and by the Iowa Lions Foundation. Correspondence: Kate Gfeller, PhD, Dept of Otolaryngology, 200 Hawkins Dr, 21035 Pomerantz Family Pavilion, Iowa City, IA 52242-1076. Cochlear implants (CIs) were designed to sup- port speech recognition by persons with profound hearing loss, and are quite effective at doing so in quiet listening environments. 1 Most current-genera- tion CIs transmit only broad features of the spectral envelope in terms of the place-frequency mapping of the multiple-electrode array; the speech proces- sor stimulation rates are unrelated to the precise fre- quency components of the input signals. Although this processing approach is effective in transmitting salient features of speech in quiet, the lack of fine structure and the poor frequency resolution have negative implications for speech perception in noisy listening conditions. 2-5 Turner et al 4 and Stickney et al 6 provided evidence that patients with CIs tend to do quite poorly in recognizing speech in back- ground noise, in particular when the background signal is not steady-state. One explanation is im- plant patients’ inability to use pitch cues to separate the target from the background, as supported by the studies of Qin and Oxenham. 3 Frequency selectivity and related abilities such as pitch perception have been shown to be important factors in the ability to recognize speech in back- grounds of noise or competing talkers. 3 Henry et al 7 showed that frequency resolution tends to be poor- est in profoundly deaf CI recipients using electric stimulation, somewhat better in listeners with vari- ous degrees of sensorineural hearing loss, and best in normal-hearing (NH) listeners. This finding cor- responds with the abilities of these subject groups to understand speech in background noises. 3,8-10 Pa- tients with CIs who have preserved low-frequency acoustic hearing can use the better frequency reso- lution of the acoustic hearing to help them under- DO NOT DUPLICATE Copyrighted Material

Accuracy of cochlear implant recipients in speech reception in the presence of background music

Embed Size (px)

Citation preview

Accuracy of Cochlear Implant Recipients in SpeechReception in the Presence of Background MusicKate Gfeller, PhD; Christopher Turner, PhD; Jacob Oleson, PhD;

Stephanie Kliethermes, MS; Virginia Driscoll, MA

Objectives: This study examined speech recognition abilities of cochlear implant (CI) recipients in the spectrally com-plex listening condition of 3 contrasting types of background music, and compared performance based upon listener groups: CI recipients using conventional long-electrode devices, Hybrid CI recipients (acoustic plus electric stimulation), and normal-hearing adults.Methods: We tested 154 long-electrode CI recipients using varied devices and strategies, 21 Hybrid CI recipients, and 49 normal-hearing adults on closed-set recognition of spondees presented in 3 contrasting forms of background music (piano solo, large symphony orchestra, vocal solo with small combo accompaniment) in an adaptive test.Outcomes: Signal-to-noise ratio thresholds for speech in music were examined in relation to measures of speech recog-nition in background noise and multitalker babble, pitch perception, and music experience.Results: The signal-to-noise ratio thresholds for speech in music varied as a function of category of background music, group membership (long-electrode, Hybrid, normal-hearing), and age. The thresholds for speech in background music were significantly correlated with measures of pitch perception and thresholds for speech in background noise; auditory status was an important predictor.Conclusions: Evidence suggests that speech reception thresholds in background music change as a function of listener age (with more advanced age being detrimental), structural characteristics of different types of music, and hearing status (residual hearing). These findings have implications for everyday listening conditions such as communicating in social or commercial situations in which there is background music.Key Words: cochlear implant, music, pitch, preserved hearing, speech perception.

Annals of Otology, Rhinology & Laryngology 121(12):782-791.© 2012 Annals Publishing Company. All rights reserved.

782

From the Iowa Cochlear Implant Clinical Research Center (Gfeller, Turner, Driscoll), the Department of Communication Sciences and Disorders (Gfeller, Turner), the School of Music (Gfeller), the Department of Otolaryngology–Head and Neck Surgery (Turner), and the Department of Biostatistics (Oleson, Kliethermes), University of Iowa Hospitals and Clinics, University of Iowa, Iowa City, Iowa. This study was supported by grant 2 P50 DC00242 and grant 1R01 DC000377 from the National Institute on Deafness and Communi-cation Disorders, National Institutes of Health, by grant RR00059 from the General Clinical Research Centers Program, National Cen-ter for Research Resources, National Institutes of Health, and by the Iowa Lions Foundation.Correspondence: Kate Gfeller, PhD, Dept of Otolaryngology, 200 Hawkins Dr, 21035 Pomerantz Family Pavilion, Iowa City, IA 52242-1076.

Cochlear implants (CIs) were designed to sup-port speech recognition by persons with profound hearing loss, and are quite effective at doing so in quiet listening environments.1 Most current-genera-tion CIs transmit only broad features of the spectral envelope in terms of the place-frequency mapping of the multiple-electrode array; the speech proces-sor stimulation rates are unrelated to the precise fre-quency components of the input signals. Although this processing approach is effective in transmitting salient features of speech in quiet, the lack of fine structure and the poor frequency resolution have negative implications for speech perception in noisy listening conditions.2-5 Turner et al4 and Stickney et al6 provided evidence that patients with CIs tend to do quite poorly in recognizing speech in back-ground noise, in particular when the background signal is not steady-state. One explanation is im-

plant patients’ inability to use pitch cues to separate the target from the background, as supported by the studies of Qin and Oxenham.3

Frequency selectivity and related abilities such as pitch perception have been shown to be important factors in the ability to recognize speech in back-grounds of noise or competing talkers.3 Henry et al7 showed that frequency resolution tends to be poor-est in profoundly deaf CI recipients using electric stimulation, somewhat better in listeners with vari-ous degrees of sensorineural hearing loss, and best in normal-hearing (NH) listeners. This finding cor-responds with the abilities of these subject groups to understand speech in background noises.3,8-10 Pa-tients with CIs who have preserved low-frequency acoustic hearing can use the better frequency reso-lution of the acoustic hearing to help them under-

DO N

OT DU

PLIC

ATE

Co

pyrig

hted

Mat

erial

783 Gfeller et al, Background Music & Cochlear Implants 783

stand speech in background noise better than can CI patients reliant upon the electric stimulation trans-mitted via the traditional long-electrode (LE) im-plant.4

Although considerable research has been con-ducted regarding speech perception of CI recipients in conditions of broadband noise or multiple talkers, less is known regarding another common acoustic signal that often competes with speech: background music. Background music often comprises a broader frequency range (fundamentals and harmonics) and greater fluctuations in amplitude and timbre than are experienced with human speech. Recipients of CIs are quite likely to encounter background mu-sic in a variety of everyday situations. For exam-ple, background music is frequently played to create the appropriate “ambience” at social gatherings and in places of business. Music scores are also com-mon in movies and television, played in conjunc-tion with theatrical dialog, in order to establish the mood of the scene, or as a cue to signal characters or events.11 Even though these uses of background music are intended to enhance a mood or offer addi-tional communication regarding the situation, music is nevertheless a competing acoustic signal that can mask or interfere with reception of the target speak-er. To date, however, few studies have examined the challenges of speech perception in background mu-sic; some studies have focused on comparing NH or hearing-impaired listeners with conventional hear-ing aid users, and several studies comprised survey data with self-reports.12-14 Additional data on per-ceptual accuracy in the presence of real-world back-ground music would help illuminate the auditory ca-pabilities of CI recipients in this common, everyday listening environment.

This study examines the following questions re-garding speech recognition of CI recipients in the listening condition of competing real-world music. First, does usable low-frequency acoustic hearing assist CI recipients in recognizing speech in differ-ent types of competing background music? Second, what participant characteristics (age, speech percep-tion, musical experience) are influential in recogniz-ing speech in complex listening situations, such as speech in background music? Third, is better speech recognition in the presence of competing sound re-lated to more accurate pitch perception? Finally, how does speech in background music relate to speech recognition in other complex listening con-ditions (eg, noise or multitalker babble)?

METHODS

Participants. The participants included 175 adult

CI recipients and 49 NH adults. The implant recipi-ents used a variety of internal devices and speech processing strategies. We divided the participants into an LE group of 154 patients, who used a vari-ety of devices (all 22-mm internal array) and strat-egies, and a Hybrid device group, who received acoustic plus electric stimulation (n = 21). Types of internal electrode arrays used by LE CI recipi-ents in this study included the Freedom, CIIHF1/2, Clarion, 90K, CI24M, CI22, Contour, HiFocus, and Ineraid. The Ineraid users underwent sequential im-plantation of the 90K and CI24M devices. Because none of the LE devices were significantly different in relation to the outcomes of the Speech Reception Threshold in Music Test (SRTM), no further detail regarding specific devices is covered in this report. Types of external signal processing strategies used by LE CI recipients in this study included HiRes-S/P, ACE/(RE), continuous interleaved sampling (CIS), SPEAK, and Conditioner. Because none of the strategies were significantly different in relation to the SRTM outcomes, no further detail regarding specific devices is covered in this report. Prelimi-nary analyses of the results indicated that none of the devices or strategies within the LE group were superior with regard to accuracy on pitch ranking or the primary dependent variable. Consequently, the data were collapsed into one group (LE group) for subsequent analyses.

The CI recipients in the Hybrid group underwent implantation with a 10-mm internal electrode ar-ray. The technical features of the Hybrid device and specific selection criteria have been described.15-17 Briefly, candidates for the Hybrid device are drawn from a population that is different with regard to au-ditory profile from typical LE candidates, in that they should have good residual acoustic hearing for low frequencies; however, like LE candidates, they have extremely poor hearing for high frequencies. Selec-tion criteria for the Hybrid device include severe to profound hearing loss for frequencies over 1,500 Hz, consonant-nucleus-consonant (CNC) word recogni-tion scores between 10% and 60% in the ear to be implanted, and CNC word recognition scores for the contralateral ear equal to or better than the scores for the ear to be implanted (although not more than 80%). The Hybrid device stimulates 6 to 10 chan-nels in the basal end of the cochlea with high-fre-quency information using a CIS processing strate-gy; low-frequency information is perceived via the patient’s residual hearing. Eighteen patients in the Hybrid group also used hearing aids in conjunction with their CI, which amplified their preserved acous-tic hearing in the implanted and/or contralateral ear.

Although the LE and Hybrid groups differed with

DO N

OT DU

PLIC

ATE

Co

pyrig

hted

Mat

erial

TABLE 1. MEANS AND STANDARD DEVIATIONS FOR DEMOGRAPHIC VARIABLES AND SPEECH IN QUIET SCORES

Duration of Pure Tone Pure Tone Hearing in Age at Months of Profound Average Average Noise TestGroup Testing (y) Implant Use Deafness (y) (dB) at 250 Hz (dB) at 500 Hz Score (in Quiet)Long electrode (n = 154) Mean ± SD 62.1 ± 14.5 80.5 ± 56.4 10.4 ± 12.1 79.9 ± 21.2 93.7 ± 16.8 87.6 ± 19.1 Range 21-84 11-265 0-50 20-105 35-110 0-100Hybrid (n = 21) Mean ± SD 60.1 ± 11.2 29.6 ± 26.2 NA 32.5 ± 11.6 45.3 ± 16.7 NA Range 33-81 2-99 10-50 15-95Normal hearing (n = 49) Mean ± SD 26.2 ± 5.6 NA NA NA NA NA Range 19-42

NA — not available or not applicable.

784 Gfeller et al, Background Music & Cochlear Implants 784

regard to the length of the internal array and the number of electrodes, these two groups also differed considerably with regard to hearing history (eg, du-ration of profound deafness; Table 1) and residual hearing. The LE CI users had, for the most part, es-sentially no residual hearing in the implanted ear af-ter the implantation surgery. The preoperative pure tone thresholds for this group averaged 80 and 94 dB hearing level (HL) for 250 and 500 Hz, respec-tively. Because implantation surgery usually de-stroys residual hearing, postoperative threshold test-ing for these patients rarely produces any responses. In contrast, the Hybrid users had substantial residual hearing in the low frequencies. The mean unaided thresholds for this group were 32.5 and 45.25 dB HL for 250 and 500 Hz, respectively. When hearing aids were used, these thresholds were even more sensi-tive, averaging 17.5 dB HL at 250 Hz and 25 dB HL at 500 Hz.

An NH group, who served as a normative refer-ence for performance on the primary dependent vari-able (Speech Reception Threshold in Music Test; SRTM), was recruited through advertisements. The participants were required to pass a hearing screen in order to be included in the study. None had ex-tensive formal music training (eg, extensive high school or college-level instruction) as determined through a music background questionnaire.11 Table 1 provides the means and standard deviations for the comparison groups regarding demographic vari-ables: age at time of testing, months of CI use, dura-tion of profound deafness, pure tone thresholds, and scores (LE group) for speech perception (HINT) in quiet. A precise duration of profound deafness was not available for 23 patients in the LE group, and profound deafness was not a relevant variable for the Hybrid group, given that criteria for implanta-tion with that device require that the individual pos-sess better residual hearing. All participants gave consent to information given in compliance with

ethical standards for human subjects.Speech Reception Threshold in Music Test. The

primary dependent variable in this study was the SRTM, designed to examine the ability of the lis-tener to accurately recognize spondees against 3 contrasting types of background music. This test was modeled after a speech reception in noise task (speech reception threshold; SRT) reported by Tur-ner et al4 and described below, which examined spon dee recognition in broadband noise or multi-talker babble. The SRTM requires identification of a spondee word, spoken by a female talker in the presence of background music. In our study, the 12 spondee items were homogeneous in difficulty, were digitized from a commercial recording,18 and were presented in random order during each presen-tation. The fundamental frequency of the spondee items ranged from 212 to 250 Hz, and the spondees ranged in duration from 1.12 to 1.63 seconds. The spondees were presented at 65 dBC sound pressure level (SPL) in sound field at 45° azimuth, 1 foot (30 cm) on the listener’s left against background music. All spondees were equalized to the same root mean square level.

Spondee recognition was used as the target speech for several reasons. Prior studies4 indicated that nearly all CI recipients easily recognized each of these spondee words in quiet, so this task primar-ily measured the ability of the listeners to perceive speech in noise, rather than their underlying speech recognition ability. The use of spondees rather than sentences can minimize potentially influential vari-ables such as contextual cues and cognitive abili-ties. In addition, the use of spondees permits a more direct comparison with speech perception in noise (SRT) data from our laboratories, reported below.

The word “music” represents a vast universe of structural combinations of pitches, rhythms, and timbres that vary with regard to masking properties and structural complexity in relation to cognitive

DO N

OT DU

PLIC

ATE

Co

pyrig

hted

Mat

erial

C

A

B

processing. Clearly, it is not feasible to examine all possible structural (pitch, timbre, rhythm, intensity) combinations comprising real-world music, or the specific interaction between those features and spo-ken communication. In this experiment, we selected 3 real-world excerpts that contrast with one another on the parameters of timbral blends, presence or ab-sence of linguistic information (lyrics), size of musi-cal ensemble (1 instrument, 3-person instrumental-vocal combo, full symphonic orchestra), and genre. The excerpts, of approximately 2 seconds in length, included 1) a classical piano solo by Beethoven, with a structurally predictable and repe titive arpeg-giated bass line against a simple solo melody line in the right hand (“piano”); 2) an orchestral com-

position by 20th-century composer Igor Stravinski (representing the full range of frequencies heard in contemporary orchestras), made up of large, disso-nant chords played in an irregular rhythmic pattern (“orchestra”); and 3) popular music with a linguis-tic component, that is, a solo male vocalist (Billy Joel) singing lyrics (which, hypothetically, could be confused with words in the spon dee list) against a small ensemble of drum set (bass drum with cym-bal) and muted guitar (“vocal”). The small ensem-ble played the last in a predictable rhythmic pattern of duple meter. All musical excerpts were calibrated to be presented at the same loudness during testing. Spectrograms for the 3 contrasting background mu-sic options appear in Fig 1.

785 Gfeller et al, Background Music & Cochlear Implants 785

Fig 1. Spectrograms of 3 contrasting background music stimuli. Vertical axes display frequency in increments of 2,000 Hz. Horizontal axes represent time in hundredths of milliseconds. A) Piano solo. B) Orchestra. C) Vocal.DO

NOT

DUPL

ICAT

E

Co

pyrig

hted

Mat

erial

After each SRTM item (spondee plus background music), the listeners responded on a touch screen with the spondee that they thought had been present-ed. The listeners were required to respond on each trial and were instructed to guess if they were not sure of the correct answer. The level of the back-ground music was increased by 2 dB after a correct response and decreased by 2 dB after an incorrect response, thus converging on the 50%-correct level of performance. The adaptive procedure continued until 14 reversals had occurred, and then the aver-age signal-to-noise ratio (SNR) of the final 10 re-versals was taken as the SNR-in-noise value. (A lower SNR means that the listener could understand speech in more-adverse conditions.) At least 4 ex-perimental runs of this SNR in background music procedure were obtained from each subject; the final value was taken as the average of the final 3 experi-mental runs.

Speech-in-Noise Tasks. The 2 tasks in the SRT test are described in detail in Turner et al.4 Briefly, the speech reception tasks required the participant to identify a spondee word spoken by a female talk-er in the presence of either steady-state broadband sound (steady-state white noise that had been low-pass-filtered at –12 dB per octave above 400 Hz, to generally simulate the long-term speech spectrum) or babble with sentences by a male talker and a fe-male talker mixed together at equal root mean square amplitudes. The spondees were presented at 65 dB SPL in sound field, 1 foot (30 cm) on the listener’s left, against background music. The same sample of noise background was presented on each trial. The level of the background was increased by 2 dB after a correct response and decreased by 2 dB after an incorrect response, thus converging on the 50%-cor-rect level of performance. The speech recognition task was similar to that used in the SRTM.

Pitch Ranking. The pitch ranking task was de-scribed in detail by Gfeller et al.19 Briefly, this test measured how accurately the participant could de-termine the direction of a pitch change (higher or lower). Each trial consisted of 3 tones presented se-quentially. The first 2 tones were the same frequency, and the third tone was a different frequency. The re-sponse task was to indicate via touch screen whether the final tone was higher or lower in pitch than the first 2. No feedback was given on accuracy.

The stimuli in this task were pure tones ranging from 131 Hz to 1,048 Hz and were presented in a total of 540 fundamental frequency pairs, with pairs ranging in interval sizes of 1, 2, 3, or 4 semitones. The tones were generated digitally at a 44.1-kHz sampling rate and were presented through the out-

put of a DigiDesign Audiomedia III sound card. The tones were 500 ms in duration, with 25-ms rise-fall times. The time interval between tones of each trial sequence was 300 ms. The tones were roved across an 8-dB level around the average presentation level to minimize loudness cues.

Pitch ranking was calculated by dividing the num-ber of correct responses by the total number of trials (6) at each combination of base frequency and inter-val size. The pitch ranking measure is modeled as a function of the size of the interval (difference in fre-quency between 2 sequential fundamental frequen-cies) and the base frequency class of the 2 sequential fundamental frequencies. The stimuli were present-ed in sound field at suprathreshold levels (average level, 87 dB SPL).

The LE group was tested with only their CIs. Thirty-one bilateral LE CI users were tested using both devices; the contralateral ears of monaural LE recipients were plugged during testing. All of the Hybrid participants were tested in the Hybrid CI “condition” (Hybrid plus hearing aid in the ipsilater-al ear), except for 4, who used either no hearing aid or who used only a contralateral hearing aid because of a loss of residual hearing in the implant ear and were tested with only the CI.17 Those who used the Hybrid device had their contralateral ear plugged during testing. The Hybrid patients had symmetri-cal hearing losses. The use of a contralateral earplug resulted in a conveyed signal at least 35 dB, and up to 70 dB, more intense in the implanted ear than in the plugged ear, thus reducing the sensation level of the opposite ear so that pitch discrimination from that ear would offer minimal, if any, assistance in the task. These patients had significant hearing loss in the opposite ear; thus, hearing from the plugged ear was likely not a serious concern.

Auditory Profile and Musical Experience Vari-ables. In order to characterize the 2 CI groups on potentially influential variables, we gathered infor-mation on age at time of testing, duration of pro-found deafness, auditory thresholds at 250 and 500 Hz (with and without hearing aids), months of im-plant use, HINT scores in background noise (Table 1), and CNC word scores. We also gathered infor-mation on amount of music training and listening habits before and after implantation using the Iowa Music Background Questionnaire.12 Two values for music training were documented: a score for mu-sic training during elementary school, consisting of length of participation in music lessons, classes, and ensemble participation, and a score for music train-ing (classes and lessons) during high school and be-yond (eg, college, community involvement). Music

786 Gfeller et al, Background Music & Cochlear Implants 786

DO N

OT DU

PLIC

ATE

Co

pyrig

hted

Mat

erial

listening habits, or the typical amount of music lis-tening time per week, were documented for the peri-ods before and after cochlear implantation.

Correlations Among Speech in Background Mu-sic, Speech in Noise, and Pitch. Correlational analy-ses were used to examine the relations among the primary dependent variable (SRTM score), pitch ranking, and speech in background noise (SRT) for the CI participants. The relations between SRTM score, pitch ranking, and speech in background noise or babble were considered of interest because prior studies indicated a benefit of frequency information in segregation of the target speaker from background noise; thus, more accurate pitch perception, which is better presented through acoustic as opposed to electric hearing, may improve speech recognition in various noisy listening environments.3,8-10 Scores and thresholds were not available for all CI recipi-ents on each of the measures used in the correlation-al analyses. Thus, the sample size will be noted for each measure.

Statistical Methods. The first research question examined whether low-frequency acoustic hear-ing in CI users assists them in recognizing speech in different types of competing background music. Because each subject was measured on each musi-cal stimulus, a repeated-measures analysis of vari-ance (ANOVA) was employed with use of PROC MIXED in SAS version 9.2. Accuracy on speech in background music (SRTM) was investigated as a function of the following predictors: group mem-bership (between individual factors), type of musi-cal stimuli, age at time of testing, and 2-way inter-actions between the predictors. Additional hearing covariates (eg, months of CI use, speech perception scores) were not included in the analyses because we did not have background hearing information on the NH subjects. The predictor variables were in-cluded as fixed effects in the model. Adjusting for hearing covariates was an important consideration in this study. Therefore, a follow-up repeated-mea-sures ANOVA using only the Hybrid and LE groups was conducted with the additional hearing covari-ates to clarify the potentially influential factors of auditory profile and musical background.

RESULTS

The data set contained 224 individuals, but only 202 individuals were used in SRTM analysis, be-cause 22 individuals were excluded as a result of missing SRTM scores or missing predictor values. The final ANOVA model used to assess the influ-ence of low-frequency acoustic hearing in recogni-tion of differing background music included group

membership, type of musical stimuli, age at time of testing, and an interaction between group and music. Type of music (p < 0.0001), group (p < 0.0003), age at time of testing (p < 0.0001), and interaction be-tween group and type of music (p < 0.0001) were all significant predictors of SRTM thresholds.

A compound symmetry correlation structure was used to account for within-subject measurements. The estimated total variance in SRTM score was 43.75, and the estimated variance from the com-pound symmetry was 30.17; thus, the within-sub-ject correlation between a subject’s scores on the 3 types of music (piano, orchestra, vocal) accounted for 68% of the overall variability in the sample. The parameter estimate on age was 0.1941; that is, when we compare 2 individuals within the same group and on the same music stimulus who differ in age by 1 year, on average the older individual will have an SRTM score 0.1941 dB higher than the individual who is 1 year younger. Therefore, as age increased, thresholds were on average larger (poorer). Simi-larly, someone 10 years older would be expected to have thresholds about 2 dB larger (1.941), condi-tional on music stimulus and group. No significant difference was detected between groups in the re-lationship of SRTM score to age (ie, no significant age-by-group interaction), despite the differing age ranges in each group.

Speech recognition for all groups, as well as the extent to which acoustic hearing was benefi-cial, varied depending upon the particular type of background music. Because the interaction between group and music was statistically significant, the dif-ference in speech recognition by music type need-ed to be assessed 1 group at a time. We report the follow-up tests using the Tukey-Kramer adjusted p value for multiple comparisons. For the LE, Hybrid, and NH groups, all 3 types of music were signifi-cantly different from one another in their effect on SRTM scores (p < 0.0001; Fig 2). A negative thresh-old for the SNR indicates greater ability to recog-nize speech at a lower mean threshold than the aver-age background level. The data values indicated that all 3 groups had the best SNR thresholds on piano, the second best on vocal, and the poorest on orches-tra. All 3 groups had negative mean SRTM scores for both vocal and piano music types; only the NH group had a negative mean threshold for the orches-tra music. However, even the NH group experienced difficulty with speech reception in the orchestral lis-tening condition.

There was variation in the results when we com-pared groups within each music type. The NH group had the best thresholds for all 3 music types. Al-

787 Gfeller et al, Background Music & Cochlear Implants 787

DO N

OT DU

PLIC

ATE

Co

pyrig

hted

Mat

erial

though the Hybrid group had lower mean thresh-olds than the LE group in the vocal and orchestra conditions, the difference was not significant (with an a priori significance level of 0.05). None of the groups were significantly different from one anoth-er when listening to the orchestra condition. In the piano condition, the LE and Hybrid groups had sig-nificantly poorer thresholds than the NH group (p < 0.0001); there was no significant difference between the LE and Hybrid groups (p = 0.9943; Fig 2). In the vocal music condition, the thresholds for the LE and Hybrid groups were poorer than the thresholds for the NH group, but only the LE group difference was significant (p < 0.0001 and p = 0.0903, respective-ly). Although the thresholds for the Hybrid group were not statistically different from those for the LE group (p = 0.5651), the distributions (Fig 2) show a potential difference in SRTM scores between the two groups (covariate-adjusted mean thresholds of –5.66 and –2.62 for Hybrid and LE groups, respec-tively).

The unaccounted-for variability in both groups, coupled with the small sample size in the SE group, may have contributed to the inability to detect a sig-nificant difference; therefore, a secondary analysis was conducted between the LE and Hybrid groups, with the inclusion of additional hearing predic-tors not available for the NH group, to help explain more of the variability in SRTM thresholds. Addi-tional factors considered were months of implant use, CNC word scores, music training during ele-mentary school, music training during high school and beyond, music listening habits before and after cochlear implantation, and all reasonable 2-way in-teractions of the predictors. The final repeated-mea-

sures ANOVA model included the predictors music type, group, age, CNC word scores, music listening habits before and after implantation, interaction be-tween CNC words score and music, and interaction between music and group.

For the two CI groups, the significant predictors for SRTM scores were music type (p < 0.0001), group (p = 0.0098), age (p < 0.0001), and CNC word scores (p < 0.0001). Music listening habits be-fore cochlear implantation were not significant (p = 0.0958). Significant interactions were CNC word scores by music (p < 0.0001) and music by group (p = 0.0042). On average, older individuals had higher SRTM results. The effects of music, group, and CNC word scores cannot be assessed without evaluating the interaction. A significant difference in SRTM scores between the LE and Hybrid groups was now found for the vocal (p = 0.0170) and or-chestra (p = 0.0383) music conditions, but the piano condition was still not significant (p = 0.9984; Fig 3). In general, the better the CNC word score, the smaller (better) the SRTM score. However, the CNC word score was the most influential in predicting the thresholds in the piano condition, and the least in-fluential in predicting the thresholds in the orchestra condition.

Correlations Between Speech Recognition in Complex Listening Conditions and Pitch Ranking. Additional research questions involved the rela-tions among speech recognition in competing mu-sic, speech in background noise, and accurate pitch perception. Data for the LE and Hybrid groups were consolidated for correlation analyses between the SRTM score, SRT for noise or babble, and pitch ranking. As Table 2 indicates, there were statisti-

788 Gfeller et al, Background Music & Cochlear Implants 788

Fig 3. Covariate adjusted mean signal-to-noise ratio thresholds for Speech Reception Threshold in Music test for cochlear implant users, with 95% confidence inter-vals.

Fig 2. Box plots for raw signal-to-noise ratio thresholds for Speech Reception Threshold in Music test.

DO N

OT DU

PLIC

ATE

Co

pyrig

hted

Mat

erial

cally significant correlations between pitch ranking and all measures except speech in steady-state back-ground noise. The strongest relations were between pitch ranking and speech in babble for the orchestra condition, which was the most challenging compet-ing condition in terms of music background.

Correlations Between SRTM Scores and Speech in Noise. In considering overall means for the SRT, the mean threshold for noise was slightly lower than the mean threshold for babble. The overall SRTM (all types of music) and SRT mean scores were high-ly correlated (r = 0.70; p < 0.0001). This correla-tion differed by group. Within the LE group, SRTM and SRT scores were highly correlated (r = 0.73; p < 0.0001), whereas there was not a significant correlation within the Hybrid group (r = 0.41; p = 0.19). The lack of significant correlation in the Hy-brid group could be related to the small sample size; that is, only a small number of Hybrid participants had both SRTM and SRT scores available for analy-sis (n = 12). With a significant correlation, as the SRT scores decreased (which indicates more accu-rate recognition in adverse listening conditions), the SRTM scores were expected to decrease as well.

The pairwise correlations within the subgroups as a function of music type were all relatively strong, and in a positive direction. Within the LE group, the SRT in noise was most highly correlated with the SRTM scores with piano, followed by orchestra, and then vocal (all p < 0.0001). For the Hybrid group, the only statistically significant correlation was orches-tra with SRT in noise (p = 0.031); the correlation be-tween vocal and SRT in noise neared significance (p = 0.054). The small number of observations was an issue in analyses within the Hybrid group.

DISCUSSION

As these data indicate, the extent to which back-ground music causes a problem with regard to speech recognition varies as a function of the structural properties of the music, as well as the type of audi-tory input (NH, acoustic plus electric, electric) and auditory status (residual hearing, speech perception

abilities). This study used 3 structurally different ex-cerpts of real-world music as competing stimuli. For example, even the NH listeners experienced consid-erable difficulty in the orchestra condition. These results emphasize that persons with a healthy hear-ing mechanism, as well as those with compromised hearing, can experience communicative difficulties in commercial or social situations (eg, restaurants, parties) in which background music is played, de-pending upon the characteristics of the music.

Future studies that systematically examine spe-cific acoustic variables of background music could help determine particular acoustic elements that cre-ate more challenging listening conditions for CI re-cipients than for others. For example, Ekström and Borg14 found that music had a lower masking effect for both NH listeners and hearing-impaired listen-ers (hearing aid users) when the music was played at a slower tempo and in a higher frequency range. The clinical application of such information, unfor-tunately, is tempered by the reality that real-world music as heard in commercial or social settings is naturally quite variable and includes ongoing and rapid changes in structural features (dynamic range, instrumental combinations, variations in tempo20). Furthermore, control of the musical selections and sound system in public places is unlikely to be un-der listener control. Thus, practically speaking, the findings of the present study emphasize the impor-tance of effective counseling for CI recipients re-garding proactive management of differing and ev-er-changing listening environments (direct requests for a more quiet location in restaurants, or that the loudness of the music be turned down).

With regard to group differences (LE and Hy-brid), after we adjusted for hearing covariates, the Hybrid group did achieve a statistically significant difference from the LE group at the 0.05 level. This finding helps to confirm the notion that the pattern of distributions was different. The mean thresholds were –1.91 dB for the LE group and –5.30 for the Hybrid group — a difference that has possible clini-cal implications. It is also important to note that the

789 Gfeller et al, Background Music & Cochlear Implants 789

TABLE 2. CORRELATIONS BETWEEN PITCH RANKING TASK AND SPEECH RECOGNITION IN NOISE ORMUSIC FOR COCHLEAR IMPLANT RECIPIENTS

Pitch Ranking Task SRT in Noise SRT in Babble SRTM Piano SRTM Vocal SRTM OrchestraOverall r = –0.16; p = 0.071 r = –0.30; p = 0.0018 r = –0.21; p = 0.008 r = –0.23; p = 0.003 r = –0.31; p < 0.0001 (n = 132) (n = 127) (n = 168) (n = 166) (n = 167)Long electrode r = –0.24; p = 0.0013 r = –0.28; p = 0.0013 r = –0.26; p = 0.0013 r = –0.24; p = 0.0093 r = –0.30; p = 0.0002 (n = 120) (n = 116) (n = 151) (n = 150) (n = 151)Hybrid r = 0.35; p = 0.2579 r = –0.08; p = 0.8054 r = 0.03; p = 0.9218 r = 0.027; p = 0.9200 r = 0.09; p = 0.9749 (n = 12) (n = 11) (n = 16) (n = 16) (n = 1)

SRT — speech reception threshold; SRTM — Speech Reception Threshold in Music.

DO N

OT DU

PLIC

ATE

Co

pyrig

hted

Mat

erial

LE group thresholds ranged from –21.6 to 31.56 dB, whereas the Hybrid thresholds had much less variation in scores, ranging from –18.68 to 9.28 dB. Thus, on average, the bottom tail of participants using the Hybrid device performed better than did the bottom tail of participants using the LE device. Among the worst-performing individuals (the top tail), only 5% scored worse than 7.24 dB in the Hy-brid group, whereas the 5% cutoff for the LE group was 9.04 dB. Furthermore, the 25th, 50th, and 75th percentiles for the Hybrid group were –12.6, –5.56, and 1.14 dB, respectively, whereas the same percen-tiles were –7.24, –1.76, and 2.16 dB, respectively, for the LE group. The worst-performing participants were considerably worse in the LE group than in the Hybrid group. Thus, the lack of significance in the initial model that included NH appears to be an ar-tifact of the small Hybrid sample size and the large variability exhibited by LE participants.

In general, CI recipients were hampered in word recognition by background music more than were NH listeners, but those CI recipients who as a group had better residual hearing (and thus access to acous-tic stimulation, ie, Hybrid group) had better mean thresholds than did those in the LE group for the two more challenging music conditions. There was a significant advantage for the Hybrid users in the vocal music condition.

These results, although less straightforward than those of some experiments reviewed by McDer-mott,21 suggest possible real-world listening advan-tages of preserving low-frequency residual acous-tic hearing, particularly in supporting some com-plex listening situations. The benefits of preserved acoustic hearing are of particular importance given the current challenges in providing better pitch dis-crimination through electric stimulation.21 This idea is emphasized by the lack of difference (as noted in the preliminary analyses) among unilateral or bilat-eral CI users of different device types or strategies (ACE, SPEAK, CIS).

The correlations between the SRTM and speech in noise (SRT) measures (r = 0.70), as well as the

SRTM and SRT scores with pitch ranking (Table 2) results, provide additional support to prior stud-ies that indicated that the ability of patients to sep-arate voices in multitalker situations and complex background music is related to their ability to accu-rately perceive pitch, and that preservation of low-frequency acoustic hearing can be quite helpful.4,22 Segregation of voices by use of voice pitch has been shown to be important by a number of research-ers.23,24 However, it might also be that residual hear-ing (and its better pitch sensations) may only be one tool that listeners use to separate speech from music. There may be other cues or abilities that CI listen-ers can use to help in this task, which would account for the high levels of performance seen by some LE users. Some studies have shown that acoustic plus electric stimulation is less effective in identifying voices or gender than is the LE device.25

This study did not examine directly the specific contributions of residual hearing (eg, pure tone av-erages, testing the hearing aid condition only) to per-ceptual acuity. Some preliminary studies of bimod-al conditions (LE CI plus hearing aids) suggest that there may be a synergistic effect of bimodal stimula-tion26-28; thus, tests that compare various conditions (LE CI plus hearing aid, short electrode alone, hear-ing aid alone, and short electrode plus hearing aids on the ipsilateral and/or contralateral side) would help to further elucidate the relative contributions of electric and acoustic stimulation and implant tech-nology.

Consistent with prior studies,13,29 age at time of testing was a significant predictor of poorer perfor-mance on the SRTM. More advanced age is asso-ciated with greater difficulty in spectrally complex listening tasks.30,31 These findings suggest a partic-ular need for accommodations in acoustic environ-ments (eg, eliminating or turning down the volume of background music) to support optimal commu-nication of CI recipients who are more advanced in age, particularly with regard to real-world listening tasks such as listening to a conversational partner in the presence of ambient music in a commercial set-ting or background music in a movie or at a party.

790 Gfeller et al, Background Music & Cochlear Implants 790

REFERENCES1. Wilson B. Cochlear implant technology. In: Niparko JK,

Kirk KI, Mellon NK, Robbins AM, Tucci DL, Wilson BS, eds. Cochlear implants: principles and practices. New York, NY: Lippincott, Williams & Wilkins, 2000:109-18.

2. Gantz BJ, Turner C, Gfeller KE, Lowder MW. Preser-vation of hearing in cochlear implant surgery: advantages of combined electrical and acoustical speech processing. Laryn-goscope 2005;115:796-802.

3. Qin MK, Oxenham AJ. Effects of simulated cochlear-im-

plant processing on speech reception in fluctuating maskers. J Acoust Soc Am 2003;114:446-54.

4. Turner CW, Gantz BJ, Vidal C, Behrens A, Henry BA. Speech recognition in noise for cochlear implant listeners: ben-efits of residual acoustic hearing. J Acoust Soc Am 2004;115: 1729-35.

5. Nelson PB, Jin S-H. Factors affecting speech understand-ing in gated interference: cochlear implant users and normal-hearing listeners. J Acoust Soc Am 2004;115:2286-94.

DO N

OT DU

PLIC

ATE

Co

pyrig

hted

Mat

erial

791 Gfeller et al, Background Music & Cochlear Implants 791

6. Stickney GS, Zeng F-G, Litovsky R, Assmann P. Cochle-ar implant speech recognition with speech maskers. J Acoust Soc Am 2004;116:1081-91.

7. Henry BA, Turner CW, Behrens A. Spectral peak reso-lution and speech recognition in quiet: normal hearing, hear-ing impaired, and cochlear implant listeners. J Acoust Soc Am 2005;118:1111-21.

8. Turner CW. Hearing loss and the limits of amplification. Audiol Neurootol 2006;11(suppl 1):2-5.

9. Fu QJ, Shannon RV, Wang X. Effects of noise and spec-tral resolution on vowel and consonant recognition: acoustic and electric hearing. J Acoust Soc Am 1998;104:3586-96.

10. Friesen LM, Shannon RV, Baskent D, Wang X. Speech recognition in noise as a function of the number of spectral channels: comparison of acoustic hearing and cochlear im-plants. J Acoust Soc Am 2001;110:1150-63.

11. Gfeller KE. Music: a human phenomenon and therapeu-tic tool. In: Davis WB, Gfeller KE, Thaut MH, eds. An introduc-tion to music therapy theory and practice. 3rd ed. Silver Spring, Md: American Music Therapy Association, 2008:41-75.

12. Gfeller K, Christ A, Knutson JF, Witt S, Murray KT, Ty-ler RS. Musical backgrounds, listening habits, and aesthetic en-joyment of adult cochlear implant recipients. J Am Acad Audiol 2000;11:390-406.

13. Gfeller K, Jiang D, Oleson JJ, Driscoll V, Knutson JE. Temporal stability of music perception and appraisal scores of adult cochlear implant recipients. J Am Acad Audiol 2010;21:28-34.

14. Ekström S-R, Borg E. Hearing speech in music. Noise Health 2011;13:277-85.

15. Gantz BJ, Turner CW. Combining acoustic and electric hearing. Laryngoscope 2003;113:1726-30.

16. Woodson EA, Dempewolf RD, Gubbels SP, et al. Long-term hearing preservation after microsurgical excision of ves-tibular schwannoma. Otol Neurotol 2010;31:1144-52.

17. Gantz BJ, Hansen MR, Turner CW, Oleson JJ, Reiss LA, Parkinson AJ. Hybrid 10 clinical trial: preliminary results. Au-diol Neurootol 2009;14(suppl 1):32-8.

18. Harris RW. Speech audiometry materials compact disk. Provo, Utah: Brigham Young University, 1991.

19. Gfeller K, Turner C, Oleson J, et al. Accuracy of cochlear implant recipients on pitch perception, melody recognition, and speech reception in noise. Ear Hear 2007;28:412-23.

20. Chasin M. Music and hearing aids. Hearing J 2003;56:36-38, 40, 41.

21. McDermott H. Benefits of combined acoustic and electric hearing for music and pitch perception. Semin Hear 2011;32: 103-14.

22. Qin MK, Oxenham AJ. Effects of introducing unpro-cessed low-frequency information on the reception of envelope-vocoder processed speech. J Acoust Soc Am 2006;119:2417-26.

23. Assmann PF, Summerfield Q. Modeling the perception of concurrent vowels: vowels with different fundamental fre-quencies. J Acoust Soc Am 1990;88:680-97.

24. Brokx JPL, Nooteboom SG. Intonation and the percep-tual separation of simultaneous voices. J Phonetics 1982;10:23-36.

25. Dorman MF, Gifford RH, Spahr AJ, McKarns SA. The benefits of combining acoustic and electric stimulation for the recognition of speech, voice and melodies. Audiol Neurootol 2008;13:105-12.

26. Dillier N. Combining cochlear implants and hearing in-struments. In: Proceedings of the Third International Pediatric Conference, Chicago, Ill, 2004:163-72.

27. Kong YY, Stickney GS, Zeng FG. Speech and melody recognition in binaurally combined acoustic and electric hear-ing. J Acoust Soc Am 2005;117:1351-61.

28. El Fata F, James CJ, Laborde ML, Fraysse B. How much residual hearing is “useful” for music perception with cochlear implants? Audiol Neurootol 2009;14(suppl 1):14-21.

29. Gfeller K, Turner C, Mehr M, et al. Recognition of famil-iar melodies by adult cochlear implant recipients and normal-hearing adults. Cochlear Implants Int 2002;3:29-53.

30. Gordon-Salant S, Fitzgibbons PJ. Recognition of multi-ply degraded speech by young and elderly listeners. J Speech Hear Res 1995;38:1150-6.

31. Martin JS, Jerger JF. Some effects of aging on central au-ditory processing. J Rehabil Res Dev 2005;42(suppl 2):25-44.

DO N

OT DU

PLIC

ATE

Co

pyrig

hted

Mat

erial