7
American Auditory Society Annual Meeting, March 8-10, 2012 Nonlinear Frequency Compression: Balancing Start Frequency and Compression Ratio Joshua M. Alexander Department of Speech, Language, and Hearing Sciences Purdue University, West Lafayette, IN 47907 http://www.TinyURL.com/PurdueEar Research Question Listeners with hearing aids often have limited access to important high-frequency speech information. For moderately impaired listeners, this can occur because the miniature receivers are unable to provide sufficient high-frequency amplification or cannot do so without audible whistling and overtones caused by feedback. For more severely impaired listeners, the inner hair cells that code these frequencies may be absent or non-functioning. Frequency lowering techniques, including nonlinear frequency compression (NFC), have been suggested as a means of re-introducing high-frequency speech cues to these listeners. Compared to other methods, NFC is unique in that the low-frequency spectrum below a programmable start frequency is unaltered to help preserve signal quality. The high-frequency spectrum is compressed toward the start frequency by an amount determined by the compression ratio (CR). CR corresponds very closely with bandwidth reduction (i.e., reduction in spectral resolution) on a normal-hearing Equivalent Rectangular Bandwidth (ERB N ) scale (Moore, 2003). When implementing any frequency- lowering algorithm, the upper frequency limit of aided audibility (the “max output frequency”) is critical because it helps inform about the frequency range that should be targeted for lowering and about where it can be moved. Because the NFC start frequency and CR both influence how frequencies are remapped, there are infinite ways the unaidable high-frequency spectrum can be repackaged into the audible range of the listener. This project examines the perceptual tradeoffs that occur when trying to optimize the choice of NFC start frequency and CR to fit a moderately-severe to profound high-frequency hearing loss and a mild to moderate high-frequency loss. On the one hand, lower start frequencies might be detrimental for phonemes that rely heavily on formant frequency, especially vowels. On the other hand, lower start frequencies could be beneficial because a) they allow a greater amount of high-frequency information to be lowered if CR is fixed, or b) they allow for lower CRs (less reduction in spectral resolution) if the input bandwidth (the “max input” frequency) is fixed. Similarly, it is uncertain whether CR should be kept low to maintain spectral resolution or should be increased so that a greater amount of high- frequency information can be lowered into the range of audibility and whether this effect depends on start frequency (e.g., lower CRs might be best for low start frequencies, but less important for high start frequencies where formant frequencies are less critical). Supported by NIDCD grant 1RC1DC010601-01

Nonlinear Frequency Compression: Balancing Start Frequency …web.ics.purdue.edu/~alexan14/Publications_files/... · 2013. 5. 28. · Group 1: Simulation of moderately-severe to profound

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Nonlinear Frequency Compression: Balancing Start Frequency …web.ics.purdue.edu/~alexan14/Publications_files/... · 2013. 5. 28. · Group 1: Simulation of moderately-severe to profound

American Auditory Society Annual Meeting, March 8-10, 2012

Nonlinear Frequency Compression: Balancing Start Frequency and Compression Ratio

Joshua M. Alexander

Department of Speech, Language, and Hearing Sciences

Purdue University, West Lafayette, IN 47907

http://www.TinyURL.com/PurdueEar

Research Question

Listeners with hearing aids often have limited access to important high-frequency speech information.

For moderately impaired listeners, this can occur because the miniature receivers are unable to provide

sufficient high-frequency amplification or cannot do so without audible whistling and overtones caused

by feedback. For more severely impaired listeners, the inner hair cells that code these frequencies may

be absent or non-functioning. Frequency lowering techniques, including nonlinear frequency

compression (NFC), have been suggested as a means of re-introducing high-frequency speech cues to

these listeners.

Compared to other methods, NFC is unique in that the low-frequency spectrum below a programmable

start frequency is unaltered to help preserve signal quality. The high-frequency spectrum is compressed

toward the start frequency by an amount determined by the compression ratio (CR). CR corresponds

very closely with bandwidth reduction (i.e., reduction in spectral resolution) on a normal-hearing

Equivalent Rectangular Bandwidth (ERBN) scale (Moore, 2003). When implementing any frequency-

lowering algorithm, the upper frequency limit of aided audibility (the “max output frequency”) is critical

because it helps inform about the frequency range that should be targeted for lowering and about

where it can be moved. Because the NFC start frequency and CR both influence how frequencies are

remapped, there are infinite ways the unaidable high-frequency spectrum can be repackaged into the

audible range of the listener.

This project examines the perceptual tradeoffs that occur when trying to optimize the choice of NFC

start frequency and CR to fit a moderately-severe to profound high-frequency hearing loss and a mild to

moderate high-frequency loss. On the one hand, lower start frequencies might be detrimental for

phonemes that rely heavily on formant frequency, especially vowels. On the other hand, lower start

frequencies could be beneficial because a) they allow a greater amount of high-frequency information

to be lowered if CR is fixed, or b) they allow for lower CRs (less reduction in spectral resolution) if the

input bandwidth (the “max input” frequency) is fixed. Similarly, it is uncertain whether CR should be

kept low to maintain spectral resolution or should be increased so that a greater amount of high-

frequency information can be lowered into the range of audibility and whether this effect depends on

start frequency (e.g., lower CRs might be best for low start frequencies, but less important for high start

frequencies where formant frequencies are less critical).

Supported by NIDCD grant 1RC1DC010601-01

Page 2: Nonlinear Frequency Compression: Balancing Start Frequency …web.ics.purdue.edu/~alexan14/Publications_files/... · 2013. 5. 28. · Group 1: Simulation of moderately-severe to profound

Alexander American Auditory Society Annual Meeting, March 8-10, 2012

2

Listeners

Group 1: Simulation of moderately-severe to profound high-frequency loss

14 (6 male, 8 female) listeners with sensorineural loss, ages 47-83 years (median = 70 years)

Average Thresholds

Freq. (Hz) 250 500 1000 2000 3000 4000 6000 8000

dB HL 17.1 22.5 28.9 37.5 47.1 55.7 68.2 68.6

Group 2: Mild to moderate high-frequency loss

13 (6 male, 7 female) listeners with sensorineural loss, ages 27-82 years (median = 62 years)

Average Thresholds

Freq. (Hz) 250 500 1000 2000 3000 4000 6000 8000

dB HL 22.3 24.2 28.5 40.4 44.6 48.8 52.3 50.4

Page 3: Nonlinear Frequency Compression: Balancing Start Frequency …web.ics.purdue.edu/~alexan14/Publications_files/... · 2013. 5. 28. · Group 1: Simulation of moderately-severe to profound

Alexander American Auditory Society Annual Meeting, March 8-10, 2012

3

Hearing Aid Simulator

Nonlinear Frequency Compression (NFC)

Only the part of the input spectrum above the start frequency was subjected to NFC. The upper limit of

the input band used for NFC (denoted as “max input” and “BW” for input bandwidth) varied by

condition. The compression ratio (“CR”) was precisely set so that the max input frequency for NFC was

lowered to a max output frequency of 3273 Hz (Group 1) or 4996 Hz (Group 2).

NFC was carried out in MATLAB using techniques described by Simpson et al. (2005). Short-time fast-

Fourier transform segments (5.8 ms) were used to compute the instantaneous frequencies of the input.

Input frequencies targeted for frequency remapping were synthesized at lower output frequencies using

phase-vocoding, with overlap-and-add (Allen, 1977) being used for signal reconstruction. The processed

signal was recombined with appropriate delay with the unprocessed signal (the input signal low-pass

filtered at the start frequency). The combined signal was then subjected to wide dynamic range

(amplitude) compression.

Wide Dynamic Range Compression

To control output levels, wide dynamic range compression was simulated in MATLAB. The amplified

speech was presented monaurally via circumaural BeyerDynamic DT150 headphones. Using a transfer

function obtained on KEMAR, listeners’ audiometric thresholds were converted to estimated dB SPL at

the tympanic membrane. These values were entered in the DSL m(I/O) v5.0a algorithm for adults which

generated individualized prescriptive values for compression threshold and compression ratio for each

channel as well as target values for the real-ear aided responses. Gain was automatically tuned to

targets using the ‘carrot passage’ from Audioscan®.

Stimuli were scaled to 60 dB SPL, band pass filtered into 8 channels, and then processed with wide

dynamic range compression. Center and crossover frequencies were based on the recommendations of

the DSL algorithm: 315, 500, 800, 1250, 2000, 3150, 5000, and 8000. Channels beyond the max output

frequency were not amplified.

Output compression limiting was used to keep the output from exceeding recommended broadband

output limiting targets (BOLT) or 105 dB SPL, whichever was less. Signals were summed across channels

and subjected to a final stage of output compression limiting to control the final presentation level and

prevent peak clipping.

Page 4: Nonlinear Frequency Compression: Balancing Start Frequency …web.ics.purdue.edu/~alexan14/Publications_files/... · 2013. 5. 28. · Group 1: Simulation of moderately-severe to profound

Alexander American Auditory Society Annual Meeting, March 8-10, 2012

4

Test Stimuli

Practice blocks using different talkers preceded each test block. Practice blocks had half the number of

talkers as the test blocks and included feedback about the correct response (test blocks did not).

• Consonants

– 240 nonsense syllables (vCv) presented in speech-shaped noise at 10 dB SNR

• 20 consonants x 3 vowel contexts (/a/, /i/, /u/) x 4 adult talkers (2 males, 2 females)

• Vowels

– 144 nonsense syllables (/hVd/) presented in speech-shaped noise at 5 dB SNR (Hillenbrand et

al., 1995)

• 12 vowels x 12 talkers (4 adult males, 4 adult females, 2 boys, 2 girls)

• Fricatives and Affricates

– 108 nonsense syllables (/iC/) presented in speech-shaped noise at 10 dB SNR

• 9 fricatives and affricates x 3 adult female talkers x 4 renditions

Conditions

There was 1 control condition with no NFC and 6 experimental conditions with NFC. All test conditions

were low-pass filtered at max output. A within-subjects, Latin Squares design was used. To help

orientate listeners to the tasks, all listeners had exposure to one session without any filtering or

processing (wideband) before beginning the randomized test sequence.

Moderately-severe to profound Mild to moderate

Results

For each of the figures below, proportion correct for each condition is assessed against performance for

the low-pass filtered controls (purple dotted line). Error bars indicate the 95% confidence interval of the

difference, after Bonferroni correction (* for p ≤ 0.05, ** for p ≤ 0.01, *** for p ≤ 0.001). “Best NFC

Setting” corresponds to the highest performance across the NFC conditions for each listener.

Page 5: Nonlinear Frequency Compression: Balancing Start Frequency …web.ics.purdue.edu/~alexan14/Publications_files/... · 2013. 5. 28. · Group 1: Simulation of moderately-severe to profound

Alexander American Auditory Society Annual Meeting, March 8-10, 2012

5

Moderately-Severe to Profound Mild to Moderate

Within-subjects ANOVA indicated that start and BW and the

interaction were statistically significant. The effect of BW

depended on the start. There was no effect of BW (CR) for

the 2239-Hz start, but for the 1550-Hz start, performance for

the two larger input BWs (also, higher CRs) were significantly

worse than for the 4996-Hz input BW.

Within-subjects ANOVA indicated that start and BW were

both statistically significant. Performance for the smaller BW

(lower CRs) was significantly better than the larger BW

(higher CRs). Performance for the lowest start was

significantly worse than for the other two.

Within-subjects ANOVA indicated that start and BW and the

interaction were statistically significant. The effect of BW

depended on the start. There was no effect of BW (CR) for

the 2239-Hz start, but for the 1550-Hz start, performance for

the 9130-Hz input BW was significantly worse than for the

7063-Hz input BW.

Within-subjects ANOVA indicated that only start was

statistically significant. Performance for the lowest start was

significantly worse than for the other two.

Within-subjects ANOVA indicated that were no statistically

significant main effects or interaction. Within-subjects ANOVA indicated that only start was

statistically significant. Performance for the 1550-Hz start

was significantly worse than for the 2756-Hz start.

Page 6: Nonlinear Frequency Compression: Balancing Start Frequency …web.ics.purdue.edu/~alexan14/Publications_files/... · 2013. 5. 28. · Group 1: Simulation of moderately-severe to profound

Alexander American Auditory Society Annual Meeting, March 8-10, 2012

6

Feature Analysis (VCV)

Plotted below are the differences in relative information transmitted for each condition and feature

compared to the low-pass filtered controls.

Moderately-Severe to Profound Mild to Moderate

/ʃ/ for /s/ Confusions

Moderately-Severe to Profound Mild to Moderate

A lower start substantially increased errors for place of

articulation and nasality, especially with larger BW (higher CRs). Interestingly, place of articulation and voicing were better for

the two higher starts despite no significant difference in

overall performance compared to the control. This indicates

that errors made with NFC were more systematic, while

errors for the control were more random.

Compared to the low-pass filtered control, /ʃ/ for /s/ errors

were substantially increased for all conditions, especially with

the lower start and the larger BWs (higher CRs).

/ʃ/ for /s/ errors are comparable to the low-passed filtered

control, except when the start is low and BW (CR) is large.

Page 7: Nonlinear Frequency Compression: Balancing Start Frequency …web.ics.purdue.edu/~alexan14/Publications_files/... · 2013. 5. 28. · Group 1: Simulation of moderately-severe to profound

Alexander American Auditory Society Annual Meeting, March 8-10, 2012

7

Personal Best NFC Settings

Plotted for each set of test stimuli are the number of listeners who had their best performance at each

of the NFC conditions. Listeners participated in an additional 1-hour task that involved /s/-/ʃ/

discrimination for each of the NFC conditions. The conditions that yielded best performance on this task

did NOT predict the conditions that yielded best performance in the main experiment.

Moderately-Severe to Profound Mild to Moderate

Discussion

Overall, the results demonstrate that improvements in fricative/affricate identification should be

expected when using NFC for a variety of hearing losses. However, in some cases this might come at the

expense of a decrease in vowel and non-fricative consonant identification. The results also indicate that

low start frequencies should be avoided and that in cases where the bandwidth of audibility is

restricted, it is better to tradeoff an increase in CR (which reduces spectral resolution of the lowered

signal) for a higher start frequency. When this happens, a slightly lower CR can be maintained by

bringing less high-frequency information down into the range of audibility. This strategy can help

preserve vowel and non-fricative consonant identification. However, if the reduction in high-frequency

information is too great, fricative/affricate identification might not be optimized. In cases where the

bandwidth of audibility is less restricted, attempts should be made to keep the start frequency above

the range of most second formants. If this is done and if a sufficient amount of high-frequency

information is brought down, CR seems to be less important. Finally, attempts to identify the best NFC

setting on an individual basis using a /s/-/ʃ/ discrimination task or rules that simply maximize input

bandwidth are limited. Recommendations also need to consider the start frequency, the CR, and the

interaction between the two.