Upload
ngothuan
View
221
Download
0
Embed Size (px)
Citation preview
NOISE ROBUST ALGORITHMS TO IMPROVE CELL PHONE SPEECHINTELLIGIBILITY FOR THE HEARING IMPAIRED
By
MEENA RAMANI
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2008
1
To Appapa, Ammama, Appa, Amma and Hari
I dedicate this dissertation to my incredible family who have been a constant source of
support and inspiration.
3
ACKNOWLEDGMENTS
I would like to thank my advisor Dr. John G. Harris for his encouragement, patience
and guidance. He taught me to ask the right questions and get to the root of the problem
and that is something I will always be grateful for. I also thank him for making the hybrid
group a home away from home for all of us.
I would like to thank Dr. Alice E. Holmes for meeting with me every week and
helping me understand the fascinating field of audiology. I also thank Dr. Holmes for
access to the Shands speech and hearing clinic where I met amazing people who further
strengthened my resolve to work on hearing loss compensation.
I would like to thank Dr. Hans van Oostrom and Dr. Clint Slatton for being part of
my committee and providing me with helpful insights. I would like to thank the Motorola
iDEN group for funding the research in Chapters 2 and 3.
Over the course of my inter-disciplinary research, I had the opportunity to work with
several audiology students who have helped me look at hearing loss from a non-engineering
perspective. I thank Sharon Powell, Ryan Baker, Shari Kwon and Brittany Sakowicz for
that. I also thank them for helping me run the subjective evaluation tests and for helping
me collect the hearing aid fitting data.
I feel extremely blessed to be part of the hybrid group where I get to interact with
brilliant people on a day to day basis. Apart from being extremely knowledgeable
researchers, they are also some of the nicest people I have met. I thank Kwansun,
Xiaoxiang, Jeremy, Ismail, Mark, Harsha, Du, Christy and the many others for making my
PhD life extra special.
Finally, I would like to thank my family for their unwavering faith in me.
4
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
CHAPTER
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1 Sensorineural Hearing Impairment . . . . . . . . . . . . . . . . . . . . . . . 141.1.1 Causes of Sensorineural Hearing Loss . . . . . . . . . . . . . . . . . 151.1.2 Perceptual Measure of Sensorineural Hearing Loss . . . . . . . . . . 151.1.3 Characteristics of Sensorineural Hearing Loss . . . . . . . . . . . . . 161.1.4 Modeling Sensorineural Hearing Loss . . . . . . . . . . . . . . . . . 18
1.2 Speech Intelligibility and Quality . . . . . . . . . . . . . . . . . . . . . . . 191.2.1 Factors Influencing Speech Intelligibility and Quality . . . . . . . . 191.2.2 Speech Intelligibility Measures . . . . . . . . . . . . . . . . . . . . . 201.2.3 Speech Quality Measures . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3 Cell Phone Speech Intelligibility . . . . . . . . . . . . . . . . . . . . . . . . 21
2 HEARING LOSS COMPENSATION ALGORITHMS . . . . . . . . . . . . . . . 31
2.1 Review of Existing Hearing Loss Compensation Algorithms . . . . . . . . . 312.1.1 Threshold-Only Gain Prescription Procedures . . . . . . . . . . . . 322.1.2 Suprathreshold Gain Prescription Procedures . . . . . . . . . . . . . 33
2.2 Development of Recruitment Based Compensation . . . . . . . . . . . . . . 342.3 Parameter Analysis of RBC . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.1 Dynamic Constants of Compression . . . . . . . . . . . . . . . . . . 362.3.2 Filter Bank Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 382.3.3 Real-Time Implementation Issues . . . . . . . . . . . . . . . . . . . 39
2.4 Performance Analysis of the RBC Algorithm . . . . . . . . . . . . . . . . . 392.4.1 Performance of Algorithm in Terms of Speech Quality . . . . . . . . 402.4.2 Performance of Algorithm in terms of Speech Intelligibility . . . . . 41
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3 NOISE ROBUST HEARING ENHANCEMENT ALGORITHMS . . . . . . . . 60
3.1 Effects of Noise on Cell Phone Speech . . . . . . . . . . . . . . . . . . . . . 603.2 Development of Noise Robust Recruitment Based Compensation . . . . . . 61
3.2.1 Single Microphone Noise Estimation . . . . . . . . . . . . . . . . . . 623.2.2 Calculating the Noise Masking Threshold . . . . . . . . . . . . . . . 623.2.3 Derivation of Noise Robust Recruitment Based Compensation . . . 63
5
3.3 Performance Analysis of the NR-RBC Algorithm . . . . . . . . . . . . . . 643.3.1 Performance of Algorithm in Terms of Speech Quality . . . . . . . . 643.3.2 Performance of Algorithm in terms of Speech Intelligibility . . . . . 65
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4 ACCLIMATIZATION MODELING FOR THE AIDED HEARING IMPAIRED 76
4.1 Development of the Fitting Satisfaction Scale . . . . . . . . . . . . . . . . 774.2 Hearing Aid Fitting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.1 Hearing Aid Fitting Data Collection . . . . . . . . . . . . . . . . . . 784.2.2 Multi-Session Hearing Aid Fitting Data Analysis . . . . . . . . . . . 78
4.3 Modeling the Acclimatization Effect . . . . . . . . . . . . . . . . . . . . . . 794.4 Performance Analysis of Model . . . . . . . . . . . . . . . . . . . . . . . . 804.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
APPENDIX
A SURVEY OF HEARING-IMPAIRED CELL PHONE USERS . . . . . . . . . . 102
A.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102A.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
A.2.1 Cell Phone Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102A.2.2 Electromagnetic Interference . . . . . . . . . . . . . . . . . . . . . . 103A.2.3 Cell Phone Speech and Ringer Level . . . . . . . . . . . . . . . . . . 103A.2.4 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . 103
B CELL PHONE HEARING EVALUATION QUESTIONNAIRE . . . . . . . . . 106
C ANALYSIS OF THE FOCUS GROUP DISCUSSIONS . . . . . . . . . . . . . . 110
C.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110C.2 Focus Group Main Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
C.2.1 Aided Cell Phone Listening Problems . . . . . . . . . . . . . . . . . 110C.2.2 Ideal Hearing Aid Compatible Cell Phone . . . . . . . . . . . . . . . 111C.2.3 Comments on a Cell Phone Assistive Listening Device . . . . . . . . 111
D PHYSIOLOGY OF HEARING . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6
LIST OF TABLES
Table page
1-1 Mean opinion score 5 point scale . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2-1 The kf constant for POGO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2-2 The kf constant for NAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3-1 Sources of cell phone noise and noise-reduction methods . . . . . . . . . . . . . 67
3-2 Critical bands and FFT bins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4-1 Speech intelligibility based fitting satisfaction scale . . . . . . . . . . . . . . . . 81
4-2 Phonak hearing aid fitting parameters . . . . . . . . . . . . . . . . . . . . . . . 82
7
LIST OF FIGURES
Figure page
1-1 Effects of aging on hearing thresholds . . . . . . . . . . . . . . . . . . . . . . . . 23
1-2 Matlab audiogram graphic user interface (GUI) . . . . . . . . . . . . . . . . . . 24
1-3 Decreased audibility characteristic of sensorineural hearing loss (SNHL) . . . . . 24
1-4 Decreased dynamic range characteristic of SNHL . . . . . . . . . . . . . . . . . 25
1-5 Decreased frequency resolution characteristic of SNHL . . . . . . . . . . . . . . 25
1-6 Decreased temporal resolution characteristic of SNHL . . . . . . . . . . . . . . . 26
1-7 Spectrograms of cell phone speech for normal-hearing and simulated SNHL . . . 27
1-8 Simulated SNHL model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1-9 Speech intelligibility (SI) as a function of bandwidth . . . . . . . . . . . . . . . 28
1-10 Hearing in noise test (HINT) Matlab GUI . . . . . . . . . . . . . . . . . . . . . 29
1-11 Speaker response for the Motorola i265 . . . . . . . . . . . . . . . . . . . . . . . 29
1-12 Mean opinion score (MOS) speech quality ratings for cell phone vocoders . . . . 30
1-13 Nature of cell phone hearing problems . . . . . . . . . . . . . . . . . . . . . . . 30
2-1 Classification of existing hearing aid fitting methods . . . . . . . . . . . . . . . . 43
2-2 Gains prescribed by the Fig6 method . . . . . . . . . . . . . . . . . . . . . . . . 44
2-3 Input-Output curve at 2 kHz obtained from the visual input output locator . . . 44
2-4 Variation of desired sensation level (DSL) prescribed gain with hearing loss . . . 45
2-5 Recruitment based compensation system . . . . . . . . . . . . . . . . . . . . . . 45
2-6 Computation of gain based on loudness recruitment . . . . . . . . . . . . . . . . 46
2-7 Estimated dependence of recruitment range on hearing loss . . . . . . . . . . . 46
2-8 Compression input-output and gain curves . . . . . . . . . . . . . . . . . . . . 47
2-9 Effect of variation of filter bank size on speech intelligibility . . . . . . . . . . . 48
2-10 Average MOS scores for the hearing-impaired . . . . . . . . . . . . . . . . . . . 49
2-11 Audiogram on the phone Java midlet . . . . . . . . . . . . . . . . . . . . . . . . 50
2-12 Audiograms of all the hearing-impaired listeners . . . . . . . . . . . . . . . . . . 50
8
2-13 Hearing loss simulation system . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2-14 The PESQ objective speech quality score for normal-hearing and hearing-impaired 51
2-15 Spectrogram of SNHL and linear-amplified speech . . . . . . . . . . . . . . . . . 52
2-16 Spectrogram of normal-hearing and linear-amplified speech . . . . . . . . . . . 53
2-17 Average MOS scores for the hearing-impaired . . . . . . . . . . . . . . . . . . . 54
2-18 Average MOS scores for the normal-hearing . . . . . . . . . . . . . . . . . . . . 55
2-19 Speech intelligibility index (SII) scores for normal-hearing as a function of SNR 56
2-20 Average HINT scores of the hearing-impaired for wide band speech . . . . . . . 57
2-21 Average HINT scores of the hearing-impaired for cell phone speech . . . . . . . 58
2-22 Average HINT scores of the normal-hearing for cell phone speech . . . . . . . . 59
3-1 Noise robust recruitment based compensation (NR-RBC) system . . . . . . . . . 68
3-2 The PESQ objective speech quality score for various HA fitting algorithms . . . 69
3-3 Spectrogram of SNHL and linear-amplified speech . . . . . . . . . . . . . . . . 70
3-4 Spectrogram of normal-hearing and linear-amplified speech . . . . . . . . . . . 71
3-5 Average NR-RBC MOS scores for the hearing-impaired listener . . . . . . . . . 72
3-6 Average NR-RBC MOS scores for the normal-hearing listener . . . . . . . . . . 73
3-7 The SII scores for simulated normal-hearing as a function of SNR . . . . . . . . 74
3-8 The HINT scores for NR-RBC for hearing-impaired . . . . . . . . . . . . . . . . 74
3-9 The HINT scores for NR-RBC for normal-hearing . . . . . . . . . . . . . . . . . 75
4-1 Comparison of Claro gain parameters from initial to final fitting . . . . . . . . . 83
4-2 Comparison of Savia gain parameters from initial to final fitting . . . . . . . . . 84
4-3 Comparison of Savia compression parameters from initial to final fitting . . . . . 85
4-4 Comparison of Savia compression parameters from initial to final fitting . . . . . 86
4-5 Phonak Savia maximum trend in fitting parameter variation . . . . . . . . . . . 87
4-6 Phonak Claro maximum trend in fitting parameter variation . . . . . . . . . . . 87
4-7 Phonak Extra maximum trend in fitting parameter variation . . . . . . . . . . . 88
4-8 Phonak Valeo maximum trend in fitting parameter variation . . . . . . . . . . . 89
9
4-9 Phonak Eleva maximum trend in fitting parameter variation . . . . . . . . . . . 90
4-10 Phonak Perseo maximum trend in fitting parameter variation . . . . . . . . . . 91
4-11 Structure of the MLP used to model multi-session fitting trends . . . . . . . . . 92
4-12 Phonak Savia neural network modeling results for 40dB gain parameter . . . . . 93
4-13 Phonak Savia neural network modeling results for 60dB gain parameter . . . . . 94
4-14 Phonak Savia neural network modeling results for 80dB gain parameter . . . . . 95
4-15 Phonak Savia neural network modeling results for CR parameter . . . . . . . . . 96
4-16 Phonak Savia neural network modeling results for TK parameter . . . . . . . . 97
4-17 Phonak Savia neural network modeling results for MPO parameter . . . . . . . 98
A-1 Degree of hearing impairment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
A-2 Degree of hearing impairment for survey participants . . . . . . . . . . . . . . . 105
D-1 Structure of the human ear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
D-2 Organ of corti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
D-3 Electron micrograph of the organ of corti . . . . . . . . . . . . . . . . . . . . . . 115
D-4 Frequency sensitivity of the basilar membrane . . . . . . . . . . . . . . . . . . . 116
10
Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy
NOISE ROBUST ALGORITHMS TO IMPROVE CELL PHONE SPEECHINTELLIGIBILITY FOR THE HEARING IMPAIRED
By
Meena Ramani
May 2008
Chair: John G. HarrisMajor: Electrical and Computer Engineering
Cell phone speech can lead to a difficult listening environment because of the
environmental noise, the reduced bandwidth, the packet drop offs and the vocoder
artifacts. This is especially true for hearing-impaired listeners who require a 9 dB
improvement in signal to noise ratio (SNR) compared to normal-hearing listeners in
order to understand speech in noise. This research explored various means to improve cell
phone speech intelligibility for the hearing-impaired and resulted in the development of
three novel hearing enhancement algorithms.
The first algorithm developed by us is the recruitment based compensation (RBC)
fitting method. RBC is a hearing enhancement algorithm aimed at improving speech
intelligibility (SI) for unaided listeners with sensorineural hearing loss. It is a fitting
algorithm which adjusts the gain parameters of the cell phone based on the individuals
threshold of hearing. It provides multiple band gain and compression to make cell phone
speech audible and within the reduced dynamic range of the hearing-impaired individual.
Subjective hearing in noise tests (HINT) run on hearing-impaired subjects reveal that
RBC shows a 15 dB improvement in SNR when compared to linear amplification which
is typical of the cell phone volume control. RBC also shows a 6 dB improvement in SNR
when compared to the desired sensation level (DSL) fitting method which is a popular
audiology option.
11
The second algorithm developed by us is the noise robust recruitment based
compensation (NR-RBC) algorithm. NR-RBC is derived from RBC but uses the masked
thresholds in noise instead of the thresholds in quiet. NR-RBC provides hearing loss
compensation and automatic volume control in noisy environments. The objective speech
intelligibility index (SII) scores indicate that NR-RBC has high speech usage when
compared to all the other fitting methods. Both RBC and NR-RBC received a speech
quality mean opinion score (MOS) of “Good.”
Though RBC and NR-RBC were designed with the hearing-impaired in mind the
algorithm proves beneficial to the normal-hearing person with slight modifications. This
resulted in a 3 dB improvement in SNR when compared to DSL using RBC, a 13 dB
improvement in SNR using NR-RBC and a speech quality rating of “Good.”
For the aided hearing-impaired population, the hearing aid fitting acclimatization
method was developed to improve speech intelligibility. Acclimatization occurs because
of the plasticity of the auditory cortex. Acclimatization modeling was carried out using
neural networks which were trained with multi-session Phonak hearing aid fitting data.
This method is to be used in conjunction with existing hearing loss fitting algorithms and
predicts the effect of hearing aid acclimatization. The mean square error (MSE) between
the predicted values and the optimal values averaged across the parameters is lower than
with the initial settings.
12
CHAPTER 1INTRODUCTION
The sense of hearing plays a pivotal role in human interaction and communication.
Acoustic pressure waves are transduced by the cochlea into electrical neural signals
which are processed by the brain to provide a meaningful cognitive experience. Hearing
impairment can reduce the ability to communicate successfully. The inability of being
able to understand what is being said can result in social and emotional isolation [1].
The telephone, one of the most important inventions of the 19th century, was the result
of Alexander Graham Bell’s work on communication devices for the hearing-impaired.
Telephones have now become an integral part of human communication and provide
easy means of long-distance communication. The invention of the wireless cell phone has
further lead to an ease in communication. Cell phones are the modern day Swiss Army
knives and are packed with a myriad of hardware and software functionalities. As of June
2007, there are 243 million [2] cell phone subscribers in the United States and this number
is growing.
In the United States alone there are 28 million [3] people who are hearing-impaired.
Yet less than 8% of them use hearing aids though they could obtain significant improvement
with them. This is mainly because of the high costs and the stigma attached to using
hearing aids. Studies have shown that hearing-impaired listeners require a 9 dB
improvement in signal to noise ratio (SNR) [4] when compared to normal-hearing
listeners in order to understand conversational speech. Hearing aids can help satisfy
this requirement to a certain extent. Unfortunately hearing aids and cell phones are not
completely compatible because of electromagnetic interference (EM) [5]. The amount of
interference depends on the amount of radio-frequency (RF) emission produced by the
particular cell phone and the immunity of the particular hearing aid. The IEEE C63.19
standard [6] provides a rating scale which serves as a measure of the compatibility between
cell phones and hearing aids. Consumers can look for this rating while purchasing a cell
13
phone or hearing aid. Appendix A has the results of a survey conducted at University
of Florida which indicates that in order to avoid the EM interference, most aided
hearing-impaired listeners prefer to remove their hearing aid in order to use the cell
phone.
Cell phone speech can sometimes be difficult to understand, because of the environmental
noise, the reduced signal bandwidth (300–3400 Hz), the packet dropoffs and the vocoder
artifacts. The environmental noise masks the speech while the reduced signal bandwidth
and vocoder artifacts result in a loss in naturalness and intelligibility [7]. Hearing-Impaired
listeners often find cell phones speech to be unintelligible. Modern hearing aids are
extremely low power digital signal processor (DSP) based systems and provide gain
and compression based on the individuals hearing loss through a process referred to as
hearing-aid fitting. In addition, hearing aid DSPs also run feedback cancellation and
noise reduction algorithms. In order to improve cell phone speech intelligibility for the
hearing-impaired, powerful hearing enhancement algorithms can be run on the cell phones.
This chapter will provide an introduction to sensorineural hearing-impairment and cell
phone speech intelligibility.
1.1 Sensorineural Hearing Impairment
Hearing impairment can be categorized both according to the type and the severity
of the loss. The loss can be conductive or sensorineural. In conductive loss, the acoustical
energy is attenuated uniformly by the outer and middle ears before reaching the cochlea.
The signal processing solution for conductive loss is linear amplification. Sensorineural
hearing loss occurs as a result of damage to the outer hair cells (OHC) and inner hair
cells (IHC) of the cochlea [8]. Because of the nonlinear nature of this loss, simple linear
amplification will not restore normal hearing. Hearing loss can be categorized based on
the severity as mild (25–40 dB HL), moderate (40–70 dB HL), severe (70–95 dB HL) and
profound (≥95 dB HL.) Hearing aids can help people with mild to severe hearing loss.
The hearing aid algorithms attempt to imitate the OHCs acting to replace the damaged or
14
dead OHCs [9]. Cochlear implants have to be used if there is significant IHC loss as is the
case with profound hearing loss. Appendix D provides a short description of how we hear
and describes the roles of the IHCs and the OHCs.
1.1.1 Causes of Sensorineural Hearing Loss
Hearing loss due to aging also known as presbycusis is the most common type of
sensorineural hearing loss. It is a predominantly high frequency loss. Figure 1-1 shows
the effects of aging on the thresholds of hearing. Presbycusis occurs due to wear and tear
of the hair cells of the cochlea. Losses up to 60 dB HL can be assumed to be caused by
damage to the OHCs. For losses greater than 80 dB HL, both the IHCs and OHCs have to
be damaged.
Sensorineural hearing loss caused due to exposure to loud sounds is called noise
induced hearing loss (NIHL) [10]. Sounds at high intensities fatigue the hair cells of the
cochlea and depending on the duration of exposure this may cause permanent damage.
Portable audio devices like iPods can produce sound levels which can cause irreversible
damage even when played for a couple of minutes [11]. Cell phones and bluetooth headsets
also produce sound levels which can cause considerable damage. Recently there has been
a lot of effort on the part of portable audio device manufacturers to educate the public on
safe listening practices. Safe listening levels for music and speech have been estimated [11]
using existing noise exposure standards [12], [13].
1.1.2 Perceptual Measure of Sensorineural Hearing Loss
Sensorineural hearing loss can be measured using several perceptual tests. The most
commonly used one is the audiogram [14]. The audiogram measures the threshold of
hearing in quiet. It is obtained by playing pure tones or narrow bands of noise, typically
between 250–8000 Hz, at various intensity levels till it is just audible.
The thresholds of hearing thus obtained are compared to the average normal hearing
thresholds and the difference is reported in dB HL (hearing level). People with ‘perfect’
hearing will have an audiogram of 0 dB HL. Normal hearing is defined as having all points
15
on an audiogram at or below 20 dB HL. Figure 1-2 shows an audiogram of a person with
a mild hearing loss measured using a Matlab GUI. On an average, a pure tone audiogram
takes 5 minutes to be measured.
1.1.3 Characteristics of Sensorineural Hearing Loss
Sensorineural hearing loss is characterized by four main effects: decreased audibility,
decreased dynamic range or loudness recruitment, decreased frequency resolution and
decreased temporal resolution [15], [8].
Decreased audibility. Sensorineural hearing loss results in the decreased audibility
of high frequencies. This is because the basal OHCs which are worn out first are the ones
closest to the oval window. Appendix D describes the mechanics behind how we hear and
how hearing loss occurs. Figure 1-3 shows the hearing thresholds for a hearing-impaired
and a normal-hearing listener and it can be seen that the hearing-impaired listener has
higher thresholds of hearing especially at high frequencies.
This decreased audibility results in low speech intelligibility because the consonants
and the second, third formants of speech will not be audible. Since the loudness of speech
is dominated by the low frequency components, the hearing-impaired listeners do not
realize that they are missing out on part of the signal [16]. Even though cell phone speech
is band limited to 300–3400 Hz, for 90% of hearing-impaired listeners the degree of hearing
loss worsens from 500 Hz–4 kHz [17] and this detrimentally affects the cell phone speech
intelligibility. The audiogram provides a direct measure of the decreased audibility and
is used in all hearing aid fitting algorithms. Decreased audibility can be compensated by
providing a frequency dependent gain.
Loudness recruitment. The uncomfortable listening level (UCL) is the level at
which the sound is painful to listen to. For conductive hearing loss, the threshold of
hearing and the UCL increase by the same amount. For sensorineural hearing loss only the
threshold of hearing increases. This implies that sound levels which are uncomfortable
for normal-hearing listeners are also uncomfortable for sensorineural hearing loss
16
listeners [18]. This results in a decreased dynamic range of speech and this phenomenon
is called loudness recruitment. Loudness recruitment is measured using loudness scaling
experiments. Figure 1-4 shows typical loudness growth curves measured using a six point
loudness scale for a normal-hearing and a hearing-impaired listener. Decreased dynamic
range can be compensated by providing compression.
Decreased frequency resolution. Decreased frequency resolution [19] refers to
the decrease in frequency sensitivity and frequency selectivity [20]. The OHCs increase
the sensitivity of the cochlea to the particular frequency that the portion of the basilar
membrane is tuned to. When the OHCs are damaged this sensitivity decreases. Frequency
resolution can be measured using psychoacoustic tuning curves. Psychoacoustic curves
are measured by playing an audible pure tone (probe) and varying the level of a narrow
band of noise (masker) till the tone is barely audible. Figure 1-5 shows the psychoacoustic
curves for a normal-hearing and hearing-impaired listener for a 4 kHz tone with a 40 dB
masker.
The tuning curve for the hearing-impaired listener is flat and broad (Figure 1-5) [21].
Because of this, the high energy, low frequency parts of speech will mask more of the
weaker high frequency components. This is known as upward spread of masking [22].
Most often environmental noise is low frequency and because of upward spread of masking
hearing-impaired listeners have a difficult time understanding speech in noise. Also, it has
been shown that at high intensity levels even normal-hearing listeners have poor frequency
resolution. This is because of saturation of the hair cells. Hearing-Impaired listeners
always listen to high intensity sounds. This further worsens their frequency resolution [23].
Decreased frequency resolution can be compensated for to a certain extent by using sharp
and narrow filter banks while processing the speech.
Decreased temporal resolution. Temporal resolution refers to the ability to
distinguish consecutively occurring sounds. Speech has a lot of temporal intensity
variations and often the intense sounds can mask the weak sounds which occur immediately
17
after it. This effect is more pronounced for those with hearing impairment [24]. While
listening to speech in an noisy environment, normal-hearing listeners extract most
information from speech when the noise is low in magnitude. But because of reduced
temporal resolution,these speech regions will be masked for the hearing impaired [25].
Temporal resolution is measured using psychoacoustic tuning curves. Figure 1-6 shows the
psychoacoustic curves for the normal-hearing and hearing-impaired for a 4 kHz probe tone
with a 40 dB masker. Decreased temporal resolution can be compensated by varying the
gain so as to get normal masking threshold.
1.1.4 Modeling Sensorineural Hearing Loss
Algorithms which simulate sensorineural hearing loss [26], [27], [28] help in the
development and testing of compensatory techniques. For our research, we used a model
based on both Moore [26] and Duchnowski [28]. The model simulates the decreased
audibility and the loudness recruitment aspects of hearing loss. Spectrograms of cell phone
speech at a normal conversational level for both normal-hearing and a typical mild to
severe SNHL hearing loss of [10 20 30 60 80 90] dB HL are shown in Figure 1-7.
The high frequency consonant information of speech is completely missing for the
hearing-impaired and this results in low speech intelligibility (Figure 1-7b). The high
energy low frequency part of speech is still present and makes the speech audible but
unintelligible. Figure 1-8 is the setup used to model the hearing loss.
The algorithm uses multiple filter bands and calculates the Hilbert transform for each
filter band output. The envelope of the bandlimited speech obtained from the Hilbert
transform is then raised and smoothed to obtain the effect of loudness recruitment. The
modified envelope is then multiplied with the fine structure within the original envelope,
to generate simulated lossy speech for that band. The outputs of all the filter bands are
finally summed together to get the simulated lossy speech. The Matlab simulation used
30 filter banks with center frequencies equally spaced in mel frequency between 100 Hz to
8000 Hz.
18
1.2 Speech Intelligibility and Quality
Speech intelligibility indicates the degree to which speech is understood by the
listener [29] and speech quality indicates whether the speech meets the expectations of the
listener. Subjective measures of evaluating speech intelligibility and quality are based on
scores obtained via listening experiments. Objective measures of intelligibility and quality
rely on signal-to-noise measurements and models of human speech perception.
1.2.1 Factors Influencing Speech Intelligibility and Quality
Bandwidth. The frequency response of speech, both the shape and bandwidth,
affects it’s intelligibility [30]. Measurements show that the intelligibility of speech
decreases with decreasing bandwidth. It is also important for the frequency response to
be reasonably flat throughout it’s range. For single words narrow band (NB) speech yields
an accuracy of only 75%, while wide band (WB) speech results in a 97% accuracy [31].
This loss of intelligibility increases when multiple-word speech sounds are used to test
intelligibility (Figure 1-9).
Masking. Noise is any unwanted signal that interferes with speech and a decrease in
the signal-to-noise ratio is the most common cause for a decrease in speech intelligibility.
Masking is the phenomenon where the perception of speech is affected by the presence
of noise [32]. Only noise which falls within the same critical bandwidth as speech can
contribute to the masking of speech. Environmental noise is predominantly low frequency
and is a strong masker which at high sound pressure levels can mask both the speech
vowels and consonants [33], [34].
Distortion. Speech distortion is an unfavorable byproduct of certain signal
processing techniques like coding, spectral subtraction, peak clipping and compression [35].
Independent multi-band operations change the temporal and spectral envelope of speech
and this detrimentally affects the speech cues resulting in low SI [36]. To avoid audible
artifacts, multi-band techniques are usually followed by some post-processing like envelope
smoothening.
19
1.2.2 Speech Intelligibility Measures
The most commonly used subjective measure of speech intelligibility is the hearing
in noise test (HINT). The speech intelligibility index (SII) is the most commonly used
objective measure of intelligibility.
Subjective: Hearing in noise test. The hearing in noise test is a standard test of
intelligibility commonly used in audiology [37]. Listeners are placed in a 65 dBA constant
noise environment and speech sentences at various signal levels are presented to them via
headphones. The listener then has to repeat what he heard. The intensity of the next
sentence is adaptively varied by ± 2 dB or ± 4 dB based on their response. It is stipulated
that after 10 sentences, the final sentence intensity level converges to a level at which the
listener recognizes 50% of the sentences correctly. This method of scoring intelligibility is
called the reception threshold for sentences (RTS). A Matlab GUI was used to automate
the test (Figure 1-10). The result of the HINT is a SNR value based on RTS. The lower
the SNR value, the higher the speech intelligibility.
Objective: Speech intelligibility index. The speech intelligibility index (SII)
is the ANSI S3.51997 standard for the objective measurement of speech understanding.
Like the articulation index, it varies in value from 0 (speech is inaudible) to 1 (speech
is audible and useful). The SII is not a direct measure of SI. But when the SII is used
with empirically derived transfer functions, it can be translated to a speech recognition
% correct score. SII and speech understanding have a monotonic relationship so higher
the SII value the higher the speech understanding. The SII is calculated as shown in
Equation 1–1.
SII =n∑
i=1
AiIi (1–1)
In this formula, n refers to the number of frequency bands used which can vary from 6
octave bands to 21 critical bands. Ii is the band importance function and Ai refers to the
band audibility, which ranges from 0 to 1 and indicates the proportion of speech cues that
20
are audible in a given frequency band. Details about Ii and Ai are available in the ANSI
SII standard [38].
1.2.3 Speech Quality Measures
The mean opinion score (MOS) is the standard listening test used to measure speech
quality. The perceptual evaluation of speech quality (PESQ) score is an ITU standard
which provides an objective measure of speech quality.
Subjective: Mean opinion score. The mean opinion score is a subjective listening
test where sentences are played to the listener at a comfortable listening level. The listener
then rates the quality of the sentences using a 5 point scale, as shown in Table 1-1. A
Matlab GUI was created to automate the MOS test.
Objective: Perceptual evaluation of speech quality. The perceptual evaluation
of speech quality is the ITU-T P.862 [39] recommended standard for the objective
measurement of the speech quality of narrow band systems. PESQ compares the original
signal to a modified version of the same and predicts the perceived quality that would be
given to the modified signal by subjects in a subjective listening test. The range of the
PESQ score is -0.5 (extremely low quality) to 4.5 (excellent quality).
1.3 Cell Phone Speech Intelligibility
The telephone bandwidth was restricted to 300–3400 Hz more than 60 years ago
because of the limitations of the transducers then available. Even though the present day
transducers operate on a wider frequency band, cell phone speech is still restricted to
the narrow 3 kHz bandwidth because of all the existing NB infrastructure. This reduced
cell phone bandwidth makes it difficult to distinguish between consonants like ‘f’ and ‘s’
because the distinguishing F2 information lies above 3 kHz. The elimination of frequencies
below 250 Hz results in a loss in naturalness and comfort [7]. Figure 1-11 shows the
frequency response of the Motorola i265 cell phone loudspeaker. It can be noted that
response is not flat across frequencies and this further results in a loss in intelligibility.
21
Overall, the frequency response of the cell phone, both the bandwidth and the shape,
results in speech with reduced quality and intelligibility.
In addition, cell phone speech also has vocoder artifacts. Basically for any vocoder,
the input speech is first divided into overlapping frames. A set of model parameters are
then estimated for each frame, quantized and then transmitted. At the receiver, the
decoder reconstructs the model parameters and uses them to generate a synthetic speech
signal. The advanced multi-band excitation (AMBE) vocoder is commonly used with
Motorola handsets [40]. Figure 1-12 shows the MOS speech quality ratings for the most
commonly used vocoders. Depending on the data rate, AMBE has an average MOS score
of 3.2 to 3.7.
Clarity and the EAR foundation conducted a research study among a random group
of 458 baby boomers between the age of 41-60 [41]. 53% of the baby boomers reported
having at least a ‘mild’ loss and over 57% of baby boomers had trouble hearing on their
cell phones. 40% of those who had problems using the cell phone said they would use the
cell phone more often if they could hear the conversations more clearly while using it.
Figure 1-13 lists the nature of the cell phone hearing problems.
In order to better understand the cell phone needs of the hearing-impaired, focus
groups and surveys on cell phone hearing were carried out at the University of Florida.
A ‘Cell phone hearing evaluation’ questionnaire, available from Appendix B, was created
and handed out to 84 patients at the Shands speech and hearing clinic in Gainesville.
Appendix A has the results from the questionnaire based survey and Appendix C discusses
the main themes observed at the two focus groups which were conducted. The results from
both these indicate the necessity of having algorithms run on the cell phone in order to
enhance the hearing and improve speech intelligibility.
22
Table 1-1: Mean opinion score 5 point scaleMOS Quality1 Bad2 Poor3 Fair4 Good5 Excellent
Figure 1-1. Effects of aging on hearing thresholds [42]
23
Figure 1-2. Matlab Audiogram GUI for a mild hearing loss
Figure 1-3. Thresholds of hearing for the normal-hearing and hearing-impaired
24
Figure 1-4. Loudness growth curve for the normal-hearing and hearing-impaired
Figure 1-5. Psychoacoustic tuning curve showing decreased frequency resolution
25
Figure 1-8. Simulated sensorineural hearing loss model
Figure 1-9. Speech intelligibility measured using articulation index as a function ofbandwidth [31]
28
Figure 1-10. Hearing in Noise Test Matlab GUI
Figure 1-11. Speaker response for the Motorola i265
29
Figure 1-12. MOS speech quality ratings for cell phone vocoders [40]
Figure 1-13. Nature of cell phone hearing problems [41]
30
CHAPTER 2HEARING LOSS COMPENSATION ALGORITHMS
Hearing aids have to be customized to each user’s unique hearing loss. This is
achieved by adjusting the gain and compression values of the hearing aid digital signal
processor (DSP) using a prescriptive algorithm. This process is referred to as hearing
aid fitting and the prescriptive algorithm used is called the hearing loss compensation
algorithm or the hearing aid fitting algorithm. As mentioned in Chapter 1 and in
Appendix D, mild to moderately severe sensorineural hearing loss (SNHL) is primarily
caused by damage to the outer hair cells of the cochlea. So in effect, the hearing loss
compensation algorithm has to imitate the outer hair cells (OHC) [9]. In order to run
hearing enhancement algorithms on the cell phone for the hearing-impaired, the DSP of
the phone has to be fit to the listener’s hearing loss. This chapter will provide a brief
review of the existing fitting algorithms and will detail the development of a new hearing
loss compensation algorithm for cell phone speech, the recruitment based compensation
(RBC) method. Speech processed by the new algorithm will be shown to have higher
intelligibility and quality than the existing methods.
2.1 Review of Existing Hearing Loss Compensation Algorithms
There are a number of existing hearing aid fitting algorithms [15] which vary in their
rationale behind gain prescription. Some algorithms prescribe gain so that the speech
is always at a most comfortable level (MCL) [43], others use loudness normalization
or loudness equalization [44] as the rationale. Loudness normalization is a means of
prescribing gain so as to make the loudness growth curve of the hearing-impaired the
same as that for normal-hearing. Loudness equalization is based on the principle of
equalizing the loudness information across frequencies. Intelligibility is assumed to be
maximized when all the bands of speech are perceived to have the same loudness [45].
Figure 2-1 shows the basic classification of the hearing aid gain fitting algorithms. All
these algorithms have been implemented in Matlab.
31
2.1.1 Threshold-Only Gain Prescription Procedures
The threshold-only algorithms are simple linear prescription algorithms. They provide
the same amount of gain for all input intensity levels based on the audiogram [46]. Just
mirroring the audiogram would result in an ineffective fitting since the output will reach
uncomfortable loud levels when the input signal is high in intensity. This is because of the
decreased dynamic range aspect of SNHL. Since threshold-only algorithms do not include
compression in the prescription, they should be followed by output limiting compression to
prevent the sounds from getting too loud.
Half-Gain rule. Lybarger in 1944 made the observation that while mirroring the
audiogram resulted in an uncomfortable fit, providing half the gain of the audiogram
resulted in speech being at the most comfortable level. The formula for fitting is given by
Equation 2–1.
IGf = 0.5 Hf (2–1)
Here IGf is the gain and Hf is the frequency dependent hearing loss,
Prescription of gain and output. The prescription of gain and output (POGO) [47]
is a 12
gain rule with an attenuation term at the low frequencies. This is done to decrease
the upward spread of masking. The formula for fitting is given by Equation 2–2.
IGf = 0.5 Hf + kf (2–2)
Here IGf is the Gain, Hf is the frequency dependent hearing loss, and kf is as shown in
Table 2-1. POGO can be used for hearing losses up to 80 dB HL.
National acoustic lab-revised. The national acoustic lab (NAL) [48] of Australia
published the national acoustic lab-revised (NAL-R) formula in 1983. It is the most
popular of the threshold-only based fitting methods. The aim of the NAL-R procedure is
to maximize listener intelligibility at the MCL by equalizing loudness. The NAL-R fitting
formula is given by Equation 2–3. Table 2-2 indicates how the constant kf varies with
32
frequency.
H3FA =H500 + H1k + H2k
3
X = 0.15 H3FA
IGf = X + 0.31 Hf + kf (2–3)
2.1.2 Suprathreshold Gain Prescription Procedures
Suprathreshold fitting methods are those which prescribe both gain and compression
using both the audiogram and the loudness growth curves. Unlike threshold-only based
methods, suprathreshold methods vary the gain based on the input intensity.
Fig6 fitting method. The Fig6 [49] procedure follows the loudness normalization
rationale for medium and high level input signals. Fig6 prescribes gains for three different
input intensity levels (40 dB SPL, 65 dB SPL and 95 dB SPL) based on the audiogram
and average loudness growth data. The three levels of speech represent the different levels
of conversational speech with 40 dB SPL representing soft speech, 65 dB SPL representing
conversational speech and 95 dB SPL loud speech.
Figure 2-2 shows the targets as prescribed by fig6. The 95 dB SPL curve provides
little gain for the low frequency sounds which are more intense than the high frequency
sounds even for conversational level speech. The 65 dB SPL and 40 dB SPL curves
provides more gain at the high frequencies. It can be seen that the amount of gain
decreases when the input level increases.
Independent hearing aid fitting forum method. The independent hearing aid
fitting forum (IHAFF) [50] technique is based on loudness normalization and uses loudness
scaling experiments instead of average loudness growth curves. The loudness scaling
procedure used is the contour test and involves playing pulsed warble tones in ascending
order from 5 dB till the subject indicates that the stimulus is at the MCL.
At each level the subject uses a 7 point rating scale to describe its loudness. The
seven loudness categories for warble tones are condensed to three categories for speech
33
shown as the shaded horizontal bars in Figure 2-3. The visual input output locator
(VIOLA) program then plots for each frequency an input-output curve with 2 compression
thresholds and 2 compression ratios. An example input-output graph is shown in
Figure 2-3. The diagonal line across the graph represents the 0 dB gain. The distance
between the IHAFF prescribed targets (the asterisks) and the diagonal line is the gain to
amplify soft, average and loud input speech for that frequency.
Desired sensation level fitting method. The desired sensation level (DSL) [51],
[52] aims at making speech comfortably loud and audible. The gain for different hearing
loss and frequency as used in the DSL 4.0 computer program is shown in Figure 2-4. The
compression ratio prescribed by DSL is larger than that required to normalize loudness
and is prescribed so as to fit the extended dynamic range from the normal-hearing
threshold to the hearing-impaired UCL into the reduced dynamic range of the hearing
impaired. DSL is the most popular suprathreshold fitting method.
2.2 Development of Recruitment Based Compensation
Hearing loss compensation algorithms have to provide gains which vary with
frequency and input levels [53]. For our novel method this is achieved by using filter banks
as shown in Figure 2-5. Here S(n) is the incoming cell phone speech signal which is to be
enhanced. Processing is carried out in the time domain using a frame-by-frame approach.
S(n) is fed to a filter bank which has 14 bands with center frequencies equally spaced in
mel frequency between 300-3400Hz. The gain computation block uses the energy per band
and the user’s hearing thresholds to prescribe a gain and compression term for each band
as per the RBC formula. The signals from each band are finally combined together to get
the enhanced speech signal Se(c). The RBC gain block should be followed by a output
limiting block which makes sure that the sounds never get painfully loud. The compression
ratio and threshold for this stage are fixed at: 10:1 and 110 dB SPL respectively. Loudness
normalization is a method of prescribing gains so as to make the loudness growth curve
for the hearing-impaired the same as that for the normal-hearing. Figure 2-6 shows the
34
loudness relationship for the normal-hearing and the hearing-impaired. The blue line
shows the relationship between the sound levels judged to be at equal loudness by a
normal listener.
Tn represents the normal threshold of audibility and serves as a reference for the
typical impaired loudness growth which is shown in the red solid line. At the impaired
threshold, Ti, the perceived loudness is assumed to be equal to that of a signal at the
normal threshold Tn. At Tc, the threshold of complete recruitment, the loudness for the
impaired and the normal listener becomes the same.
In 1959, Hallpike and Hood [54] showed that the range of recruitment, Tc − Ti, is
a fairly orderly function of the hearing loss, Ti − Tn and is independent of frequency for
unilateral hearing loss. Miskolczy-Fodor [55] further reported this behavior for presbycusis.
Both these relationships are as shown in Figure 2-7.
Let α be defined as the angle between the recruitment curve and the horizontal axis
(Figure 2-6). The relation between α and hearing loss can be described by Equation 2–4.
α = 47 + 0.45HL (2–4)
Here hearing loss HL = Ti − Tn. If R = Tc − Ti is used to represent the recruitment
range, the relation between the recruitment range and hearing loss can be described by
Equation 2–5
R =HL
tanα− 1(2–5)
In order to achieve loudness normalization, the algorithm amplifies the signal in
each channel such that the output level is related to the input level by the solid line.
As the level is increased, the gain decreases until at Tc the gain becomes one. From
the audiogram, we can compute α and R and hence the gain factor per channel can
be computed. This approach which is based on the frequency independent relationship
between recruitment and hearing loss is called the recruitment based compensation (RBC)
method.
35
The gain for each channel GdB(w) is calculated as indicated by Equation set 2–6.
GdB(w) = Pout(w)− Pin(w)
Pout(w) = m(w)Pin(w) + b(w)
m(w) =R(w)
R(w) + HL(w)
b(w) = (1−m(w))Tc(w)
Tc(w) = R(w) + HL(w) + Tn(w)
R(w) =HL(w)
tanα(w)− 1
α = 47 + 0.45HL(w) (2–6)
Here HL(w) is the hearing loss at the center frequency of each band which is obtained
by the linear-interpolation of the audiogram. The RBC algorithm includes compression as
part of the prescription and the compression ratio for each channel is given by CR(w) =
1m(w)
. Compression is restricted to be within 1.1 and 3 for the hearing-impaired in order to
prevent any artifacts
While the existing algorithms which are also based on the concept of loudness
normalization require the loudness growth curve for each frequency or the average loudness
growth curves, all the RBC method requires is the audiogram of the hearing-impaired
person.
2.3 Parameter Analysis of RBC
2.3.1 Dynamic Constants of Compression
Compression is used to reduce the dynamic range of speech so that it can fit within
the reduced dynamic range of the hearing-impaired listener [56]. Compression can be
carried out either in a single band on in multiple bands. In multi-band processing, each
band usually has different compression characteristics and the degree of compression either
increases or decreases with frequency. Typically 2 or 3 bands are used. Increasing the
number of compression bands beyond 3 can result in audible distortion. Fitting algorithms
36
should always have output limiting compression to make sure that the sounds never get
painfully loud.
The attack and release times are the dynamic constants of compression and specify
how quickly a compressor operates. If the effect of compression is instantaneous audible
artifacts are produced because of the sudden change in levels. ANSI S3.22 defines the
attack time as the time taken for the output to stabilize within 3 dB of its final level after
the input changes from 55 to 90 dB SPL. The release time is defined as the time taken for
the output to stabilize within 4 dB of its final level after the input falls from 90 to 55 dB
SPL. Experimentally an attack time of 6 ms and a release time of 20 ms was found to be
ideal. The implementation of the attack and release time constants compression is given
by Equations 2–7.
Gave(w) = βattackGi(w) + (1− βattack)Gave(w)
Gave(w) = βreleaseGi(w) + (1− βrelease)Gave(w) (2–7)
Here Gave(w) is the average smoothed gain per band, Gi(w) is the instantaneous gain per
band, βattack and βrelease are the attack and release constants as defined in ANSI S3.22.
Compression ratio is the inverse of the slope of the input-output curve. The
compression ratio usually varies from 1.1:1 to 3:1. Compression threshold is the SPL
above which compression kicks in. If loudness is to be normalized completely, compression
should kick in from the threshold of normal-hearing which is 0 dB SPL [57]. But useful
speech sounds rarely occur below 30 dB SPL. When the compression threshold is > 50 dB
SPL it is termed as high-threshold and when the compression threshold is < 50 dB SPL it
is termed as low-threshold. Wide dynamic range compression (WDRC) refers to systems
which have low threshold. Figure 2-8 shows a typical WDRC characteristics with output
limiting.
Till the compression threshold of 50 dB, the gain is linear. Compression is effective
from after the threshold till the threshold of complete recruitment Tc(80 dB in this
37
example). The gain after Tc is linear till it reaches the output limiting compressor’s
threshold.
2.3.2 Filter Bank Analysis
The Matlab implementation of the RBC algorithm used 14 filter banks with center
frequencies equally spaced in Mel frequency between 100 Hz to fs/2. The number of filter
bank was chosen after listening experiments proved that 14 was the optimal number for
maximum cell phone speech intelligibility for the hearing-impaired (Figure 2-9).
In order to use RBC for the normal-hearing population a filter bank size of 5
was found to be optimal as a result of subjective listening tests. This is because
normal-hearing listeners can hear the artifacts caused because of multi-band gain. Also,
since typical normal-hearing population have little or no loudness recruitment effects, the
compression parameters varied from 1.1 to 1.5.
Compensation for reduced temporal and frequency resolution. The current
algorithm overcomes the decreased audibility and the loudness recruitment aspects of
sensorineural hearing loss by providing frequency and level dependent gain. By including
14 filter bands in 300–3400 Hz bandwidth we increase the frequency resolution. None of
the existing fitting algorithms include processing to overcome the reduced resolution in
time and frequency.
Dead regions of the cochlea. The RBC algorithm limits the amount of gain
per band based on the frequency and on the threshold of hearing for that band. If the
band has a loss ≥ 80 dB HL then the gain for that band is set to zero. This is done
because speech intelligibility decreases when the listening levels are loud. Also, high
frequency bands with high thresholds are penalized more than low frequency bands with
high threshold. Severe loss such as ≥ 80 dB HL usually occur due to damage to the
IHCs and OHCs. An area of non-functioning IHCs is referred to as a ‘dead region’. The
threshold equalizing noise (TEN) test [58] can be used clinically to detect dead regions
of the cochlea. It is similar to measuring thresholds of hearing in noise and measures the
38
threshold for detecting a tone in a threshold-equalizing noise. The dead frequency regions
are extrapolated to the filter band frequencies. Any band which lies in a dead region has
it’s gain set to zero. The first nearest neighbor frequency bands are also provided a gain
lower than usual.
2.3.3 Real-Time Implementation Issues
Audiogram on the phone. A Java midlet [59] was created to measure the
thresholds of hearing using the cell phone. Since the cell phone is to be used as an
assistive listening device for cell phone conversations and not a hearing aid, calibration is
not a key issue. The midlet plays tones at different levels and the listener presses a key
to indicate having heard the sound. The Motorola Roker E2 has 7 volume steps and by
playing scaled tone wave files a volume range from 3–65 dB was achieved. Figure 2-11
shows a depiction of how the audiogram on the phone would look.
2.4 Performance Analysis of the RBC Algorithm
The performance of RBC was compared with linear amplification (LA), a simple high
pass filter (HPF) [60], the DSL method, the HG method, the POGO method and the
NAL-R method. The speech database unless otherwise mentioned is the standard HINT
database. Cell phone speech was obtained by bandlimiting the speech to 300–3400 Hz
and then passing it through an AMBE vocoder/decoder block to introduced the vocoder
effects.
Experimental Setup. The HINT and the MOS listening tests were run at the
speech and hearing clinic, at the Gainesville Shands hospital in a sound treated booth
using the Sennheiser HDA 200 head phones. 10 hearing-impaired patients with a pure
tone average (500 Hz, 1 kHz, 2 kHz) of 40-70 dB HL were recruited. 10 normal-hearing
were also recruited. Figure 2-12 shows the audiograms of all the hearing impaired subjects.
Output limiting compression was provided for all the algorithms with a compression ratio
of 10:1 and a compression threshold of 110 dB.
39
2.4.1 Performance of Algorithm in Terms of Speech Quality
Both the subjective mean opinion score (MOS) test and the objective perceptual
evaluation of subjective quality (PESQ) scores were used to measure speech quality.
PESQ speech quality measurement for hearing-impaired and normal
hearing. To evaluate the performance of the new algorithm using PESQ the setup
shown in Figure 2-13 was used. Typical mild to severe sensorineural hearing loss and
typical normal-hearing were simulated using the Matlab hearing loss simulation block.
The audiograms used were: [10 20 30 60 80 90] dB HL and [5 10 15 20 20 20] dB HL
respectively. The unprocessed cell phone speech was passed through the hearing loss block
to generate simulated loss speech. Speech preprocessed by the various fitting algorithms
were passed through the hearing loss block to generate compensated speech.
The objective PESQ scores were obtained using the original cell phone speech as the
reference signal and comparing it to both the simulated loss speech and the compensated
speech (Figure 2-14).
PESQ is sensitive to distortion due to compression. The typical mild to severe SNHL
modeled here would provide output levels at high frequencies which would turn on the
output limiting compression. This results in low PESQ scores. If we compare with all
the compression based systems RBC does the best followed by DSL and NAL-R. The
scores also reveal that RBC outdid linear amplification and the other fitting algorithms for
normal-hearing subjects with an average PESQ score greater than “4-Good.”
Spectrograms for hearing-impaired and normal-hearing. The spectrograms for
the typical mild to severe SNHL simulated speech, linear-amplified speech and the speech
compensated using the RBC method were obtained (Figure 2-15). The simulated hearing
loss block was used to generate the speech (Figure 2-13).
For the hearing-impaired, a lot of high frequency information is missing (Figure 2-15a).
Linear amplification does not help because of the reduced dynamic range aspect of the
40
hearing loss (Figure 2-15b). Compression results in more high frequencies and this helps
improve intelligibility (Figure 2-15c).
The spectrograms for the typical normal-hearing simulated speech, linearly amplified
speech and the speech compensated using the RBC method were also obtained (Figure 2-16).
There is more high frequency information because of frequency dependent gain and this
helps improve intelligibility.
These results show that for both the normal-hearing and the hearing-impaired RBC
has better speech quality and more useful frequencies than with just a linear gain which is
what the cell phones volume control does.
Subjective speech quality measurement for hearing-impaired and normal
hearing: MOS. The MOS test provides subjective rating of speech quality in the absence
of noise. For the unaided hearing-impaired listeners speech was played at 75 dBA. For
normal-hearing listeners speech was played at 65 dBA. The Matlab MOS GUI was used to
run this test. Figure 2-17 shows the average of the MOS scores for the hearing-impaired.
For the hearing-impaired, RBC has an average MOS score greater than “4-Good.”
Figure 2-18 shows the average of the MOS scores for the normal-hearing. For the
normal-hearing, RBC has an average MOS score greater than “4-Good.”
2.4.2 Performance of Algorithm in terms of Speech Intelligibility
Both the subjective HINT test and the objective SII scores were used to measure
speech intelligibility.
Objective speech intelligibility measurement for normal-hearing: SII. The
speech intelligibility index (SII) was measured using the simulated hearing loss model for
normal-hearing. Cell phone bandwidth speech both unprocessed and processed by the
various fitting methods were passed through the simulated hearing loss block. The SII
standard does not give a valid score for hearing-impaired speech. Figure 3-7 shows the
variation of SII with SNR from -30 to 30.
41
For the normal-hearing, RBC does marginally better than linear amplification for
all SNR. An SII of 0.5 does not mean that speech is understandable 50% of the time. It
means that about 50% of the speech cues are audible. For conversational speech an SII of
0.5 corresponds to about 100% intelligibility for normal-hearing listeners.
Subjective speech intelligibility measurement for hearing-impaired and
normal hearing: HINT. The Matlab HINT GUI was used in this test. The 10
hearing-impaired and normal-hearing subjects listened to both unprocessed HINT
sentences and sentences processed by the different algorithms at various signal levels
in the presence of a constant 65 dBA noise. Figure 2-20 shows the averaged SNR for the
10 hearing-impaired subjects, with reference to the baseline (linear gain) for wide band
speech. These scores show that RBC does the best followed by DSL and half-gain. NAL-R
and half-Gain. When compared to the linear gain technique RBC provides upto 15 dB
improvement in SNR. The difference between RBC and DSL for wideband speech is about
3 dB.
Figure 2-21 shows the averaged HINT results with narrow band cell phone speech
input. These scores indicate that RBC does the best followed by NAL-R and half-gain.
Half-Gain prescribes a higher gain than all the fitting methods being tested. For loud
input levels, this will lead to a decrease in intelligibility but in the HINT the level of
speech is reduced so the gain increment helps half-gain do better. The difference between
RBC and linear gain is 15 dB. The difference between RBC and DSL for cell phone speech
is about 6 dB.
Figure 2-22 shows the averaged HINT results with narrow band cell phone speech
input. These scores show that RBC does the best followed by NAL-R and HPF. The
difference between RBC and linear gain is 6 dB. The difference between RBC and DSL for
cell phone speech is about 3 dB.
42
2.5 Summary
This chapter introduced a new hearing enhancement algorithm called recruitment
based compensation. RBC is based on loudness normalization and is used to fit the
cell phone to the user’s hearing thresholds. The RBC stage is followed by an output
limiting compressor to prevent damaging loud sound outputs. The performance of RBC
was measured in terms of objective and subjective measures of speech intelligibility and
quality. RBC was found to show consistent good performance.
Table 2-1: The kf constant for POGOFreq 250 500 1000 2000 4000kf -10 -5 0 0 0
Table 2-2: The kf constant for NALFreq 250 500 1000 2000 3000 4000kf -17 -8 1 -1 -2 -2
Figure 2-1. Classification of existing hearing aid fitting methods
43
Figure 2-2. Gains prescribed by the Fig6 method
Figure 2-3. Input-Output curve at 2 kHz obtained from the visual input output locator
44
Figure 2-4. DSL prescribed gain for different hearing loss
Figure 2-5. Recruitment based compensation system
45
Tn Ti Tc
Tn
Tc
Input Intensity (dB)
Ou
tpu
t In
ten
sity
(d
B)
NormalHI
α
Gw
RHLw
Figure 2-6. Computation of gain based on loudness recruitment
20 40 60 80 100
0
10
20
30
40
50
Hearing loss, HL (dB)
Rec
ruit
men
t ra
ng
e, R
R =HL
tan(47+0.45HL)−1
Figure 2-7. Estimated dependence of recruitment range on hearing loss
46
None RBC:2 RBC:3 RBC:4 RBC:5 RBC:7 RBC:9 RBC:12 RBC:14 RBC:20 RBC:32−30
−25
−20
−15
−10
−5
0
5
10
15
20
Algorithm:No of filter bands
Ave
SN
R w
rt b
asel
ine(
dB)
a
250 500 1000 2000 4000 80000
10
20
30
40
50
60
70
80
90
100
Frequency (Hz)
HL
(d
B)
Left ear
Right ear
b
Figure 2-9. Subjective HINT results for hearing-impaired A) Average HINT scores withvarying filter bank size B) Average audiogram of the hearing-impaired listeners
48
None HPF RBC:14NALR PG HG NALRP DSL
1−Bad
2−poor
3−Fair
4−Good
5−Excellent
Algorithm
a
250 500 1000 2000 4000 80000
10
20
30
40
50
60
70
80
90
100
Frequency (Hz)
HL
(d
B)
b
Figure 2-10. Subjective MOS results A) Average MOS scores for the hearing-impaired B)Audiogram of the hearing-impaired listeners
49
Figure 2-11. Audiogram on the phone Java midlet
Figure 2-12. Audiograms of all the hearing-impaired listeners
50
Figure 2-13. Hearing loss simulation system
Figure 2-14. The PESQ objective speech quality score for normal-hearing andhearing-impaired
51
ab
c
Fig
ure
2-15
.Spec
trog
ram
ofhea
ring-
impai
red
for
A)
Typic
alm
ild
tose
vere
SN
HL
B)
Lin
ear-
Am
plified
spee
chC
)R
BC
amplified
spee
ch
52
ab
c
Fig
ure
2-16
.Spec
trog
ram
ofnor
mal
-hea
ring
for
A)
Typic
alnor
mal
-hea
ring
B)
Lin
ear-
Am
plified
spee
chC
)R
BC
amplified
spee
ch
53
None HPF RBC:14NALR PG HG NALRP DSL
1−Bad
2−poor
3−Fair
4−Good
5−Excellent
Algorithm
a
250 500 1000 2000 4000 80000
10
20
30
40
50
60
70
80
90
100
Frequency (Hz)
HL
(d
B)
b
Figure 2-17. Subjective MOS results A) Average MOS scores for the hearing-impaired B)Audiogram of the hearing-impaired listeners
54
None HPF RBC:14NALR PG HG NALRP DSL
1−Bad
2−poor
3−Fair
4−Good
5−Excellent
Algorithm
a
250 500 1000 2000 4000 8000−10
−5
0
5
10
15
20
25
30
Frequency (Hz)
HL
(dB
)
b
Figure 2-18. Subjective MOS results A) Average MOS scores for the normal-hearing B)Audiogram of the normal-hearing listeners
55
None HPF RBC NALR PG HG NALRPDSL−20
−15
−10
−5
0
5
Algorithm
Ave
. SN
R w
rt b
asel
ine
(dB
)
a
250 500 1000 2000 4000 800010
20
30
40
50
60
70
80
90
100
Frequency (Hz)
HL
(d
B)
b
Figure 2-20. Subjective HINT results with wide band speech for the hearing-impaired A)Average HINT scores B) Audiogram of the hearing-impaired listeners
57
None HPF RBC:14 NALR PG HG NALRP DSL−20
−15
−10
−5
0
5
Algorithm
Ave
. SN
R w
rt b
asel
ine(
dB)
a
250 500 1000 2000 4000 800010
20
30
40
50
60
70
80
90
100
Frequency (Hz)
HL
(d
B)
b
Figure 2-21. Subjective HINT results with cell phone speech for the hearing-impaired A)Average HINT scores B) Audiogram of the hearing-impaired listeners
58
None HPF RBC:14 NALR PG HG NALRP DSL−6
−5
−4
−3
−2
−1
0
1
2
Algorithm
Ave
. SN
R w
rt b
asel
ine(
dB)
a
250 500 1000 2000 4000 8000−5
0
5
10
15
Frequency (Hz)
HL
(dB
)
b
Figure 2-22. Subjective HINT results with cell phone speech for the normal-hearing A)Average HINT scores B) Audiogram of the normal-hearing listeners
59
CHAPTER 3NOISE ROBUST HEARING ENHANCEMENT ALGORITHMS
Environmental noise detrimentally affects the intelligibility of speech [61] and this
effect is more pronounced for people with hearing impairment. Speech is a highly
redundant signal. In a moderately noisy environment, a normal-hearing listener will
be able to understand what is being said even if some parts of the speech are masked
by noise by virtue of the redundant nature of speech. Hearing-Impaired listeners deal
with a less redundant speech signal because of the nature of their hearing loss [62]. This
implies that even if the environmental noise masks a smart portion of the remaining
speech, the intelligibility will be degraded significantly. The cochlea analyzes sound by
means of a group of highly overlapping narrow band filters. These filters are called the
critical bands and play an important role in noise masking. Only the noise which falls
within the same critical band as speech can mask the speech. But the same noise will
mask to a lesser extent, signals in higher frequency bands because of the highly overlapped
structure of the critical filter bank. This effect is called the upward spread of masking
and it increases with increase in noise intensity. This is also why low frequency sounds
are better speech maskers. For the hearing-impaired the critical bands will be more broad
and hence the upward spread of masking increases. This is why hearing-impaired listeners
require a 9 dB increase in SNR, when compared to normal-hearing listeners, in order
to understand speech in noise [25]. This chapter will discuss the development of a noise
robust recruitment based compensation (NR-RBC) algorithm.
3.1 Effects of Noise on Cell Phone Speech
Cell phone noise can be classified based on where it originates as the transmitter side
noise or the receiver side noise. The transmitter side environmental noise is often picked
up along with the speech and is transmitted as part of the outgoing signal. The channel
and vocoder produce some artifacts which are also transmitted. At the receiver end the
incoming signal is processed in order to remove or reduce the noise before it is played.
60
The receiver side environmental noise can mask the incoming cell phone speech. Table 3-1
provides a list of cell phone noise sources and suggests possible ways to reduce it.
3.2 Development of Noise Robust Recruitment Based Compensation
In order to reduce the effects of environmental noise masking at the listeners end,
the hearing enhancement algorithms have to be tuned to the noise. Hearing aid fitting
methods like DSL, NAL-R assume that a single frequency response is enough for speech
intelligibility under all listening conditions. But recent studies show that different
responses are desirable under different listening conditions. The factors that influence
the best setting are the noise spectrum and the noise level. In addition to the frequency
response, the best compression parameters also change with noise.
The RBC algorithm uses the audiogram in quiet information to prescribe the gains.
If masked thresholds of hearing are calculated then they can be used in the place of the
thresholds in quiet in the RBC estimation method. The algorithm will then vary the
gain and compression based on both the thresholds of hearing and the environmental
noise. This modified algorithm is called the noise robust recruitment based compensation
(NR-RBC) method.
Figure 3-1 shows the block diagram of the procedure. Here S(n) is the incoming
cell phone speech signal which is to be enhanced. Processing is carried out in the time
domain using a frame-by-frame approach. S(n) is fed to a filter bank which has 18 bands
with center frequencies and bandwidth as shown in Table 3-2. The environmental noise
is picked up by the cell phone’s calibrated microphone and is referred to as Y (n). The
microphone also picks up the user’s voice. In order to identify which frames contain noise,
Y (n) passes through a voice activity detection block. The noise frames are then fed to a
noise estimation block which provides an estimate of the noise for each octave-band. This
octave-band noise estimate is used to compute the noise masked thresholds. The gain
computation block uses the energy per band, the noise estimate and the user’s hearing
thresholds to prescribe a gain and compression term for each band as per the NR-RBC
61
formula. The signals from each band are finally combined together to get the enhanced
speech signal Se(c).
3.2.1 Single Microphone Noise Estimation
The cell phones microphone signal is referred to as Y (n) (Figure 3-1). During
pauses in the conversation Y (n) picks up the environmental noise. Using a voice activity
detection system the frames can be monitored for noise and speech. If a noise flag is set
then the noise power estimate is then updated. Using the single microphone system an
estimate of the environmental noise N(w) at the listeners end has to be calculated. This
will be done during pauses in the conversation. Techniques like Minima Controlled
Recursive Averaging (MCRA) [63] method and others [64] are available for robust
estimation of noise. We used a simple voice activity detector based on spectral distance.
3.2.2 Calculating the Noise Masking Threshold
The noise masking threshold is calculated for the incoming cell phone speech. If the
environmental noise lies below the noise masking threshold then the gain prescription
formula is the same as for RBC. The noise masking threshold can be obtained by modeling
the frequency selectivity and masking properties of the cochlea [65], [66]. As a first step
a critical band analysis has to be carried out in order to know which speech bands of
the incoming cell phone signal will be affected by the environmental noise. This can be
achieved by passing the noise through filter bank structure similar to the one used in
RBC 2-5. While this will lead to an accurate analysis, it will be computationally inefficient
to implement on the phone. A way to work around this is to group together the FFT bins
based on the critical band center frequencies and bandwidth. Table 3-2 lists the critical
band number, center frequency, bandwidth and the FFT bin details for a bin size of 256
and a sampling frequency of 8000 Hz.
Critical band analysis is carried out on the power spectrum of the signal over the
FFT bins which correspond to each critical band. 18 critical bands cover the cell phone
frequency range up to 4000 Hz. Since the critical bands are highly overlapped structures
62
the critical band power signal has to be convolved with a spreading function in order to
estimate the effects of masking across critical bands. The spreading function proposed by
Schroeder [67] is given by Equation 3–1
10 log10 Ti = 15.81 + 7.5(i + 0.474)− 17.5(1 + (i + 0.474)2)1/2 (3–1)
where, i is the critical band number and Ti is the spreading function.
The next step involves the calculation of the noise masking threshold, given by
Equation ??, which includes an offset term Oi which is specified in Table ??
Ti = 10log 10Ci−(Oi/10) (3–2)
The noise spread threshold has to be converted back to the Bark or critical band domain.
This is done by renormalization. The bark thresholds are compared to the individuals
absolute thresholds of hearing HL(w) also in the bark scale. The noise masking threshold
T (w) for any critical band which has a noise threshold lower than the absolute threshold is
changed to the absolute threshold.
3.2.3 Derivation of Noise Robust Recruitment Based Compensation
If the environmental noise is below noise masking thresholds for the speech, then the
usual RBC formula is used. If the noise is greater than the threshold of hearing then the
new gain can be calculated as follow:
The gain for each channel GdB(w) is calculated by Equation set 3–3.
HLn(w) = T (w) +(N(w)− T (w)
2, when T (w) < N(w) (3–3)
GdB(w) = Pout(w)− Pin(w)
Pout(w) = m(w)Pin(w) + b(w)
m(w) =R(w)
R(w) + HLn(w)b(w) = (1−m(w))Tc(w)
Tc(w) = R(w) + HLn(w) + Tn(w)
R(w) =HLn(w)
tanα(w)− 1
63
α = 47 + 0.45HLn(w)
3.3 Performance Analysis of the NR-RBC Algorithm
The performance of NR-RBC was compared with RBC and the other existing
algorithms in terms of subjective and objective measures of speech intelligibility and
quality.
Experimental setup. The HINT and the MOS listening tests were run at the
Shands speech and hearing clinic in a sound treated booth using a modified headset. The
modified device was shaped to look like a cell phone. 10 hearing-impaired patients with a
pure tone average (500 Hz, 1 kHz, 2 kHz) of 40–70 dB HL and 10 normal-hearing listeners
were recruited.
3.3.1 Performance of Algorithm in Terms of Speech Quality
Both the subjective mean opinion score (MOS) test and the objective perceptual
evaluation of subjective quality (PESQ) scores were used to measure speech quality.
PESQ speech quality measurement for hearing-impaired and normal
hearing. To evaluate the performance of the new algorithm using PESQ the setup
shown in Figure 2-13 was used. A typical mild to severe sensorineural hearing loss and
a typical normal-hearing was simulated using the Matlab hearing loss simulation block.
The audiograms used were: [10 20 30 60 80 90] dB HL and [5 10 15 20 20 20] dB HL.
The unprocessed cell phone speech was passed through the hearing loss block to generate
speech with simulated loss. Then speech preprocessed by the various fitting algorithms
were passed through the hearing loss block to generate compensated speech.
Among the HA fitting algorithms which include compression, NR-RBC has the
maximum PESQ score. For the normal-hearing NR-RBC had a slightly lower PESQ score
this could be due artifacts. In order to understand this the spectrograms of simulated
normal-hearing speech were calculated.
64
Spectrograms for hearing-impaired and normal hearing. Figure 3-3 shows
the spectrograms for the hearing-impaired speech, linearly amplified speech and the
speech compensated using the NR-RBC method. These results show that for the
hearing-impaired NR-RBC has better speech quality and intelligibility than with just
a linear gain which is what the cell phones volume control does. Figure 3-4 shows
the spectrograms for the normal-hearing speech, linearly amplified speech and the
speech compensated using the NR-RBC method. These results also show that for the
normal-hearing NR-RBC has better speech quality and intelligibility than with just a
linear gain.
Subjective speech quality measurement: MOS. The MOS test provides
subjective rating of speech quality in the absence of noise. For the unaided hearing-impaired
listeners speech was played at 75 dBA. For the normal hearing listeners speech was played
at 65 dBA. The room where the MOS test was conducted had a noise floor of in a room
with a noise floor of 46 dBA. The Matlab MOS GUI was used to run this test.
Figure 3-5 shows the average of the MOS scores for the hearing-impaired. For the
hearing-impaired, NR-RBC has an average MOS score greater than ‘4-Good’. Figure 3-6
shows the average of the MOS scores for the normal-hearing. For the normal-hearing,
NR-RBC has an average MOS score greater than ‘4-Good’.
3.3.2 Performance of Algorithm in terms of Speech Intelligibility
Both the subjective hearing in noise test (HINT) and the objective speech intelligibility
index (SII) scores were used to measure speech intelligibility.
Objective speech intelligibility measurement: SII. The SII was measured
using the simulated hearing loss model for normal-hearing. Cell phone bandwidth speech
both unprocessed and processed by the various fitting methods were passed through the
simulated hearing loss block. The sentences were then normalized and passed through
the SII measurement block while varying the SNR. It can be seen that NR-RBC has the
highest SII score at all SNRs.
65
Subjective speech intelligibility measurement: HINT. The Matlab HINT GUI
was used in this test. From the HINT tests in the Chapter 2 we know that RBC and DSL
show the best performance. The performance of NR-RBC was compared to RBC and DSL
with varying filter sizes.
Figure 3-8 shows the averaged SNR for the 10 hearing-impaired subjects, with
reference to the baseline (linear gain). It is clear that NR-RBC outperforms RBC and
DSL and has best performance at N=14 (Figure 3-8). The difference between NR-RBC
and the best DSL is about 20 dB.
Figure 3-9 shows the averaged SNR for the 10 normal-hearing subjects, with reference
to the baseline (linear gain). It is clear that NR-RBC outperforms RBC and DSL and
has best performance at N=8.The difference between NR-RBC and the best DSL is about
13 dB.
3.4 Summary
Environmental noise degrades the speech intelligibility for both normal-hearing and
hearing-impaired listeners though the degree of degradation is more for the hearing-impaired.
The recruitment based compensation algorithm was modified to include the noise term by
introducing the noise masked threshold. This leads to the noise robust recruitment based
compensation method (NR-RBC). This creates a fitting system which varies the gains
based on both the environment and the hearing thresholds. The noise robust recruitment
based compensation method was found to have good performance in terms of speech
intelligibility and quality for both the hearing-impaired and the normal hearing.
66
Table 3-1: Sources of cell phone noise and noise-reduction methodsCell phone noise Ways to reduce noiseReceiver side environment noise To prevent it from being transmitted:
Beamforming and AGCTo help you hear better: RBC algorithm(Automatically adjust gain)To help you hear better:Occlude contra-lateral ear
Transmitted environment noise To reduce its effects:Spectral subtraction
Vocoder and channel noise To reduce its effects:Spectral subtraction
Table 3-2: Critical bands and FFT binsCritical band details FFT details Noise masking
details N = 256, fs = 8kHz thresholdCritical band Center freq Bandwidth FFT critical band Offset term
number (Hz) (Hz) range (Hz) (dB)1 50 80 0 94 -172 150 100 94 187 -183 250 100 187 312 -194 350 100 312 406 -205 450 110 406 500 -216 570 120 500 625 -227 700 140 625 781 -238 840 150 781 906 -249 1000 160 906 1094 -2510 1170 190 1094 1281 -2511 1370 210 1281 1469 -2512 1600 240 1469 1719 -2513 1850 280 1719 2000 -2514 2150 320 2000 2312 -2515 2500 380 2312 2687 -2416 2900 450 2687 3125 -2317 3400 550 3125 3687 -2218 4000 700 3687 4000 -19
67
ab
c
Fig
ure
3-3.
Spec
trog
ram
innoi
seof
hea
ring-
impai
red
for
A)
Typic
alm
ild
tose
vere
SN
HL
B)
Lin
ear-
Am
plified
spee
chC
)N
R-R
BC
amplified
spee
ch
70
ab
c
Fig
ure
3-4.
Spec
trog
ram
innoi
seof
nor
mal
-hea
ring
for
A)
Typic
alsi
mula
ted
nor
mal
-hea
ring
B)
Lin
ear-
Am
plified
spee
chC
)N
R-R
BC
amplified
spee
ch
71
None RBC:14 NR−RBC:14 DSL
1−Bad
2−poor
3−Fair
4−Good
5−Excellent
Algorithm
a
250 500 1000 2000 4000 80000
10
20
30
40
50
60
70
80
90
100
Frequency (Hz)
HL
(d
B)
b
Figure 3-5. Subjective MOS results A) Average MOS scores for the hearing-impaired B)Audiogram of the hearing-impaired listeners
72
None RBC:14 NR−RBC:14 DSL
1−Bad
2−poor
3−Fair
4−Good
5−Excellent
Algorithm
a
250 500 1000 2000 4000 8000−10
−5
0
5
10
15
20
25
30
Frequency (Hz)
HL
(dB
)
b
Figure 3-6. Subjective MOS results A) Average MOS scores for the normal-hearing B)Audiogram of the normal-hearing listeners
73
Figure 3-7. The SII scores for simulated normal-hearing as a function of SNR
Figure 3-8. The HINT scores for RBC (A1), NR-RBC (A2) and DSL (A3) with variationof filter size for hearing-impaired
74
Figure 3-9. The HINT scores for RBC (A1), NR-RBC (A2) and DSL (A3) with variationof filter size for normal-hearing
75
CHAPTER 4ACCLIMATIZATION MODELING FOR THE AIDED HEARING IMPAIRED
The auditory cortex undergoes physiological and anatomical changes over a period
of time when presented with altered auditory signal inputs. In the case of a person with
mild to severe SNHL, the altered auditory signal will have little or no high frequencies.
Moore [68] provided a review of studies showing evidence of plasticity in the auditory
system of the adult brain. Because of this brain plasticity, it takes time for the aided
hearing-impaired listeners to fully use the high-frequency information that they were
previously not used to hearing. This is known as the acclimatization effect. The time
period for acclimatization is defined as the period between when the hearing loss was
noticed and when the hearing aid was fitted.
Acclimatization is more pronounced for new hearing aid users and affects the hearing
aid fitting procedure. For a first time hearing aid user, the audiologist will first measure
the amount of loss, discuss the various hearing aid options (styles, binaural or monoaural)
and then choose a make and model of a hearing-aid. Ear mold measurements of the
patient are then made. The hearing aid will arrive after 2-3 weeks and the patient will be
‘fit’ with the hearing aid. Fitting is the procedure by which the hearing aid parameters
are tuned for the patient’s hearing loss. Usually, each hearing aid is accompanied by a
CD with the company’s proprietary fitting software which also allows selecting certain
established fitting procedures like DSL and NAL. As long as the initial fitting parameters
do not cause any discomfort, they will not be modified during the first visit. During the
follow up visits, the audiologist will fine-tune the parameters based on verbal feedback
from the patient. The verbal feedback is descriptive and indicates how certain sounds are
now being perceived with the hearing-aid. The patient is not asked to rate the sounds on
any scale. The fine-tuning process is repeated over multiple visits until the hearing aid
user is satisfied with a particular fitting. The follow up visits are usually apart by a couple
of weeks.
76
It is hypothesized that initially patients choose the amplification characteristics which
gives them the greatest gain at frequencies where they have the least loss because they
are used to hearing sounds at those frequencies. After a month, they then prefer high
frequency emphasis. It is also possible for patients to get acclimatized to a particular
hearing aid fitting. In such cases, the initial fitting parameters should be the optimum
ones. A compromise is to provide patients with a response that slowly varies over time
from the response they prefer to the response that is best for them. This will enable them
to gradually get used to a new response without subjecting them to a sound quality that
they are not happy with.
4.1 Development of the Fitting Satisfaction Scale
In the analysis of multi-session fitting data the final fitting parameters are considered
to be optimum since they provide the best sound experience for the patients. But this
definition of optimality is ambiguous since it can mean either best sound quality or best
speech intelligibility. This is because while manually adjusting the HA fitting parameters
audiologists rely on verbal patient feedback which is ambiguous. A better and more
structured approach would be to use a fitting satisfaction scale at each session to evaluate
the fitting.
The three main psychoacoustic phenomena associated with SNHL, elevated
threshold, loudness recruitment, and frequency blurring, lower the speech intelligibility
for hearing-impaired listeners by degrading the speech cues. Our previous research in
this area [69] has shown that SI based fitting methodologies show better performance in
noise and other real world scenarios. Hence it is better to use a rating scale, as shown
in Table 4-1, based on SI where 1 is low intelligibility and 5 is high intelligibility. It is
assumed that the listeners are not trained listeners, that the speech stimuli are sentences
and that the listeners have no prior cues about the speech stimuli.
77
4.2 Hearing Aid Fitting Data
4.2.1 Hearing Aid Fitting Data Collection
Multi-session fitting data was collected from patients using Phonak hearing aids (HA).
The Phonak HAs targeted were the Savia (30 patients), Claro (20 patients), Extra (7
patients), Valeo (4 patients), Perseo (3 patients) and Eleva (2 patients). All the patients
were fit binaurally. The left and right ear HAs were viewed as two separate inputs since
the audiograms for both the ears were different. The fitting software used with the
Claro used the desired sensation level (DSL) fitting procedure while the rest used the
national acoustic lab nonlinear (NAL-N1) procedure. Each HA users had 3-8 follow-up
sessions which were separated by a maximum of 3 months. The HA users used their
perception of speech to judge improvements between sessions and did not provide any
rating. Table 4-2 shows some of the fitting parameters provided by the listed Phonak HAs.
All HAs allow frequency based fine-tuning of the gain parameters and the maximum power
output (MPO) values at one or several input signal levels. Some hearing aids also allow
modification of the compression parameters (CR, TK, TK knee) across frequency.
4.2.2 Multi-Session Hearing Aid Fitting Data Analysis
Since most of the Phonal HAs use either NAL-NL1 or DSL[i/o], the fitting data of
one of each type was compared. National acoustic lab nonlinear (NAL-NL1) prescribes
gain which is similar to the NAL-R procedure but it also includes compression unlike
the NAL-R. NAL-NL1 prescribes less gain at the low frequencies compared to the other
suprathreshold methods and this is more evident at low input levels. DSL[i/o] and RBC
prescribe gain at lows in order to normalize loudness.
Figure 4-1 shows the variation of the 50 dB and 80 dB gains for the Claro HA, as
prescribed by DSL, between the first and last fitting. For the Claro, more people tend to
prefer having lesser gain at the final-fitting than at the initial-fitting. This goes against
our hypothesis that patients will slowly increase the HF gain. This could be explained by
78
the fact that DSL[i/o] has been known to over prescribe gain at 500Hz,2kHz and 4kHz at
both input levels [15].
Figure 4-2 shows the variation of the 40 dB and 80 dB gains for the Savia HA, as
prescribed by NAL-NL1, between the first and last fitting. For the Savia more people
prefer having higher gain at the final-fitting than at the initial-fitting. This could be
explained by the acclimatization process.
Figures 4-3 and 4-4 show the variation of the compression parameters for the Claro
and Savia HA. It can be seen that more people tend to not change TK for both Claro
and Savia. For Claro HAs, the change in compression ratio does not follow any conclusive
trend while for Savia the compression ratio seems to increase at the final fitting. This
could be explained by the fact that NAL-NL1 prescribes lesser CR at all frequencies
compared to the other suprathreshold methods [15].
In order to study the variation of the all the parameters for each HA, the trend in
change for each parameter was first averaged across the frequency. The maximum trend
across this average was then picked up and plotted for each HA in Figures 4-5 to 4-9.
For the Claro which was fit by DSL, from Figure 4-6it appears that the gain
parameters decrease at the final fitting while the TK, MPO parameters remain the
same and the CR decrease at the final fitting.
For the NAL-NL1 based HAs it appears that the frequency averaged gain parameters
increase at the final fitting. For Savia (Figure 4-5), Extra (Figure 4-7), and Valeo
(Figure 4-8) the frequency averaged compression parameters remain the same across
the fitting stages. For Eleva (Figure 4-9) and Perseo (Figure 4-10) the frequency averaged
compression parameters decrease across the fitting stages.
4.3 Modeling the Acclimatization Effect
A neural network (NN) can be used to model the variations from the initial fitting
to the final fitting (best response). The use of NNs in the area of hearing aids is not a
new thought. NNs have been used with HAs in the task of noise reduction [70] and to
79
help select the HA model based on the patient information [71]. J.M. Kates [72] studied
the feasibility of using NNs to derive HA prescriptive procedures. Kates concluded that
the factors which affected the accuracy of a NN based fitting method were the training
database size, the variability of the patient’s responses and the variability of the hearing
loss. Gao [73] proposed a new hearing prosthetics similar to the one proposed by Kates
which was based on NNs and fuzzy logic.
Rather than using NNs to replace existing fitting algorithms, we propose to use the
NNs along with the existing fitting algorithms to model the acclimatization effect. The
NN was trained on the multi-session fitting data using supervised learning. The input was
the initial fitting data and the desired was the final fitting data. In addition to the fitting
data, some HA user specific parameters such as patient’s age, degree of hearing loss (HL)
and the number of years with HL were used as inputs during training.
Figure 4-11 shows the structure of the NN used in the acclimatization modeling
setup. The NN used to model acclimatization was the multi-layer perceptron [74]. It had 7
sigmoidal input nodes and 7 linear output nodes. The number of hidden nodes was varied
until the lowest error was obtained and was found to be 4. Training was carried out using
the Levenberg-Marquardt [75] method with an initial global learning rate (LR) of 0.01.
Cross validation was used to stop the training in order to prevent over-training. The NN
was trained on the Savia data which had the most number of data points.
4.4 Performance Analysis of Model
The NN was trained using validation to prevent over-training. There were 60
multi-session data vectors for the Savia. The multi-session data was divided into 40
training sets, 10 validation sets and 10 test sets. 10 iterations with random initial weights,
training, testing and validation tests were run. A separate NN was used to model each
parameter The MSE error between the predicted and the target was used as our figure of
merit and comparison.
80
Figures 4-12to 4-17 show the results of training for the Savia HA. In the figures, the
red curve shows the MSE between the initial fitting values and the desired or optimal
values. From the figures, it can be observed that the gains predicted by the NN are
closer to the optimal values than by just using the initial values. There exists some error
between the predicted and the optimal values especially at the low frequencies and this
might be resolved by increasing the training database.
From the figures, it can be seen that the neural network succeeds in modeling the
trend with a certain amount of error. The error can be brought down by increasing the
training database. The MSE of the optimum setting is always less than that with the
initial setting.
4.5 Summary
Acclimatization occurs due to the plasticity of the auditory cortex. Fitting the HA
with the close to optimal solution at the first visit will both reduce the number of follow
up fitting sessions and result in a more optimum fit since the brain will get readjusted to
hearing the right frequencies from the first session. A neural network was trained to model
the variation of the change in parameters across fitting sessions for the Phonak Savia. The
results show a low test on new data MSE for all the parameters. A fitting satisfaction
scale based on SI was also proposed which will further help with the acclimatization
modeling by providing a statistic label to each session data.
Table 4-1: Speech intelligibility based fitting satisfaction scaleSpeech intelligibility ScoreSpeech is never intelligible 1
Speech is rarely intelligible 2
Speech is sometimes intelligible 3
Speech is usually intelligible 4
Speech is always intelligible 5
81
Tab
le4-
2:P
hon
akhea
ring
aid
fitt
ing
par
amet
ers
Hea
ring
Aid
sFit
ting
Input
Fre
quen
cy(k
Hz)
No
ofN
oof
par
amet
ers
leve
l(dB
)par
amet
ers
Pat
ients
Sav
iaG
ain
40
[0.3
0.88
1.7
2.4
3.5
7]
630
Gai
n60
[0.3
0.88
1.7
2.4
3.5
7]
630
Gai
n80
[0.3
0.88
1.7
2.4
3.5
7]
630
CR
-[0.3
0.88
1.7
2.4
3.5
7]
630
TK
-[0.3
0.88
1.7
2.4
3.5
7]
630
MP
O-
[0.3
0.88
1.7
2.4
3.5
7]
630
Cla
roG
ain
50
[0.25
0.5
0.75
1.2
23
5]
720
Gai
n80
[0.25
0.5
0.75
1.2
23
5]
720
CR
-[0.25
0.75
1.2
25]
520
TK
Knee
-[0.50
3]
220
TK
--
120
MP
O-
-1
20
Extr
aG
ain
40
[0.3
0.88
1.7
2.4
3.5
7]
67
Gai
n60
[0.3
0.88
1.7
2.4
3.5
7]
67
Gai
n80
[0.3
0.88
1.7
2.4
3.5
7]
67
CR
-[0.3
0.88
1.7
2.4
3.5
7]
67
TK
-[0.3
0.88
1.7
2.4
3.5
7]
67
MP
O-
[0.3
0.88
1.7
2.4
3.5
7]
67
Val
eoG
ain
50
[0.25
0.5
1.2
23
5]
64
MP
O-
-1
4
Per
seo
Gai
n50
[0.4
0.75
1.2
2.2
35]
63
Gai
n80
[0.4
0.75
1.2
2.2
35]
63
TK
-[
0.4
0.75
1.2
2.2
35]
63
MP
O-
[0.4
0.75
1.2
2.2
35]
63
82
a
b
Figure 4-1. Comparison of change from initial to final stage for Claro parameter A) 50 dBGain B) 80 dB Gain
83
a
b
Figure 4-2. Comparison of change from initial to final stage for Savia parameter A) 50 dBGain B) 80 dB Gain
84
a
b
Figure 4-3. Comparison of change from initial to final stage for Claro parameter A) TKand B) CR
85
a
b
Figure 4-4. Comparison of change from initial to final stage for Savia parameter A) TKand B) CR
86
L40:6 L60:6 L80:6 R40:6 R60:6 R80:6 RTk:6 LTk:6 RMpo:6 LMpo:6 RCr:6 LCr:60
2
4
6
8
10
12
14
16
18
20
Fitting Parameter:No of Frequencies
No
of
Pat
ien
ts
Final<InitFinal=InitFinal>Init
Figure 4-5. Phonak Savia maximum trend in fitting parameter variation averaged acrossfrequencies
L50:7 L80:7 R50:7 R80:7 RTk:2 LTk:2 RMpo:1 LMpo:1 RCr:5 LCr:5 RTKTh:1LTKTh:10
2
4
6
8
10
12
14
16
Fitting Parameter:No of Frequencies
No
of
Pat
ien
ts
Final<InitFinal=InitFinal>Init
Figure 4-6. Phonak Claro maximum trend in fitting parameter variation averaged acrossfrequencies
87
L40:6 L60:6 L80:6 R40:6 R60:6 R80:6 RTk:6 LTk:6 RMpo:6LMpo:6 RCr:6 LCr:60
1
2
3
4
5
6
Fitting Parameter:No of Frequencies
No
of
Pat
ien
ts
Final<InitFinal=InitFinal>Init
Figure 4-7. Phonak Extra maximum trend in fitting parameter variation averaged acrossfrequencies
88
L50:6 R50:6 RMpo:1 LMpo:10
0.5
1
1.5
2
2.5
3
3.5
4
Fitting Parameter:No of Frequencies
No
of
Pat
ien
ts
Final<InitFinal=InitFinal>Init
Figure 4-8. Phonak Valeo maximum trend in fitting parameter variation averaged acrossfrequencies
89
L40:6 L60:6 L80:6 R40:6 R60:6 R80:6 RTk:6 LTk:6 RMpo:6 LMpo:6 RCr:6 LCr:60
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Fitting Parameter:No of Frequencies
No
of
Pat
ien
ts
Final<InitFinal=InitFinal>Init
Figure 4-9. Phonak Eleva maximum trend in fitting parameter variation averaged acrossfrequencies
90
L50:6 L80:6 R50:6 R80:6 RTk:6 LTk:6 RMpo:1 LMpo:10
0.5
1
1.5
2
2.5
3
Fitting Parameter:No of Frequencies
No
of
Pat
ien
ts
Final<InitFinal=InitFinal>Init
Figure 4-10. Phonak Perseo maximum trend in fitting parameter variation averaged acrossfrequencies
91
a
b
Figure 4-12. Phonak Savia neural network modeling results for 40dB gain A) test on traindata and B)test on new data
93
a
b
Figure 4-13. Phonak Savia neural network modeling results for 60dB gain A) test on traindata and B)test on new data
94
a
b
Figure 4-14. Phonak Savia neural network modeling results for 80dB gain A) test on traindata and B)test on new data
95
a
b
Figure 4-15. Phonak Savia neural network modeling results for CR parameter A) test ontrain data and B)test on new data
96
a
b
Figure 4-16. Phonak Savia neural network modeling results for TK parameter A) test ontrain data and B)test on new data
97
a
b
Figure 4-17. Phonak Savia neural network modeling results for MPO parameter A) test ontrain data and B)test on new data
98
CHAPTER 5CONCLUSIONS
The US census bureau states that 50 million people, nearly one-fifth of the US
population, are in some way disabled. Among this 28 million people are hearing-impaired.
With suitable technological assistance these men and women (notably the aging baby
boomers) may prolong their independence and reduce their need for specialized care.
Their quality of life will be improved. While no product can be designed so that every
single person in the world can use it, the intent is to maximize the potential of each
device. This dissertation proposes using the cell phone as an assistive listening device.
This will enable the 20 million hearing-impaired people who do not use hearing aids
understand cell phone speech better. This will also help normal-hearing listeners especially
in noise situations.
Sensorineural hearing loss (SNHL) is mostly caused by damage to the outer hair
cells (OHC). So in particular, the hearing loss compensation algorithm has to replace
the damaged outer hair cells. A novel algorithm based on the frequency independent
relationship between hearing loss and recruitment was developed. This recruitment based
compensation (RBC) algorithm prescribes both the gains and compression parameters.
RBC shows a 15 dB improvement in speech intelligibility when compared to the baseline
algorithm (linear gain) for the hearing-impaired. For the normal hearing the SNR
difference between RBC and linear gain is 6 dB. By providing frequency dependent
gain and compression the decreased audibility and decreased dynamic range aspects of
SNHL are overcome. This is carried out by processing the speech signal in 14 filter banks.
The filter banks have their centers equally spaced in mel frequency and is so designed
keeping the auditory processing of the OHCs in mind.
The OHCs lose their abilities to increase the sensitivity of the cochlea for frequencies
to which the affected part of the cochlea is tuned. Psychoacoustically, this shows
up as flatter tuning curves. Because of this, noise has a greater masking effect for
99
hearing-impaired people. The noise robust recruitment based compensation (NR-RBC)
algorithm was developed to improve the performance of RBC in noise. NR-RBC uses a
noise estimate to calculate the noise masked threshold for speech which is used in the
gain prescription. NR-RBC outperforms RBC and DSL in terms of speech intelligibility.
The difference between NR-RBC and DSL for the hearing-impaired is 20 dB and for the
normal-hearing is 13 dB. Both RBC and NR-RBC have a MOS speech quality rating of
‘Good’.
The auditory cortex undergoes physiological and anatomical changes in the
presence of altered auditory input. Because of this brain plasticity, it takes some time
for the hearing-impaired to learn to fully use the high-frequency information that
they were previously not used to hearing. This is known as the acclimatization effect.
Multiple-session data for a number of Phonak hearing aids was collected and analyzed.
Neural network were used to model the acclimatization effect in hearing aid fitting. Low
mean square error (MSE) for test on new data was obtained.
Contribution summary. Three novel hearing enhancement algorithms were
developed as part of this multi-disciplinary research which was the result of a proposal
written by us. A cell phone hearing evaluation questionnaire was created to understand
the needs of the hearing-impaired. Focus group meetings were conducted and video
testimonials were obtained to further narrow down on the main problems faced by the
hearing-impaired. Higher speech intelligibility was found to be a main requirement.
Three novel algorithms all based on the rationale of maximizing speech intelligibility
were created. RBC is aimed towards helping both normal-hearing and unaided hearing-impaired
listeners understand cell phone speech better. NR-RBC is a noise robust technique which
enhances the hearing in noise. Acclimatization modeling is proposed to improve the
quality of the initial fit for the aided hearing-impaired listener. The hearing enhancement
algorithms were tested extensively in terms of objective and subjective measures of speech
quality (SQ) and intelligibility (SI) on both normal-hearing and hearing-impaired subjects
100
and were shown to have good performance in terms of both quality and intelligibility. A
FFT based filter bank approach to implement both RBC and NR-RBC was also proposed
which leads to an easy realtime implementation. Matlab code was written for the hearing
loss simulation, for the audiogram GUI, for the three new algorithms, for all the popular
hearing aid fitting methods mentioned in this dissertation, for the automated subjective SI
and SQ tests.
101
APPENDIX ASURVEY OF HEARING-IMPAIRED CELL PHONE USERS
This chapter discusses the results of the cell phone hearing evaluation survey. 84
hearing-impaired participants answered 20 hearing aid and cell phone related questions
and provided other relevant demographic information. The questionnaire used in the
survey is available in Appendix B.
A.1 Participants
84 subjects with hearing loss were recruited from four different University of Florida
clinics during patient visits and voluntarily completed the surveys in the fall of 2004
and spring of 2005.. All participants were previous or current cell phone users. The
participants’ hearing impairment ranged from mild to profound and the distribution is
shown in Figure A-1. Hearing aid usage experience varied among the subjects. Subjects
ranged from 23 to 89 years of age with a mean age of 66.32 years old. Fifty-two males and
32 females participated in the survey.
A.2 Results
Review of the completed surveys showed some questions were left unanswered by
some participants. Therefore, the survey data posted below indicates the number of
participants who responded to each question.
A.2.1 Cell Phone Usage
Of the 84 participants, 62 were experienced hearing aid users and the remaining 22
had not tried hearing aids. Five of the 62 hearing aid users were fit unilaterally and the
other 57 were fit binaurally. Only 4 of the 62 hearing aid users wore completely in canals
(CICs) and the remaining hearing aid styles were almost evenly split between behind the
ears (BTEs) and in the ears (ITEs). Sixty-one cell phone users indicated their estimated
time spent on their cell phone and the results showed that 8 individuals use their cell
phone just for emergencies, 7 reported over 400 minutes of use each month, while the
remaining 45 reported less than 400 minutes on the cell phone each month. The average
102
age of respondents who reported frequent cell phone usage was 61.6 years while those who
used their phones less frequently was 70.0 years, and finally those who reported they do
not use their cell phone frequently was 75.6 years. Sixty respondents identified the style
of cell phone they use, and 55% indicated that their cell phone was a flip phone while the
other 45% used a candy bar style phone. Two cell phone users used neckloops.
A.2.2 Electromagnetic Interference
14 of 35 hearing aid users indicated that their cell phone created EM interference
in their hearing aid and 8 of these 14 reported that the noise was so severe that it had
prevented them from using their cell phone. Two participants thought that the noise
might have occurred and were not sure about it, while two others experienced interference
from the cell phone backlight. Analysis of variance indicated that 78% of respondents who
reported noise in their hearing aid used candy bar style cell phones and the other 22%
were flip phone users. The two individuals who used neckloops expressed concern that the
neckloop would not be compatible with other phones. Other individuals commented that
they were not interested in buying any additional equipment to make their hearing aids
compatible with their cell phone.
A.2.3 Cell Phone Speech and Ringer Level
One in three of the hearing-impaired cell phone users reported trouble hearing their
cell phone ring. 31.8% of cell phone users who did not use hearing aids had trouble
hearing the ringer. 31 of 54 cell phone users desired louder cell phone speech levels and
1/3rd of those desiring more volume were non-hearing aid users. The need for higher cell
phone output level was also shared by persons with hearing loss ranging from mild to
profound. Figure A-2 indicates the self-reported understanding of cell phone speech in
quiet and in noise for both aided and unaided hearing-impaired participants.
A.2.4 Summary and Conclusions
The results of the survey indicate that a large percentage of the hearing-impaired
have difficulty using the cell phone effectively because of obstacles such as electromagnetic
103
(EM) interference and insufficient cell phone signal and ringer volume. ANSI standard
C63.19 [6] indicates how hearing aid and cell phone manufacturers should measure the
EM interference. The measurements are translated into M-ratings in which the higher
ratings indicate a lower likelihood of interference. Handsets that receive a hearing aid
compatibility rating of M3 or higher have met or surpassed FCC requirement. The FCC
has required that cell phone companies have 50% of their handsets meet a minimum ANSI
rating of M3 or better by February 18, 2008.
Figure A-1. Degree of hearing impairment among survey participants
104
APPENDIX BCELL PHONE HEARING EVALUATION QUESTIONNAIRE
1. Is your hearing loss:
(a) Mild- A little difficulty hearing speech
(b) Moderate- More difficulty hearing speech
(c) Severe- A lot of difficulty hearing speech
(d) Profound- So bad that hearing aids may not help
2. Do you use Hearing aids?
(a) Yes
(b) No
3. Your Left Ear Hearing aid is:
(a) None
(b) ITE- In The Ear
(c) BTE-Behind the Ear
(d) CIC-Completely In Canal
(e) Other
4. Your Right Ear Hearing aid is:
(a) None
(b) ITE- In The Ear
(c) BTE-Behind the Ear
(d) CIC-Completely In Canal
(e) Other
5. What is the make and model of your Hearing aid?
6. Check whichever is true:
(a) I can understand speech over the telephone with my hearing aid
106
(b) I can understand speech over the telephone in noisy environments with my
hearing aid
(c) I can understand speech over the telephone without my hearing aid
7. Does your Hearing aid have a telecoil?
(A feature available on many hearing aids is the telecoil or t-switch or t-coil which
aids in hearing telephone conversations.)
(a) Yes
(b) No
8. If your Hearing aid has a telecoil, do you use your telecoil with your cell phone?
(a) Yes
(b) No
(c) I don’t use cell phones
9. Do you frequently use cell phones?
(a) Yes
(b) No
10. Check whichever is true: What has been your general experience with cell phones?
(a) I can understand speech on the cell phone
(b) I can understand speech on the cell phones in noisy environments
(c) I have trouble hearing my cell phone ring
(d) I cannot understand speech on the cell phones because:
If you don’t use cell phones regularly jump to question 17 else continue
11. What is the make and model of the cell phone you use?
(Example Make= Motorola Model=v300)
12. Is it a Flip phone?
(a) Yes
107
(b) No
13. Do you use a neckloop for cell phone conversations?
(A neckloop is a necklace-size loop of wire worn around the neck of someone who has
a hearing aid with a telecoil.)
(a) Yes
(b) No
14. Which Cell phone Network provider do you use?
(a) Cingular
(b) Tmobile
(c) Verizon
(d) Sprint
(e) Other
15. How many minutes per month do you talk on the cell phone?
(a) Just for emergencies
(b) <200
(c) 200-400
(d) 400+
(e) Don’t know
16. Check whichever is true:
(a) I wish my cell phone would be louder
(b) My cell phone create noise in my Hearing aid
(c) This noise prevents me from using my cell phone
(d) A person talking near me on a cell phone produces noise in my Hearing aid
17. Does your cell phone backlight cause noise in your Hearing aid?
(a) Yes
(b) No
(c) Not noticed
108
18. Would you be interested in a combination Hearing aid and cell phone?
(a) Yes
(b) No
19. Is there enough information available online to help you choose the cell phone right
for you?
(a) Yes
(b) No
20. What information regarding cell phones and hearing aids would you like to see
available?
109
APPENDIX CANALYSIS OF THE FOCUS GROUP DISCUSSIONS
In order to better understand the needs of the hearing-impaired population two
focus groups with hearing-impaired participants were conducted. This chapter provides a
synopsis of the main themes which were observed at the two focus group.
C.1 Participants
The two focus groups were held on 06/18/2004 and 09/24/2004 at the University
of Florida speech and hearing clinic at Shands hospital in Gainesville. Focus group one
lasted approximately two hours and was attended by 3 hearing-impaired subjects. All of
them were hearing-aids users and and had used cell phones. Focus group two was attended
by 10 hearing-impaired subjects. All of them owned hearing aids and had used cell
phones.The hearing-impaired participants were informed that they were at the meeting to
give their opinions, answer questions, ask questions, nominate topics and generate ideas.
The sessions were audio taped and later transcribed.
C.2 Focus Group Main Themes
C.2.1 Aided Cell Phone Listening Problems
The placement of the microphones in the behind the ear (BTE) hearing aids makes
it tricky to couple the cell phone loudspeaker output to the hearing aid. There was the
worry that, in trying to find the ‘sweet spot’ for the BTE hearing aid (HA), the phone
would be placed at such an odd angle that the person on the other end would not be able
to hear you speak. Some people find it easier to remove their HAs to use the cell phone
but do not like doing so. The HA volume had to be at maximum to hear the cell phone
conversation and this caused feedback for the in the ear (ITE) hearing aids. Louder cell
phone output was requested by all. Some participants reported electromagnetic (EM)
interference between the cell phone and the HA. There was also trouble hearing the phone
ring.
110
C.2.2 Ideal Hearing Aid Compatible Cell Phone
The ideal cell phone would be one which would not have any feedback or placement
problems. It would have volume control which would allow for louder levels and which
would offer frequency based adjustments so as to match their unique losses. The phone
would be a flip-phone style which was found to have less EM interference with the HA.
It would have a hands-free option where it could be directly coupled to the hearing-aid.
Control over the ring tone volume level and frequency would also be provided.
C.2.3 Comments on a Cell Phone Assistive Listening Device
There was the thought that one could wear one normal hearing aid on one ear and a
handsfree cell phone assistive listening device (ALD) on the other ear.The idea of taking
a hearing test on the cell phone was favored by all. There was some worry about whether
the phone would be too loud and the option of fitting the phone at the audiology clinic
was mentioned. There was the thought that fitting a cell phone to meet a hearing loss and
also using the HA might provide the increase in volume required to hear speech especially
for those with high losses. A hands-free solution, similar to a bluetooth headset, was
appreciated. In the absence of a hands free solution, there was worry over taking out the
HA in order to use the phone.
111
APPENDIX DPHYSIOLOGY OF HEARING
The human ear, the organ of hearing and balance, is the best example of an
engineering masterpiece. It enables us to hear sounds ranging from 20 Hz-20 kHz with
a dynamic range of 0-130 dB. Anatomically, the human ear can be divided into three
parts: the outer ear (pinna, auditory canal), the middle ear (ossicles, eardrum, oval
window) and the inner ear (cochlea, semicircular canals). Figure D-1, shows the internal
structure of the ear.
The human pinna is symmetric, points forward and has a curved structure. It focuses
sound pressure waves into the auditory canal. The structure of the pinna aids in sound
localization. Horizontal localization is made possible because of inter-aural time and
intensity differences while vertical localization is made possible because of the frequency
shaping of the sound by the curves of the pinna. The auditory canal which is around
2.7 cm in length acts as a 14
wave closed tube resonator and boosts the 2-5 kHz region by
15 dB. The broad resonance peak is because the closed end of the auditory canal is the
pliant ear drum or tympanic membrane.
The middle ear consists of the eardrum, the ossicles (malleus, incus and stapes) and
the oval window. The ossicles translate the sound pressure wave to vibrations in the
cochlea. They provide impedance matching since the acoustic impedance of the fluid
in the cochlea is about 4000 times that of air. The ossicles provide amplification by
lever action (3x) and by terms of area amplification (15x). The ossicles also help block
very loud low frequency sounds by means of the stapedius reflex. The stapes transmits
vibrations to the oval window on the outside of the cochlea. This moves the fluid in the
cochlea which forms a traveling wave, with a peak at one location along the length of the
cochlea. Conductive hearing loss occurs when sound is not conducted efficiently to the
cochlea through the ossicles. The cochlea is intact for conductive hearing loss.
112
The cochlea is the body’s microphone. It converts the mechanical movement into
electrical action potentials which are then carried to the brain through the auditory nerve.
The cochlea is a snail-shell like structure and contains three fluid filled canals. One of
them the organ of corti has a lining called the basilar membrane (BM). Hair cells are
arranged in four rows along the entire length of the cochlear coil (Figure D-2). Three rows
consist of outer hair cells (OHCs) and one row consists of inner hair cells (IHCs). Each
hair cell has hundreds of tiny stereocilia. The stereocilia of the OHCs are embedded in
the tectorial membrane. The traveling wave bends the IHC’s stereocilia and this produces
action potentials.
The afferent IHCs transmit signals to the brain via the auditory nerve. The efferent
OHCs receive neural input from the brain which influences its motility as part of the
cochlea’s mechanical pre-amplifier. The OHCs help the IHCs sense soft sounds by
sharpening the peak of the traveling wave. Based on feedback from the brain, the OHCs
mechanically shrink pulling down the tectorial membrane. Once the IHCs cilia brushes
against the tectorial membrane, action potentials are generated. In addition to amplifying
soft sounds, OHCs also sharpen the peak of the traveling wave resulting in high frequency
resolution. Sensorineural hearing loss occurs because of damage to the hair cells. Hearing
loss due to aging or presbycusis is a type of sensorineural hearing loss and occurs due to
wear and tear of the hair cells. A hearing loss up to 60 dB HL can be considered to be
because of OHCs and anything higher than 80 dB HL is because of both IHC and OHC
damage. Figure D-3, shows the hair cells for a person with normal-hearing and a person
with severe hearing loss. Damage to the OHCs indicates a lower frequency resolution
and the inability to hear soft sounds while damage to the IHCs indicates that the sound
information is not being sent to the brain.
There is a tonotopic mapping along the length of the BM (Figure D-4). Each part of
the BM has a characteristic frequency of maximum vibration which depends on its relative
position. At the base of the cochlea (near the oval window), the BM is stiff and thin and
113
hence more responsive to high frequencies. The apex of the cochlea is wide and floppy and
more responsive to low frequencies. Each IHC has about 10 auditory nerve (AN) fibers.
The AN fibers also have a tonotopic mapping and encode steady state sounds and onsets.
At the stimulus onset, the AN firing increases rapidly. For constant stimulus, the firing
rate decreases exponentially. The AN pathway passes from the cochlea to the brainstem
and then upwards to the auditory processing centers of the temporal lobes of the brain
which decode the neural signal and provides us with the sensation of sound.
Figure D-1. Structure of the human ear [76]
114
Figure D-2. Hair cells of the Organ of corti [77]
a b
Figure D-3. Electron micrograph of the organ of corti for A) Normal-hearing B) Severehearing loss [78]
115
REFERENCES
[1] S. Arlinger, “Negative consequences of uncorrected hearing loss–a review,” Interna-tional Journal of Audiology, vol. 42, pp. S17–20, 2003.
[2] “CTIA semi-annual wireless industry survey,” (updated 2007; cited December 2007).[Online]. Available: http://files.ctia.org/pdf/CTIA Survey Mid Year 2007.pdf, CTIA,Washington, DC.
[3] Statistics about hearing disorders, ear infections, and deafness. (updated January2007; cited December 2007). [Online]. Available: http://www.nidcd.nih.gov/health/statistics/hearing.asp. NIDCD. Bethesda, MD.
[4] M. C. Killion, “The SIN report: Circuits haven’t solved the hearing-in-noiseproblem,” The Hearing Journal, vol. 50, p. 10, 1997.
[5] M. Skopec, “Hearing aid electromagnetic interference from digital wirelesstelephones,” IEEE Transactions On Rehabilitation Engineering, vol. 6, pp. 235–239,1998.
[6] American National Standard for Methods of Measurement of Compatibility betweenWireless Communications Devices and Hearing Aids, ANSI Std. C63.19-2005, 2005.
[7] J. Rodman, “The effect of bandwidth on speech intelligibility,” White Paper,Polycom, 2003.
[8] B. C. J. Moore, “Perceptual consequences of cochlear hearing loss and theirimplications for the design of hearing aids,” Ear and Hearing, vol. 17, pp. 133–161,1996.
[9] T. Venema, Compression for Clinicians. San Diego, CA: Singular Publishing Group,1999.
[10] Noise induced hearing loss facts. (updated 2007; cited December 2007). [Online].Available: http://www.hei.org/news/facts/nihlfact.htm. HEI. Los Angeles, CA.
[11] C. Portnuff. Safe listening levels for apple iPod. (updated October 2006; citedDecember 2007). [Online]. Available: http://www.physorg.com/news80304823.html.Boulder, CO.
[12] Occupational Safety and Health Standards, Occupational Safety and HealthAdministration Std. 1910.95, 1983.
[13] Criteria for a Recommended Standard: Occupational Noise Exposure Publication,National Institute for Occupational Safety and Health Std. 98-126, 1998.
[14] J. W. Hall and H. G. Mueller, Audiologists’ Desk Reference Volume I: DiagnosticAudiology Principles Procedures and Protocols. San Diego, CA: Singular PublishingGroup, 1996.
117
[15] H. Dillon, Hearing Aids. New York: Thieme Medical Publishers; 1st edition, 2001.
[16] M. C. Killion, “SNR loss: I can hear what people say, but i can’t understand them,”The Hearing Review, vol. 4, pp. 8–14, 1997.
[17] J. H. Macrae and H. Dillon, “Gain, frequency response, and maximum outputrequirements for hearing aids,” Journal of Rehabilitation Research and Development,vol. 33, no. 4, pp. 363–76, 1996.
[18] J. C. Steinberg and M. B. Gardner, “The dependence of hearing impairment on soundintensity,” Journal of the Acoustical Society of America, vol. 9, no. 1, pp. 11–23, 1937.
[19] B. A. Henry, C. W. Turner, and A. Behrens, “Spectral peak resolution and speechrecognition in quiet: Normal hearing, hearing impaired, and cochlear implantlisteners,” Journal of the Acoustical Society of America, vol. 118, pp. 1111–1121,2005.
[20] B. R. Glasberg and B. C. J. Moore, “Psychoacoustic abilities of subjects withunilateral and bilateral cochlear hearing impairments and their relationship to theability to understand speech,” Scandinavian Audiology, vol. 32, pp. 1–25, 1989.
[21] E. Zwicker and K. Schorn, “Psychoacoustical tuning curves in audiology,” Audiology,vol. 17, pp. 120–40, 1978.
[22] E. M. Danaher and J. M. Pickett, “Some masking effects produced by low-frequencyvowel formants in persons with sensorineural hearing loss,” Journal of Speech andHearing Research, vol. 18, pp. 261–71, 1975.
[23] J. R. Dubno and A. B. Schaefer, “Frequency selectivity for hearing-impaired andbroadband-noise-masked normal listeners,” Quarterly Journal of ExperimentalPsychology, vol. 43, pp. 543–64, 1991.
[24] E. M. Danaher, M. P. Wilson, and J. M. Pickett, “Backward and forward masking inlisteners with severe sensorineural hearing loss,” Audiology, vol. 17, pp. 324–38, 1978.
[25] S. Hygge, J. Ronnberg, and B. Larsby, “Normal-hearing and hearing-impairedsubjects’ ability to just follow conversation in competing speech, reversed speech, andnoise backgrounds,” Journal of Speech and Hearing Research, vol. 35, pp. 208–15,1992.
[26] B. C. J. Moore and B. R. Glasberg, “Simulation of the effects of loudness recruitmentand threshold elevation on the intelligibility of speech in quiet and in a backgroundof speech,” Journal of the Acoustical Society of America, vol. 94, pp. 2050–2062, Oct.1993.
[27] Y. Nejime and B. C. J. Moore, “Simulation of the effect of threshold elevation andloudness recruitment combined with reduced frequency selectivity on the intelligibilityof speech in noise,” Journal of the Acoustical Society of America, vol. 102, no. 1, pp.603–615, 1997.
118
[28] P. Duchnowski and P. M. Zurek, “Villchur revisited: Another look at automatic gaincontrol simulation of recruiting hearing loss,” Journal of the Acoustical Society ofAmerica, vol. 98, no. 6, pp. 3170–3181, 1995.
[29] P. W. Barnett, “Overview of speech intelligibility,” in Proceedings of the Institute ofAcoustics, vol. 21, 1999.
[30] K. Kasturi, P. Loizou, M. Dorman, and T. Spahr, “The intelligibility of speech withholes in the spectrum,” Journal of the Acoustical Society of America, vol. 112, 2002.
[31] N. R. French and J. C. Steinberg, “Factors governing the intelligibility of speechsounds,” Journal of the Acoustical Society of America, vol. 19, p. 90, 1947.
[32] J. D. Miller, “Effects of noise on people,” Journal of the Acoustical Society of Amer-ica, vol. 56, pp. 724–764, 1974.
[33] S. Fidell, R. Horonjeff, and S. Teffeteller, “Effective masking bandwidths at lowfrequencies,” Journal of the Acoustical Society of America, vol. 73, pp. 628–38, 1983.
[34] G. Parikh and P. Loizou, “The influence of noise on vowel and consonant cues,”Journal of the Acoustical Society of America, vol. 118, pp. 3874–3888, 2005.
[35] L. M. Jenstad and P. E. Souza, “Quantifying the effect of compression hearing aidrelease time on speech acoustics and intelligibility,” Journal of Speech, Language andHearing Research, vol. 48, pp. 651–67, 2005.
[36] R. V. Shannon, F. Zeng, and V. Kamath, “Speech recognition with primarilytemporal cues,” Science, vol. 270, pp. 303–304, 1995.
[37] Hearing in noise test. (updated June 2005; cited December 2007). [Online]. Available:http://www.californiaearinstitute.com/audiology/hint.php. California Ear Institute.Palo Alto, CA.
[38] American National Standard Methods for Calculation of the Speech IntelligibilityIndex, American National Standards Institute Std. S3.51 997, 1997.
[39] Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-endSpeech Quality Assessment of Narrowband Telephone Networks and Speech Codecs,ITU-T Std. P.862, 2001.
[40] MOS results on DVSI AMBE Vocoders. (updated 2006; cited December 2007).[Online]. Available: http://www.dvsinc.com/papers/toll.htm. DVSI. Westford, MA.
[41] Baby boomer hearing loss study. (updated 2006; cited December 2007). [Online].Available: http://www.clarityproducts.com/boomer/. Clarity and the EARfoundation. Chattanooga, TN.
[42] E. Zwicker and H. Fastl, Psychoacoustics, facts and models. New York: SpringerVerlag, 1990.
119
[43] K. W. Berger, E. N. Hagberg, and D. M. Varavvas, “Comparison of hearing thresholdlevel and most comfortable loudness level in hearing aid prescription,” Ear andHearing, vol. 3, pp. 30–3, 1982.
[44] G. Keidser and F. Grant, “Comparing loudness normalization (IHAFF) with speechintelligibility maximization (NAL-NL1) when implemented in a two-channel device,”Ear and Hearing, vol. 22, pp. 501–515, 2002.
[45] C. V. Pavlovic, “Derivation of primary parameters and procedeures for use in speechintelligibility predictions,” Journal of the Acoustical Society of America, vol. 82, pp.413–422, 1987.
[46] L. E. Humes, “Evolution of prescriptive fitting approaches,” American Journal ofAudiology, vol. 5, pp. 19–23, 1995.
[47] “Prescription of gain/output (POGO) for hearing aids,” Hearing Instruments, vol. 34,p. 16, 1983.
[48] D. Byrne and H. Dillon, “The national acoustic laboratories’ (nal) new procedure forselecting the gain and frequency response of a hearing aid,” Ear and Hearing, vol. 7,pp. 257–265, 1986.
[49] M. C. Killion, “Loudness-data basis for FIG6 hearing-aid fitting targets,” Journal ofthe Acoustical Society of America, vol. 98, p. 2927, 1995.
[50] R. M. Cox, “A hands-on discussion of the IHAFF approach,” The Hearing Journal,1995.
[51] R. C. Seewald, M. Ross, and M. K. Spiro, “Selecting amplification characteristics foryoung hearing-impaired children,” Ear and Hearing, vol. 6, pp. 48–53, 1985.
[52] R. Seewald and L. Cornelisse, “A software implementation of the desired sensationlevel DSL[i/o] method for fitting linear gain and wide dynamic range compression,”Users manual, Healthcare Research Unit, University of Western Ontario, London,Ontario, Canada, 1997.
[53] E. Villvhur, “Signal processing to improve speech intelligibility in perceptivedeafness,” Journal of the Acoustical Society of America, vol. 53, no. 6, pp. 1646–1657,1973.
[54] C. S. Hallpike and J. D. Hood, “Observations upon the neurological mechanism of theloudness recruitment phenomenon,” Acta Otolaryngol, vol. 50, pp. 472–86, 1959.
[55] F. Miskolczy-Fodor, “Relation between loudness and duration of tonal pulses.iii.response in cases of abnormal loudness function,” Journal of the Acoustical Societyof America, vol. 32, pp. 486–492, 1960.
120
[56] A. J. Oxenham and S. P. Bacon, “Cochlear compression: perceptual measures andimplications for normal and impaired hearing,” Ear and Hearing, vol. 24, no. 5, pp.352–366, 2003.
[57] C. Grasso, D. Quaglia, and L. Farinetti, “Wide-band compensation of presbycusis,”in IASTED International Conference on Signal Processing, Pattern Recognition, andApplications, Greece, 2003, pp. 104–108.
[58] B. C. J. Moore. A new CD for diagnosis of dead regions in the cochlea-theTEN(HL)CD. (updated 2004; cited December 2007). [Online]. Available:http://hearing.psychol.cam.ac.uk/dead/TENCD.html. Department of ExperimentalPsychology. Cambridge, England.
[59] Q. Yang, “Hearing enhancement algorithm implementation,” University of Florida,Motorola Internal Document, December 2006.
[60] R. J. Niederjohn and D. G. Mliner, “The effects of high-pass and of low-pass filteringupon the intelligibility of speech in white noise,” Journal of Audiology Research,vol. 22, pp. 189–99, 1982.
[61] J. Cooper and B. Cutts, “Speech discrimination in noise,” Journal of Speech andHearing Research, vol. 14, pp. 332–337, 1971.
[62] M. A. .Pickeny and N. I. .Durlach, “Speaking clearly for the hard of hearing ii:acoustic characteristics of clear and conversational speech,” Journal of Speech andHearing Research, vol. 29, pp. 434–445, 1986.
[63] I. Cohen, “Noise spectrum estimation in adverse environments: Improved minimacontrolled recursive averaging,” IEEE Transactions on Speech and Audio Processing”,year=”2002”, volume=”11”, pages=466-475,.
[64] K. Maleh, A. Samouelian, and P. Kabal, “Frame-level noise classification in mobileenvironments,” in Proceedings of IEEE Conference on Acoustics, Speech and SignalProcessing, Phoenix, AZ, 1999, pp. 237–240.
[65] N. Virag, “Single channel speech enhancement based on masking properties of thehuman auditory system,” Ieee Transactions On Speech And Audio Processing, vol. 7,no. 2, pp. 126–137, 1999.
[66] J. D. Johnston, “Transform coding of audio signals using perceptual noise criteria,”IEEE Journal on Selected Areas in Communications,, vol. 6, pp. 314–323, 1988.
[67] M. R. Schroeder and B. S. Atal, “Optimizing digital speech coders by exploitingmasking properties of the human ear,” Journal of the Acoustical Society of America,vol. 66, p. 16471652, 1979.
[68] C. I. Moore, M. B. Browning, and G. M. Rose, “Hippocampal plasticity induced byprimed burst, but not long-term potentiation, stimulation is impaired in area CA1 ofaged fischer-344 rats,” Hippocampus, vol. 3, pp. 57–66, 1993.
121
[69] M. Ramani, J. G. Harris, and A. Holmes, “Comparison of hearing loss compensationalgorithms using speech intelligibility measures,” in Journal of the Acoustical Societyof America, vol. 117, Vancouver, CA, April 2005, p. 2604.
[70] F. Feldbusch, “Identification of noises by neural nets for application in hearing aids,”in Proceedings of the Second International ICSC Symposium on Neural Computation,Berlin, May 2000, pp. 505–510.
[71] O. Arnsten, H. Koren, and T. Strom, “Hearing-aid pre-selection through a neuralnetwork,” Scandinavian Audiology, vol. 25, pp. 259–262, 1996.
[72] J. M. Kates, “On the feasibility of using neural nets to derive hearing-aid prescriptiveprocedures,” Journal of the Acoustical Society of America, vol. 98, no. 1, pp. 172–180,1995.
[73] R. Gao, Y. Liu, and S. Basseas, “Next generation hearing aid devices,” in Proceedingsof the 11th IEEE International Conference on Tools With Artificial Intelligence, vol.117, Chicago, IL, April 1999, pp. 327–331.
[74] D. W. Ruck, “The multilayer perceptron as an approximation to a bayes optimaldiscriminant function,” IEEE Transactions on Neural Networks, vol. 1, no. 4, pp.296–298, 1995.
[75] I. A. Manolis, “A brief description of the levenberg-marquardt algorithm,” (updatedFebruary 2005; cited December 2007). [Online]. Available: http://www.ics.forth.gr/∼lourakis/levmar/levmar.pdf, Institute of Computer Science, Crete, Greece.
[76] D. Packard. Human ear. (Updated 2006; Cited December 2007). [Online]. Available:http://commons.wikimedia.org/wiki/Image:HumanEar.jpg. Stanford, CA.
[77] R. Fettiplace and C. Hackney, “The sensory and motor roles of auditory hair cells,”Nature Reviews Neuroscience, vol. 7, pp. 19–29, 2006.
[78] A. S. Lab, “Electron micrographs of normal and damaged cochlea,” (updated2005; cited December 2007). [Online]. Available: http://www.sickkids.ca/AuditoryScienceLab/section.asp?s=Hearing+Loss&sID=3243, University of Toronto,Toronto, Canada.
122
BIOGRAPHICAL SKETCH
Meena Ramani was born in the small town of Palakkad, Kerala in India, on June
1, 1981. She received her bachelors degree in electronics and communication engineering
from Kumaraguru college of technology in 2002. She received her masters degree in
electrical and computer engineering from the University of Florida in 2004. She joined
the Computational NeuroEngineering Laboratory (CNEL) at the University of Florida
in 2003 and has since been working under the guidance of Dr. John G. Harris. Her PhD
research has been funded by Motorola. Her research interests include speech and hearing
enhancement, noise reduction, spike based computation and cochlear implants. During the
course of her studies, Meena has worked as a software development and testing intern at
Microsoft, WA and as a research scientist with SoundID, CA. After graduation Meena will
join SoundID fulltime as a research scientist.
123