13
Development of the Cantonese speech intelligibility index a) Lena L. N. Wong, b Amy H. S. Ho, c and Elizabeth W. W. Chua d Division of Speech & Hearing Sciences, University of Hong Kong, Hong Kong, China Sigfrid D. Soli House Ear Institute, Los Angeles, California 90057 Received 4 April 2006; revised 8 December 2006; accepted 11 December 2006 A Speech Intelligibility Index SII for the sentences in the Cantonese version of the Hearing In Noise Test CHINT was derived using conventional procedures described previously in studies such as Studebaker and Sherbecoe J. Speech Hear. Res. 34, 427–438 1991. Two studies were conducted to determine the signal-to-noise ratios and high- and low-pass filtering conditions that should be used and to measure speech intelligibility in these conditions. Normal hearing subjects listened to the sentences presented in speech-spectrum shaped noise. Compared to other English speech assessment materials such as the English Hearing In Noise Test Nilsson et al., J. Acoust. Soc. Am. 95, 1085–1099 1994, the frequency importance function of the CHINT suggests that low-frequency information is more important for Cantonese speech understanding. The difference in frequency importance weight in Chinese, compared to English, was attributed to the redundancy of test material, tonal nature of the Cantonese language, or a combination of these factors. © 2007 Acoustical Society of America. DOI: 10.1121/1.2431338 PACS numbers: 43.71.Gv ARB Pages: 2350–2361 I. INTRODUCTION A. Background The Articulation Index AI or its revised appellation, Speech Intelligibility Index SII, is a quantitative measure that accounts for the contribution of audible speech cues in given frequency bands to speech intelligibility Amlani et al., 2002. It is a useful tool for estimating speech under- standing ability under specified listening situations. The AI has been suggested for clinical applications such as predic- tion of speech recognition performance with various configu- rations of hearing loss Macrae and Brigden, 1973; Pavlovic, 1984; Kamm et al., 1985; Killion and Christensen, 1998, estimation of unaided and aided speech intelligibility to de- termine the potential benefits of hearing aids Mueller and Killion, 1990; Killion and Christensen, 1998; Stelmachowicz et al., 2002, and prescription of hearing aid gain Rankovic, 1991. Amendments to the original calculations of AI were made for over a decade before the ANSI-S3.5 1969 stan- dard was adopted. Since then, efforts were made to simplify the calculation of AI for clinical applications e.g., Pavlovic, 1984; Eisenberg et al., 1998. The term, Speech Intelligibil- ity Index SII was later adopted in the ANSI-S3.5 1997 standard to account for spread of masking and level distor- tion effects Amlani et al., 2002. To establish the SII, it is necessary to gain a thorough understanding of the frequency-importance function FIF of a specific test material because the relative importance of various frequency bands to speech intelligibility is a key component of the basic SII equation: i=1 n IiAi , 1 where Ii is the importance of a frequency band i and is expressed as a weighted factor from 0.0 to 1.0; Ai is the audibility function, representing the amount of speech en- ergy available in the ith frequency band that contributes to the overall intelligibility French and Steinberg, 1947; Am- lani et al., 2002. It is assumed that the speech signals in the adjoining frequency bands that comprise the audible spec- trum will independently contribute to the articulation score, and speech intelligibility is an additive measure of weighted importance contributed by different frequency regions Rankovic, 1995. The dynamic range DR of the long-term average speech spectrum LTASS, which Byrne et al. 1994 found is similar across languages, affects the calculation of the au- dibility function. Conventionally, the effective DR is as- sumed to be 30 dB at all frequency bands for English mate- rials e.g., ANSI-S3.5, 1969; Studebaker and Sherbecoe, 1991; Eisenberg et al., 1998. Studebaker et al. 1999 ar- gued that there is “credible evidence” for a larger value, and they experimentally proved that a DR of 40 dB yielded bet- ter prediction of speech recognition using NU-6 Tillman and Carhart, 1966 on normal hearing and hearing-impaired par- ticipants under different listening conditions. The SII can be used to predict speech intelligibility via a transfer function S such as the one derived by Fletcher and Galt 1950: a Portions of this work were presented in a paper “Cantonese Speech Intel- ligibility Index,” Proceedings of International Congress of Audiology, Phoenix, Arizona, September 2004. b Electronic mail: [email protected] c Currently associated with St. Teresa’s Hospital Hearing and Speech Centre, Hong Kong. d Currently associated with Starkey HK Hearing and Speech Centre Ltd. 2350 J. Acoust. Soc. Am. 121 4, April 2007 © 2007 Acoustical Society of America 0001-4966/2007/1214/2350/12/$23.00

Development of Cantonese Speech Intelligibility Index

Embed Size (px)

Citation preview

Page 1: Development of Cantonese Speech Intelligibility Index

Development of the Cantonese speech intelligibility indexa)

Lena L. N. Wong,b� Amy H. S. Ho,c� and Elizabeth W. W. Chuad�

Division of Speech & Hearing Sciences, University of Hong Kong, Hong Kong, China

Sigfrid D. SoliHouse Ear Institute, Los Angeles, California 90057

�Received 4 April 2006; revised 8 December 2006; accepted 11 December 2006�

A Speech Intelligibility Index �SII� for the sentences in the Cantonese version of the Hearing InNoise Test �CHINT� was derived using conventional procedures described previously in studiessuch as Studebaker and Sherbecoe �J. Speech Hear. Res. 34, 427–438 �1991��. Two studies wereconducted to determine the signal-to-noise ratios and high- and low-pass filtering conditions thatshould be used and to measure speech intelligibility in these conditions. Normal hearing subjectslistened to the sentences presented in speech-spectrum shaped noise. Compared to other Englishspeech assessment materials such as the English Hearing In Noise Test �Nilsson et al., J. Acoust.Soc. Am. 95, 1085–1099 �1994��, the frequency importance function of the CHINT suggests thatlow-frequency information is more important for Cantonese speech understanding. The difference infrequency importance weight in Chinese, compared to English, was attributed to the redundancy oftest material, tonal nature of the Cantonese language, or a combination of these factors.© 2007 Acoustical Society of America. �DOI: 10.1121/1.2431338�

PACS number�s�: 43.71.Gv �ARB� Pages: 2350–2361

I. INTRODUCTION

A. Background

The Articulation Index �AI� or its revised appellation,Speech Intelligibility Index �SII�, is a quantitative measurethat accounts for the contribution of audible speech cues ingiven frequency bands to speech intelligibility �Amlani etal., 2002�. It is a useful tool for estimating speech under-standing ability under specified listening situations. The AIhas been suggested for clinical applications such as predic-tion of speech recognition performance with various configu-rations of hearing loss �Macrae and Brigden, 1973; Pavlovic,1984; Kamm et al., 1985; Killion and Christensen, 1998�,estimation of unaided and aided speech intelligibility to de-termine the potential benefits of hearing aids �Mueller andKillion, 1990; Killion and Christensen, 1998; Stelmachowiczet al., 2002�, and prescription of hearing aid gain �Rankovic,1991�. Amendments to the original calculations of AI weremade for over a decade before the ANSI-S3.5 �1969� stan-dard was adopted. Since then, efforts were made to simplifythe calculation of AI for clinical applications �e.g., Pavlovic,1984; Eisenberg et al., 1998�. The term, Speech Intelligibil-ity Index �SII� was later adopted in the ANSI-S3.5 �1997�standard to account for spread of masking and level distor-tion effects �Amlani et al., 2002�.

To establish the SII, it is necessary to gain a thoroughunderstanding of the frequency-importance function �FIF� of

a specific test material because the relative importance ofvarious frequency bands to speech intelligibility is a keycomponent of the basic SII equation:

�i=1

n

IiAi , �1�

where Ii is the importance of a frequency band �i� and isexpressed as a weighted factor from 0.0 to 1.0; Ai is theaudibility function, representing the amount of speech en-ergy available in the ith frequency band that contributes tothe overall intelligibility �French and Steinberg, 1947; Am-lani et al., 2002�. It is assumed that the speech signals in theadjoining frequency bands that comprise the audible spec-trum will independently contribute to the articulation score,and speech intelligibility is an additive measure of weightedimportance contributed by different frequency regions�Rankovic, 1995�.

The dynamic range �DR� of the long-term averagespeech spectrum �LTASS�, which Byrne et al. �1994� foundis similar across languages, affects the calculation of the au-dibility function. Conventionally, the effective DR is as-sumed to be 30 dB at all frequency bands for English mate-rials �e.g., ANSI-S3.5, 1969; Studebaker and Sherbecoe,1991; Eisenberg et al., 1998�. Studebaker et al. �1999� ar-gued that there is “credible evidence” for a larger value, andthey experimentally proved that a DR of 40 dB yielded bet-ter prediction of speech recognition using NU-6 �Tillman andCarhart, 1966� on normal hearing and hearing-impaired par-ticipants under different listening conditions.

The SII can be used to predict speech intelligibility via atransfer function �S� such as the one derived by Fletcher andGalt �1950�:

a�Portions of this work were presented in a paper “Cantonese Speech Intel-ligibility Index,” Proceedings of International Congress of Audiology,Phoenix, Arizona, September 2004.

b�Electronic mail: [email protected]�Currently associated with St. Teresa’s Hospital Hearing and Speech Centre,

Hong Kong.d�Currently associated with Starkey �HK� Hearing and Speech Centre Ltd.

2350 J. Acoust. Soc. Am. 121 �4�, April 2007 © 2007 Acoustical Society of America0001-4966/2007/121�4�/2350/12/$23.00

Page 2: Development of Cantonese Speech Intelligibility Index

S = �1 – 10−AP/Q�N, �2�

where S is the percent correct intelligibility score, A is theSII value, P stands for a proficiency factor that accounts fortalker’s and listener’s competence and practice effect, andboth Q and N are fitting constants depending on the speechstimulus’ characteristics �Fletcher and Galt, 1950�. Morespecifically, Q is a correction factor “to compensate forchanges in proficiency” to the test stimuli in an experiment;N represents “the number of independent sounds in a testitem” or a constant “that controls the shape of the line �S�”�Studebaker and Sherbecoe, 1991, pp. 431 and 433�.

B. SII for specific speech materials

Studebaker and Sherbecoe �1993� reported that FIFsvary with speech stimuli so that given the same SII, pre-dicted speech intelligibility varies with speech materials. Theoriginal AI calculation was based on CVC nonsense syllables�French and Steinberg, 1947�. Other types of speech test ma-terials have been used in subsequent research. These includethe Central Institute for the Deaf �CID� W-22 word lists�Studebaker and Sherbecoe, 1991�, NU-6 word test �Stude-baker et al., 1993�, Hearing In Noise Test �HINT� sentencematerials �Eisenberg et al., 1998�, Consonant-vowelNucleus-Consonant �CNC� monosyllabic word test �Henry etal., 1998�, and Connected Speech Test �CST� passages �Sher-becoe and Studebaker, 2002�. DePaolis et al. �1996� foundstatistically different one-third octave band FIFs for PB-50monosyllabic words, the SPIN test and continuous discourse.Distinct crossover frequencies, i.e., the frequency that di-vides a speech spectrum into two equally important parts,varied from 1189 to 1900 Hz for various materials �see TableI�. With the exception of the W-22 word lists, crossover fre-quencies shift to lower values as the redundancy of thespeech materials increases �Studebaker et al., 1987; Stude-baker and Sherbecoe, 1991�—continuous discourse has thelowest values and nonsense syllables have the highest values.The crossover frequency may differ across languages. Forexample, while French and English did not show much dif-ference in crossover frequencies �about 1500 Hz�, Finnishdisyllabic words had a significantly lower crossover fre-quency at about 1000 Hz �Studebaker and Sherbecoe, 1993�.The FIF or crossover frequency has never been establishedfor tonal languages such as Cantonese.

C. Cantonese

Cantonese is a tonal language spoken by more than 16million people in the world. Cantonese is a regional dialect

in South-Eastern China �Ramsey, 1987� and one of the maindialects in China �Li, 1989�. It is commonly spoken amongChinese immigrants in North America, South Asia, Australia,and Great Britain �Lau and So, 1988; Matthews and Yip,1994�. Among Chinese dialects, its influence is second tothat of Mandarin �Matthews and Yip, 1994�.

Cantonese morphemes are monosyllabic and monosyl-lables are combined to form polysyllabic words. Cantonesesyllables take the form of optional initial consonant, manda-tory vowel, and optional final consonant or �C�V�C�. Can-tonese has the same long-term average speech spectrum�LTASS� as many other languages including English �Byrneet al., 1994�, but Cantonese phonology is very different fromEnglish phonology �So and Dodd, 1995�. For example, Can-tonese speakers would be concerned with discrimination ofaspirated and unaspirated consonants and not between voicedand voiceless consonants. Cantonese has fewer consonantsand more vowels than English, and tones carry lexical mean-ing �Dodd and So, 1994�. There are nine lexical tones�Browning, 1974; Fok Chan, 1974; Dodd and So, 1994�, aslisted in Table II. Browning �1974� suggested that the threeentering tones �high, mid, and low� of Cantonese are notcontrastive as their registers are comparable to tones 1 �highlevel�, 3 �mid level�, and 6 �low level�. Pitch variations dueto changes in fundamental frequency �F0� provide the maincues for tone perception �Fok Chan, 1974; Gandour, 1981;Cheung, 1992�. Cheung �1992� found that tones are moreresistant to the masking effect of noise than consonants.Thus, it is possible that low-frequency information carriesmore weight for Cantonese speech understanding than En-glish. In fact, compared to English speakers with the sameamount of hearing loss, Cantonese speakers with good low-frequency hearing experience less self-reported difficulty inspeech understanding, despite a significant loss at higher fre-quency �Doyle and Wong, 1996; Doyle et al., 2002; Wong etal., 2004�.

TABLE I. Crossover frequencies of various speech materials.

Study Speech stimulus Crossover frequency

Studebaker et al. �1987� Continuous discourse 1189 HzStudebaker and Sherbecoe �1991� W-22 1314 HzEisenberg et al. �1998� HINT sentences 1550 HzSherbecoe and Studebaker �2002� Connected Speech Test 1599 HzANSI �S3.5-1969� Nonsense syllables 1660 HzFrench and Steinberg �1947� Nonsense syllables About 1900 Hz

TABLE II. Description and examples of each Cantonese tone.

Number Classification Example Transcription

1 High level Poem si12 High rising History si23 Mid level Examination si34 Low falling Time si4

5 Low rising Market si56 Low level Matter si67 High entering Color sIk7

8 Mid entering Kiss sIk8

9 Low entering Eat sIk9

J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007 Wong et al.: Cantonese speech intelligibility index 2351

Page 3: Development of Cantonese Speech Intelligibility Index

D. Aim of the study

This study was aimed at deriving a Speech IntelligibilityIndex for Cantonese �SIIC� using the materials from the Can-tonese version of the Hearing in Noise Test �CHINT� �Wongand Soli, 2005�. The CHINT is the only standardized Can-tonese sentence speech reception test. Deriving a SII basedon the CHINT would result in a better understanding of Can-tonese speech perception. In particular, cochlear implant cod-ing strategies are based on work to optimize speech under-standing in native English speakers, but Cantonese users failto recognize tones �Ciocca et al., 2002; Wong and Wong,2004�. It is hoped that knowledge of Cantonese SII mayresult in a better understanding of cochlear implant strategiesto help preserve tonal information. How hearing aids shouldbe best prescribed for Cantonese speakers also requires athorough understanding of how audibility at various frequen-cies contributes to intelligibility.

Procedures described by Studebaker and Sherbecoe�1991� were used as a basis for Cantonese SII derivation.With Cantonese being a tonal language, it was expected thatthe crossover frequency for a given type of Cantonese mate-rial would be lower than the English equivalent and the FIFwould be different from English or French materials. As theeffective DR for CHINT has not been determined, resultsbased on the work by Byrne and colleagues �1994� wereused. That is, the DR of CHINT was assumed to be 30 dB,but a 40-dB DR was also evaluated.

II. METHOD

A. Participants

Six normal-hearing native Cantonese speakers partici-pated in the pilot study. Seventy-eight �34 male, 44 female�other young normal-hearing native Cantonese speakers par-ticipated in the actual study. As participants were recruited inHong Kong where some individuals are exposed to two dia-lects �e.g., Cantonese and Mandarin� since birth, first lan-guage was difficult to determine. Therefore, participantsspeaking Cantonese as their primary language were re-cruited. None of the participants spoke Cantonese with adialectal accent. Mean age of participants in the actual ex-periment was 23 years for male �s.d. 4.5� and 22 years forfemale �s.d. 2.5�, with a range from 18 to 34 years. All par-ticipants had bilateral hearing thresholds of 20 dB HL orbetter at the octave frequencies from 250 to 8000 Hz. In theactual experiment, participants’ pure-tone hearing thresholdsaveraged at 500, 1000, and 2000 Hz in the right ear was9.9 dB HL �s .d . 3.8� and in the left ear was 6.9 dB HL�s .d . 4.3�. None of the participants reported histories ofnoise exposure or middle ear pathology. All of them hadnormal middle ear function confirmed by tympanometryprior to the experiment. All participants were paid to takepart in the study.

B. Materials

Sentences from the CHINT �Wong and Soli, 2005� wereused in the present study because it is the only well-standardized material for assessing Cantonese speech intelli-

gibility. The CHINT comprises 24 sets of 10 sentences each,with sentences in each set balanced for the level of difficultyand phonemic characteristics. The Cantonese HINT sen-tences have 10 syllables represented by 10 Chinese charac-ters; this contrasts with the English HINT sentences that con-tain four to seven syllables �Nilsson et al., 1994�. TheCHINT can be used to assess speech intelligibility in quietand in noise with noise simulated to originate from 0° ,90°,and 270° azimuths. In this study, speech and noise were pre-sented in noise only from 0° azimuth. The noise used wasmatched to the long-term average speech spectrum of thetalker.

C. Equipment

The CHINT sentences and the speech-spectrum shapednoise were presented via the Hearing In Noise Test �HINT�program �version 5.0.3� using a SoundBlaster soundcard.Both speech signal and speech-spectrum shaped noise weremixed before they were delivered to a Tucker-Davis Tech-nologies �TDT� System 3 digital filter. The filter was con-trolled by a computer program, Realtime Processor VisualDesign Studio �RPvds� �version 4.0� and provided a rejectionslope of 96 dB/octave at the desired cutoff frequencies. Thefiltered signals were routed to a GSI 16 audiometer and pre-sented diotically to the participants using TDH-50P head-phones. The output of headphones was calibrated to 65 dB Ain a 6-cc coupler using the speech-spectrum shaped noiselow-pass filtered at 12 000 Hz.

D. Procedures

For the wide-band condition, the noise was fixed at65 dB A, and the level of speech signal was varied accordingto the desired signal-to-noise ratio �SNR� in each test condi-tion. Prior to testing, participants listened to two practicelists, one presented in quiet and another in noise to familiar-ize them with the stimuli and test procedures. Receptionthresholds of sentences �RTSs� were obtained adaptively�Nilsson et al., 1994� in quiet with test stimuli low-pass �LP�filtered at 12 000 Hz. Individual RTSs served as referencelevels for obtaining speech intelligibility scores in the filter-ing conditions.

1. Pilot study for selecting filtering conditions

A pilot study was conducted to determine the signal-to-noise ratios �SNRs� and the cutoff frequencies that should beused in the actual experiment. Six participants took part inthe pilot study. RTSs were obtained in noise and served asthe reference level for determining the speech level at whichpercent correct intelligibility was measured in variousfiltering/SNR conditions. Participants were instructed to re-peat as much of each sentence as possible. According to theHINT protocol, only small variations in response that did notchange the meaning of the sentences were allowed �e.g.,mommy instead of mama�.

Percent correct intelligibility was then obtained in vari-ous filtering/SNR conditions. As there were 10 sentences ineach list, every sentence repeated correctly contributed to10% of the score. To determine the filtering conditions to be

2352 J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007 Wong et al.: Cantonese speech intelligibility index

Page 4: Development of Cantonese Speech Intelligibility Index

used in the actual study, presentations at +7 dB SNR; and LPand high-pass �HP� filters set to 500, 800, 1100, 1400, and1700 Hz were arbitrarily chosen. To select the SNR condi-tions for the actual experiment, a total of eight SNR condi-tions, i.e., −5, −4, −3, −1, 0, +1, +3, +5 dB SNR �withreference to individual RTSs� were used with LP filter set to12 000 Hz cutoff. A total of 13 conditions were evaluated.Each participant took about one hour to complete the testingin the pilot study. After the pilot study, SNR and filteringconditions that would contribute important information to thestudy were selected. In addition, a few other SNR and filter-ing conditions were selected in order to yield more detailedinformation. The criteria used to select these conditions willbe discussed in the Results section.

2. Study

A practice list was administered to obtain RTS in noiseso as to familiarize participants with test stimuli and proce-dures. RTS was measured in noise to determine the speechlevel at which percent correct intelligibility was to be mea-sured. As only 23 sentence lists were available after indi-vidual RTSs were obtained, each participant was evaluatedusing 22 to 23 randomly assigned conditions. This processtook about one hour to complete. Mean intelligibility in eachtest condition was based on 16 sets of data. Based on resultsfrom the pilot study, RTS was obtained in a total of 115filtering/SNR conditions �see Table III�.

E. Data analysis

1. Pilot study to select filtering/SNR conditions

Percent intelligibility across the filtering/SNR conditionswas compared. Conditions were selected or added to ensure

as wide a range of scores as possible �from 0% to 100%�could be obtained. Among the conditions that yielded verysimilar results, only one was selected.

2. Determination of performance-intensity function

The performance-intensity �PI� function is defined as thechange in intelligibility per dB change in SNR. The PI func-tion was used to confirm whether intelligibility grows as afunction of SNR in a linear relationship, with a slope ofabout 10% per dB change in SNR �Wong and Soli, 2005�.For the present study, the PI function was estimated by usingthe data from the 12 000 Hz LP filtering condition at variousSNRs. Intelligibility scores from 20% to 80% were used toestimate the PI function; beyond this range, plateau of scoresdid not allow accurate measurement.

3. Determination of crossover frequencies

The crossover frequency is defined as the frequencywhich divides the frequency range into two regions, eachaccounting for 50% of the information. To obtain crossoverfrequencies at each SNR, intelligibility at each LP and HPcutoff frequency was plotted. These data were examined andonly the scores that contributed to the linear portion of thegrowth function were used to obtain two regression equa-tions, one for the LP and another for the HP conditions;beyond this range, ceiling and floor effects might have af-fected the results. The intersection between the LP and HPcurves at each SNR represents the crossover frequency forthe CHINT materials. The crossover frequency was obtainedby solving these equations. The same procedures were ap-plied at various SNRs.

TABLE III. Mean percent speech recognition scores in various filtering/SNR conditions used in the actualexperiment. The blank cells represent conditions that were not evaluated in the study.

SNR �refer to individual RTSs�Cut-off frequency −4 −2 −1 0 2 4 6 8

Low-pass filtered500 0.6 1.3 1.9650 10.0 18.0 37.5800 7.3 23.1 28.7 50.01100 0.6 9.3 18.8 46.9 64.71400 0.6 4.4 5.0 15.0 38.1 57.5 67.51700 1.3 6.7 7.3 13.3 43.1 56.9 81.3 82.03500 3.8 12.7 23.1 42.5 50.6 84.4 91.3 95.05000 11.1 28.9 32.2 60.0 74.4 88.9 93.3 98.36500 2.2 31.1 44.4 56.7 74.4 91.1 94.4 96.78000 11.1 30.0 48.9 58.9 74.4 77.8 97.8 100.0

12000 15.6 29.4 43.8 56.3 74.4 85.6 96.3 96.3

High-pass filtered200 7.5 26.9 41.9 54.4 88.7 84.4 92.5 96.9500 8.7 21.9 34.4 39.4 68.1 75.6 89.4 90.0800 0.6 1.3 7.5 12.7 24.4 41.9 56.9 65.01100 0.6 3.1 6.3 8.7 19.4 29.4 55.6 60.01400 2.5 3.1 4.0 10.0 18.8 35.6 40.61700 0.6 3.8 2.5 3.1 6.7 12.5

J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007 Wong et al.: Cantonese speech intelligibility index 2353

Page 5: Development of Cantonese Speech Intelligibility Index

4. Derivation of the relative transfer function

The relative transfer function �RTF� assumes that themaximum SII is equal to one. That is, the unfiltered condi-tion with the highest score is assigned a SII value of 1.00 andthe other conditions have SIIs relative to that value �Stude-baker et al., 1987; Studebaker and Sherbecoe, 1991�. Thecurve bisection procedure described by Studebaker and Sher-becoe �1991, pp. 431 to 432� was used to derive the RTF.Briefly, percent correct scores for the LP and HP filteringconditions at the highest SNR �i.e., +8 dB SNR, with refer-ence to individual RTSs� were first plotted as a function offilter cutoff frequency �see Fig. 1�. The percent correct intel-ligibility corresponding to 0.5 SII was obtained using thesetwo curves. That is, the intersection of these two curves rep-resents 0.5 SII, because half of the total auditory area isavailable to the listener above this point and another half is

below this point. The total area for this SNR is assumed tohave an SII of 1.00 �Studebaker and Sherbecoe, 1991, p.430�. The procedures are shown in panel A of Fig. 1. Thescore at 0.50 SII was then used to determine the next point�i.e., the score corresponding to 0.25 SII� on the transferfunction. Because there were no HP or LP curves that termi-nated at the score corresponding to 0.50 SII, data from thecurves corresponding to 0 and 2 dB SNR were used to inter-polate the data. The intersection of these two curves yieldedthe scores for 0.25 SII.

Scores for SII values above 0.50 were obtained by iden-tifying points on the curves that complemented those below0.5 SII �Studebaker and Sherbecoe, 1991, p. 431�. The pro-cedure is illustrated in panel B of Fig. 1. The 0.75 SII pointwas produced by extending a horizontal line for the score for0.25 SII until it intersected the HP and LP curves for the

FIG. 1. The curve bisection procedureused to derive the RTF. Panel A de-notes the value for 0.50 SII, panel Bdenotes the value for 0.25 SII, andpanel C denotes the values used to de-rive 0.75 SII. Results from high-passfiltering conditions are represented bylines with upper ends that start fromthe left side of the graph; those fromlow-pass filtering conditions start fromthe right side.

2354 J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007 Wong et al.: Cantonese speech intelligibility index

Page 6: Development of Cantonese Speech Intelligibility Index

+8 dB SNR �with reference to individual RTSs� condition.These HP and LP curves as well as the horizontal line areshown in panel C of Fig. 1. Two vertical lines were thendrawn, starting from these two intersection points, to connectto the upper ends of the LP and HP curves for the +8 dBSNR �with reference to individual RTSs� condition. The val-ues corresponding to the top intersections of these lines andthe HP and LP curves, as indicated by the two circles inpanel C of Fig. 1, were then averaged to yield a final scorefor 0.75 SII.

The above bisection procedures were followed until anumber of SII values with corresponding percent correct in-telligibility were obtained. The SPSS 11.0 program was usedto fit the percent correct intelligibility scores and the corre-sponding SII derived from the above procedures using sev-eral equations including Eq. �2�. The best fit SII relativetransfer function �RTF�, together with its fitting constantswere estimated.

5. Derivation of the frequency-importance function

The RTF was then used to derive the frequency-importance function �FIF�, i.e., the relative importance ofspeech information contained in each frequency region de-fined by the area between filter cutoff frequencies �Henry etal., 1998; Studebaker and Sherbecoe, 1991�. The proceduresdescribed in Studebaker and Sherbecoe �1991, pp. 430 to433� and Henry et al. �1998, p. 83� were followed. First, allHP and LP mean scores at each SNR condition were con-verted to SIIs using Eq. �3�, which is a transformation of Eq.�2�:

A = Q/P log�1 − S1/N� . �3�

The mean scores and their corresponding SIIs were substi-tuted into Eq. �3� to obtain the fitting constants Q and Nusing SPSS 11.0 program. The P value was assumed to be1.000. The HP and LP SII data were combined and averagedusing the procedures set out in Studebaker and Sherbecoe�1991, pp. 430 to 432� to generate an average cumulative SIIcurve against the filter cutoff frequencies. Briefly, the meanSII across all SNRs for each filtering cutoff frequency wascalculated. These SII values were then plotted against thefiltering frequencies. The SPSS 11.0 program was used toidentify the best fit curve for relating these parameters. Asthis graph represented the cumulative band-importance forthe full range of frequencies �200 to 12 000 Hz�, the contri-bution of each one-third octave band FIF was obtained bydividing the full range into appropriate bands, and sub-tracting the cumulative SII at the center frequency of thelower band from that of the higher band. Then, the relativeFIF was expanded to an SII scale of 0 to 1. This wasachieved by dividing every SII value by the sum of indi-vidual SIIs.

6. Derivation of the absolute transfer function

Once the FIF is determined, the slope of the RTF can beadjusted so that the best SII predicted by that function is nowequal to its true absolute value �Studebaker and Sherbecoe,1991�. As the curve bisection procedure in RTF derivation

assumes the best score obtained is equivalent to a perfect SII�or 1.0�, adjustment to the slope of the RTF is required toobtain an absolute transfer function �ATF�, which reflects thetrue relationship between SII and the test scores. The proce-dures described in Studebaker and Sherbecoe �1991, p. 433�were followed below to derive the ATF.

Equation �1� was used to identify the SIIs for the meanpercent correct score in each test condition. As the SNRswere based on individual RTSs, a correction factor equiva-lent to the mean RTS �or 3.5 dB� was added to each condi-tion before calculations. Using an iterative method of audi-bility index determination �Studebaker and Sherbecoe,1991�, the SII for each listening condition was calculatedusing Eq. �4�:

SII = �i=1

n

��SNRadjusted + K�/DR� � FIFi , �4�

where SNR adjusted is the SNR for each test condition ad-justed by the mean RTS �or 3.5 dB�, K is the assumedspeech maxima above LTASS, DR is the assumed dy-namic range for speech, FIFi is the FIF of frequency bandi, and n is the total number of bands used in the calcula-tion. First, mean scores between 5 and 95% were plottedagainst their SII values, using Eq. �2� as the fitting modelwhich was also the best fit curve among others �e.g., lin-ear regression analysis�. As the value of K was unknownfor the CHINT material, it was varied in 1 dB steps from10 to 21 dB, and the DR was set at 30 or 40 dB to identifya combination of K and DR values that would yield thesmallest mean square error. These K and DR values werebased on ANSI-S3.5 �1969� standard where a 30 dB DRrepresents the range from +12 dB to −18 dB relative to theLTASS. This range was then modified to ±15 dB inANSI-S3.5 �1997�. In this study, the exploration of the Kvalue was extended to 21 to include these values. Whiletypical Cantonese speech DR was assumed to be 30 dB�Byrne et al., 1994�, a DR of 40 dB as suggested by Stude-baker et al. �1999� was also evaluated for any improve-ment to the accuracy of intelligibility prediction.

III. RESULTS

A. Pilot study to select filtering/SNR test conditions

A speech intelligibility dropped dramatically when thecutoff frequency of the LP filter was reduced from 800 to500 Hz, the 650 Hz LP filtering condition was added in theactual experiment. Because the 1700 Hz LP filtering condi-tion failed to yield a high score �i.e., scores were lower than90% correct�, LP filtering conditions with cutoff frequenciesat 3500, 5000, 6300, and 8000 Hz were added. A 200 Hz HPfiltering condition was also added because the 500 Hz HPfiltering condition failed to yield a high score. Thus, a totalof 17 filtering conditions were used in the actual experiment.There were 11 LP filtering conditions with cutoff frequenciesset at 500, 650, 800, 1100, 1400, 1700, 3500, 5000, 6300,8000, and 12 000 Hz; and six HP conditions with cutoffs at200, 500, 800, 1100, 1400 and 1700 Hz.

J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007 Wong et al.: Cantonese speech intelligibility index 2355

Page 7: Development of Cantonese Speech Intelligibility Index

As there was no substantial difference in scores betweenthe −5 and −4 dB SNR conditions �i.e., 5% versus 8.3%�, the−5 dB condition was not used for the actual experiment. Be-cause the +7 dB SNR with LP filter cutoff at 1700 Hz con-dition yielded a score of only 80%, the +8 dB SNR conditionwas added in the actual experiment in an attempt to yieldbetter scores. In addition, a preliminary SII was estimatedusing the curve bisection procedure described above. Thissuggested that testing using 2 dB SNR steps was adequate ingenerating results for SII calculations except that the −1 dBSNR condition should be retained because it yielded ap-proximately 0.50 SII in the pilot study and would facilitatederivation of SII. Thus, in the actual study, eight SNR con-ditions at −4, −2, −1, 0, +2, +4, +6, and +8 dB wereadopted.

As speech stimuli in some of the filtering/SNR condi-tions �e.g., LP filtering cutoff at 1400 Hz or below at −4 dBSNR� were consistently unintelligible, these conditions wereexcluded from further testing. Together, 115 filtering/SNRconditions �see Table III�, instead of the 136 conditions�8 SNR�17 filtering conditions� used in the pilot study wereused in the actual study.

B. Results in various filtering/SNR conditions

The mean percent correct score in each filtering/SNRcondition is reported in Table III and Fig. 2. These resultssuggest an improvement in intelligibility as the cutoff fre-quency of LP filtering was increased to about 3500 Hz and

as the cutoff frequency of HP filtering was reduced to about800 Hz. The scores also covered a wide range of perfor-mance.

C. Reception threshold of sentences andperformance-intensity function

The mean RTS was −3.5 dB �s.d. 1.16�. The PI functionis shown in Fig. 3. Sentence intelligibility that ranged be-tween 29.4% and 74.4% corresponded to −2 dB and +2 dBSNR �with reference to individual RTSs�, respectively, in thefull band condition and grew at a rate of 11.1% per dB SNR.

FIG. 2. Mean percent speech intelligibility, plotted as a function of cutoff frequency at various SNRs. Results from high-pass filtering conditions arerepresented by lines with upper ends that start from the left side of the graph; those from low-pass filtering conditions start from the right side.

FIG. 3. PI function plotted as mean percent intelligibility at various SNRs�refer to individual RTSs�. The bars represent ±1 standard error from themean.

2356 J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007 Wong et al.: Cantonese speech intelligibility index

Page 8: Development of Cantonese Speech Intelligibility Index

Using this PI function, the SNR for 50% correct performanceis estimated at −0.3 dB, which is 3.2 dB above the meanRTS.

D. Crossover frequency

Linear regressions used to fit the data yielded crossoverfrequencies of 1069 Hz at −2 dB SNR, 1097 Hz at −1 dBSNR, 1110 Hz at 0 dB SNR, 1045 Hz at 2 dB SNR,1130 Hz at 4 dB SNR, 1025 Hz at 6 dB SNR, and 1050 Hzat 8 dB SNR. Crossover frequency at −4 dB SNR �with ref-erence to individual RTS� was not calculated because theintelligibility was very low across all filtering conditions andperformance was probably affected by floor effects. The geo-metric average of these crossover frequencies is 1075 Hz.Mean percent performance at +4 dB SNR �refer to individualRTSs� for LP and HP filtering conditions is presented inFig. 4.

E. Relative transfer function

In panel A of Fig. 1, the two LP and HP curves for the+8 dB SNR �refer to individual RTSs� condition are plotted.The intersection point �marked by a circle� between the twocurves corresponded to 0.5 AI. The corresponding percentcorrect intelligibility �58%� served as the starting point atwhich the next two LP and HP curves were plotted. As noneof the SNRs produced a 58% correct score, the next pair ofLP and HP curves was estimated by interpolating data be-tween the two curves �0 and +2 dB SNR refer to individualRTSs� that yielded scores closest to 58% in the unfilteredcondition, as shown in panel B. The point where these twocurves intersected was 0.25 SII. The value corresponding to0.25 SII was about 7%. The 0.75 SII point was estimated bydrawing a horizontal line through the 0.25 SII point until itintersected the LP and HP filtered curves in the best SNRcondition. Two vertical lines were then drawn across the in-tersections until one met the upper end of the HP filteredcurve, and the other met the upper end of the LP filteredcurve. The circles in panel C indicated the values used toderive 0.75 SII and these values were averaged. Ten other

points were estimated in a similar manner, yielding a total of13 SII values with corresponding percent correct intelligibil-ity as plotted in Fig. 5.

Equation �2� yielded the best fit SII relative transferfunction �RTF�, as compared to that of the other fit functionsevaluated when the proficiency factor P was assumed to be1.000. The fitting constants Q and N were found at 0.3638and 12.2491, respectively. R2 value of 0.9894 indicated thatthe model provided a good fit to the data. The RTF, plotted asa function of sentence recognition score against SII using theCHINT in the wideband condition, is also shown in Fig. 5.

F. Derivation of the frequency-importance function„FIF…

To derive the FIF, Eq. �2� was transformed to Eq. �3�.The adjusted Q value was 0.3647, the value of N was12.1488, and the R2 value was 0.9996. Again, P was as-sumed to be 1.000. Values for the FIF, in one-third octavebands, are summarized in Table IV and Fig. 6. The FIF ischaracterized by a peak at 1600 Hz which is the frequencyrange of greatest importance for CHINT sentence recogni-tion. Cumulative values of the CHINT FIF are plotted in Fig.7, together with those of similar materials in English. As theFIFs for ANSI S3.5-1997 and Pavlovic �1984� were derivedfrom the same data, the ANSI 3.5-1997 cumulative FIF isnot plotted in Fig. 7. Frequency regions below 557 Hz andabove 2331 Hz each accounted for 25% of importanceweight. The midpoint of the FIF is at 1183 Hz.

FIG. 4. Crossover frequency as the intersection between regression lines asa function of mean percent intelligibility at cutoff frequencies from 500 to1700 Hz. Results from +4 dB SNR �refer to individual RTSs� conditions areused.

FIG. 5. Best-fit relative transfer function �RTF� and the 13 intelligibilityscores �%� plotted as a function of SII values.

TABLE IV. Frequency-importance function in one-third octave bands. Theweights are expressed as percentages �%�.

1/3-Octaveband �Hz�

Centerfrequency �Hz�

Weight�%�

1/3-Octaveband �Hz�

Centerfrequency �Hz�

Weight�%�

0–180 160 5.1 1120–1400 1250 8.1180–224 200 2.2 1400–1800 1600 9.6224–280 250 2.7 1800–2240 2000 8.4280–355 315 3.6 2240–2800 2500 8.2355–450 400 4.3 2800–3550 3150 7.8450–560 500 4.8 3550–4500 4000 6.2560–710 630 6.1 4500–5600 5000 4.2710–900 800 7.0 5600–7100 6300 2.9

900–1120 1000 7.3 7100–9000 8000 1.5

J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007 Wong et al.: Cantonese speech intelligibility index 2357

Page 9: Development of Cantonese Speech Intelligibility Index

G. Derivation of the ATF

The slope of the RTF was adjusted to reflect its absolutevalue. The ATF is shown in Fig. 8. The iterative process ofvarying K and DR suggested that the smallest rms error wasobtained with K set at 11.8 dB and DR set at 30 dB. Thesevalues provided the best fit of the data to the ATF. The cor-responding Q and N values were 0.1894 and 12.1771 and theR2 value was 0.8926 for predicting SII from intelligibilityscores. The Q value was 0.1844 and the N value was12.5769, with the R2 at 0.9499 for predicting intelligibilityscores using SII values. These R2 values indicate that themodel still provided a good fit to the data. Applying Eq. �3�to the mean scores, the SII for all filtering/SNR test condi-tions were obtained. These values are plotted in Fig. 8.

IV. DISCUSSION

A. Reception threshold of sentences andperformance-intensity function

The mean RTS in noise found in this study is within the95% confidence interval for normal hearing listeners foundby Wong and Soli �2005�. The slope of the PI function foundin this study is also in agreement with the slope of 9.7% perdB found previously �Wong and Soli, 2005�. These findingssuggested that the CHINT is a consistent measure of speechintelligibility in noise. Although Studebaker et al. �1987� and

Sherbecoe and Studebaker �2002� suggested that a steeper PIfunction is expected when speech and noise spectra arematched, the slope of the PI function obtained in this study isnot as steep as might have been expected based on someearlier work that used talker spectrum matched maskers. Infact, the PI function is consistent with those reported for theCST by Sherbecoe and Studebaker and the English HINT byEisenberg et al. �1998�, and more gentle than those found byPlomp and Mimpen �1979�, Hagerman �1982�, and Stude-baker et al. �1987�. The CHINT materials were designed toyield a PI function slope of about 10% per dB so that theyare more suitable for the HINT adaptive procedure �Wongand Soli, 2005; Nilsson et al., 1994�. Any influence due toclarity of speech or spectral matching between speech andnoise would have been accounted for by this predeterminedcriterion of test development. Because test stimulus levelsare specified in the same way, we are able to compare the PIfunction of the CHINT and the English HINT and concludethat they yielded a similar PI function �Sherbecoe and Stude-baker, 2002�.

B. The CHINT transfer function

As suggested by Sherbecoe and Studebaker �2002�,comparing transfer functions �TFs� across studies is difficultbecause absolute TFs often are not reported. When absoluteTFs have been reported, testing might not have been con-ducted using noise matched to the speech spectrum to controlfor filtering effects of hearing thresholds. Furthermore, a pri-ori assumptions about the size of speech peaks have beenmade when relative TFs were converted to absolute TFs.Nonetheless, like the TFs of many English materials, TheCHINT TF shows a monotonic relationship between SII val-ues and speech recognition scores. The slope of the CHINTtransfer function and the Q and N values are similar to thosereported by Eisenberg et al. �1998� for the English HINT andSherbecoe and Studebaker �2002� for the CST �see Table V�.However, the N value for Cantonese HINT is much smallerthan the mean number of phonemes �26.3� in CHINT sen-tences, in contrast with the English HINT sentences with amean number of phonemes �16.8� per sentence matching theN value. It seems therefore, that Cantonese phonemes are notperceived as separate units but “chunks.” This speculation

FIG. 7. Comparison of cumulative FIFs derived from the CHINT and othersimilar materials.

FIG. 8. Best-fit absolute transfer function �ATF� and actual mean scores �%�plotted as a function of SII.

FIG. 6. Frequency-importance function �FIF� of the CHINT.

2358 J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007 Wong et al.: Cantonese speech intelligibility index

Page 10: Development of Cantonese Speech Intelligibility Index

requires further research to verify. However, an example mayhelp illustrate this phenomenon.

The CHINT sentence, / tai6 kÅ1 siG4 j0 t6 h0 i2 k*G1 si1

kÅG2 tin6 wa2/, means “my big brother is on the phone allday long at work.” The Chinese word /kÅ1/ means “brother”and would limit the word before it to those related to order ofbirth. The word / j 0 t6/ means “day” and would limit theword before it to mean the day before or after, or all day. Theword /h 0 i2/ means “in” and refers specifically to a physicallocation. When followed by the character /k *G1/ �work�, thenext word must be /s i1/ which together with /h 0 i2/ and/k *G1/ mean at one’s workplace. The words /k ÅG2/ and/wa2/ both mean speaking and when spoken in a sequence,the only words that can fit between are / t in6/ �electric�,/ t a i6/ �big�, or / s iu3/ �laugh�. The three monosyllables to-gether mean talking on the phone, lying or joking. Therefore,it seems that individual Chinese speech sounds are not inde-pendent of each other and perhaps is related to the fact thatChinese polysyllabic words are made up of semanticallymeaningful monosyllabic parts. Chinese polysyllabic wordsseemed to have greater semantic and syntactic constraintsthan their English counterparts. This redundancy has madeshorter Chinese sentences inappropriate for adaptive testing.Adverbial phrases were added to shorter sentences to derivethe CHINT sentences to make them less redundant �Wongand Soli, 2005�.

An SII value of 0.5 or higher would yield close to maxi-mum intelligibility using CHINT sentences �97.6%�. Thiswould be consistent with findings of other materials �e.g., theCST� that are more redundant in content than single words�e.g., NU-6�. At the same SII, 89.3% intelligibility is ex-pected with the English HINT. As greater constraint onspeech material �e.g., grammatical structure and context� andgreater redundancy would yield higher percent intelligibilityfor a given AI �ANSI-S3.5 1969, p. 21; Studebaker et al.,1987�, we can conclude that the Cantonese materials aremore redundant than the English HINT and materials thatemploy single-word stimuli such as the NU-6 �Studebakeret al., 1993�.

In summary, the CHINT sentences have fewer indepen-dent sounds than would be suggested by the number of pho-

nemes in the sentences. The CHINT material is more redun-dant in context than similar materials such as the EnglishHINT or single-word materials.

C. Crossover frequency and frequency-importancefunction

As the crossover frequency decreases, the relative im-portance of low-frequency information increases. Since thecrossover frequency is lower for Cantonese than for all En-glish speech materials �see Table I�, we conclude that lowfrequencies in Cantonese contain more speech informationthan in English. Results from the Cantonese HINT FIF �Fig.7� also show that when compared to similar English materi-als, the 1/3 octave band centered at 180 Hz carries moreweight for speech understanding. As a result, the whole FIFis shifted down in frequency; 75% of CHINT information islocated below 2331 Hz. Figure 7 also shows that the shapeof CHINT cumulative FIF resembles those of average speechderived by Pavlovic �1987� and the ANSI S3.5-1997, withthe exception that frequencies below 400 Hz are slightlymore heavily weighted and frequencies above 4000 Hz ex-hibit reduced importance when compared to equivalent En-glish materials.

Several reasons might contribute to differences in CFand FIF across materials. First, redundancy of materials maybe a factor �Studebaker et al., 1987; Studebaker and Sherbe-coe, 1991�. As discussed, CHINT appears to carry much re-dundant information. In fact, the crossover frequency of theCHINT material resembles that reported for continuous dis-course by Studebaker et al. �1987� �at 1189 Hz�. This con-trasts with those reported for the W-22, with a crossoverfrequency at 1314 Hz �Studebaker and Sherbecoe, 1991�, theEnglish HINT, with crossover frequency at 1550 Hz �Eisen-berg et al., 1998�, and nonsense syllables, with crossoverfrequency at 1980 Hz �French and Steinberg, 1947�. Second,the shape of the FIF may differ depending on the bandwidthof the filter used to derive the function �DePaolis, 1996�.

Third, the rate, clarity, and peak spectrum of the speechmaterials may have an effect on the FIF, so that for a givenmaterial, different talkers may yield different FIFs �Sherbe-

TABLE V. Comparison of transfer functions �TFs� and frequency-importance functions �FIFs� for various speech materials.

Authors Material Q N TF slopea FIF �shape, peaks�

Current study CHINT sentences 0.1844 12.58 11.0 bimodal, below 200 Hz andaround 800–1600 Hz

DePaolis et al. �1996� PB-50 words 0.641 2.436 �4.0 unimodal, around 2000 HzSPIN sentences 0.329 4.481 �8.0 unimodal, around 2000 Hz

Continuous discourse 0.353 8.943 �7.0 unimodal, around 2000 HzEisenberg et al. �1998� HINT sentences 0.235 15.13 �10.0 unimodal, 2000 Hz using average

ANSI S3.5 standard 0.247 16.90 �10.0Henry et al. �1998� CNC monosyllables 0.474 2.518 — unimodal, 2000 HzSherbecoe and Studebaker �2002� CST passages 0.227 10.26 10.6 bimodal, 500 and 1600 HzStudebaker and Sherbecoe �1991� CID W-22 words 0.283 4.057 10.2 bimodal, 400 and 2000 HzStudebaker et al. �1993� NU-6 words 0.404 3.334 6.4 bimodal, 500 and 2000 HzStudebaker et al. �1987� Continuous discourse — — 18.7 bimodal, 500 and 2500 Hz

aTF slopes are in percent per 0.0333 SII and are based on observed or estimated scores between 20 and 80%. Numbers are either reported in the relevantstudies or estimated by the authors according to reported TFs �� denotes approximation�. TF slope data are not available in Henry et al. �1998�.

J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007 Wong et al.: Cantonese speech intelligibility index 2359

Page 11: Development of Cantonese Speech Intelligibility Index

coe and Studebaker, 2002�. This, however, is unlikely tohave affected the shape of the CHINT FIF because spectrallymatched noise was used �Studebaker et al., 1994�. The cross-over frequency obtained in this study was slightly lower thanthe midpoint of the FIF �1183 Hz� and the crossover fre-quencies did not vary systematically with SNR �Studebakeret al., 1993; Sherbecoe and Studebaker, 2002�. Thus, thecontribution of talker characteristics was small. The shift inimportance weight toward lower frequency is probably dueto a fourth factor—the tonal nature of Cantonese. Findingsfrom research on tone recognition support this phenomenon�e.g., Fok Chan, 1974�.

1. The role of fundamental frequency on Cantonesespeech perception

Fundamental frequency �F0� contains information onpitch level and contour. F0 ranges from 80 to 210 Hz formales and 190 to 305 Hz for females �Baken, 1987; Evans etal., 2006�. F0 plays a crucial role in identifying the meaningof Cantonese words with identical phonemes �Fok Chan,1974; Gandour, 1981, 1983; Lee et al., 2002�. While somestudies found pitch contour and direction are more importantthan height �Fok Chan, 1974; Gandour, 1981; Cheung, 1992;Whalen and Xu, 1992�, others found height a more importantfactor �Vance, 1976; Tse, 1977; Gandour, 1983; Lui, 2000�.The CHINT FIF showed that, while the 1/3 octave bandbetween 180 to 224 Hz contributed only minimally to intel-ligibility, frequencies below 180 Hz, where the fundamentalfrequency of male speakers lies �the CHINT was recordedusing a male voice�, seemed more importantly weighted. Ng�1981� also found that good Cantonese word discriminationcan be achieved even when the signals have been LP filteredat 250 Hz. The contribution of tonal information is exempli-fied in the ability to acquire correct tone production by chil-dren with moderate to profound hearing loss and Dodd andSo �1994� attributed this phenomenon to better hearing atlow frequency.

2. Findings from other tonal language literature

Research on Mandarin, another Chinese dialect, alsosuggested that low frequencies play an important role inspeech and tone recognition. Tone recognition could be pre-served at a high level �94.6% correct�, even with speech LPfiltered at 300 Hz �Liang, 1963�. Similarly, Fu et al. �1998�found that tone recognition of LP filtered Mandarin �at500 Hz� was preserved. In another study, about 80% of Man-darin tones were correctly identified when speech was LPfiltered at 750 Hz �Zhang et al., 1981�. However, the cues fortone recognition in Mandarin and Cantonese, however, areslightly different. The primary cues for Cantonese tones arepitch contour and level �Fok Chan, 1974�. While fundamen-tal frequency is the most important cue for tone recognitionin both dialects, temporal �e.g., duration� and amplitude en-velopes cue Mandarin sentence recognition when spectral in-formation is absent, these cues are less crucial when morespectral information is available �Lin, 1988; Fu et al., 1998;Whalen and Xu, 1992; Wei et al., 2004�. Similarly, Fu andZeng �2000� found that tone duration and amplitude contours

help in the identification of tone 3 in Mandarin, amplitudecues contribute to the discrimination of tone 4, and period-icity cues aid recognition of all five tones. When fundamen-tal frequencies are absent, resolved and unresolved harmon-ics contribute to tone recognition �Stagray et al., 1992�.These results suggest that low-frequency information is im-portant for tone recognition which, in turn, aids sentencerecognition.

Overall, findings from this study suggested that low-frequency information is more important for speech under-standing for Cantonese than for English. These results areconsistent with findings in tone recognition experiments�e.g., Fok Chan, 1974�.

V. SUMMARY AND CONCLUSION

To summarize, a SII for the CHINT material was estab-lished in this study. While the Q and N values were similar tothose of English sentence materials, the N value was smallerthan the average number of phonemes in each sentence. Theslope of the ATF, the N value, the crossover frequency andthe FIF of the CHINT suggest that low frequencies are moreimportant for Cantonese speech recognition than English.Whether the redundancy of the CHINT material and/or thetonal nature of the language has affected this result remainsuncertain. One way to separate these effects is to repeat theexperiment using female recordings �with higher fundamen-tal frequency�. If similar results are obtained, the shift inimportance weight at low frequency is probably related tothe redundancy in the speech materials. These results alsosuggest that it is important to establish separate FIF and SIIfor various languages. The FIF obtained in this study mayhave important implications on how hearing aids and/or co-chlear implants should be fitted to Cantonese speakers. Theroles of low- or high-frequency information on speech intel-ligibility assessed using other Cantonese speech materials,and using materials in other tonal languages, need to be es-tablished.

ACKNOWLEDGMENT

The authors are grateful to Carol Cheung, Kammy Ye-ung, and Benny Zee for their assistance in data collectionand analysis. Our gratitude also goes to all participants in thestudy, as well as to Phonak Hearing Center Hong Kong Ltd.and the University of Hong Kong Standard Chartered Com-munity Foundation Hearing Center for their assistance inparticipant recruitment. This study was supported by a Re-search Grants Council CERG grant �HKU 7165/01H�, HongKong, China.

Amlani, A. M., Punch, J. L., and Ching, T. Y. C. �2002�. “Methods andapplications of the audibility index in hearing aid selection and fitting,”Trends Amplif. 6, 81–129.

ANSI �1969�. S3.5, American National Standard Methods for the Calcula-tion of the Articulation Index �Acoustical Society of America, New York�.

ANSI �1997�. S3.5, American National Standard Methods for Calculation ofthe Speech Intelligibility Index �Acoustical Society of America, NewYork�.

Baken, R. J. �1987�. Clinical measurement of speech and voice �Taylor andFrances, London�.

Browning, L. K. �1974�. “The Cantonese dialect with special reference tocontrasts with Mandarin as an approach to determining dialect related-ness,” Ph.D dissertation, Georgetown University.

2360 J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007 Wong et al.: Cantonese speech intelligibility index

Page 12: Development of Cantonese Speech Intelligibility Index

Byrne, D., Dillon, H., Tran, K., Arlinger, S., Wibraham, K., Cox, R., Hager-man, B., Hetu, R., Kei, J., Lui, C., Kiessling, J. Kotby, M. N., Nasser, N.H. A., El Kholy, W. A. H., Nakanishi, Y., Oyer, H., Powell, R., Stephens,D., Meredith, R., Sirimanna, T., Tavartkiladze, G., Fronlenkov, G. I., Wes-terman, S., and Ludvigsen, C. �1994�. “An international comparison oflong-term average speech spectra,” J. Acoust. Soc. Am. 96, 2108–2120.

Cheung, P. P. �1992�. “Tonal confusions in Cantonese at different signal-to-noise ratios,” B.Sc. dissertation, University of Hong Kong.

Ciocca, V., Francis, A. L., Aisha, R., and Wong, L. �2002�. “The perceptionof Cantonese lexical tones by early-deafened cochlear implantees,” J.Acoust. Soc. Am. 111, 2250–2256.

DePaolis, R. A., Janota, C. P., and Frank, T. �1996�. “Frequency importancefunctions for words, sentences, and continuous discourse,” J. Speech Hear.Res. 39, 714–723.

Dodd, B. J., and So, L. K. H. �1994�. “The phonological abilities ofCantonese-speaking children with hearing loss,” J. Speech Hear. Res. 37,671–779.

Doyle, J., and Wong, L. L. �1996�. “Mismatch between aspects of hearingimpairment and hearing disability/handicap in adult/elderly Cantonesespeakers: some hypotheses concerning cultural and linguistic influences,”J. Am. Acad. Audiol 7, 442–446.

Doyle, J., Schaefer, C., Dacakis, G., and Wong, L. L. N. �2002�. “Hearinglevels and hearing handicap in Cantonese speaking Australian,” Asia-Pac.J. Speech Lang. Hear. 7, 92–100.

Eisenberg, L. S., Dirks, D. D., Takayanagi, S., and Martinez, A. S. �1998�.“Subjective judgments of clarity and intelligibility for filtered stimuli withequivalent speech intelligibility index predictions,” J. Speech Lang. Hear.Res. 41, 327–339.

Evans, S., Neave, N., and Wakelin, D. �2006�. “Relationships between vocalcharacteristics and body size and shape in human males: An evolutionaryexplanation for a deep male voice,” Biol. Psychol. 72�2�, 160–163.

Fletcher, H., and Galt, R. H. �1950�. “The perception of speech and itsrelation to telephony,” J. Acoust. Soc. Am. 22, 89–151.

Fok Chan, Y. Y. �1974�. A Perceptual Study of Tones in Cantonese �Univer-sity of Hong Kong, Hong Kong�.

French, N. R., and Steinberg, J. C. �1947�. “Factors governing the intelligi-bility of speech sounds,” J. Acoust. Soc. Am. 19, 90–119.

Fu, Q. J., and Zeng, F. G. �2000�. “Identification of temporal envelope cuesin Chinese tone recognition,” Asia Pacific J. Speech Lang. Hear. 5, 45–57.

Fu, Q. J., Zeng, F. G., Shannon, R. V., and Soli, S. D. �1998�. “Importanceof tonal envelope cues in Chinese speech recognition,” J. Acoust. Soc.Am. 104, 505–510.

Gandour, J. �1981�. “Perceptual dimensions of tones: evidence in Can-tonese,” J. Chin. Linguist. 9, 20–36.

Gandour, J. �1983�. “Tone perception in Far Eastern languages,” J. Phonet-ics 11, 149–175.

Hagerman, B. �1982�. “Sentences for testing speech intelligibility in noise,”Scand. Audiol. 11, 79–87.

Henry, B. A., McDermott, H. J., McKay, C. M., James, C. J., and Clark, G.M. �1998�. “A frequency importance function for a new monosyllabicword test,” Aust. J. Audiol. 20, 79–86.

Kamm, C. A., Dirks, D. D., and Bell, T. S. �1985�. “Speech recognition andthe articulation index for normal and hearing-impaired listeners,” J.Acoust. Soc. Am. 77, 281–288.

Killion, M. C., and Christensen, L. A. �1998�. “The case of the missing dots:AI and SNR loss,” Hear. J. 51, 32–47.

Lau, C. C., and So, K. W. �1988�. “Material for Cantonese speech audiom-etry constructed by appropriate phonetic principles,” Br. J. Audiol. 22,297–304.

Lee, K. Y. S., Chiu, S. N., and van Hasselt, C. A. �2002�. “Tone perceptionability of Cantonese-speaking children,” Lang Speech 45, 387–406.

Li, R. �1989�. “The classification of the Chinese dialects,” FangYan. 4,241–259.

Liang, Z. A. �1963�. “The auditory perception of Mandarin tones,” Acta.Physiol. Sincia. 26, 85–91.

Lin, M. C. �1988�. “The acoustic characteristics and perceptual cues of tonesin standard Chinese,” Chin. Ling. 204, 182–193.

Lui, J. �2000�. “Cantonese tones perception in children,” Unpublished B.Sc.dissertation, University of Hong Kong.

Macrae, J. H., and Brigden, D. N. �1973�. “Auditory threshold impairmentand everyday speech reception,” Audiology 12, 272–290.

Matthews, S., and Yip, V. �1994�. Cantonese: A Comprehensive Grammar�Routledge, London�.

Mueller, H. G., and Killion, M. C. �1990�. “An easy method for calculatingthe articulation index,” Hear. J. 43, 14–17.

Ng, Y. H. �1981�. “The effects of filtering on the intelligibility of Can-tonese,” M.Ed. dissertation, University of Manchester.

Nilsson, M., Soli, S. D., and Sullivan, J. A. �1994�. “Development of theHearing In Noise Test for the measurement of speech reception thresholdsin quiet and in noise,” J. Acoust. Soc. Am. 95, 1085–1099.

Pavlovic, C. V. �1984�. “Use of the articulation index for assessing residualauditory function in listeners with sensorineural hearing impairment,” J.Acoust. Soc. Am. 75, 1253–1258.

Pavlovic, C. V. �1987�. “Derivation of primary parameters and proceduresfor use in speech intelligibility predictions,” J. Acoust. Soc. Am. 82, 413–422.

Plomp, R., and Mimpen, A. M. �1979�. “Improving the reliability of testingthe speech reception threshold for sentences,” Audiology 18, 43–52.

Ramsey, S. R. �1987�. The Languages of China �Princeton University Press,Princeton�.

Rankovic, C. M. �1991�. “An application of the articulation index to hearingaid fitting,” J. Speech Hear. Res. 34, 391–402.

Rankovic, C. M. �1995�. “Prediction of articulation scores,” J. Acoust. Soc.Am. 97, 3358.

Sherbecoe, R. L., and Studebaker, G. A. �2002�. “Audibility-index functionsfor the Connected Speech Test,” Ear Hear. 23, 385–398.

So, L. K. H., and Dodd, B. J. �1995�. “The acquisition of phonology byCantonese-speaking children,” J. Child Lang 22, 473–493.

Stagray, J. R., Downs, D., and Sommers, R. K. �1992�. “Contributions of thefundamental, resolved harmonics, and unresolved harmonics in tone-phoneme identification,” J. Speech Hear. Res. 35, 1406–1409.

Stelmachowicz, P., Lewis, D., and Creutz, T. �2002�. Situational Hearing-Aid Response Profile (SHARP, version 6.0) User’s Manual �Boys TownNational Research Hospital, Omaha�.

Studebaker, G. A., and Sherbecoe, R. L. �1991�. “Frequency-importance andtransfer functions for recorded CID W-22 word lists,” J. Speech Hear. Res.34, 427–438.

Studebaker, G. A., and Sherbecoe, R. L. �1993�. “Frequency-importancefunctions for speech recognition,” in Acoustical factors affecting hearingaid performance, edited by G. A. Studebaker and I. Hochberg �Allyn andBacon, Boston�, pp. 185–204.

Studebaker, G. A., Pavlovic, C. V., and Sherbecoe, R. L. �1987�. “A fre-quency importance function for continuous discourse,” J. Acoust. Soc.Am. 81, 1130–1138.

Studebaker, G. A., Sherbecoe, R. L., and Gilmore, C. �1993�. “Frequency-importance and transfer functions for the Auditec of St. Louis recordingsof the NU-6 word test,” J. Speech Hear. Res. 36, 799–807.

Studebaker, G. A., Sherbecoe, R. L., McDaniel, D. M., and Gwaltney, C. A.�1999�. “Monosyllabic word recognition at higher-than-normal speech andnoise levels,” J. Acoust. Soc. Am. 105, 2431–2444.

Studebaker, G. A., Taylor, R., and Sherbecoe, R. L. �1994�. “The effect ofnoise spectrum on speech recognition performance-intensity functions,” J.Speech Hear. Res. 37, 439–448.

Tillman, T. W., and Carhart, R. �1966�. An expanded test for speech dis-crimination utilizing CNC monosyllabic words: Northwestern Universityauditory test no. 6. Technical report no. SAM-TR-66-55. San Antonio, TX:USAF School of Aerospace Medicine, Brooks Air Force Base.

Tse, J. K. P. �1977�. “Tone acquisition in Cantonese: a longitudinal casestudy,” J. Child Lang 5, 191–204.

Vance, T. J. �1976�. “An experimental investigation of tone and intonation inCantonese,” Phonetica 33, 368–392.

Wei, C. G., Cao, K., and Zeng, F. G. �2004�. “Mandarin tone recognition incochlear-implant subjects,” Hear. Res. 197, 87–95.

Whalen, D. H., and Xu, Y. �1992�. “Information for mandarin tones in theamplitude contour and in brief segments,” Phonetica 49, 25–47.

Wong, L., Hickson, L., and McPherson, B. �2004�. “Hearing aid expecta-tions among Chinese first-time users: Relationships to post-fitting satisfac-tion,” Aust. New Zeal. J. Audiol. 26, 53–69.

Wong, L. L. N., and Soli, S. D. �2005�. “Development of the CantoneseHearing in Noise Test �CHINT�,” Ear Hear. 26�3�, 276–289.

Wong, A. O., and Wong, L. L. �2004�. “Tone perception of Cantonese-speaking prelingually hearing-impaired children with cochlear implants,”Otolaryngol.-Head Neck Surg. 130, 751–758.

Zhang, J. L., Qi, S. Q., Song, M. Z., and Liu, Q. X. �1981�. “On the impor-tant role of Chinese tones in speech intelligibility,” Acta Acust. �Beijing�4, 237–24.

J. Acoust. Soc. Am., Vol. 121, No. 4, April 2007 Wong et al.: Cantonese speech intelligibility index 2361

Page 13: Development of Cantonese Speech Intelligibility Index