13
Automatic and controlled processing of acoustic and phonetic contrasts Elyse Sussman a; , Teija Kujala b;c , Jaana Halmetoja c , Heikki Lyytinen d , Paavo Alku e , Risto Na «a «ta «nen c;f a Department of Neuroscience and Department of Otolaryngology, Albert Einstein College of Medicine, 1410 Pelham Parkway S., Bronx, NY, USA b Helsinki Collegium for Advanced Studies, University of Helsinki, Helsinki, Finland c Cognitive Brain Research Unit, P.O. Box 9, 00014 University of Helsinki, Finland d Department of Psychology, University of Jyva «skyla «, AGORA building, PL 35, 40351 Jyva «skyla «, Finland e Acoustics Laboratory, Helsinki University of Technology, P.O. Box 3000, FIN-02015 Helsinki, Finland f Helsinki Brain Research Centre, P.O. Box 9, 00014 University of Helsinki, Finland Received 4 October 2003; accepted 16 December 2003 Abstract Changes in the temporal properties of the speech signal provide important cues for phoneme identification. An impairment or inability to detect such changes may adversely affect one’s ability to understand spoken speech. The difference in meaning between the Finnish words tuli (fire) and tuuli (wind), for example, lies in the difference between the duration of the vowel /u/. Detecting changes in the temporal properties of the speech signal, therefore, is critical for distinguishing between phonemes and identifying words. In the current study, we tested whether detection of changes in speech sounds, in native Finnish speakers, would vary as a function of the position within the word that the informational changes occurred (beginning, middle, or end) by evaluating how length contrasts in segments of three-syllable Finnish pseudo-words and their acoustic correlates were discriminated. We recorded a combination of cortical components of event-related brain potentials (MMN, N2b, P3b) along with behavioral measures of the perception of the same sounds. It was found that speech sounds were not processed differently than non-speech sounds in the early stages of auditory processing indexed by MMN. Differences occurred only in later stages associated with controlled processes. The effects of position and attention on speech and non-speech stimuli are discussed. ß 2004 Elsevier B.V. All rights reserved. Key words: Speech processing; Event-related potentials; Mismatch negativity; N2b; P3b 1. Introduction Changes in the temporal properties of the speech sig- nal provide important cues for phoneme recognition (Lyytinen et al., 2003, 2004; Shannon et al., 1995; Tal- lal et al., 1993). In English, temporal cues signal di¡er- ences in stop consonants. Contrasts are distinguished by di¡erences in the voice onset time (e.g., /g/ vs. /k, /d/ vs. /t/, /b/ vs. /p/), which signal changes in word meaning. Thus, accurate detection of temporal cues would be needed to distinguish between den and ten or between bag and back. By and large, this distinction between voiced and unvoiced stop consonants does not specify semantic di¡erences in the Finnish language; temporal oppositions that cue semantic changes in Fin- nish exist in the length of phonemes. Many Finnish vowels and consonants have length contrasts that can signal meaning changes occurring in the beginning 1 , middle, or endings of words. For example, kuka means ‘who’ and kukka means ‘£ower’ (only the length of the consonant /k/ di¡ers); kurainen means ‘muddy’ and kuurainen means ‘frosty’ (only the length of the vowel 0378-5955 / 04 / $ ^ see front matter ß 2004 Elsevier B.V. All rights reserved. doi :10.1016/S0378-5955(04)00016-4 * Corresponding author. Tel.: +1 (718) 430-3313; Fax: +1 (718) 430-8821. E-mail address: [email protected] (E. Sussman). 1 Note that variations of vowel duration (short and long) can occur at any position within a Finnish word (beginning, middle, or end of the word). Consonants, however, cannot be long (written with a re- peated letter) in the beginning or end position of Finnish words. Vowel contrasts were used in the current study. Hearing Research 190 (2004) 128^140 R Available online at www.sciencedirect.com www.elsevier.com/locate/heares

Automatic and controlled processing of acoustic and phonetic contrasts

Embed Size (px)

Citation preview

Automatic and controlled processing of acoustic and phonetic contrasts

Elyse Sussman a;�, Teija Kujala b;c, Jaana Halmetoja c, Heikki Lyytinen d, Paavo Alku e,Risto Na«a«ta«nen c;f

a Department of Neuroscience and Department of Otolaryngology, Albert Einstein College of Medicine, 1410 Pelham Parkway S., Bronx, NY, USAb Helsinki Collegium for Advanced Studies, University of Helsinki, Helsinki, Finlandc Cognitive Brain Research Unit, P.O. Box 9, 00014 University of Helsinki, Finland

d Department of Psychology, University of Jyva«skyla«, AGORA building, PL 35, 40351 Jyva«skyla«, Finlande Acoustics Laboratory, Helsinki University of Technology, P.O. Box 3000, FIN-02015 Helsinki, Finland

f Helsinki Brain Research Centre, P.O. Box 9, 00014 University of Helsinki, Finland

Received 4 October 2003; accepted 16 December 2003

Abstract

Changes in the temporal properties of the speech signal provide important cues for phoneme identification. An impairment orinability to detect such changes may adversely affect one’s ability to understand spoken speech. The difference in meaning betweenthe Finnish words tuli (fire) and tuuli (wind), for example, lies in the difference between the duration of the vowel /u/. Detectingchanges in the temporal properties of the speech signal, therefore, is critical for distinguishing between phonemes and identifyingwords. In the current study, we tested whether detection of changes in speech sounds, in native Finnish speakers, would vary as afunction of the position within the word that the informational changes occurred (beginning, middle, or end) by evaluating howlength contrasts in segments of three-syllable Finnish pseudo-words and their acoustic correlates were discriminated. We recordeda combination of cortical components of event-related brain potentials (MMN, N2b, P3b) along with behavioral measures of theperception of the same sounds. It was found that speech sounds were not processed differently than non-speech sounds in the earlystages of auditory processing indexed by MMN. Differences occurred only in later stages associated with controlled processes. Theeffects of position and attention on speech and non-speech stimuli are discussed.: 2004 Elsevier B.V. All rights reserved.

Key words: Speech processing; Event-related potentials ; Mismatch negativity; N2b; P3b

1. Introduction

Changes in the temporal properties of the speech sig-nal provide important cues for phoneme recognition(Lyytinen et al., 2003, 2004; Shannon et al., 1995; Tal-lal et al., 1993). In English, temporal cues signal di¡er-ences in stop consonants. Contrasts are distinguishedby di¡erences in the voice onset time (e.g., /g/ vs. /k,/d/ vs. /t/, /b/ vs. /p/), which signal changes in wordmeaning. Thus, accurate detection of temporal cueswould be needed to distinguish between den and tenor between bag and back. By and large, this distinction

between voiced and unvoiced stop consonants does notspecify semantic di¡erences in the Finnish language;temporal oppositions that cue semantic changes in Fin-nish exist in the length of phonemes. Many Finnishvowels and consonants have length contrasts that cansignal meaning changes occurring in the beginning1,middle, or endings of words. For example, kuka means‘who’ and kukka means ‘£ower’ (only the length of theconsonant /k/ di¡ers) ; kurainen means ‘muddy’ andkuurainen means ‘frosty’ (only the length of the vowel

0378-5955 / 04 / $ ^ see front matter : 2004 Elsevier B.V. All rights reserved.doi:10.1016/S0378-5955(04)00016-4

* Corresponding author. Tel. : +1 (718) 430-3313;Fax: +1 (718) 430-8821.E-mail address: [email protected] (E. Sussman).

1 Note that variations of vowel duration (short and long) can occurat any position within a Finnish word (beginning, middle, or end ofthe word). Consonants, however, cannot be long (written with a re-peated letter) in the beginning or end position of Finnish words.Vowel contrasts were used in the current study.

HEARES 4826 16-3-04

Hearing Research 190 (2004) 128^140

R

Available online at www.sciencedirect.com

www.elsevier.com/locate/heares

/u/ di¡ers). Detecting changes in the temporal proper-ties of the speech signal, therefore, is a crucial abilityfor distinguishing between phonemes and identifyingwords. Accordingly, an impairment or inability to de-tect temporal changes in the speech signal may ad-versely a¡ect one’s ability to understand or spell words.

In many studies, the detection of phonemic contrastshas been assessed in isolation (e.g., /ba/ vs. /da/ or /s/ vs./sh/), whereas in natural listening environments it wouldbe infrequent that these minimal di¡erence pairs wouldbe presented in isolation. The cues used to process min-imal contrasts can change depending on the context ofthe surrounding phonemes, in that, in the naturalspeech environment, the phonemic information con-tained within a word includes cues associated with co-articulation that would be missing if the phonemes werepresented in isolation.

Additionally, the position of the phonetic cues withinthe word has been hypothesized to be critical to learn-ing to read (Laing and Hulme, 1999; Rack and Hulme,1993). Rack and Hulme (1993) taught children to asso-ciate a string of letters with a spoken word. They foundthat subjects learned the letter string association muchmore easily when the middle letter cue was altered com-pared to when the beginning letter cue was altered. Forexample, it was easier to associate ‘bzkt’ with /biscuit/than ‘pskt’ with /biscuit/. The position of the inaccuratephoneme information had a more profound e¡ect onword identi¢cation when it occurred in the beginningposition of the word than in the middle position. Thissuggests that the onset of words, the beginning pho-nemes, carry stronger cues to word identi¢cation thanlater positions within words.

In the present study, we tested the hypothesis that thebrain’s response, for detection (automatic processing)and identi¢cation (controlled processing) of phoneticchanges in pseudo-words2, would vary as a functionof the position within the word (beginning, middle, orend) the change occurred, possibly providing a link be-tween neural processes and behavioral performance.Furthermore, since there remains considerable debateover whether speech perception is handled by di¡erentneural mechanisms than non-speech perception (Ahis-sar et al., 2000; Lubert, 1981; McAnally and Stein,1996; Mody et al., 1997; Na«a«ta«nen et al., 1997; Nit-trouer, 1999; Reed, 1989; Schulte-Korne et al., 1999;Tallal, 1980), we additionally compared processing ofpseudo-words with their acoustic correlates, using com-plex sounds that contained the same spectral informa-

tion as the speech sounds (e.g., frequency and intensitycomponents) and were non-speech. The goal was toassess, via electrophysiological and behavioral mea-sures, whether or not any attentional or position e¡ectsoccurred speci¢cally as a function of speech perception.

Thus, in the current study, we assessed how lengthcontrasts in segments of three-syllable pseudo-wordsand their acoustic correlates were pre-attentively andattentively discriminated using a combination of cor-tical components of event-related brain potentials(ERPs; such as the mismatch negativity [MMN], N2b,P3b) and behavioral measures to index both automaticand controlled processing associated with change detec-tion within words. ERPs, which provide high temporalresolution (in the order of milliseconds), have been usedextensively to study the progression of neural responsesassociated with sound change detection.

An infrequent auditory stimulus (called the ‘deviant’)occurring within a repetitive sequence of a frequentsound (called the ‘standard’) elicits the MMN compo-nent of ERPs (for recent reviews, see Na«a«ta«nen andWinkler, 1999; Picton et al., 2000). Simple deviationsin stimulus features, such as frequency, intensity, dura-tion or spatial location, as well as deviations in complexsounds (for review see Alho, 1995) and speech (for re-view see Kraus and Cheour, 2000) elicit the MMN. TheMMN can be elicited whether or not attention is fo-cused on the sound stimulation, allowing the compar-ison of automatic and controlled processing of the samestimuli (e.g., Sussman et al., 1998, 2002).

The MMN response, with the main generators inauditory cortex (e.g., Giard et al., 1990; Scherg et al.,1989), may be followed by frontal activation (Na«a«ta«nenand Michie, 1997; Opitz et al., 2002; Rinne et al.,2000), possibly providing a link to the generators ofthe subsequent sequence of brain responses associatedwith controlled cognitive processes. The location of thegenerators in auditory cortex accounts for the observedscalp topography of the waveform, which is maximallynegative over the fronto-central scalp locations and in-verts in polarity below the Sylvian ¢ssure (when thenose is used as the reference). The component typicallypeaks between 100 and 200 ms from the onset of stim-ulus deviance. The latency and amplitude of the MMNre£ect the amount of change detected by the brain,being shorter and larger (respectively) the easier thedetection of the infrequent auditory deviance (Tiitinenet al., 1994).

The ERP components associated with focused atten-tion (N2b and P3b) are often elicited together in re-sponse to target detection. The N2b response indexesattentional deviance detection and is evoked when aninfrequent stimulus is attentively detected as being dif-ferent from the frequently repeating stimuli (Na«a«ta«nenet al., 1982; Novak et al., 1992a; Renault and Leserve,

2 Pseudo-words rather than meaningful words were used in the cur-rent study to investigate the acoustic^phonetic aspects of speech per-ception without e¡ects that may additionally occur when processingthe meaning of words. The goal was to compare responses betweentonal patterns and speech patterns without added semantic processes.

HEARES 4826 16-3-04

E. Sussman et al. /Hearing Research 190 (2004) 128^140 129

1979). The N2b ERP component is a negative-goingwaveform that peaks at about 200^300 ms from stim-ulus onset and follows the MMN in time. It can befurther distinguished from MMN by its scalp topogra-phy (showing its maximum in centro-parietal sites withno polarity reversal at the mastoid sites ; Alho et al.,1986). The P3b component (Sutton et al., 1965) is alsoelicited by infrequent stimuli. However, unlike MMN,which can be elicited whether or not the sounds are inthe focus of the subject’s attention, the sounds must bein the focus of attention for deviants to elicit P3b. TheP3b was suggested as being associated with context up-dating (Donchin and Coles, 1988) and usually has amore posterior scalp distribution than N2b.

The purposes of the present study were to (1) com-pare automatic and controlled processes associated withprocessing length contrasts of vowels in isolated pseu-do-words; (2) assess whether discrimination of lengthcontrasts di¡ers as a function of their position withinthe pseudo-word (beginning, middle, or end); and (3)test whether detection of the contrasts di¡ers for speechvs. non-speech stimuli.

2. Methods

2.1. Participants

Ten adults with no known hearing or neurologicaldisorders were paid for their participation in the experi-ment. Participants were students at the University ofHelsinki aged 18^32 years (mean age 24 years ; fourmales), where the study was conducted. Informed con-sent was obtained after the procedures were explainedto them. Human subjects treatment was conducted inaccordance with the human subjects review board ofthe University of Helsinki.

2.2. Stimuli

The stimuli were pseudo-words and their non-pho-netic counterparts, represented by a composite of sinus-oids that matched the main spectral features of thepseudo-words. The pseudo-words were synthesized byconcatenating a waveform of the plosive /t/ cut from aword produced by a real speaker to the waveform ofthe vowel /a/ generated by the SSG (semisyntheticspeech generation) method (Alku et al., 1999). SSGwas used because it yields naturally sounding vowelswhose formants (i.e., resonance of the vocal tract thatdetermined the phoneme) can be adjusted as desired. Inaddition, modi¢cation of vowel duration is straightfor-ward in SSG. With SSG, two di¡erent vowel /a/ seg-ment durations (100 and 200 ms) were produced, whichwere then used together with the plosive /t/ (30 ms) to

synthesize the pseudo-words. The frequencies of thefour lowest formants of the vowel /a/ were as follows:540 Hz, 1165 Hz, 2195 Hz, and 3680 Hz. The non-phonetic stimuli were composed of four sinusoids, thefrequencies and intensity levels of which were adjustedto match the strongest frequency components, the har-monics, in the spectrum of the vowel /a/ in the vicinityof the lowest four formants (see Fig. 1). With this pro-cedure, the tone frequencies of the non-phonetic stimuliwere set to the following values: 490 Hz, 1195 Hz, 2175Hz, and 3725 Hz. There were 30 ms stop gaps (silence,in place of the plosive /t/) occurring at the beginning ofeach segment. Finally, energies of the pseudo-wordsand their non-phonetic counterparts were equalized.

The pseudo-words and their non-phonetic counter-parts were presented in separate blocks (speech andnon-speech, respectively). Four stimuli (one standardand three deviants) were presented in each block. Thestandard stimulus in the speech condition was a Finnishpseudo-word (‘tatata’) consisting of three plosive^vowelsyllables. Standard stimuli were 390 ms in length (eachsyllable was 130 ms), binaurally presented (50 dB abovethreshold for each subject) with an interstimulus inter-val of 860 ms (onset to onset). Deviants were a length-ening of one of the segments to 200 ms, occurring in

Fig. 1. Spectrum of the stimuli showing the frequency domain ofthe segments contained in the pseudo-word during the vowel /a/(upper panel) and contained in the non-speech stimulus (lower pan-el).

HEARES 4826 16-3-04

E. Sussman et al. /Hearing Research 190 (2004) 128^140130

place of one of the three syllables (¢rst, second, orthird). Deviants occurred randomly and were equiprob-ably distributed, on 21% of the stimuli (7% per devianttype).

The speech and non-speech stimuli were presented intwo conditions (Ignore and Attend). A total of 2250stimuli were presented in the Ignore condition and1050 stimuli were presented in the Attend condition,in three separately randomized runs for each tokentype (speech and non-speech) in each condition.

2.3. Procedures

The Ignore condition was presented ¢rst to all sub-jects to avoid e¡ects that could result from identifyingthe stimuli. In the Ignore condition, subjects were in-structed to watch a silent captioned video and ignorethe sounds. Before proceeding with the Attend condi-tion, subjects were given a 15-min rest break, followedby a practice session. In the practice session, subjectswere ¢rst presented with an example of each of thestimulus types for the Speech condition, followed bysix trials (two samples of each deviant type), in whichone of the three deviant types randomly occurred with-in a train of 15 standard stimuli. Subjects were in-structed to press one of three response keys correspond-ing to the position of the detected deviant. They pressedbutton #1 when the deviant was heard as occurring inthe ¢rst position of the pseudo-word, button #2 when itwas heard in the second position of the pseudo-wordand button #3 when it was heard in the third positionof the word. Recording proceeded when subjects re-sponded with at least 85% accuracy in the practice ses-sion. At the end of the speech block of runs, subjectstook a short break and then proceeded with the practicefor the non-speech block, in the same manner as justdescribed. The order of the blocks was counterbalancedacross subjects, half starting with the non-speech stimuliand the other half starting with the speech stimuli.

2.4. Electroencephalogram (EEG) recording and dataanalysis

Eleven EEG channels were recorded with Ag/AgClelectrodes attached to the scalp at the Fz, F3, F4, Cz,

C3, C4, Pz, P3, P4 (10-20 system) locations and at bothmastoids (left and right mastoids (LM and RM, respec-tively)), all referenced to the tip of the nose, using DC-coupled ampli¢ers, with a low-pass ¢lter setting of 40Hz. Horizontal eye movements were measured by re-cording the electro-oculogram (EOG) between the outercanthi of the two eyes and vertical eye movements byrecording the vertical EOG between Fp1 and an exter-nal electrode placed below the left eye. EOG was moni-tored to ensure that subjects were reading the captionson the video during the Ignore conditions and to ensurethat subjects were maintaining a ¢xed gaze during theAttend conditions. The EEG was digitized at a sam-pling rate of 500 Hz and then o¡-line ¢ltered between1 and 15 Hz using a 24 dB/octave rollo¡. Epochs were1100 ms in duration, starting 100 ms before and ending1000 ms after the onset of the standard and devianttones. Epochs with activity exceeding 100 WV at anyrecording channel were rejected from subsequent pro-cessing.

For each participant, the epochs were then averagedseparately for each condition (Attend and Ignore), eachtoken type (speech and non-speech), and each stimulustype (standard and deviant). Amplitudes of the ERPresponses were measured with reference to the meanamplitude in the 100 ms pre-stimulus period (subtractedfrom each point of the averaged ERP responses).

The MMN is most clearly delineated by subtractingfrom the ERP elicited by the infrequent deviant tonethe response of the ERP elicited by the standard tone.The MMN amplitude was measured, using the deviant-minus-standard di¡erence curves, by obtaining themean frontal (Fz) amplitude in a 40 ms window cen-tered on the grand mean peak, separately for each at-tentional state and stimulus token separately. MMNgenerally peaks between 100 and 200 ms from the onsetof deviance. In the present paradigm, the deviant beganat the o¡set of the 130 ms segment at the ¢rst, second,or third position of the stimulus. Thus, the length con-trast could not be distinguished until it exceeded the130 ms length of the standard. The peak MMN latencywas measured separately at Fz and the mastoids (LMand RM). The mastoids (LM and RM) were used toassess the statistical presence of the MMN because thatis where the clearest peak could be observed with no

Table 1Summary of behavioral data

Non-speech Speech

Position 1 Position 2 Position 3 Position 1 Position 2 Position 3

Accuracy (%) 94 (0.00) 86 (0.14) 92 (0.00) 95 (0.00) 86 (0.14) 93 (0.11)Reaction time (ms) 902 (139) 966 (125) 840 (101) 903 (164) 944 (116) 832 (104)Hit rate (%) 79 (15) 71 (25) 75 (19) 82 (16) 70 (22) 70 (22)False alarm rate (%) 0.41 (0.5) 0.82 (0.6) 0.57 (0.7) 0.43 (0.6) 0.92 (0.7) 0.45 (0.8)

Standard deviations are given in parentheses.

HEARES 4826 16-3-04

E. Sussman et al. /Hearing Research 190 (2004) 128^140 131

overlap with attention-related (N2b-P3b) componentselicited when stimuli were attended. See Tables 2 and3 for summary of the mean amplitude and measure-ment intervals used to measure statistical presence ofthe MMN.

Presence of component N2b was measured from thedi¡erence curves at the Cz electrode, to obtain the larg-est signal to noise ratio using a 40 ms window centeredaround the peak in the grand mean ERP waveforms(see Table 4 for measurement intervals and amplitudeinformation).

Presence of component P3b was measured from thedi¡erence curves at the Pz electrode, to obtain the larg-est signal to noise ratio, using a 40 ms window centered

around the peak in the grand mean ERP waveforms(see also Table 4 for measurement intervals range andamplitude information). One-sample t-tests for depen-dent measures were used to assess for the presence ofthe MMN, N2b, and P3b components. Repeated mea-sures ANOVA and paired t-tests for dependent mea-sures were used to compare responses. Post-hoc TukeyHSD tests were calculated. Greenhouse^Geisser correc-tions were reported.

In the Attend conditions, measures of reaction timeand identi¢cation were obtained simultaneously withthe ERP measures by requiring participants to pressone of three response keys to the deviant stimuli, eachkey representing the di¡erent positions of the deviants

Fig. 2. Grand averaged di¡erence waveforms (obtained by subtracting the ERPs elicited by the standard stimuli from the ERPs elicited by thedeviant stimuli) are displayed from the midline and the LM sites for the speech and non-speech stimuli of the Ignore condition. Note that they-scale is the same as that used in Fig. 3. The dashed line shows the latency of the MMN at the mastoid site for each position, separately.

HEARES 4826 16-3-04

E. Sussman et al. /Hearing Research 190 (2004) 128^140132

within the stimuli. A response was considered correctwhen the response button corresponding to the positionof deviance was recorded 150^1500 ms after the onsetof the target stimulus. Hits, misses, and false alarmswere calculated for each token type (speech and non-speech) and position for the Attend condition. Then,the probability that a response was correct (‘accuracyrate’) was calculated by dividing the number of hits bythe number of hits plus false alarms for each subject,token type and position, individually. Reaction timewas measured from onset of the stimulus deviance. La-tencies of the ERP components were measured for eachindividual as the largest peak in the intervals delineatedabove, for each component separately.

3. Results

3.1. Behavioral results

There was a high degree of accuracy for detectinglength contrasts in all positions (see Table 1), althoughthe responses were signi¢cantly slower and less accuratefor the middle position contrasts, regardless of the to-ken type (i.e., speech vs. non-speech). This was shownby two-way repeated measures ANOVA with factors oftoken type (speech vs. non-speech) and position (1,2,3)run separately for both accuracy and reaction time.A main e¡ect of position on accuracy (F2;18¼8.15,P6 0.015) and a main e¡ect of position on reaction

Fig. 3. Grand averaged di¡erence waveforms (obtained by subtracting the ERPs elicited by the standard stimuli from the ERPs elicited by thedeviant stimuli) along the midline and the LM site for the speech and non-speech stimuli of the Attend condition. The dashed line shows thelatency of the MMN at the mastoid site for each position, separately. Arrows point to the N2b and P3b components.

HEARES 4826 16-3-04

E. Sussman et al. /Hearing Research 190 (2004) 128^140 133

time (RT; F2;18¼12.6, P6 0.0001) were found, with nomain e¡ect of token type (speech vs. non-speech), andno interactions. Post-hoc Tukey HSD tests revealedthat responses were signi¢cantly slower and less accu-rate when the length contrasts occurred in the middleposition. RT was shortest to the last position lengthcontrasts. The mean accuracy rates and reaction timesfor each position and stimulus type are shown in Table1. These results indicate that the deviant in the middleposition may have been hardest to identify.

3.2. ERP results

Fig. 2 (Ignore condition) and Fig. 3 (Attend condi-tion) display the grand averaged di¡erence waveformsfor the speech and non-speech stimuli. Fig. 4 displaysthe mean latency of the ERP components.

3.3. MMN

3.3.1. MMN amplitudeMMNs were elicited by length contrasts in each con-

dition (Attend and Ignore), each position (¢rst, second,and third) and each token type (speech and non-speech;see Figs. 2 and 3 and Tables 2 and 3).

Three-way repeated measures ANOVA comparingthe peak MMN amplitude at the mastoid site (LM)with factors of attentional state (attend vs. ignore), to-ken type (speech vs. non-speech) and position (1,2,3)revealed a main e¡ect of attention (F1;9 = 37.1,P6 0.001) and a main e¡ect of position (F1;9 = 3.7,P6 0.05), with no main e¡ect of token type and nointeractions. The attention e¡ect shows that theMMN amplitude was larger when attended. The posi-tion e¡ect shows that MMN amplitude elicited bylength contrasts occurring in the ¢rst position werelarger than MMNs elicited by contrasts occurring ineither the second or third position.

Three-way repeated measures ANOVA comparingthe peak MMN amplitude measured at the frontalsite (Fz) with factors of attentional state (attend vs.ignore), token type (speech vs. non-speech), and posi-tion (1,2,3) revealed a main e¡ect of position (F1;9 = 8,7,P6 0.005) with no main e¡ect of attentional state ortoken type and no interactions. The position e¡ectshows that the amplitude of the MMN elicited by thethird position length contrasts (speech and tones) wassmaller than that elicited by the deviant in positions 1or 2.

In sum, there were di¡erent e¡ects of attention andposition at the frontal and mastoid sites. There was anattention e¡ect on the MMN, the MMN amplitude waslarger when the stimuli were attended than whenignored (regardless of token type), seen at the mastoidsite but not at Fz. The ¢rst position MMN amplitudewas largest in the mastoid site (regardless of tokentype). At the frontal site, the position e¡ect was onthe third position, in which it was smallest.

3.3.2. MMN latencyPosition 1 : Three-way repeated measures ANOVA

with factors of attentional state (attend vs. ignore), to-ken type (speech vs. non-speech) and electrode (Fz vs.LM) revealed a three-way interaction in the latencyof MMN elicited by position 1 deviants (F1;9 = 9.7,P6 0.02). This is explained by the shorter latency of

Fig. 4. Latencies of the ERP components. Mean latencies of theMMN components (measured at Fz) elicited in the Attend andIgnore conditions are displayed by position (top row) for the speech(black bars) and tones (gray bars). Mean latencies of the N2b (mid-dle row) and P3b (bottom row) components (measured at Cz andPz, respectively) elicited in the Attend condition are displayed byposition for speech (black bars) and tones (gray bars). Error barsrepresent standard deviations.

HEARES 4826 16-3-04

E. Sussman et al. /Hearing Research 190 (2004) 128^140134

the MMN elicited by the speech token at Fz comparedto the non-speech token when attended but not whenignored (there was no di¡erence between the latency ofthe two tokens when ignored). Moreover, this three-way interaction showed that there was no di¡erencein the latency of the MMN elicited by position 1 devi-ants at the mastoid site, attended or ignored.Position 2 : Three-way repeated measures ANOVA

with factors of attentional state (attend vs. ignore), to-ken type (speech vs. non-speech) and electrode (Fz vs.LM) revealed two-way interactions on latency of MMNelicited by position 2 deviants between attentional stateand token; between token type and electrode; and be-tween attentional state and electrode (F1;9 = 12.1,P=0.01). The three-way interaction did not quite reachsigni¢cance (P=0.06). These interactions suggest thatMMNs elicited by tones at Fz (no di¡erence at LM)

were shorter in latency than those elicited by speechsounds when attended. However, when ignored, theMMN latency was shorter at LM than at Fz.Position 3 : Three-way repeated measures ANOVA

with factors of attentional state (attend vs. ignore), to-ken type (speech vs. non-speech) and electrode (Fz vs.LM) revealed a two-way interaction on latency of posi-tion 3 MMNs between attentional state and electrode(F1;9 = 12.1, P6 0.01). This shows that the MMN elic-ited at Fz was shorter in latency when the stimuli wereattended than when they were ignored (with no suchdi¡erence at LM) regardless of whether the tokentype was speech or non-speech.

In sum, there was a di¡erence in peak MMN latencybetween the frontal and mastoid sites. The mastoid siteMMN latency was stable across attentional state andtoken type, whereas the latency of the frontal site

Table 2MMN component in the Ignore condition

Non-speech Speech

Position 1 Position 2 Position 3 Position 1 Position 2 Position 3

Latency range (ms) (224^264) (368^408) (508^548) (220^260) (372^412) (512^552)Fz 32.89** 32.52** 32.29** 32.87** 33.33** 31.88**

(1.48) (1.31) (1.78) (1.75) (2.30) (1.59)F3 32.71** 32.11** 32.04** 32.70** 33.06** 31.67**

(1.19) (1.17) (1.35) (1.56) (2.01) (1.58)F4 32.69** 32.55** 32.45** 32.74** 33.16** 32.10**

(1.49) (1.54) (1.88) (1.75) (2.39) (1.68)(218^256) (363^404) (508^548) (216^256) (364^404) (496^546)

LM 1.29** 1.31** 1.22** 1.59** 1.44** 1.26**(0.70) (0.39) (0.32) (0.77) (0.94) (0.73)

RM 1.08** 1.17** 1.22** 1.50** 1.41** 1.26**(0.67) (0.64) (0.29) (0.74) (1.05) (0.74)

Mean amplitudes (standard deviation given in parentheses) are provided for the frontal and mastoid electrode sites (and the latency range mea-sured) for each position of the Ignore condition. Statistical results of the one-sample t-tests measuring the presence of the MMN are given foreach electrode: *P6 0.05; **P6 0.01.

Table 3MMN component in the Attend condition

Non-speech Speech

Position 1 Position 2 Position 3 Position 1 Position 2 Position 3

Latency range (ms) (260^300) (352^392) (492^532) (196^236) (364^404) (504^544)Fz 32.21** 32.19** 31.04 33.00** 32.57** 30.31

(2.24) (1.52) (2.59) (1.59) (2.06) (2.69)F3 32.91** 31.71** 30.63 32.53** 32.23** 30.55

(3.18) (1.37) (2.57) (1.62) (2.02) (2.18)F4 32.65** 32.27** 30.90 32.87** 32.63** 30.81

(1.73) (1.25) (2.53) (1.37) (2.41) (2.27)(216^256) (364^404) (500^540) (216^256) (364^404) (500^540)

LM 2.57** 1.84** 1.85** 2.52** 2.17** 2.4**(0.96) (0.72) (0.75) (1.03) (0.87) (0.80)

RM 2.31** 1.68** 1.84** 2.13** 1.93** 2.27**(0.85) (0.82) (0.78) (1.05) (1.16) (1.02)

Mean amplitudes (standard deviation given in parentheses) are provided (and the latency range measured) for each position of the Attend con-dition. Statistical results of the one-sample t-tests measuring the presence of the MMN component are given for each electrode: *P6 0.05;**P6 0.01.

HEARES 4826 16-3-04

E. Sussman et al. /Hearing Research 190 (2004) 128^140 135

MMN varied by attentional state and token type.Shorter latency MMNs occurred for speech comparedto tones at Fz for the beginning segment and for at-tended stimuli at ending segments (regardless of tokentype). This indicates there were some e¡ects of attentionfor processing the beginning speech segments and ane¡ect of attention on processing ending segments ingeneral. Interestingly, for the middle segment, toneshad a shorter latency MMN at Fz compared to speechsounds. This may indicate that the duration contrastsoccurring in the middle position of speech sounds wereharder to identify than those in the tone stimuli andmay further suggest that tone patterns have di¡erentsalient aspects to them than speech patterns.

3.4. Attention-related components N2b-P3b

3.4.1. N2b-P3b amplitudeIn the Attend condition, attention-related compo-

nents N2b-P3b were elicited by the target speech andnon-speech stimuli following the MMN in all positions(see Fig. 3 and Table 4). A two-way repeated measuresANOVA with factors of token type (speech vs. non-speech) and position (1,2,3) showed an interaction be-tween position and token type on N2b amplitude(F2;18 = 4.1, P6 0.04). There was a position e¡ect onN2b amplitude for speech but not non-speech. TheN2b amplitude elicited by third position speech targetswas larger than that elicited by positions 1 and 2,whereas there was no di¡erence in N2b amplitude fornon-speech positions.

A two-way repeated measures ANOVA with factorsof token type (speech vs. non-speech) and position

(1,2,3) showed no main e¡ects on P3b amplitude andno interaction.

3.4.2. N2b-P3b latencyThe N2b latency was shorter for speech than tones

elicited by targets in position 1 (t9 = 4.2, P6 0.003),with no signi¢cant di¡erences in N2b latency betweenspeech and tones elicited by targets in the second orthird positions.

The P3b latency was shorter for speech than toneselicited by targets in position 3 (t9 = 2.5, P6 0.04), withno signi¢cant di¡erences in P3b latency between speechand tones elicited by targets in ¢rst or second positions.

These results, ¢nding di¡erences in amplitude andlatency for N2b and P3b components, suggest thatthese two ERP components may re£ect slightly di¡erentaspects of target/novel detection that are not usuallydistinguished in paradigms used to elicit them.

4. Discussion

We tested how length contrasts in segments of three-syllable pseudo-words and their acoustic (non-speech)correlates were pre-attentively and attentively discrimi-nated. There were four main ¢ndings. The ERPs andconcurrent behavioral responses of the present studyshowed that (1) the amplitude of the MMN (an auto-matic measure of sound change detection) did not dif-ferentiate between speech and non-speech input, regard-less of whether the stimuli were attended or ignored; (2)the amplitude of the later N2b component, associatedwith controlled processing, showed di¡erences accord-

Table 4N2b and P3b components in the Attend condition

Non-speech Speech

Position 1 Position 2 Position 3 Position 1 Position 2 Position 3

N2bLatency range (ms) (332^372) (414^454) (554^594) (246^306) (426^466) (564^604)Cz 33.92** 34.16* 35.50** 34.21** 34.54** 37.85**

(2.17) (5.59) (5.24) (3.59) (4.68) (5.86)C3 33.97** 33.91* 35.16** 34.14** 34.25** 36.69**

(2.21) (4.69) (4.48) (3.12) (3.63) (4.94)C4 34.00** 34.02** 35.46** 34.37** 34.67** 36.80**

(1.84) (3.47) (4.13) (3.43) (3.57) (4.94)P3bLatency range (ms) (436^476) (582^622) (735^775) (432^472) (576^516) (708^748)Pz 4.80** 4.00** 3.50* 3.67** 3.61** 4.99**

(2.60) (2.16) (5.62) (2.82) (2.86) (2.60)P3 4.00** 3.13** 2.58 2.80** 3.01** 4.20**

(2.38) (2.20) (5.68) (2.51) (3.05) (2.72)P4 3.95** 3.70** 3.13* 3.01** 3.19** 4.47**

(1.87) (1.98) (5.36) (2.21) (2.31) (2.16)

Mean amplitudes (standard deviation given in parentheses) are provided (and the latency range measured) for each position of the Attend con-dition. Statistical results of the one-sample t-tests measuring the presence of the N2b and P3b components are given for each electrode:*P6 0.05; **P6 0.01.

HEARES 4826 16-3-04

E. Sussman et al. /Hearing Research 190 (2004) 128^140136

ing to whether it was elicited by speech or non-speechtokens; (3) the latency of the ERP components wasshorter for speech than non-speech stimuli, when at-tended, for the contrasts occurring in the beginning orending segments of the pseudo-words; and (4) attentionenhanced the amplitude of the MMN at the mastoidsites irrespective of whether the stimuli were speech ornon-speech.

One of the goals of the study was to determinewhether speech would be processed di¡erently than sim-ilarly complex tones when attended or ignored. Wefound no e¡ects of token type (speech vs. non-speech)on MMN amplitude when either attended or ignoredusing the matched acoustic correlates of the speechsounds. MMNs elicited by the length contrasts ofboth token types in all three positions had similar am-plitudes. Many of the previous studies that tested pre-attentive processing of speech tokens with MMN para-digms used pure tone stimuli for the comparison stimuliand found that processing pure tone stimuli di¡eredfrom processing speech stimuli (e.g., Aaltonen et al.,1999; Schulte-Korne et al., 2001; Uwer et al., 2002).However, using pure tones does not truly control forthe complexity of the acoustic aspects in the speechsounds so ¢nding a di¡erence between them shouldnot be surprising and using acoustically matched com-plex tones we found no di¡erence in MMN amplitudebetween speech and tones. Thus, the current resultssuggest that the di¡erences in MMN amplitude foundin previous studies (between pure tone and speechsounds) may have resulted from di¡erences in theacoustic complexity of the stimuli and not per se inthe processing of speech vs. non-speech3.

Accordingly, the current results also show that, whenthe complexity of the speech stimuli was controlled for,di¡erences between the speech and the complex toneswere not re£ected in the amplitude of the MMN (i.e., inthe pre-attentive stages of auditory processing). More-over, with one exception, there was no attention e¡ecton speech processing at the level of MMN4 in the cur-rent set of data (although there was an attention e¡ecton the MMN amplitude independent of token type, seebelow for further discussion) and no interaction be-tween token type and position on the behavioral re-sponses.

These results support the view that MMN re£ectsprocessing of the acoustic level of speech, regardlessof whether the input is attended or ignored. This isconsistent with previous studies that found that, where-as the MMN re£ected acoustic changes in speech stim-uli, it did not re£ect the categorical perception of speech(Dalebout and Stack, 1999; Maiste et al., 1995; Sharmaet al., 1993). Together, these results suggest that MMNelicited by vowel or consonant contrasts may in factresult from changes in the acoustic parameters of thespeech sounds (i.e., changes in the frequency, intensity,or duration) and not directly from the phonetic catego-rization of the sounds.

However, there are many studies to suggest thatMMN does re£ect categorical perception. This is dem-onstrated in cross-linguistic studies of vowel contrasts,di¡erentiated by spectral composition (Dehaene-Lam-bertz et al., 2000; Na«a«ta«nen et al., 1997; Winkler etal., 1999a,b), voice onset time (Sharma and Dorman,2000) and length (Nenonen et al., 2003), in which largeramplitude MMNs were elicited by the contrasts in na-tive vs. non-native speakers. This would suggest thatMMN could re£ect phonetic properties of the speechsounds. It should be noted, however, that no compar-ison was made in these cross-linguistic studies withacoustically matched non-speech tokens. Thus, it isnot known whether, at least in part, the acoustic aspectsof the speech tokens in these studies could have ac-counted for some of the MMN amplitude di¡erence.That is, linguistic experience may a¡ect acoustic sensi-tivity to the aspects of speech sounds that are relevantto the language environment of the listener.

In contrast to this ¢nding regarding the early, pre-attentive auditory processes re£ected in the MMN,there were attention e¡ects on speech processing occur-ring after the MMN. These were re£ected by changes inthe latency and amplitude of the ERP responses asso-ciated with controlled processing (the N2b and P3bcomponents). These attention-related ERP responseswere shorter in latency when the length contrasts wereidenti¢ed as occurring in either the beginning or end

3 One study found larger amplitude MMN elicited by speech soundscompared to complex tones in a passive listening condition (Jaramilloet al., 2001). However, the paradigm used by Jaramillo et al. di¡eredconsiderably from the present one, especially in that the range of the10 harmonics of the complex tones in the Jaramillo et al. study waslower in frequency (210^1155 Hz) than the range of the 10 formantsof the vowel stimuli (470^9500 Hz), which may account for the largerMMN amplitude for the speech tokens. Thus, it is di⁄cult to assesswhether the larger amplitude MMN found in this case re£ects some-thing unique about speech processing.

4 There was also a statistically signi¢cant di¡erence in the peakMMN latency (at Fz) in position 1 of the speech (but not non-speechsounds) when attended, but this result is not clean because of theoverlap with N2b. It was di⁄cult to determine the true peak latencyof the MMN at the frontal electrode sites, even using Cz as a guide todistinguish the peak latency of N2b from the peak latency of MMN,because there was not always a distinct MMN peak at Fz (whereMMN is typically measured). It should also be noted that due tothe overlap, presence of the MMN was made at the mastoid record-ings (see Section 2.4). In addition, there was no di¡erence in MMNlatency measured at the mastoid sites for MMN. It is likely that theMMN observed at the frontal site includes overlap with other pro-cesses, even when ignored (see Sussman and Winkler, 2001), whereasthe mastoids do not. However, the peak N2b could be clearly identi-¢ed at the Cz electrode. Therefore, for the present purposes, the short-er peak of the MMN latency will be discussed with prudence.

HEARES 4826 16-3-04

E. Sussman et al. /Hearing Research 190 (2004) 128^140 137

positions for speech, whereas this di¡erence was notfound for non-speech. The N2b elicited by ¢rst positionspeech segments was shorter in peak latency comparedto the N2b elicited by the other two positions. Theshorter N2b latency may suggest that when attended,some aspects of the speech are processed faster than thenon-speech when the contrast occurs in the initial seg-ment. This, along with the shorter latency MMN thatwas also elicited by the initial speech segment, is con-sistent with studies showing the chronometry of theseresponses, in which an increase in the MMN latencyproduces a proportionate increase in the N2b latency(i.e., timing of the MMN determines the timing of thelater N2b response; Novak et al., 1992b). There may bean initial advantage for processing speech compared toother complex sounds. This result further suggests thatphonemes in beginning segments carry stronger cues toword identi¢cation than later positions within the word.

The other signi¢cant attention e¡ect on speech pro-cessing occurred with respect to the ¢nal segment of thestimuli. When elicited by the third segment, the N2bhad larger amplitudes and the P3b component had ashorter latency elicited by speech than non-speech con-trasts. These results, taken together, suggest that con-trolled processes may provide an ‘advantage’ for speechprocessing compared to non-speech information. It isalso possible that the cues for word di¡erentiation arestronger when they occur either in the beginning andending syllables of words.

Additionally, subjects were slower to respond to thesecond position targets and were less accurate at iden-tifying these segments, though this e¡ect was indepen-dent of token type. This result may re£ect a type of‘primacy-recency’ e¡ect or suggest that the middle seg-ment had less saliency than either the beginning or end-ing segments of the three-segment stimuli.

Finally, the MMN amplitude was larger when at-tended than ignored, regardless of token type. This re-sult shows an overall attention e¡ect on the amplitudeof the MMN. This e¡ect was found only at the mastoidsites (LM and RM) and not at the frontal sites (Fz, F3,F4). This is the ¢rst evidence of an attention e¡ect onthe MMN at the mastoid sites. Interestingly, largerMMN amplitude elicited at the mastoid sites cannotbe attributed to overlap with N2b because of the topo-graphic di¡erences between the components. The pos-itive waveform of the MMN at the mastoid sites (whenthe nose is used as the common reference) results fromthe orientation of the neuronal dipoles in the auditorycortex. The N2b response, however, does not invert inpolarity at the mastoids, thus, the overlap of the N2bwith MMN components, when it occurs, is only ob-served at the fronto-central electrode sites. Di¡erentia-tion between MMN amplitudes and latencies at frontaland mastoid sites has been observed in a few MMN

studies (e.g., Sussman and Winkler, 2001) and may beconsistent with topographic studies showing that theMMN arises from di¡erent frontal and temporal gen-erators (Opitz et al., 2002; Rinne et al., 2000).

It should be noted that there are no other studies thatshow attention e¡ects of this sort on MMN. Attentione¡ects on MMN amplitude have been shown in selec-tive listening paradigms (Alain and Woods, 1997; Na«a«-ta«nen et al., 1993; Szymanski et al., 1999; Treijo et al.,1995; Woldor¡ et al., 1991, 1998). In these selectivelistening studies, the listener selects one subset of soundinput and ignores another subset of the sounds. Theamplitude of the MMN elicited on the unattendedchannel has been shown to be smaller than that elicitedin the attended channel, under certain circumstances.However, many of these amplitude e¡ects can be ex-plained by overlap of N2b, in which it is not clearwhether the unattended MMN is truly attenuated byattention. Moreover, Sussman et al. (2003) demon-strated that selective attention, in and of itself, doesnot modify the amplitude of the MMN5. Nevertheless,in the current study a selective listening paradigm wasnot used. In both conditions, subjects either ignored allor attended all of the deviants in the auditory input.Therefore, previous selective listening studies showinge¡ects of attention on MMN cannot explain the currentresults.

There is only one study we are aware of that com-pared stimuli that were fully attended and fully ignoredin separate conditions (Gomes et al., 2000). However, inGomes et al. pure tone stimuli were used. AlthoughGomes et al. did not statistically analyze the MMNamplitude elicited at the mastoid sites, visual inspectionof the data does not suggest such an attention e¡ectwas present. It is, however, possible that pure tonestimuli and complex stimuli are processed di¡erentlyunder di¡erent circumstances (Alho et al., 1996).

A possible interpretation of the current attention ef-fect is that the mastoid site re£ects detection of thepurely acoustic aspects of the sound input (Sussmanand Winkler, 2001), whereas the amplitude of theMMN at frontal sites includes overlap with other cog-nitive processes. However, because no other studies us-ing pure tone stimuli have found this attention e¡ect onthe MMN amplitude, at temporal or frontal sites, it is

5 The studies that have shown that focusing attention to sounds pre-sented to one ear and detecting infrequent deviants attenuates theMMN amplitude to similar deviants presented to the other ear (Na«a«-ta«nen et al., 1993; Woldor¡ et al., 1991, 1998) can be explained by a‘competition’ e¡ect occurring when the same deviants are in bothattended and unattended channels. The deviants compete for MMNgeneration and the attended channel ‘wins out’, with the amplitude ofthe MMN elicited by di¡erent deviants in the unattended ear unaf-fected. Thus, Sussman et al. (2003) demonstrate that in the absence ofsuch competition, MMN is not a¡ected by attentional manipulation.

HEARES 4826 16-3-04

E. Sussman et al. /Hearing Research 190 (2004) 128^140138

additionally possible that the current result is speci¢cto complex stimuli. An e¡ect of attention on com-plex stimuli could indicate that the integration of spec-tral and temporal components of the complex stimulimakes them easier to distinguish when attended thanignored.

In summary, the main results of the behavioral andelectrophysiological data of the current study demon-strated position e¡ects on sound change detection pro-cesses, a¡ecting how the sounds were both detected andidenti¢ed depending on where within the complex stim-uli they occurred. There were no e¡ects of token type atthe level of MMN when either detecting or identifyinglength contrasts of the non-speech stimuli (i.e., eitherignoring or attending the stimuli). The position e¡ectswere found to occur largely independent of token type.In contrast, interactions between token type and posi-tion were found with respect to the ERP responses as-sociated with controlled processes. These results suggestthat attention may facilitate identi¢cation of speech in-put at the beginning and ending segments of multi-syl-labic word length stimuli. Because no e¡ects of tokentype were found on the MMN amplitude, these datafurther suggest that speech may be processed di¡erentlythan non-speech only in the later stages involved withcontrolled auditory processes. Long-term memory tem-plates of the speech sounds may be engaged with con-trolled processes, thus providing an advantage forspeech processing by facilitating the identi¢cation ofspeech information.

Acknowledgements

This research was supported by the National Insti-tutes of Health (Grant R01 DC004263) and the Acad-emy of Finland. We thank Judith Kreuzer for herassistance with data processing and Dr. MitchellSteinschneider for his helpful comments on a previousversion of the manuscript.

References

Aaltonen, O., Eerola, O., Lang, A.H., Uusipaikka, E., Tuomainen, J.,1994. Automatic discrimination of phonetically relevant and irrel-evant vowel parameters as re£ected by mismatch negativity.J. Acoust. Soc. Am. 96, 1489^1493.

Ahissar, M., Protopapas, A., Reid, M., Merzenich, M.M., 2000.Auditory processing parallels reading abilities in adults. Proc.Natl. Acad. Sci. USA 97, 6832^6837.

Alain, C., Woods, D.L., 1997. Attention modulates auditory patternmemory as indexed by event-related potentials. Psychophysiology34, 534^546.

Alho, K., 1995. Cerebral generators of mismatch negativity (MMN)and its magnetic counterpart (MMNm) elicited by sound changes.Ear Hear. 16, 38^51.

Alho, K., Paavilainen, P., Reinikainen, K., Sams, M., Na«a«ta«nen, R.,1986. Separability of di¡erent negative components of the event-related potential associated with auditory stimulus processing. Psy-chophysiology 23, 613^623.

Alho, K., Tervaniemi, M., Huotilainen, M., Lavikainen, J., Tiitinen,H., Ilmoniemi, R.J., Knuutila, J., Na«a«ta«nen, R., 1996. Processingof complex sounds in the human auditory cortex as revealed bymagnetic brain responses. Psychophysiology 33, 369^375.

Alku, P., Tiitinen, H., Na«a«ta«nen, R., 1999. A method for generatingnatural-sounding speech stimuli for cognitive brain research. Clin.Neurophysiol. 110, 1329^1333.

Dalebout, S.D., Stack, J.W., 1999. Mismatch negativity to acousticdi¡erences not di¡erentiated behaviorally. J. Am. Acad. Audiol.10, 388^399.

Dehaene-Lambertz, G., Dupoux, E., Gout, A., 2000. Electrophysio-logical correlates of phonological processing: a cross-linguisticstudy. J. Cogn. Neurosci. 12, 635^647.

Donchin, E., Coles, M.G.H., 1988. Is the P300 component a mani-festation of context updating? Behav. Brain Sci. 11, 357^374.

Giard, M.H., Perrin, F., Pernier, J., Bouchet, P., 1990. Brain gener-ators implicated in processing of auditory stimulus deviance: Atopographic event-related potential study. Psychophysiology 27,627^640.

Gomes, H., Molholm, S., Ritter, W., Kurtzberg, D., Cowan, N.,Vaughan, H.G., Jr., 2000. Mismatch negativity in children andadults, and e¡ects of an attended task. Psychophysiology 37,807^816.

Jaramillo, M., Ilvonen, T., Kujala, T., Alku, P., Tervaniemi, M.,Alho, K., 2001. Are di¡erent kinds of acoustic features processeddi¡erently for speech and non-speech sounds? Cogn. Brain Res. 12(3), 459^466.

Kraus, N., Cheour, M., 2000. Speech sound representation in thebrain. Audiol. Neuro-otol. 5, 140^150.

Laing, E., Hulme, C., 1999. Phonological and semantic processesin£uence beginning readers’ ability to learn to read words.J. Exp. Child Pathol. 73, 183^207.

Lubert, N., 1981. Auditory perceptual impairments in children withspeci¢c language disorders: a review of the literature. J. SpeechHear. Disord. 46, 1^9.

Lyytinen, H., Leppa«nen, P.H.T., Richardson, U., Guttorm, T., 2003.Brain Functions and Speech Perception in Infants at Risk forDyslexia. Kluwer, Dordrecht.

Lyytinen, H., Aro, M., Holopainen, L., Leiwo, M., Lyytinen, P.,Tolvanen, A., 2004. Children’s Language Development and Read-ing Acquisition in a Highly Transparent Language. Lawrence Erl-baum Associates, New York, in press.

Maiste, A.C., Wiens, A.S., Hunt, M.J., Scherg, M., Picton, T.W.,1995. Event-related potentials and the categorical perception ofspeech sounds. Ear Hear. 16, 68^90.

McAnally, K.I., Stein, J.F., 1996. Auditory temporal coding in dys-lexia. Proc. R. Soc. Lond. B Biol. Sci. 263, 961^965.

Mody, M., Studdert-Kennedy, M., Brady, S., 1997. Speech perceptionde¢cits in poor readers: auditory processing or phonological cod-ing? J. Exp. Child Pathol. 64, 199^231.

Na«a«ta«nen, R., Lehtokoski, A., Lennes, M., Cheour, M., Huotilainen,M., Iivonen, A., Vainio, M., Alku, P., Ilmoniemi, R.J., Luuk, A.,Allik, J., Sinkkonen, J., Alho, K., 1997. Language-speci¢c pho-neme representations revealed by electric and magnetic brain re-sponses. Nature 385, 432^434.

Na«a«ta«nen, R., Michie, P.T., 1997. Early selective-attention e¡ects onthe evoked potential: a critical review and reinterpretation. Biol.Psychol. 8, 81^136.

Na«a«ta«nen, R., Winkler, I., 1999. The concept of auditory stimulusrepresentation in cognitive neuroscience. Psychol. Bull. 125, 826^859.

HEARES 4826 16-3-04

E. Sussman et al. /Hearing Research 190 (2004) 128^140 139

Na«a«ta«nen, R., Simpson, M., Loveless, N.E., 1982. Stimulus devianceand evoked potentials. Biol. Psychol. 14, 53^98.

Na«a«ta«nen, R., Paavilainen, P., Tiitinen, H., Jiang, D., Alho, K., 1993.Attention and mismatch negativity. Psychophysiology 30, 436^450.

Nenonen, S., Shestakova, A., Huotilainen, M., Na«a«ta«nen, R., 2003.Linguistic relevance of duration within the native language deter-mines the accuracy of speech-sound duration processing. Cogn.Brain Res. 16, 492^495.

Nittrouer, S., 1999. Do temporal processing de¢cits cause phonolog-ical processing problems? J. Speech Hear. Res. 42, 925^942.

Novak, G., Ritter, W., Vaughan, H.G., Jr., 1992a. Mismatch detec-tion and the latency of temporal judgements. Psychophysiology29, 398^411.

Novak, G., Ritter, W., Vaughan, H.G., Jr., 1992b. The chronometryof attention-modulated processing and automatic mismatch detec-tion. Psychophysiology 29, 412^430.

Opitz, B., Rinne, T., Mecklinger, A., Von Kramon, D.Y., Schro«ger,E., 2002. Di¡erential contribution of frontal and temporal corticesto auditory change detection: fMRI and ERP results. Neuroimage15, 167^174.

Picton, D.W., Alain, C., Otten, L., Ritter, W., Achim, A., 2000. Mis-match negativity: di¡erent water in the same river. Audiol. Neuro-otol. 5, 111^139.

Rack, J., Hulme, C., 1993. The role of phonology in young childrenlearning to read words: the direct-mapping hypothesis. J. Exp.Child Pathol. 57, 42^71.

Reed, M.A., 1989. Speech perception and the discrimination of briefauditory cues in reading disabled children. J. Exp. Child Pathol.48, 270^292.

Renault, B., Leserve, N., 1979. A trial-by-trial study of the visualomission response in reaction time situations. In: Lehman, D.,Callaway, E. (Eds.), Human Evoked Potentials. Plenum Press,New York.

Rinne, T., Alho, K., Ilmoniemi, R.J., Virtanen, R., Na«a«ta«nen, R.,2000. Separate time behaviors of the temporal and frontal mis-match negativity sources. Neuroimage 12, 14^19.

Scherg, M., Vajsar, J., Picton, T.W., 1989. A source analysis of thelate human auditory evoked potentials. J. Cogn. Neurosci. 1, 336^355.

Schulte-Korne, G., Deimel, W., Bartling, J., Remschmidt, H., 1999.The role of phonological awareness, speech perception, and audi-tory-temporal processing for dyslexics. Eur. Child Adolesc. Psy-chiatry 8, 28^34.

Schulte-Korne, G., Deimel, W., Bartling, J., Remschmidt, H., 2001.Speech perception de¢cit in dyslexic adults as measured by mis-match negativity (MMN). Int. J. Psychophysiol. 40, 77^87.

Shannon, R.V., Zeng, F.-G., Kamath, V., Wygonski, J., Ekelid, M.,1995. Speech recognition with primarily temporal cues. Science270, 303^304.

Sharma, A., Dorman, M.F., 2000. Neurophysiologic correlates ofcross-language phonetic perception. J. Acoust. Soc. Am. 107,2697^2703.

Sharma, A., Kraus, N., McGee, T., Carrell, T., Nicol, T., 1993.Acoustic versus phonetic representation of speech as re£ected bythe mismatch negativity event-related potential. Electroencepha-logr. Clin. Neurophysiol. 88, 64^71.

Sussman, E., Winkler, I., 2001. Dynamic process of sensory updatingin the auditory system. Cogn. Brain Res. 12, 431^439.

Sussman, E., Ritter, W., Vaughan, H.G., Jr., 1998. Attention a¡ectsthe organization of auditory input associated with the mismatchnegativity system. Brain Res. 789, 130^138.

Sussman, E., Winkler, I., Kreuzer, J., Saher, M., Na«a«ta«nen, R., Rit-ter, W., 2002. Temporal integration: Intentional sound discrimi-nation does not modify stimulus-driven processes in auditory eventsynthesis. Clin. Neurophysiol. 113, 909^920.

Sussman, E., Winkler, I., Wang, W.J., 2003. MMN and attention:Competition for deviance detection. Psychophysiology 40, 430^435.

Sutton, S., Braren, M., Zubin, J., John, E.R., 1965. Evoked potentialcorrelates of stimulus uncertainty. Science 150, 1187^1188.

Szymanski, M.D., Yund, E.W., Woods, D.L., 1999. Phonemes, inten-sity and attention: Di¡erential e¡ects on the mismatch negativity(MMN). J. Acoust. Soc. Am. 106, 3492^3505.

Tallal, P., 1980. Language disabilities in children: a perceptual orlinguistic de¢cit? J. Pediatr. Psychol. 5, 127^140.

Tallal, P., Miller, S., Fitch, R.H., 1993. Neurobiological basis ofspeech: a case for the preeminence of temporal processing. Ann.NY Acad. Sci. 682, 27^47.

Tiitinen, H., May, P., Reinikainen, K., Na«a«ta«nen, R., 1994. Attentivenovelty detection in humans is governed by pre-attentive sensorymemory. Nature 372, 901^902.

Treijo, L.J., Ryan-Jones, D.L., Kramer, A.F., 1995. Attentional mod-ulation of the mismatch negativity elicited by frequency di¡erencesbetween binaurally presented tone bursts. Psychophysiology 32,319^328.

Uwer, R., Albrecht, R., von Suchodoletz, W., 2002. Automatic pro-cessing of tones and speech stimuli in children with speci¢c lan-guage impairment. Dev. Med. Child Neurol. 44, 527^532.

Winkler, I., Kujala, T., Tiitinen, H., Sivonen, P., Alku, P., Lehtokos-ki, A., Czigler, I., Csepe, V., Ilmoniemi, R.J., Na«a«ta«nen, R.,1999a. Brain responses reveal the learning of foreign languagephonemes. Psychophysiology 36, 638^642.

Winkler, I., Lehtokoski, A., Alku, P., Vainio, M., Czigler, I., Cse¤pe,V., Aaltonen, O., Raimo, I., Alho, K., Lang, A.H., Iivonen, A.,Na«a«ta«nen, R., 1999b. Pre-attentive detection of vowel contrastsutilizes both phonetic and auditory memory representations.Cogn. Brain Res. 7, 357^369.

Woldor¡, M.G., Hackley, S.A., Hillyard, S.A., 1991. The e¡ects ofchannel-selective attention on the mismatch negativity wave elic-ited by deviant tones. Psychophysiology 28, 30^42.

Woldor¡, M.G., Hillyard, S.A., Gallen, C.C., Hampson, S.R., Bloom,F.E., 1998. Magnetoencephalographic recordings demonstrate at-tentional modulation of mismatch-related neural activity in humanauditory cortex. Psychophysiology 35, 283^292.

HEARES 4826 16-3-04

E. Sussman et al. /Hearing Research 190 (2004) 128^140140