Neural correlates of pitch and roughness: Toward the ...research.meei.harvard.edu/NeuralCoding/Theses/... · models for music and speech perception. 1.1 Pitch A basic assumption made

Neural correlates of pitch and roughness: Toward

the neural code for melody and harmony

perception

by

Martin Franciscus McKinney

Submitted to the Harvard-MIT Division of Health Sciences andTechnology

in partial fulfillment of the requirements for the degree of

Doctor of Philosopy

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

July 2001

c© Martin Franciscus McKinney, MMI. All rights reserved.

The author hereby grants to MIT permission to reproduce anddistribute publicly paper and electronic copies of this thesis document

in whole or in part.

Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Harvard-MIT Division of Health Sciences and Technology

July 30, 2001

Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bertrand Delgutte

Associate Professor of Otology and Laryngology, Harvard MedicalSchool

Thesis Supervisor

Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Martha L. Gray, PhD

Edward Hood Taplin Professor of Medical and Electrical EngineeringCo-director, Harvard-M.I.T. Division of Health Sciences and

Technology

2

Neural correlates of pitch and roughness: Toward the neural

code for melody and harmony perception

by

Martin Franciscus McKinney

Submitted to the Harvard-MIT Division of Health Sciences and Technologyon July 30, 2001, in partial fulfillment of the

requirements for the degree ofDoctor of Philosopy

Abstract

The universality of many aspects of music, such as octave-based tuning systems and the use ofdissonance and consonance to create harmonic tension and resolution, suggests that their perceptionmay have fundamental neurophysiological bases. Thus, music provides a natural set of stimuli andassociated percepts with which the auditory system can be studied. Here, we seek correlates of pitch,the essential element of melody, and roughness, a primary component of dissonance, in responses ofsingle auditory neurons in anesthetized cats.

Pitch, the perceived highness or lowness of sound, is generally thought to be based on a neu-rophysiological representation of frequency. Because neural responses (spikes) phaselock to lowstimulus frequencies, interspike intervals (ISIs) reflect the stimulus period and can be used to esti-mate frequency. To rigorously test this potential code for pitch, we look for correlates of pitch underconditions where the percept deviates from a simple function of frequency. One such condition isthe octave enlargement effect, listeners’ preference for pure-tone octave ratios slightly greater than2:1. Another is the pitch of a complex tone missing the fundamental frequency: the pitch matchesthat of the missing fundamental even when different harmonics are presented to opposite ears. Weshow that a correlate of the octave enlargement effect exists in ISIs of auditory nerve (AN) fibersand a correlate of the missing-fundamental pitch exists in ISIs of neurons in the inferior colliculus,the principal auditory nucleus of the midbrain. Results also reveal greater degradation of pitchrepresentation at the midbrain compared to the periphery.

Roughness, the sensation of temporal envelope fluctuations in the range of ∼20-200 Hz, is oftenequated with sensory dissonance. Here we examine IC neural responses for correlates of sensorydissonance. We show that sensory dissonance correlates with discharge rate fluctuations of all ICneurons and with average rates of a subset of IC neurons which only respond at the onset of pure-tones. Results indicate that IC neurons are specifically important for the coding of the temporalenvelope.

Our findings illustrate the complexity and specificity of auditory neural processing in the brain-stem and midbrain and show that percepts generally considered to be high order, such as thedissonance of musical intervals, have direct correlates in neural responses in the midbrain. Moregenerally they show that the auditory system performs processing important for music at multipletime scales.

Thesis Supervisor: Bertrand DelgutteTitle: Associate Professor of Otology and Laryngology, Harvard Medical School

3

4

6

Contents

1 Introduction 11

1.1 Pitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2 Consonance and dissonance . . . . . . . . . . . . . . . . . . . . . . . 13

1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3.1 Chapter 2: Octave enlargement . . . . . . . . . . . . . . . . . 15

1.3.2 Chapter 3: Monaural/diotic dissonance . . . . . . . . . . . . . 15

1.3.3 Chapter 4: Dichotic dissonance and pitch salience . . . . . . . 16

2 A possible neurophysiological basis of the octave enlargement effect 17

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.1 All-Order Interspike Intervals . . . . . . . . . . . . . . . . . . 26

2.3.2 First-Order Interspike Intervals . . . . . . . . . . . . . . . . . 31

2.4 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.4.1 Model for estimating pure-tone frequency . . . . . . . . . . . . 37

2.4.2 Model for octave matching . . . . . . . . . . . . . . . . . . . . 40

2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.5.1 Auditory Nerve Physiology . . . . . . . . . . . . . . . . . . . . 42

2.5.2 Temporal Models for Octave Matching and Pitch Perception . 44

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7

2.7 Appendix: The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . 49

2.7.1 Gaussians with independent means and variances. . . . . . . . 50

2.7.2 Gaussians with harmonically related means and a common vari-

ance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3 Neural correlates of the dissonance of musical intervals in the inferior

colliculus. I. Monaural and diotic tone presentation 53

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.3.1 Responses to pure- and complex-tone pairs . . . . . . . . . . . 62

3.3.2 Effect of level and PSTH type . . . . . . . . . . . . . . . . . . 68

3.3.3 Dependence on CF . . . . . . . . . . . . . . . . . . . . . . . . 69

3.3.4 Responses to a musical excerpt . . . . . . . . . . . . . . . . . 71

3.3.5 Additional observations . . . . . . . . . . . . . . . . . . . . . . 73

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.4.1 Neurophysiology . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.4.2 Psychophysics and perception . . . . . . . . . . . . . . . . . . 79

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4 Neural correlates of the dissonance of musical intervals in the inferior

colliculus. II. Dichotic tone presentation and pitch salience 85

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.2.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.2.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.3.1 Dichotic tone pairs . . . . . . . . . . . . . . . . . . . . . . . . 91

8

4.3.2 Pitch analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.4 Model Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

4.5.1 Neurophysiology . . . . . . . . . . . . . . . . . . . . . . . . . 109

4.5.2 Psychophysics and perception . . . . . . . . . . . . . . . . . . 112

4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5 Discussion 115

5.1 Summary of findings . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.1.1 AN ISIs and the octave enlargement effect . . . . . . . . . . . 115

5.1.2 Correlates of dissonance in IC neural responses . . . . . . . . 116

5.2 Limitations of the neurophysiological data . . . . . . . . . . . . . . . 117

5.2.1 Effect of anesthesia . . . . . . . . . . . . . . . . . . . . . . . . 117

5.2.2 Small sample sizes . . . . . . . . . . . . . . . . . . . . . . . . 118

5.2.3 Limited frequency range . . . . . . . . . . . . . . . . . . . . . 119

5.3 Pitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.4 Consonance and dissonance . . . . . . . . . . . . . . . . . . . . . . . 121

5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

5.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

9

10

Chapter 1

Introduction

The process of listening to music involves a complex set of cognitive functions, some

of which appear to be universal. Commonalities of musical systems across cultures,

including the use of octave-based scales and melodic and rhythmic contour (Dowling

and Harwood, 1986) as well as consonance and dissonance (Sethares, 1999), suggest

that there may be specific neurophysiological responses to various aspects of musical

stimuli. Thus, music provides us with a convenient standard set of stimulus rules

and associated percepts which we can use to study brain function. The general

strategy here is to use musical stimuli and percepts to gain an understanding of how

information is processed and coded in the auditory central nervous system (CNS).

Along with rhythm, melody and harmony constitute the primary components of

music in most cultures. The perception of melody is formed from the pitches of

a set of successive single tones, while the perception of harmony is based on the

changing quality of sequential sets of simultaneously sounding tones (chords). The

quality of chords ranges from dissonant (harsh) to consonant (pleasant). Harmonic

progressions typically cycle between these two extremes, providing a sense of building

tension followed by a resolution.

Two fundamental elements on which melody and dissonance are based are pitch

and roughness respectively. Pitch is defined as, “the perceived highness or lowness

of sound” (Randel, 1978) and although it is primarily dependent on frequency, pitch

also depends somewhat on intensity (Fletcher, 1934; Stevens, 1935; Terhardt, 1974c;

11

Verschuure and van Meeteren, 1975) and the presence of other sounds (Stoll, 1985).

Pitch is also important in speech communication where it is used to convey meaning

through inflection, especially in tone languages such as Mandarin Chinese where pitch

contours provide lexical information. Roughness is the sensation produced by a sound

whose amplitude envelope fluctuates periodically at a rate of ∼20-300 Hz. It gives

the sound a harsh and unpleasant quality. Sensory dissonance, a quality of isolated

chords in the absence of contextual influences, is thought to be caused by roughness

generated by the beating of neighboring partials in a complex chord (von Helmholtz,

1863; Plomp and Levelt, 1965; Terhardt, 1974a; Terhardt, 1977).

My general thesis is that certain aspects of music perception can be attributed

to specific processing at particular stages in the auditory brainstem and midbrain.

More specifically, pitch effects, such as the octave enlargement effect, are hypothesized

to be encoded in interspike intervals of neurons in the auditory-nerve and cochlear

nucleus, while roughness, a fundamental component of dissonance, is hypothesized to

be encoded directly in the average discharge rates and temporal patterns of inferior-

colliculus neurons. The general goal of this research is to show quantitative correlates

of these perceptual phenomena in the activity of auditory neurons. Such quantitative

characterization of auditory processing may lead to more accurate computational

models for music and speech perception.

1.1 Pitch

A basic assumption made here, in relating neurophysiology to pitch, is that pitch is an

estimate, based on some neural representation, of the stimulus fundamental frequency.

Under certain conditions, i.e., pitch effects, the perceived pitch does not behave as

a simple function of frequency. Following from the above assumption, the neural

representation of frequency must also deviate from a simple function of frequency

under certain conditions. Thus, pitch effects may be useful in identifying the neural

code for pitch: by correlating neural responses with psychoacoustic behavior, one can

assess the relative importance of specific neural representations of frequency to the

12

overall pitch percept.

For low frequency tones, i.e., in the frequency range of musical pitch, stimulus

frequency is encoded in the interspike intervals of auditory-nerve fibers (Rose et al.,

1967). Cariani and Delgutte (1996a,b) showed that the pitch of a wide variety of

stimuli correlates with the most prominent interval in the distribution of auditory-

nerve all-order1 ISIs. If the ISI is indeed the code for frequency on which pitch is

based, then ISI distributions should reflect the deviations in pitch which occur in the

presence of pitch effects.

Stimulus frequency can also be obtained from aspects of neural responses other

than the ISI. The cochlear frequency map of the basilar membrane is reflected in

the tonotopic array of AN fiber activity so that stimulus frequency can be estimated

from the discharge rate profile across the whole nerve. In addition, the stimulus

frequency can be estimated from the phase pattern or phase difference of the AN

response. Neural codes for pitch based on these stimulus frequency representations

will be discussed below but the focus of the research is on ISI codes, partly because

of the practical difficulty in obtaining precise measurements of rate or phase across

the full distribution of AN fibers.

1.2 Consonance and dissonance

The consonance of two complex sounds has long been known to correlate with simple

ratios of fundamental frequency (Pythagoras, 540-410 B.C.). Noting that sounds with

fundamental frequencies related by simple ratios contain a large number of coincident

harmonics, von Helmholtz (1863) hypothesized that dissonance is caused by beating

between neighboring harmonics (when they are close but not coincident). Plomp and

Levelt (1965) showed that the dissonance of two complex tones could be predicted by

summing the roughness of its constituent neighboring (pure-tone) partials.

1All-order interval distributions consist of intervals bounded by two spikes which contain anynumber (i.e., all numbers) of intervening spikes. All-order distributions are also called autocorrela-tion or auto-coincidence distributions (Perkel, Gerstein, and Moore, 1967; Rodieck, 1967; Ruggero,1973; Evans, 1983).

13

In addition to this “roughness” theory of dissonance there have been other expla-

nations for the basis of consonance and dissonance. Stumpf’s (1890) fusion theory

states that sounds are consonant because their individual components fuse together

to form a single perceptual entity, more so than do dissonant sounds. Another theory

is the long wave hypothesis from Boomsliter and Creel (1961) which states that con-

sonance is based on the length of the overall period of a stimulus. They show that

consonant intervals (based on simple integer ratios of fundamental frequencies) have

shorter overall periods than do dissonant intervals. Finally, there is a pitch theory

which suggests that consonant harmonic intervals have a more perceptually salient

common fundamental bass frequency than do dissonant intervals. This idea stems

from Rameau’s (1722) theory of “basse fondamentale” and complements the notion

that sensory consonance is just a lack of roughness (sensory dissonance) (Tramo et al.,

2000). The plausibility of each of these theories is addressed in discussions below, but

the working assumption here is that sensory dissonance is based on roughness and

that consonance may be based on pitch salience. Chapters 3 and 4 focus on these

two theories.

1.3 Overview

The chapters that follow are written as self-contained scientific papers, each describing

a particular neural correlate of pitch or dissonance. Together, these findings illustrate

the complexity and specificity of neural processing in the auditory periphery, brain-

stem and midbrain as it pertains to music perception; they show that musical percepts

generally considered to be “high order”, such as the dissonance of musical intervals,

have direct neural correlates in low- and mid-level nuclei of the auditory CNS.

The work described below begins in the AN in Chapter 2 and moves directly to

the inferior colliculus (IC) in Chapters 3 and 4. One reason for making such a jump is

that the IC is known to be an obligatory synapse in the auditory pathway and thus the

processing that occurs at nuclei below the level of the IC should be evident in IC neural

responses. In addition, neurons at lower level nuclei, such as in the cochlear nucleus,

14

tend to encode the fine time structure of stimuli, similar to AN fibers, while neurons

in the IC respond more to the temporal envelope of stimuli (Delgutte, Hammond,

and Cariani, 1998; Joris and Yin, 1998). For our investigation of dissonance we are

more interested in the neural coding of the temporal envelope.

1.3.1 Chapter 2: A possible neurophysiological basis of the

octave enlargement effect

The octave, a frequency ratio of 2:1, is the basis for most known music tuning sys-

tems. The pitches of two tones separated by an octave are deemed equivalent in the

context of a musical scale. While the physical octave is defined as a frequency ratio of

2:1, perceptually, listeners prefer slightly greater ratios. This preference, the octave

enlargement effect, occurs for a wide variety of stimulus conditions and in subjects

with various musical backgrounds (Ward, 1954; Walliser, 1969; Dobbins and Cuddy,

1982). In Chapter 2 we show that a neural correlate for the octave enlargement effect

exists in ISIs of AN fibers. This finding provides support for the idea that musical

pitch is encoded in AN ISIs.

1.3.2 Chapter 3: Neural correlates of the dissonance of musi-

cal intervals in the inferior colliculus. I. Monaural and

diotic tone presentation.

Tramo et al. (1992; 2000) found a correlate of roughness in temporal discharge pat-

terns of auditory-nerve fibers. Their model for roughness operates on fibers grouped

by CF and uses bandpass filters to extract the temporal fluctuations in each CF band.

Their filter characteristics were based on the psychophysical dependence of roughness

on modulation frequency. It was noted that these filters resemble modulation transfer

functions (MTFs) of inferior colliculus (IC) neurons (Rees and Møller, 1983; Langner

and Schreiner, 1988; Fastl, 1990; Delgutte, Hammond, and Cariani, 1998). Based on

this observation, we examined responses of IC neurons to musical intervals for cor-

15

relates of roughness (sensory dissonance). We show, in Chapter 3, that correlates of

dissonance exist in the rate fluctuations of all IC neurons and in the average discharge

rates of a subpopulation of neurons.

1.3.3 Chapter 4: Neural correlates of the dissonance of musi-

cal intervals in the inferior colliculus. II. Dichotic tone

presentation and pitch salience.

Because many IC neurons respond to interaural phase differences (IPDs) (Yin and

Kuwada, 1983), it is likely that they would respond similarly to diotically- and dichot-

ically-presented musical intervals. This is interesting because dichotically presented

tones are thought not to produce a roughness sensation (eg., Roederer, 1979) and

therefore would not be perceived as dissonant according to our working assumption

on the basis of dissonance. Similar neural responses to these stimuli that purportedly

differ in their perception would force us to rethink our conclusions on the neural code

for dissonance. In Chapter 4, we examine responses of IC neurons to both dichotic

and diotic presentation of musical intervals and show that some neurons do indeed

respond similarly to both types of stimuli. We offer several possible resolutions to this

“dichotic quandary” and also examine the representation of pitch in ISI histograms

of neural responses in the IC and look for neural correlates of consonance.

16

Chapter 2

A possible neurophysiological basis

of the octave enlargement effect1

Abstract

Although the physical octave is defined as a simple ratio of 2:1, listenersprefer slightly greater octave ratios. Ohgushi (J. Acoust. Soc. Am., 73,1694-1700) suggested that a temporal model for octave matching wouldpredict this octave enlargement effect because, in response to pure tones,auditory-nerve interspike intervals are slightly larger than the stimulusperiod. In an effort to test Ohgushi’s hypothesis, we collected auditory-nerve single-unit responses to pure-tone stimuli from Dial-anesthetizedcats. We found that although interspike interval distributions show clearphase-locking to the stimulus, intervals systematically deviate from inte-ger multiples of the stimulus period. Due to refractory effects, intervalssmaller than 5 msec are slightly larger than the stimulus period and devi-ate most for small intervals. On the other hand, first-order intervals aresmaller than the stimulus period for stimulus frequencies less than 500 Hz.We show that this deviation is the combined effect of phase-locking andmultiple spikes within one stimulus period. A model for octave matchingwas implemented which compares frequency estimates of two tones basedon their interspike interval distributions. The model quantitatively pre-dicts the octave enlargement effect. These results are consistent with theidea that musical pitch is derived from auditory-nerve interspike intervaldistributions.

1Reprinted with permission from McKinney & Delgutte, “A possible physiological basis of theoctave enlargement effect”, Journal of the Acoustical Society of America 106(5), 1999, pp2679-2692.1999, Acoustical Society of America.

17

2.1 Introduction

The octave is the basis of most known tonal systems throughout the world (Dowling

and Harwood, 1986)2. Pitches that are an octave apart are deemed equivalent to some

degree and can serve the same musical function within certain tonal contexts. The

prevalence of the octave as the fundamental building block of tonal systems suggests

that there may be a physiological basis for octave equivalence.

A physical octave is defined as a frequency ratio of 2:1. It is known, however,

that listeners prefer octave ratios slightly greater than 2:1 (Ward, 1954; Walliser,

1969; Terhardt, 1971; Sundberg and Lindqvist, 1973). In a typical procedure to

measure this octave enlargement effect, a subject listens to a lower standard tone

alternating with an adjustable higher tone and is instructed to adjust the frequency

of the higher tone until it sounds one octave above the lower tone. Results of three

such experiments are shown in Fig. 2-1. The size of the preferred or subjective octave

is close to 2:1 at low frequencies but increases with frequency and exceeds the physical

octave by almost 3% at 2 kHz. It is difficult for listeners to make octave judgements

for tones above about 2 kHz. This corresponds to an upper limit in musical pitch

of about 4-5 kHz (Ward, 1954; Attneave and Olson, 1971). There is considerable

variability in the octave enlargement effect across listeners but it is nonetheless a

statistically significant effect in all the reported studies. The effect is also seen in a

wide variety of stimulus conditions and in subjects with various musical backgrounds.

It is seen when the two tones are presented simultaneously (Ward, 1954; Demany

and Semal, 1990) and under the method of constant stimuli (Dobbins and Cuddy,

1982). The studies shown in Fig. 2-1 were all performed using pure-tone stimuli but

Sundberg and Lindqvist (1973) reported the effect with complex tones as well as pure

tones. Ward (1954) reported the presence of the effect in listeners without musical

training and in listeners with musical training as well as in possessors of absolute

pitch. Dowling and Harwood (1986, p.103) reported the effect in a number of musical

cultures.

2Dowling and Harwood report (on p.93) only one known tonal system, from an aboriginal culturein Australia, that is not based on the octave.

18

3.0

2.5

2.0

1.5

1.0

0.5

0.0

-0.5

Dev

iatio

n of

sub

ject

ive

octa

ve fr

om 2

:1 (

%)

25001500500Frequency of Lower Tone (Hz)

Terhardt (1971) Walliser (1969) Ward (1954) Ohgushi (1983)

Figure 2-1. Psychoacoustic measures of the octave enlargement. Adapted from Fig. 4 in Sundbergand Lindqvist (1973)and Fig. 9 in Ohgushi (1983). The subjective octave, obtained from octavematching experiments, is plotted as a deviation from the physical octave versus the frequency ofthe lower tone in the octave pair. The subjective octave is larger than the physical octave and thedeviation grows with frequency.

19

The presence of the octave enlargement effect under a wide range of subject and

stimulus conditions suggests that the effect may have a general physiological basis.

Ohgushi (1983) proposed an octave matching scheme based on a temporal model for

pitch that predicts the octave enlargement effect. In an earlier study, he noticed that,

in response to pure-tones, auditory-nerve interspike intervals are slightly longer than

integer multiples of the stimulus period (Ohgushi, 1978). He then showed, using a

temporal model for octave matching, that these deviations lead to a prediction of the

octave enlargement effect (Ohgushi, 1983).

Upon review of Ohgushi’s (1983) model for octave matching, Hartmann (1993)

pointed out an arbitrary factor of two. This scaling factor, which is not based on

any physiological process, allows a model listener to theoretically set it, and thus the

octave interval, to any value. Hartmann suggested a variation of the model that would

not rely on such a scaling factor. He also suggested that if the model operated on

all-order interspike intervals instead of first-order interspike intervals, it may better

predict the psychoacoustic data.

The work presented here was motivated by the hypotheses presented by Ohgushi

and Hartmann. Neither one of them could reliably test their predictions because

the existing physiological data consisted of only a small number of coarse-resolution

interspike-interval distributions. It was therefore difficult to measure the modes of

the distributions, i.e., characterize the intervals, with high precision. Special methods

were used in this study to ensure high precision interval analyses so that predictions of

temporal models for octave matching could be reliably evaluated. We combined spike

data across fibers to form pooled interspike interval histograms which have been shown

to reflect a wide variety of pitch phenomena (Cariani and Delgutte, 1996a; 1996b).

In addition to characterizing interspike intervals, we have developed and evaluated

models for octave matching, based on Ohgushi’s and Hartmann’s ideas, which operate

on pooled interspike interval histograms.

20

2.2 Method

The methods used in this study differ from typical auditory-nerve (AN) studies in

that specific efforts were taken to ensure accurate estimation of interspike intervals

(ISIs): Unusually long recordings were made to ensure the inclusion of a high number

of spikes in each record; very fine binwidths (1 µsec) were used when generating ISI

histograms in order to accurately estimate the modes.

2.2.1 Experiment

Data were recorded from auditory-nerve fibers in six healthy, adult cats. Cat prepa-

ration and recording techniques were standard for our laboratory (Kiang et al., 1965;

Cariani and Delgutte, 1996a).

In each experiment, the cat was Dial-anesthetized with an initial dose of 75 mg

per kg of body weight and subsequent doses of 7.5 mg per kg of body weight. A

craniectomy was performed and the middle-ear and bulla cavities were opened to

access the round window. The cerebellum was retracted to expose the AN. Injec-

tions of dexamethasone (0.26 mg/kg of body weight/day), to reduce brain swelling,

and Ringer’s saline (50 ml/day), to prevent dehydration, were given throughout the

experiment.

The cat was placed on a vibration isolation table in an electrically-shielded,

temperature-controlled (38 C) chamber. The AN compound action potential (CAP)

in response to click stimuli was monitored with a metal electrode placed near the

round window. The cat’s hearing was assessed by monitoring the CAP threshold and

single-unit thresholds.

Sound was delivered to the cat’s ear through a closed acoustic assembly driven by

a (Beyerdynamic DT 48A) headphone. The acoustic assembly was calibrated with

respect to the voltage delivered to the headphone, allowing for accurate control over

the sound pressure level at the tympanic membrane. Stimuli were generated by a

16-bit, Concurrent (DA04H) digital-to-analog converter using a sampling rate of 100

kHz. The total harmonic distortion for pure-tones between 110 and 3000 Hz was less

21

than -55 dB re fundamental when measured at a stimulus level of 95 dB SPL.

AN action potentials (spikes) were recorded with glass micropipette electrodes

filled with 2 M KCl. The electrodes were visually placed on the nerve and then

mechanically stepped through the nerve using a micropositioner (Kopf 650). The

electrode signal was band-pass filtered and fed into a spike-detector. The times of

spike peaks were recorded with 1 µs precision.

Nerve fibers were sought using a click (near 55 dB SPL) as a search stimulus. Upon

contact with a fiber, a threshold tuning curve was generated using the Moxon (Kiang,

Moxon, and Levine, 1970) algorithm with a criterion of 0. The spontaneous rate of

the fiber was then measured by counting the number of spikes over a 20 second

period. Units with a characteristic-frequency (CF) threshold more than two standard

deviations away from the mean threshold for normal AN fibers (as found by Liberman

and Kiang, 1978) were not included in the analysis.

An estimate of the number of false triggers in the spike record was derived from

examination of the ISIs. Because the absolute refractory period of AN fibers prohibit

ISIs smaller than about 0.5 msec (Gaumond, Molnar, and Kim, 1982; Gaumond,

Kim, and Molnar, 1983), intervals shorter than 0.5 msec were assumed to be false

triggers. Spike records containing more than 0.1 % of these short intervals were not

included in the analysis.

The experimental data were recorded using pure-tone stimuli at frequencies of

110, 220, 440, 880, 1500, 1760, 3000 Hz and at levels of 5, 10, 15, 20, 25, 40, 60

dB re threshold. The stimulus was presented once per second (400 msec on, 600

msec off, 2.5 msec rise and fall times) for 180 seconds or until 20,000 spikes had been

recorded, whichever came first. In order to avoid the possible complex effects of onset

transients and adaptation, spikes that occurred during the first 20 msec following the

onset of each stimulus and during the stimulus off-time were excluded. Recordings

containing fewer than 5000 spikes were not included in the analysis. This unusually

high requirement on the minimum number of spikes in the record ensures a reliable

estimate of the ISI distribution.

22

2.2.2 Analysis

Auditory-nerve responses to low-frequency stimuli tend to occur at a specific phase

with respect to the stimulus (Rose et al., 1967; Kiang et al., 1965). Thus, ISI distri-

butions display modes at intervals corresponding, roughly, to integer multiples of the

stimulus period. The main goal of the analysis in this study was to accurately esti-

mate modes of AN ISI distributions in order to quantitatively verify Ohgushi’s (1978)

observation that the intervals deviate from the stimulus period.

There were three main steps to the analysis of the ISI distributions. First, a his-

togram of the intervals was produced. Second, the mean interval of each mode in

the histogram was estimated by fitting, in the maximum likelihood sense, a Gaus-

sian mixture density to the histogram. Third, deviation of the interval modes from

stimulus periods were characterized.

Histogram generation

The first step in the analysis was to generate histograms of the ISIs. The histogram

binwidths were 2 µsec for frequencies less than 300 Hz and 1 µsec for frequencies

above 300 Hz. Both first-order and all-order ISI histograms were computed.

Mode estimation

The second step in the analysis was to estimate the modes of the interspike interval

distribution. A maximum likelihood (ML) estimation approach was implemented in

which the interval distributions were modeled as a mixture of Gaussian densities with

each mode in the distribution corresponding to a single density. This mixture density

was fit (in the ML sense) to the interval histograms and the means of the individual

Gaussian densities were taken as the estimated modes in the histogram. Two forms

of mixture density were used, one for estimating individual modes in the interval

distributions and another for estimating the fundamental mode (i.e. stimulus period)

in the distributions (and subsequently the stimulus frequency). In the first case, the

individual Gaussian densities in the mixture had mutually independent means and

23

variances. In the second case, they were assumed to have harmonically related means

and a common variance.

Because obtaining the ML estimates of the parameters is not analytically straight-

forward, we used the expectation-maximization (EM) algorithm, an iterative tech-

nique which converges to the ML estimate (Redner and Walker, 1984; Moon, 1996).

Mathematical details of our implementation are included in the appendix.

Mode offset

The third step in the analysis was to calculate the mode offset (MO), the difference

between the mode estimate (ME) and the corresponding multiple of the stimulus

period:

MOn = MEn −n

f(2.1)

where f is the frequency of the stimulus and n is the mode number (e.g., mode 1

contains intervals that are roughly one stimulus period in length and mode 2 contains

intervals that are roughly 2 stimulus periods in length). Figure 2-3(e) illustrates the

above calculation for MO1.

In an effort to represent the total AN population response to the stimuli, pooled

histograms were generated by summing all of the individual ISI histograms for a

specific stimulus frequency. Mode estimates of the pooled histograms were calculated

as well.

2.3 Results

From six experiments, a total of 399 spike records from 164 fibers were obtained that

met our requirements in terms of the minimum number of spikes, normal thresholds,

and small number of false triggers. The majority (79%) of the records were from high

spontaneous rate fibers. CFs ranged from 150 to 17,000 Hz.

Figure 2-2(a) shows a schematic representation of a stimulus waveform and a

hypothetical spike record. ISIs are roughly integer multiples of the stimulus period.

24

First-order intervals are those between consecutive spikes, second-order intervals are

those between every other spike, etc.

250

200

150

100

50

01086420

1st-Order 2nd-Order 3rd-Order

300

250

200

150

100

50

01086420

First-Order

Interspike Interval (msec)

Num

ber

of In

terv

als

300

250

200

150

100

50

01086420

All-Order

(b)

(c) (d)

Pure-toneStimulus

AN Spikes

(a)

Num

ber

of In

terv

als


First-OrderSecond-Order

Third-Order

Time

Figure 2-2. Histogram generation. (a) is a schematized representation of a pure-tone stimulus andcorresponding spike record from the auditory-nerve. The order of the interspike interval is basedon the number of spikes included in the interval: first-order intervals are those between consecutivespikes; second-order intervals are those between every other spike; third-order intervals are thosebetween every third spike. (b) and (c) are histograms of the various types of interspike intervals.(d) is an interval histogram containing intervals of all orders, thus termed an all-order histogram.All of the histograms were generated from the same spike record. The stimulus was an 880 Hzpure-tone at 84 dB SPL. The auditory-nerve fiber from which the recording was made had thefollowing properties: CF: 2609 Hz; SR: 29 spikes/sec. The histograms have a binwidth of 40 µsecand the following number of total intervals: first-order: 10305; second-order: 5696; third-order:1643; all-order: 17874.

Figure 2-2(b) shows a histogram of first-order ISIs from a single-unit recording

generated with an 880 Hz tone stimulus. The modal distribution of intervals clearly

reflects the synchronization of the spike train to the stimulus and the position of the

modes provide information about the stimulus frequency (Rose et al., 1967).

25

Figure 2-2(c) displays first-, second-, and third-order histograms based on the

same spike record as in (b). As one would expect, first-order intervals are, on average,

shorter than second- and third-order intervals and thus fall into earlier modes. There

is, however, a great deal of overlap in the distributions of the intervals of different

orders and the intervals of a particular order are not confined to a single mode in the

histogram.

The histogram shown in Fig. 2-2(d) contains ISIs of all orders, and is thus termed

the all-order ISI histogram. This histogram is sometimes referred to as the autocor-

relation or auto-coincidence histogram (Perkel, Gerstein, and Moore, 1967; Rodieck,

1967; Ruggero, 1973; Evans, 1983).

An important difference between first-order and all-order ISI histograms is their

general shape: the size of the modes in the first-order ISI histogram tends to decrease

as the mode number increases; the size of the modes in the all-order ISI histogram

is relatively constant. In other words, when one examines very long ISIs, few are

first-order intervals. The all-order ISI histogram does not reflect the decaying trend

because higher-order intervals are included and “fill in” the modes at long intervals.

In addition to the 399 spike records included in the analysis, 28 spike records that

met our data requirements were excluded from the analysis because their histograms

displayed peak-splitting. At moderate to high levels of pure-tone low-frequency stim-

ulation, AN ISI histograms can exhibit two or sometimes three peaks per stimulus

cycle instead of the normal one (Kiang and Moxon, 1972; Kiang, 1980; Liberman

and Kiang, 1984; Kiang, 1990; Ruggero et al., 1996). Most of the fibers from which

we recorded did not exhibit this behavior within our stimulus-level range, but those

records that did were excluded to simplify the analysis. In our data, peak-splitting

occurred primarily at stimulus frequencies below 440 Hz.

2.3.1 All-Order Interspike Intervals

Figure 2-3(a)-(d) are all-order ISI histograms from one AN fiber for four different

stimulus frequencies. Figure 2-3(e) is a magnification of the histogram in (a) with

the mode estimates indicated by X’s above each mode. As previously reported by

26

Ohgushi (1978; 1983), the short intervals (early modes) are slightly longer than stim-

ulus periods. This deviation is presumed to be at least partially due to the refractory

period of the auditory-nerve fiber (Ohgushi, 1978). The mode offset for the first mode

is labeled in the figure.

Mode offsets from the histograms in Fig. 2-3(a)-(d) are plotted in (f) as a function

of ISI length. The mode offset decreases monotonically as the ISI increases (Fig. 2-

3(f)) and for intervals greater than about 5 msec, mode offsets are insignificant. To

a first approximation, the mode offsets depend primarily on ISI and not stimulus

frequency. However, at any particular ISI <∼5 msec, lower frequency stimuli generally

yield slightly larger mode offsets.

Figure 2-4(a)-(c) show how mode offsets vary with fiber CF, spontaneous rate

(SR) and discharge rate (DR) for all-order histograms of 220 and 1760 Hz. The DR

is typically a compressed function of stimulus level ranging from SR to saturation

rate. The mode offsets in all-order ISI histograms do not obviously depend on the

fiber CF, SR or DR. Because of this, we decided to pool the ISI data (across fibers

and stimulus levels) and use pooled histograms for testing the model presented in the

next section. Because pooled histograms contain many more intervals than single-

fiber histograms they more accurately represent the underlying interval probability

distributions. Figure 2-4(d) shows mode offsets grouped by the cat from which they

were measured. There is a small, but statistically significant (see caption) variation,

across cats for the 1760 Hz data. Despite this trend, we decided to also pool data

across cats. Conclusions based on the analysis of data from individual cats were not

different from those based on pooled data.

Figure 2-5 shows pooled histograms for six stimulus frequencies. The pooled his-

tograms are much smoother than the single-fiber histograms due to the large number

of intervals they contain. Mode offsets are clearly visible at intervals less than about

5 msec. Modes in the 110 and 220 Hz histograms show no offset because even the

earliest modes occur at intervals greater than or near 5 msec.

Figure 2-6(a)-(e) show mode offsets as a function of interval length for pooled

histograms as well as for single-fiber histograms. Figure 2-6(f) shows just the pooled

27

80

60

40

20

01086420

1760 Hz ejt2-29-7

100

80

60

40

20

01086420

440 Hz ejt2-29-5

0.25

0.20

0.15

0.10

0.05

0.00

Mod

e O

ffset

(m

sec)

1086420

1760 Hz 880 Hz 440 Hz 220 Hz

60

50

40

30

20

10

01086420

220 Hz ejt2-29-4

80

60

40

20

0

2.52.01.51.00.50.0

1 2 3 4

1760 Hz ejt2-29-7

ModeOffset

100

80

60

40

20

01086420

880 Hz ejt2-29-6(a) (b)

(c) (d)

(e) (f)


Num

ber

of In

terv

als

Figure 2-3. Histogram mode offset. (a)-(d) are all-order ISI histograms of specified frequency with40 µsec binwidths. Vertical dashed lines mark integer multiples of the stimulus period. (e) is amagnification of the first four modes of (a). The gray curve outlining the histogram is the MLestimate of the Gaussian mixture density corresponding to the histogram. The ×’s above the modesin (e) mark the ML estimate of the mode (the ML means of the individual Gaussian pdfs in themixture density), obtained from Eq. (2.12) operating on a histogram with 1 µsec binwidths. Themode offset is the deviation of the mode estimate from the corresponding integer multiple of thestimulus period. Each histogram was generated from a separate spike record but each spike recordwas obtained from the same auditory-nerve fiber. Fiber characteristics: CF = 2602 Hz; SR = 66spikes/sec. The stimulus levels were all 10 dB re threshold, corresponding to the following levelsfor each spike record (in dB SPL): (a) 27; (b) 45; (c) 62; (d) 70. (f) displays the mode offsets fromthe histograms in (a)-(d). Mode offsets are primarily a decreasing function of interval, although, atcorresponding intervals, lower frequency stimuli yield slightly larger mode offsets.

28

0.2

0.1

0.0

-0.1

-0.2

25020015010050

0.2

0.1

0.0

-0.1

-0.2

100 1000 10000

0.2

0.1

0.0

-0.1

-0.2

120100806040200

(a) (b)

(c) (d)

All-

Ord

er M

ode

Offs

et

Characteristic Frequency (Hz) Spontaneous Rate (sp/sec)

Discharge Rate (sp/sec)

0.2

0.1

0.0

-0.1

-0.2

654321

Cat Number

220 Hz 1760 Hz

Figure 2-4. Variation of mode offset across spontaneous rate, characteristic frequency, dischargerate and cat. +’s mark the first mode offset of every individual 220 Hz data record and ’s mark thesecond mode offset of every individual 1760 Hz data record. These frequencies and mode numberswere chosen as typical representatives of our low- and high-frequency data. In (c), lines connectmode offsets (plotted against DR) that were derived from the same fiber. (a), (b), and (c) showthat there is no obvious dependence of mode offset on CF, SR, or DR. (d) shows how mode offsetdepends on the cat from which it was measured. One-way ANOVA on the data in (d), using the catnumber as the individual factor, yielded the following p-values: p = 0.003 for 1760 Hz and p = 0.139for 220 Hz. Although there were significant differences in mode offsets across cats our decision topool data across cats did not affect our general conclusions.

29

2.0

1.5

1.0

0.5

0.0

x103

1086420

3000 Hz

8

6

4

2

0

x103

1086420

880 Hz

2.5

2.0

1.5

1.0

0.5

0.0

x103

12840

110 Hz

10

8

6

4

2

0

x103

1086420

440 Hz

5

4

3

2

1

0

x103

1086420

220 Hz

10

8

6

4

2

0

x103

1086420

1760 Hz(a) (b)

(c) (d)

(e) (f)


Num

ber

of In

terv

als

Figure 2-5. Pooled histograms. (a)-(f) are pooled all-order ISI histograms of specified frequency.The histograms have the same format as Fig. 2-3(a)-(d). The intervals are pooled from the followingnumber of fibers: (a) 26; (b) 75; (c) 58; (d) 47; (e) 33; (f) 10. Positive mode offsets are visible forintervals <∼5 msec (i.e., modes at intervals smaller than 5 msec are shifted slightly to the right oftheir corresponding stimulus-period multiple). Note that the scale of the abscissa in (f) is differentthan the other panels.

30

histogram mode offsets for five stimulus frequencies. Although there is some variation

across fibers in the size of mode offsets, the characteristics seen in the single fiber data

are evident in the pooled data: the mode offset is a monotonically decreasing function

of ISI; mode offsets for intervals greater than about 5 msec are insignificant; and for

a given ISI, lower frequency stimuli yield slightly larger mode offsets. Thus, these

characteristics seem to be general phenomena and not just particular to one type of

auditory-nerve fiber or stimulus intensity.

2.3.2 First-Order Interspike Intervals

The general shape of first-order histograms change with fiber discharge rate while

the shape of all-order histograms remains relatively constant (Cariani and Delgutte,

1996a). Figure 2-7 shows interval histograms from one AN fiber for a 220 Hz tone

at three stimulus levels. As the SPL, and therefore discharge rate, increases, the

average first-order interval gets shorter and the relative sizes of the histogram modes

reflect this change: the later modes get smaller and the early modes get larger. In

contrast, as the discharge rate increases, higher-order intervals fill in the modes that

get depleted of first-order intervals so that the general shape of all-order histograms

remains unchanged.

The main difference between mode offsets of first-order and all-order intervals is

the presence of negative mode offsets for low stimulus frequencies in the first-order

data. A negative mode offset means that a particular ISI mode is shorter than

the corresponding stimulus-period multiple. This is illustrated in the first-order low-

frequency histograms in Fig. 2-8(a) and (b): the modes occur slightly to the left of the

stimulus period lines. Mode offset data for low-frequency first-order ISI histograms

are shown in panels (c)-(f). These mode offsets show a greater variability at low

stimulus frequencies than those from all-order ISIs.

The negative mode offsets in low-frequency first-order ISI histograms are due to

the presence of intervals in Mode zero (0). As Fig. 2-9(a) and (b) illustrate, an interval

falls into Mode 0 if two spikes occur within the same half-period of the stimulus. Mode

0 is bounded on the left by the absolute refractory period and on the right by half

31

0.20

0.15

0.10

0.05

0.00

-0.05

1086420

1760 Hz 880 Hz 440 Hz 220 Hz 110 Hz

0.4

0.3

0.2

0.1

0.0

-0.1

-0.2

1086420

110 Hz

0.4

0.3

0.2

0.1

0.0

-0.1

-0.2

1086420

220 Hz0.4

0.3

0.2

0.1

0.0

-0.1

-0.2

1086420

440 Hz

0.4

0.3

0.2

0.1

0.0

-0.1

-0.2

1086420

880 Hz0.4

0.3

0.2

0.1

0.0

-0.1

-0.2

1086420

1760 Hz


Mod

e O

ffset

(m

sec)

(a) (b)

(c) (d)

(e) (f)

Figure 2-6. Mode offsets of all-order ISI histograms. (a)-(e) display the mode offsets of pooled andindividual histograms of specified frequency. Lines connect the mode offsets of pooled histograms anddots mark the mode offsets of individual histograms. (f) shows the pooled-histogram mode offsetsfor most of the experimental stimulus frequencies. Mode offsets in pooled histograms show thesame trend with interval as those in individual histograms: mode offset is primarily a monotonicallydecreasing function of interval, although lower frequency stimuli yield slightly larger mode offsets atcorresponding intervals. Note that the scale of the ordinate in (f) is different than the other panels.

32

600

500

400

300

200

100

020151050

600

500

400

300

200

100

020151050

200

150

100

50

020151050

150

100

50

020151050

40

30

20

10

020151050

60

50

40

30

20

10

020151050


1st-Order

52dB SPL

68sp/sec

57dB SPL

125sp/sec

67dB SPL

197sp/sec

(b)

(c) (d)

(e) (f)Num

ber

of In

terv

als

All-Order

(a)

Figure 2-7. A series of 220 Hz ISI histograms (from the same auditory-nerve fiber) over a rangeof discharge rates. Stimulus level and fiber discharge rate are indicated to the left of the plots.First-order ISI histograms are plotted in (a), (c) and (e). All-order ISI histograms are plotted in(b), (d) and (f). Fiber characteristics: CF: 409 Hz; SR: 0.7 spikes/sec; Threshold at 220 Hz: 47 dBSPL. The histogram binwidths are 80 µsec.

33

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

1086420

1760 Hz 880 Hz 440 Hz 220 Hz 110 Hz

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

14121086420

110 Hz

-0.6

-0.4

-0.2

0.0

0.2

0.4

1086420

220 Hz

0.4

0.3

0.2

0.1

0.0

-0.1

-0.2

1086420

440 Hz

120

80

40

0

1086420

220 Hz bd172-51-10

100

80

60

40

20

0

14121086420

110 Hz ejt1-48-3


(a)

(c) (d)

(e) (f)

(b)

Figure 2-8. Low-frequency first-order ISI histograms display negative mode offsets. (a) and (b)show first-order ISI histograms in the same format as those in Fig. 2-3. The ×’s above the modesmark the ML estimate of the mode. (c), (d) and (e) show mode offsets from first-order ISI histogramsin the same format as Fig. 2-6. (f) shows the pooled-histogram mode offsets for most of the stimulusfrequencies. Below ∼500 Hz, first-order ISI histograms display large negative mode offsets (i.e.,modes are shifted slightly to the left of their corresponding stimulus-period multiple), in contrastto the insignificant mode offsets present in low-frequency all-order ISI histograms. Note that thescales of the axes vary across panels.

34

the stimulus period. Due to the refractory period of the AN fiber, only low frequency

stimuli (<∼500 Hz) produce ISI histograms that contain a Mode 0. When an interval

occurs in Mode 0, the preceding and following first-order intervals tend to be smaller

than if just a single spike had occurred in that half-period. The relationship between

consecutive intervals can be seen by examining a joint ISI histogram (Rodieck, Kiang,

and Gerstein, 1962), as shown in Fig. 2-9(c). The joint ISI histogram is a two-

dimensional histogram which plots the ISI size against that of the previous ISI. It

is displayed here as a grayscale image in which gray-level indicates the number of

interval pairs in a small square bin. As is the case for one-dimensional ISI histograms,

the modal distribution of intervals is clearly evident in this plot: the intervals tend

to cluster near integer multiples of the stimulus period. We will use the notation

Mode(n,m) to refer to the mode in which the previous interval is in Mode n and

the current interval is in Mode m. Examination of Mode(0, 1), Mode(0, 2), and

Mode(0, 3) shows that if the previous interval lies in Mode 0, the current interval tends

to be shorter than the corresponding stimulus-period multiple. Also, examination of

Mode(1, 0), Mode(2, 0), and Mode(3, 0) shows a similar dependency on Mode 0 for

the current interval. Thus, in a first-order ISI histogram, the presence of intervals in

Mode 0 effectively biases the other modes towards smaller values.

Offsets of higher modes in all-order ISI histograms are not affected by the presence

of intervals in Mode 0 because these histograms include higher order intervals. For

every first-order interval that is shortened by the presence of an interval in Mode 0,

there is a second-order interval (which includes the one in Mode 0) that is lengthened.

This can be seen schematically in Fig. 2-9(a). The lengthened second-order interval

falls into the same mode as the shortened first-order interval and counteracts its effect

on the mode offset.

The effect of Mode 0 on first-order ISI histograms can be quantified by selecting

only those intervals that do not precede or follow intervals in Mode 0. This condition-

ing was performed on all of the 220 Hz stimulated spike records and then histograms

of the conditioned intervals were generated. The distribution of the mode estimates

for Mode 1 in these conditioned interval distributions is plotted in Fig. 2-9(d) along

35

-1.0

-0.5

0.0

0.5

1.0

543210

time

1 10 2

Pure-toneStimulus

ANSpikes

200

150

100

50

0Num

ber

of In

terv

als

1612840Interspike Interval

0 1 2 3

First-Order(a) (b)

Num

ber

of M

odes

5.04.84.64.44.24.03.8Mode 1 Estimate

Conditioned1st-order

1st-order All-order

20Modes

(d)(c)

Figure 2-9. Intervals in Mode 0. (a) is a schematic representation of a stimulus and a correspondingspike record. The number between the spikes in (a) indicates the mode of the histogram, shownin (b), to which the first-order interval belongs. (b) is a first-order ISI histogram and (c) is a jointfirst-order ISI histogram of the same auditory-nerve spike record. The joint ISI histogram is atwo-dimensional histogram which plots ISI size against that of the previous ISI. It is displayed as agray-scale image in which gray-level indicates the number of interval pairs in a small square bin. Thedashed lines in (b) and (c) mark integer multiples of the stimulus period. The stimulus was a 220Hz pure-tone at 61 dB SPL. The fiber characteristics are: CF: 379 Hz; SR: 76 spikes/sec; thresholdat 220 Hz: 36 dB SPL. The histogram binwidth is 80 µsec in (b) and 64 µsec for both dimensionsin (c). (d) shows the distribution of the estimates of Mode 1 in 50 individual 220 Hz histogramsfor all-order, first-order and conditioned first-order ISIs. The condition in the third case is thatthe intervals do not follow or precede an interval in Mode 0. The traces are vertically offset forclarity and the vertical bar in the lower left denotes 20 modes. The binwidth of the mode estimatedistribution is 50 µsec. The vertical dashed line marks the stimulus period. Negative mode offsets inlow-frequency first-order ISI histograms are due to shortened intervals caused by intervals in Mode0 (i.e., two spikes within the same half-period of the stimulus).

36

with similar unconditioned distributions from first-order and all-order ISI histograms.

The alignment of the mode-estimate distributions for all-order and conditioned first-

order ISIs indicates that the presence of intervals in Mode 0 accounts for nearly all

of the difference between mode estimates in all-order and first-order ISI histograms.

The negative correlation between consecutive intervals is a characteristic of our

data that is not well documented in the literature. The joint ISI histogram in Fig. 2-

9(c) shows a clear dependence between the previous and current first-order ISI. All

of the modes are oval with the long axis going diagonally from the top left to the

bottom right of the figure. This means that if the previous interval was shorter than

average, the current interval will tend to be longer than average and vice versa. This

is a consequence of phase-locking: every interval longer than the stimulus period must

be compensated for by a shorter interval if the spikes are to remain phase-locked.

2.4 Model

Our primary objective in formulating a (central) model for octave matching is to

evaluate how physiological constraints in the auditory periphery, i.e., deviations in

AN ISIs, affect the central processor. This is best accomplished with simple models

that have few, if any, free parameters so that the effect of the peripheral physiological

behavior is not clouded. With this in mind, we developed a temporal model for

pure-tone octave matching based on Ohgushi’s (1983) model.

2.4.1 Model for estimating pure-tone frequency

The basic assumption of the model is that perceived pitch is equal to a biased estimate

of the stimulus frequency derived from AN ISIs. The bias in the frequency estimate

comes from the mode offsets in the ISI histograms. Frequency estimates were derived

from interval histograms using the EM algorithm (Eqs. (2.6) and (2.7)), assuming a

mixture density of Gaussians with harmonically related means (Eq. (2.14)):

f =1

µML(Nmax), (2.2)

37

where f is the estimate of stimulus frequency f , µML is the ML estimate of the

fundamental mean in the mixture density (µ+ in Eq. (2.15)), and Nmax is the number

of modes included in the calculation (M in Eqs. (2.15) and (2.16)). If the modes

occur exactly at integer multiples of the stimulus period, µML will equal the stimulus

period and the frequency estimate will be equal to the stimulus frequency.

Estimates for each stimulus frequency were calculated using pooled ISI histograms

and their deviations from the stimulus frequency were derived as follows:

fDEV = 100 · f − ff

(2.3)

where fDEV is the percent deviation of the frequency estimate and f is the frequency

estimate.

fDEV is plotted versus stimulus frequency in Fig. 2-10 for three values of Nmax.

For both all-order intervals and 1st-order intervals, fDEV is a decreasing function of

stimulus frequency3. This trend is a direct result of the dependence of mode offset on

interval size. As the stimulus frequency increases, the stimulus period decreases and

the offset for any given mode number increases. This results in a larger estimate of

the fundamental period, µML, and hence, a decrease in the frequency estimate. For

all-order intervals, fDEV is always negative because mode offsets are always positive.

On the other hand, 1st-order ISI intervals yield positive fDEV ’s for low stimulus

frequencies because the histograms contain negative mode offsets.

Figure 2-10 also shows that the free parameter Nmax greatly influences the fre-

quency estimate at high frequencies. For low values of Nmax, the frequency estimate

has a relatively large bias from the mode offsets of the lower modes. Since the mode

offset is minimal in the higher modes, the frequency estimate becomes less biased

as Nmax increases. On the other hand, Nmax has little effect on the estimates at

low frequencies because, either the mode offsets are consistently small for all modes

(all-order ISIs), or the higher modes contain few intervals and thus little weight in

3The slight deviation from monotonicity near 1500 Hz is due to differences in mode offsets acrosscats and uneven sampling across cats. The data at 1500 and 3000 Hz is primarily from two catswhich showed relatively large mode offsets in their AN responses (cats 5 and 6 in Fig. 2-4(d)).

38

-8

-6

-4

-2

0

2

4

1002 3 4 5 6 7 8 9

10002 3 4 5 6 7

First-Order

Nmax = 4

Nmax = 6

Nmax = 10

-8

-6

-4

-2

0

2

4

1002 3 4 5 6 7 8 9

10002 3 4 5 6 7

All-Order

Nmax = 4

Nmax = 6

Nmax = 10

Fre

quen

cy E

stim

ate

Dev

iatio

n (%

)

Stimulus Frequency (Hz)

(a)

(b)

Figure 2-10. Frequency estimate deviation (fDEV ) vs. frequency. (a) displays fDEV calculatedfrom pooled all-order ISI histograms for the values of Nmax shown next to each trace. (b) displaysfDEV calculated from pooled first-order ISI histograms. For frequencies >∼500 Hz, fDEV is a de-creasing function of both frequency and Nmax. Error bars show an estimate of the standard error offDEV . The estimate was calculated using the bootstrap technique (Efron and Tibshirani, 1993): 50simulations of the frequency estimate were calculated (Eq. (2.2)) in which pooled histograms weregenerated by randomly choosing (with replacement) spike records of individual stimulus presenta-tions. The standard deviation of the frequency estimates from these simulations is an estimate ofthe standard error of the mean.

39

the calculation of f (first-order ISIs).

2.4.2 Model for octave matching

The model operates on two sets of pooled ISI histograms to predict the size of the

pitch interval separating their respective stimuli. The pitch interval prediction is

obtained by comparing the frequency estimate (Eq. (2.2)) of a low-frequency tone,

f1, with the frequency estimate of a high-frequency tone, f2. The model predicts that

f1 and f2 are separated by a subjective octave when:

f2 = 2 · f1. (2.4)

The model algorithm can be interpreted graphically as attempting to align the

modes of the scaled (by two) f1 histogram with the modes of the f2 histogram. An

octave is predicted when the modes are best aligned.

The deviation of the model prediction (i.e. “subjective octave”) from the physical

octave, ∆SO, is:

∆SO = 100 · 2 · f1 − f2

f1

, (2.5)

for f1 and f2 separated by a physical octave.

Model predictions are shown in Fig. 2-11 for several values of Nmax. As in the

frequency estimate (Fig. 2-10), variation in Nmax results in large changes in model

predictions at high frequencies. As Nmax increases, more modes with little or no offset

are included in the frequency estimates and the resulting deviation of the subjective

octave decreases.

When all-order ISIs are used as model input, the model predicts an octave en-

largement in general agreement with the psychoacoustic data (Fig. 2-11(a)) at most

frequencies for Nmax ≈ 4 − 6. At low frequencies, the model underestimates the

psychoacoustic octave enlargement for all values of Nmax, but its predictions are still

within the range of the psychoacoustic data. At 1500 Hz, the model predicts the

range of psychoacoustic data simply by varying Nmax from 4 to 6.

40

3

2

1

0

-1

8 9100

2 3 4 5 6 7 8 91000

2 3

First-OrderNmax = 4

Nmax = 6

Nmax = 10

3

2

1

0

-1

8 9100

2 3 4 5 6 7 8 91000

2 3

All-OrderNmax = 4

Nmax = 6

Nmax = 10

Frequency of the Lower Tone (Hz)

Dev

iatio

n fr

om P

hysi

cal O

ctav

e (%

)

(a)

(b)

Figure 2-11. Model predictions of the octave enlargement effect. The model predictions are basedon pooled histograms for each stimulus frequency. Error bars show the estimated standard error ofthe subjective octave prediction and were calculated in a similar manner to those in Fig. 2-10. (a)shows the model predictions for all-order ISIs and several values of Nmax. (b) shows the same forfirst-order ISIs. Although, low-frequency data is not well predicted by the model, the predictionsbased on all-order intervals are within the range of the psychoacoustic data with Nmax ≈ 4− 6.

41

When operating on first-order ISIs, the model, with Nmax = 4, predicts an octave

enlargement in general agreement with the psychoacoustic data except at 100 Hz

where the model predicts a much larger deviation (Fig. 2-11(b)). In addition, the

model predicts a decrease in deviation as frequency increases (at low frequencies)

but the psychoacoustic data show the opposite trend. The model’s predicted octave

enlargement at low frequencies is due to the negative mode offsets in the first-order

ISI histograms. The frequency estimates of these low frequency tones are higher than

the true frequency (see Fig. 2-10(b)) and when they are matched to estimates of

(upper) tone frequencies that produce little or no negative mode offsets, an octave

enlargement is predicted.

In summary, the model, operating on first- or all-order ISIs with Nmax ≈ 4 − 6,

predicts the octave enlargement effect at mid- to high-frequencies. At low frequencies,

the model underestimates the effect when operating on all-order ISIs and overesti-

mates it when operating on first-order ISIs.

2.5 Discussion

2.5.1 Auditory Nerve Physiology

We have shown that, in response to low-frequency pure-tones, AN ISIs deviate system-

atically from integer multiples of the stimulus period. When quantitatively expressed

as mode offsets in ISI histograms (Eq.(2.1)), the deviations are positive for ISIs less

than 5 msec and decrease with increasing ISI until they become insignificant for ISIs

greater than 5 msec. In addition, first-order intervals show negative mode offsets for

stimulus frequencies less than 500 Hz. These robust phenomena exist for all CFs and

SRs and over a wide range of stimulus levels. Our quantitative characterization of

these physiological properties provides a solid basis to study how they can effect any

temporally-based estimate of the stimulus frequency.

Our data and analyses suggest that positive and negative mode offsets in ISI

histograms arise from fundamentally different mechanisms. We showed in Fig. 2-

42

9 that negative mode offsets, seen in first-order ISI distributions for low-frequency

stimuli, are due to the occurrence of multiple spikes within the same half-period. In

order to maintain phase-locking between the stimulus and AN response, the intervals

before and after these multiple spikes tend to be slightly shorter, on average, than

multiples of the stimulus period. Positive mode offsets, on the other hand, have been

attributed to the refractory properties of the neurons (Ohgushi, 1983; Ohgushi, 1978)

and, specifically, to a reduction in conduction velocity during the relative refractory

period (de Cheveigne, 1985). While these ideas are reasonable, it is important to note

that the delays causing the offsets could arise at any point from the basilar membrane

to the AN fiber.

A physiological characteristic that we saw in our data but ignored in the analysis is

peak-splitting. This phenomenon causes two or more modes of intervals to be present

within a single stimulus period of an ISI histogram instead of the usual one mode per

stimulus period. The multiple modes are the result of the AN response going through

a change in phase (as much as 180) relative to the stimulus as the stimulus level is

increased (Kiang and Moxon, 1972; Johnson, 1980; Kiang, 1980; Kiang, 1990).

At first sight, peak-splitting would seem to wreak havoc on temporal models for

pitch. At stimulus intensities where peak-splitting occurs, a model operating on the

intervals would estimate multiple frequencies, depending on the degree of phase-shift.

However, because the stimulus intensity at which peak-splitting occurs depends on

both fiber CF and stimulus frequency, only a small fraction of AN fibers will exhibit

peak splitting at the same stimulus intensity. So, in a temporal model for pitch that

operates on intervals pooled from fibers across many CFs, peak-splitting most likely

has a small effect on the pooled interval distribution leaving the overall frequency

estimate relatively unchanged.

43

2.5.2 Temporal Models for Octave Matching and Pitch Per-

ception

Our model for octave matching makes pitch-interval judgements based on frequency

estimates of two tones. Each frequency estimate is computed from a pooled AN ISI

histogram by fitting it with a Gaussian mixture density with harmonically related

means. An octave is predicted when the frequency estimate of one tone is twice that

of another tone. The model predicts the octave enlargement effect except at very low

frequencies, where it slightly underestimates the effect when operating on all-order

ISIs and overestimates the effect when operating on first-order ISIs.

Comparison with Ohgushi’s model

Our model is similar to Ohgushi’s (1983) model for octave matching. The basic

elements of the models are the same although there are three primary differences

in his implementation: he uses first-order ISIs only; his frequency estimates were

based on just the first two modes of the histogram while we used a variable number

(Nmax) of modes; and he calculates frequency estimates from the modes with weights

obtained by fitting the model predictions to the psychoacoustic data and adjusting

two free variables. These differences lead to different predictions at low frequencies

when operating on first-order ISIs. Ohgushi’s model predictions are consistent with

the psychoacoustic data on the octave enlargement for all frequencies while our model

has difficulties at very low frequencies (< 200 Hz). It should be pointed out that with

two free parameters, Ohgushi had more flexibility with which to fit the data.

In addition, Ohgushi operated on rather coarse (100 µsec binwidth) single-fiber ISI

histograms from only four AN fibers, published by Rose, et al. (1967; 1968), while our

model predictions were based on fine-resolution pooled histograms which represent

a large number of fibers and spikes. Analysis of our data using Ohgushi’s method

yields results similar to his.

44

Interpretation of Nmax

The one free parameter in our model is Nmax, the number of modes over which the

frequency estimate is calculated. Nmax can greatly affect the frequency estimate

and resulting octave interval prediction. Rather than treating it as an arbitrary free

parameter, it would be nice to give Nmax a physiological or psychoacoustic inter-

pretation. If one assumes that pure-tone pitch is based on the interspike interval

distribution of AN spikes, Nmax could be related to the minimum tone duration re-

quired to elicit a pitch.

A number of psychoacoustic studies have investigated the effect of tone duration

(for very short tones) on pitch (Doughty and Garner, 1947; Doughty and Garner,

1948; Pollack, 1967) and the ability to recognize musical melodies (Patterson, Peters,

and Milroy, 1983). A general result from these studies is that, for tones below about

1000 Hz, a minimum number of cycles (6 ± 3) is required to elicit a stable pitch

or to achieve maximum performance in melody recognition. On the other hand,

above 1000 Hz, a minimum tone duration (∼10 msec) is required to elicit a stable

pitch (Gulick, Gescheider, and Frisina, 1989). If Nmax is taken as the number of

cycles required to elicit a pitch for low frequencies, our empirically derived range for

Nmax (∼4− 6) is consistent with this result. It should be noted that our analyses do

not include the first 20 msec of the AN response. Verification of such a relationship

between Nmax and minimum duration for pitch would require a study which carefully

addresses the effects of adaptation and ringing of the cochlear filter for short-duration

tones. Nevertheless, our results suggest that there may be a link between the two

“integration times”.

Another consideration related to Nmax is that the overall neural delay required to

perform octave matches for low frequency tones may be physiologically implausible.

For example, with Nmax = 5, the total delay required to obtain a frequency estimate

for a 60 Hz tone is 83 msec. There is, however, evidence for the existence of a lower

limit to musical pitch around 90 Hz (Biasutti, 1997), which reduces the maximum

required neural delay in our model to about 55 msec.

45

An alternative model for octave matching

We developed and implemented a second model for octave matching following a sug-

gestion by Hartmann (1993). Noting that the scaling factor of two in Ohgushi’s (1983)

model for octave matching is arbitrary, Hartmann suggested that a more physiologi-

cally grounded model is one that attempts to correlate the ISIs without first scaling

those from the low frequency tone. The comparison is then made between two tones

using only the intervals from the even modes in the ISI histogram for the high fre-

quency tone. This model can be interpreted graphically as attempting to align the

modes of the f1 histogram with the even modes of the f2 histogram.

Despite the appeal of Hartmann’s suggestion, we found that this model fails to

predict the octave enlargement phenomenon and instead predicts a slight octave con-

traction. The cause of this prediction can be seen by examining the mode offsets

at the same interval size in Fig. 2-6(f). In an octave comparison between two tones

separated by a physical octave, the second mode in the ISI histogram for the high

frequency tone has a smaller mode offset than the first mode of the lower frequency

tone. This causes the sub-octave estimate of the higher tone to be slightly higher

than the frequency estimate of the lower tone. In order to achieve a subjective octave

match, the higher frequency tone needs to be slightly lower in frequency than the

physical octave above the lower tone. This results in a predicted octave contraction

rather than an octave enlargement.

Temporal models for pitch

Our model for octave matching is similar to existing models for frequency discrimi-

nation (Siebert, 1970; Goldstein and Srulovicz, 1977) in that they are based on the

idea that pitch is a frequency estimate of a pure-tone stimulus based on temporal

discharge patterns. Both Siebert (1970) and Goldstein and Srulovicz (1977) repre-

sent AN activity with non-homogeneous Poisson processes. Siebert’s main objective

was to investigate the limitations in frequency discrimination of an optimal processor

operating on spike times of modeled AN activity. He discovered that there is enough

46

temporal information in the all-order intervals from a small number of auditory-nerve

fibers to account for the psychoacoustic data on frequency discrimination. However,

the slope of the predicted frequency discrimination limen versus stimulus duration

far exceeded psychoacoustic performance. Goldstein and Srulovicz showed that a

similar model operating on only first-order ISIs better predicts the dependence of the

psychoacoustic frequency difference limen on stimulus duration.

The essential difference between these models and ours is that the optimal pro-

cessor models give unbiased (ML) estimates of the stimulus frequency. The octave

matching model relies on biased frequency estimates which result from the assumption

that modes of the ISI distribution are harmonically related to the stimulus period.

These biases were lacking in the Siebert and Goldstein models because refractory

effects were not included in the Poisson processes.

An important distinction within the class of temporal models for pitch is between

those that operate on first-order ISIs and those that operate on all-order ISIs. All-

order intervals can be obtained from a spike train using delay lines and coincidence

detectors as proposed by Licklider (1951). Analysis of first-order intervals, on the

other hand, requires an extra stage of processing to eliminate the higher order inter-

vals. This makes a model based on first-order ISIs less appealing, physiologically, than

one that operates on all-order intervals. A further advantage for a model operating

on all-order intervals may be the fact that all-order interval distributions tend to be

more stable across stimulus level than first-order interval distributions, as shown in

Fig. 2-7.

We have seen in this study, as have Goldstein and Srulovicz (1977), that model

predictions based on one or the other type of ISI can yield different results. Goldstein

and Srulovicz show that in the context of frequency discrimination, operating on first-

order ISIs results in a better fit to the psychoacoustic data than operating on all-order

ISIs. Also, psychophysical experiments attempting to distinguish between the two

kinds of ISI-based pitch models have favored first-order ISIs (Kaernbach and Demany,

1998). Kaernbach and Demany used random click train stimuli with specified first

and higher order interclick distributions and found that discrimination between those

47

stimuli and randomly distributed clicks was better for regular first-order interclick

intervals. Results of our study do not strongly favor either first- or all-order intervals.

Model predictions based on first-order ISIs overestimate the subjective octave at low

frequencies and those based on all-order ISIs slightly underestimate the subjective

octave. The trend with frequency, however, of those predictions based on all-order

ISIs is more consistent with the psychoacoustic data. Nevertheless, we can not rule

out models based on intermediate combinations of the two types of intervals or other

more complex models, such as Ohgushi’s, which predict the octave enlargement based

on first-order ISIs. Also, it is conceivable, that different physiological cues may be

responsible for discriminating frequency than for matching octaves or for performing

other tasks involving musical pitch.

Our model for octave matching is also analogous to the optimum processor intro-

duced by Goldstein (1973). He uses a template of Gaussian density functions spaced

harmonically along the spectral axis to fit, in the ML sense, the excitation pattern

produced by a complex tone. Although his implementation operates on spectral exci-

tation, there is nothing inherent to the model that precludes its operation on interval

distributions. Our model is similar to his in that it fits harmonic templates to noisy

and possibly inharmonic data. In Goldstein’s case, the inharmonicity only arises if

the stimulus contains inharmonically related partials. In our case, the inharmonicity

is always present and comes from mode offsets in ISI distributions.

Although we have concentrated solely on temporal models, we should not forget

that there exist alternative schemes for octave matching, namely rate/place models.

Terhardt’s (1971; 1974) model for virtual pitch theoretically predicts the octave en-

largement effect and is discussed in that light by Hartmann (1993). Terhardt suggests

that through pervasive listening to natural tone complexes we develop memory tem-

plates of tonotopic excitation patterns and that we make octave judgements based on

the places of maximum excitation in these memory templates. He further postulates

that these templates are stretched, i.e. the places of maximum excitation correspond-

ing to the harmonics in the tone complex are shifted (upwards in frequency) due to

masking effects caused by the presence of the lower harmonics. Thus, the subjective

48

octave, based on these stretched templates, is slightly larger than the physical octave.

There is some evidence that lower-frequency masking stimuli can lower the CF of an

AN fiber (Kiang and Moxon, 1974; Delgutte, 1990) but the effect of masking depends

on the overall stimulus level and on the relative levels of the signal and masker. It

is not known whether these effects are quantitatively adequate to validate Terhardt’s

theory.

2.6 Conclusion

We have shown that, in response to low-frequency pure tones, AN ISIs less than 5

msec are systematically larger than integer multiples of the stimulus period and, for

frequencies less than 500 Hz, first-order ISIs are smaller than integer multiples of the

stimulus period. These deviations result in biased estimates of frequency and can lead

directly to a prediction of the octave enlargement effect by temporal-based models.

Thus, computational models for pitch may have to incorporate detailed physiological

properties of the auditory periphery, such as refractoriness, in order to predict effects

such as octave enlargement.

Correlating psychoacoustic behavior in the context of pitch effects with physio-

logical responses to the same set of stimulus conditions can lead to valuable insights

into the neurophysiological basis of pitch. Here, we have examined models for octave

matching operating on two forms of ISIs and, although no model is completely sat-

isfactory, one of them, operating on all-order intervals, comes close to predicting the

octave enlargement effect over its entire frequency range. This result is consistent

with the notion that musical pitch is based on a temporal code.

2.7 Appendix: The EM Algorithm

In order to find the ML estimates of parameters in the Gaussian mixture densities

described in Eqs. (2.8), (2.9), and (2.14), we used the iterative EM algorithm (Redner

and Walker, 1984; Moon, 1996). This appendix briefly describes the EM algorithm

49

and shows the mathematical details of our implementation.

The general idea of the EM algorithm is as follows: Ideally, one would like to obtain

ML estimates for parameters, Φ, of a probability density function (PDF), f(y|Φ),

over the complete sample space, Y. At hand, however, is an incomplete data sample,

x, which is insufficient to compute and maximize the log-likelihood function over Y.

In our case, the vector x = xk : k = 1, N is the interspike interval distribution

where xk is a single interval and N is the number of intervals. The data sample is

incomplete because the component density in the mixture from which a particular

interval arises is not known. A complete data sample, yk = (xk, ik), would consist of

the interspike interval, xk, and an indicator, ik, of the component density from which

xk originated. So, instead of maximizing the log-likelihood over Y, the EM algorithm

maximizes the expectation of log(f(y)) given the data, x, and the current parameter

estimates, Φ′. The two-step EM algorithm is:

E-step: Determine: Q(Φ|Φ′) = E(log(f(y|Φ))|x,Φ′). (2.6)

M-step: Choose: Φ+ ∈ arg maxΦ

Q(Φ|Φ′). (2.7)

With each iteration, the next parameter estimates, Φ+, replace the current parameter

estimates, Φ′, until convergence or until the difference between sequential sets of

parameters is less than some designated ε. Our implementation of the EM algorithm

follows directly from equations developed in (Redner and Walker, 1984) so we refer

the reader to their paper for details on the preliminary derivations and focus here on

details pertinent to our implementation.

2.7.1 Gaussians with independent means and variances.

To characterize the individual modes of ISI histograms, we modeled each interval

distribution as a mixture of M weighted, univariate Gaussian PDFs with independent

means and variances:

p(x|Φ) =M∑i=1

αipi(x|φi), (2.8)

50

where x is a single interval in the distribution, Φ = (α1, . . . , αM , φ1, . . . , φM), αi

is a nonnegative weighting,∑Mi=1 αi = 1, and pi is a univariate Gaussian pdf with

parameters φi = (µi, σi):

pi(x|φi) =1√

2πσie

(x−µi)2

2σ2i . (2.9)

For a mixture of Gaussian densities in the form of Eqs. (2.8) and (2.9), Redner

and Walker (1984) derive Q(Φ|Φ′) in their Equation (4.1):

Q(Φ|Φ′) =M∑i=1

[N∑k=1

α′ipi(xk|φ′i)p(xk|Φ′)

]logαi +

M∑i=1

N∑k=1

log pi(xk|φi)α′ipi(xk|φ′i)p(xk|Φ′)

, (2.10)

where N is the number of data samples (number of intervals in the histogram), and the

other variables are as defined in Eq. (2.8). Note that maximization of Q(Φ|Φ′) with

respect to the weights, αi, is independent of the parameters, φi, of the individual

densities. Maximizing Q(Φ|Φ′) with respect to the individual parameters leads to

the following relations, which are special cases of Equations (4.5), (4.8), and (4.9)

in (Redner and Walker, 1984):

α+i =

α′iN

N∑k=1

pi(xk|φ′i)p(xk|Φ′)

, (2.11)

µ+i =

N∑k=1

xkα′ipi(xk|φ′i)p(xk|Φ′)

/N∑k=1


, (2.12)

σ+i

2=

N∑k=1

(xk − µ+i )2α

′ipi(xk|φ′i)p(xk|Φ′)

/N∑k=1


, (2.13)

where α+i , µ+

i , and σ+i

2are the parameter values used in the subsequent iteration of

the algorithm. In this form of mixture density, the weights, means and variances of

the individual densities in the mixture are mutually independent. This form was used

to characterize the individual modes in the interval histograms. The ML estimate of

µi was used as an estimate of the ith mode.

51

2.7.2 Gaussians with harmonically related means and a com-

mon variance.

To estimate the fundamental mode, i.e., stimulus period, of ISI histograms, we mod-

eled their distribution as a Gaussian mixture density with harmonically related means

and a common variance:

pi(x|φi) =1√2πσ

e(x−i·µ)2

2σ2 . (2.14)

The ML estimate of µ was used as an estimate of the stimulus period.

A different set of iteration equations result when considering the mixture density

described by Eqs. (2.8) and (2.14). In this case, maximizing Q(Φ|Φ′) with respect

to the individual parameters leads to the following iteration equations, similar to

Eqs. (2.12), and (2.13):

µ+ =

M∑i=1

N∑k=1

xk·iα′ipi(xk|φ′i)p(xk|Φ′)

/M∑i=1

N∑k=1

i2α′ipi(xk|φ′i)p(xk|Φ′)

, (2.15)

σ+2=

M∑i=1

N∑k=1

(xk − i · µ+)2α′ipi(xk|φ′i)p(xk|Φ′)

/M∑i=1

N∑k=1


. (2.16)

The weights of the individual densities, α+i , are the same as in Eq. (2.11).

52

Chapter 3

Neural correlates of the dissonance

of musical intervals in the inferior

colliculus. I. Monaural and diotic

tone presentation

3.1 Introduction

It has been known since the time of Pythagoras (c. 540-510 B.C.) that complex-tone

pairs whose fundamental frequencies are related by ratios of small integers produce a

consonant and euphonious sensation, while those not so related produce a dissonant

and rough sensation. Figure 3-1 shows line spectra of six different musical intervals

consisting of pure- and complex-tone pairs while Figure 3-2 shows judgements of

dissonance for each interval. The intervals are named after the Western diatonic

scale and, in this case, are all based on A4 (440 Hz). For complex tones, dissonance is

highest for the two intervals (Minor 2nd and Tritone) whose fundamental frequencies

are not related by simple ratios. For pure-tone intervals, dissonance is also maximum

for the Minor 2nd but, in contrast to complex tones, the Tritone is not more dissonant

than the Perfect 4th or 5th. It is important to point out that the vertical scales in

53

Fig 3-2A and B were normalized independently; the relative dissonance of pure- and

complex-tone intervals is not represented in the data.

The early explanations of consonant and dissonant frequency ratios developed

into acoustic and psychophysical models (Partch, 1974; Stumpf, 1890; von Helmholtz,

1863) and, more recently, neural models (Boomsliter and Creel, 1961; Tramo, Cari-

ani, and Delgutte, 1992; Tramo et al., 2000). A prevailing theory is Helmholtz’ idea

that sensory dissonance is caused by beating between neighboring partials in a tone

complex (Plomp and Levelt, 1965). Beating occurs at the frequency difference be-

tween two partials and produces a roughness sensation when it occurs at frequencies

in the range of 20-200 Hz (von Bekesy, 1960; Plomp and Steeneken, 1968; Terhardt,

1968b; Terhardt, 1968a; Terhardt, 1974a; Vogel, 1974). The theory essentially equates

sensory dissonance and roughness. For complex-tone pairs whose fundamental fre-

quencies are related by simple ratios, less beating occurs overall because many partials

coincide (shaded bars in Fig. 3-1).

Neural correlates of roughness have been found in temporal population discharge

patterns of auditory-nerve (AN) fibers in cat (Tramo, Cariani, and Delgutte, 1992;

Tramo et al., 2000) as well as in multi-unit responses from primary auditory cortex in

monkey (Fishman et al., 2000). In order to show the correlate in AN responses, Tramo

et al. developed a model for roughness that operates on AN fibers grouped by charac-

teristic frequency (CF) and employs bandpass filters to extract temporal fluctuations

in each CF band. Their filter characteristics were based on the dependence of rough-

ness on modulation frequency. We noted that this dependence is similar in shape to

modulation transfer functions (MTFs) of inferior colliculus (IC) neurons (Rees and

Møller, 1983; Langner and Schreiner, 1988; Fastl, 1990; Delgutte, Hammond, and

Cariani, 1998), as shown in Figure 3-3. Although slightly more lowpass in shape

than the roughness data, the MTFs show that IC neurons strongly respond to stimuli

with modulation frequencies in the roughness range. Fishman et al. (2000) found

that neural response correlates of roughness in the cortex were strongest in the tha-

lamorecipient zone, suggesting that the found correlates may exist at lower levels of

the auditory system. Based on these observations, we hypothesize that there may be

54

440 660 880

Two Pure Tones

440 2860 5280

Two Complex Tones

440 469 880 440 2860 5280

440 587 880 440 2860 5280

440 619 880 440 2860 5280

440 660 880 440 2860 5280

440 660 880 440 2860 5280

Unison(1/1)

MinorSecond(16/15)

PerfectFourth

(4/3)

PerfectFifth(3/2)

Tritone(45/32)

Octave(2/1)

==&

hH

==&

hHb

==&

H

H

==&

H

bH

==&

H

H

==&

H

_H

Frequency (Hz)

Figure 3-1. Line spectra of pure- and complex-tone intervals based at 440 Hz. The ratios of the thefundamental frequencies are given under the interval name along with the musical notation. Eachtone in a complex tone pair contains six harmonics, each at the same level as the pure tones. Graybars indicate overlapping harmonics from the lower (black bars) and upper (white bars) tones.

55

Uni m2 4th Tri 5th Oct0

0.2

0.4

0.6

0.8

1

Two Pure Tones

Uni m2 4th Tri 5th Oct0

0.2

0.4

0.6

0.8

1

Two Complex Tones

Dis

son

ance

Rat

ing

(A

rbit

rary

Un

its) A B

Figure 3-2. Dissonance judgements for two-tone intervals comprised of pure and complex tones.Although both are normalized to 1.0, the plots are not on the same vertical scale. Adapted fromFig. 1 in Terhardt (1984), which is a compilation of data from Plomp and Levelt (1965), Kameokaand Kuriyagawa (1969b), and Terhardt (1977).

direct correlates of roughness, and therefore sensory dissonance, in responses of IC

neurons without the need for additional filtering.

For this study, we have recorded from single neurons in the IC of anesthetized

cats in response to the stimuli shown in Fig. 3-1 and examined the discharge rates

and temporal properties of the responses for direct correlates of dissonance.

3.2 Method

3.2.1 Experiment

Data were recorded from single neurons in the central nucleus of the IC (ICC) in 10

adult cats using methods standard for our laboratory (Delgutte et al., 1999).

Each cat was Dial-anesthetized with an initial dose of 75 mg per kg of body weight

and subsequent doses of 7.5 mg per kg of body weight. We made a caudal approach

to the IC by performing a posterior fossa craniectomy and partially aspirating the

cerebellum. The bullae were vented to maintain ambient pressure in the middle

56

10 100-20

-15

-10

-5

0

5

10

15

20

Gai

n (

dB

)Modulation Frequency (Hz)

Figure 3-3. Synchronized rate MTFs of IC neurons (thin lines) and the psychoacoustic roughnessfunction (thick line). The MTFs were measured with a CF-tone carrier (see Sec. 3.2). The psy-choacoustic roughness function is for AM tones with a carrier of 1000 Hz (adapted from Fig. 11 inTerhardt, 1968b).

ear. An intravenous drip of Ringer’s saline was provided to prevent dehydration and

injections of dexamethasone (0.26) mg/kg of body weight/day) were given to reduce

brain swelling. The cat was placed on a vibration isolated table in an electrically

shielded, temperature-controlled (38 C), sound attenuated chamber.

Sound was delivered to the cat’s ears through closed acoustic assemblies driven by

a headphone (Realistic 40-1377). The assemblies were calibrated with reference to the

voltage applied to the headphones, allowing accurate control over the sound pressure

level at the tympanic membranes. Stimuli were generated by a 16-bit digital-to-analog

converter (Concurrent DA04H) using a sampling rate of 20 kHz.

Neural action potentials (spikes) were recorded with pariline insulated tungsten

electrodes (12 MΩ). Electrodes were advanced through the IC using a micropositioner

(Kopf 650) while playing the search stimulus, a sinusoidally amplitude-modulated

(40 Hz) pure tone swept in frequency from 200 Hz to 10 kHz. The electrode signal

was bandpass filtered and fed into a spike detector/timer which measured the time

at the peak of the action potential with 1 µsec accuracy. Three methods were used

to guide the placement of the electrode in the ICC: initial placement of the electrode

57

after visually identifying the IC; attending to background activity in response to the

search stimulus, which is more prominent in the ICC than in surrounding areas; and

histological reconstruction and examination of the electrode tracts in three of the

cats.

Once isolated, four measurements were taken to characterize each neuron: 1)

Binaural interactions were examined by switching the search stimulus on and off

in each ear. All subsequent measurements for a particular neuron were made with

the most responsive of these binaural stimulus settings (monaural or diotic). 2) A

threshold tuning curve and the characteristic frequency (CF) were obtained using the

Moxon (Kiang, Moxon, and Levine, 1970) algorithm with criterion of 0 spikes. 3)

The responses to a set of 300 msec long tones at CF, 0 to 60 dB re threshold, were

measured in order to classify the neuron by the peri-stimulus time histogram (PSTH)

of these responses (see Sec. 3.2.2). 4) The responses to an amplitude modulated tone

at CF were measured for modulation frequencies (Fm) ranging from 1 to 512 Hz in

octave steps in order to generate an MTF (see Sec. 3.2.2).

The consonant/dissonant tone pairs shown in Fig. 3-1 were each presented 30

times to a neuron. The tone pairs were 500 msec in duration (except for the first

experiment where durations were 300 and 200 for the pure-tone and complex-tone

pairs respectively) and were typically presented at 60 dB SPL. For some neurons,

we also used other levels ranging from 20 to 80 dB SPL. The harmonics for the

six-component complex tones were in cosine phase and equal amplitude. While the

stimuli shown in the figure are based on Just intonation, the stimuli in this study

were based on equal temperament tuning1

The responses to each set of tone pair presentations was closely monitored during

1Tuning based on Just intonation creates successive scale step intervals of slightly unequal size.In addition the interval steps between any two scale steps depend on which note of the scale is usedas the starting point in tuning. This causes practical difficulties for combining multiple instrumentsas well as for playing music in different keys. Equal temperament tuning, the standard used today,was developed to overcome these difficulties (see eg., Sethares, 1999. It divides the octave into 12equal frequency ratios, making all musical intervals in all keys the same size. This slight detuningfrom Just intonation is perceivable but it does not effect the relative dissonances of the intervalsused in this study. The largest frequency difference in interval sizes between the two tuning systemsis 3.5 Hz at the Tritone.

58

recording and were used in the analyses only if spikes were clearly identifiable above

the background activity. In addition, estimates of the false triggers in each spike

record were obtained by calculating the number of interspike intervals (ISIs) smaller

than 0.5 msec. These ISIs are assumed to be smaller than the absolute refractory

time the neuron and therefore are most likely due to an improperly set threshold on

the spike counter. If the quantity of these short intervals exceeded 3% of the total

number of intervals the spike record was excluded from the analyses.

In addition to the isolated tone pairs, we also presented to a few neurons a two-

voice sequence, excerpted from Bartok’s Mikrokosmos #32 (Bartok, 1940) to inves-

tigate the ability of IC neurons to follow the variations in dissonance of sequential

musical intervals. The sequence was synthesized using a piano timbre on a Korg

05R/W general MIDI synthesizer and presented at an overall level of 60 dB SPL. A

spectrogram of the stimulus is shown in Fig. 3-13.

Histology

In order to closely examine the placement of our electrodes within the IC, we histolog-

ically reconstructed the electrode tracts in three of the cats. The midbrain was fixed

and sliced into 80 µm slices. Every third slice was stained with calretinin while the

remaining slices were Nissl stained. The calretinin stained slices were used to identify

putative projections of the MSO (Adams, 1979; Adams, 1995). Results showed that,

while a majority of the electrode tracts ran through the ICC, a few of them were

close to or ran through dorsal cortex of the IC, the border of which is somewhat

indistinct. Examination of the data and neuron classifications that came from the

different electrode penetrations revealed no systematic differences, so we combined

data across all electrode penetrations.

3.2.2 Analysis

Neurons were classified as one of four types based on the shape of the PSTH of their

responses to tones at CF: Onset, Sustained, Pauser, and Other. This classification

59

is similar to, although slightly broader than that of Nuding, Chen and Sinex (1998).

Figure 3-4 shows example PST histograms for the first three types. For most neu-

rons, responses were measured at three levels and the PSTHs were summed across

levels before making the classification. For neurons from which we had not measured

responses to tones at CF we used the PSTH of the response to the Unison complex-

tone pair. PSTHs were generated with 1 msec binwidths and were classified after

the following preparation: The beginning of the neuron’s response was defined as the

first point in the PSTH where the discharge rate in a 3 msec window exceeded the

rate of the previous 3 msec window by 3 standard deviations of the rate in the bins

of the previous window; the PSTH was divided into an onset section (the first 20

msec of the response), a sustained section (the remaining 280 msec of the response),

and a pause section (the first 20 msec of the sustained section). Neurons were clas-

sified as Onset if the average discharge rate in the onset section was greater than 10

times that in the sustained section. In addition, we excluded neurons from the Onset

category that contained large phasic responses but low overall rates in the sustained

section by requiring the onset rate to be greater than 5 times the largest value of a

non-overlapping moving average of the discharge rate in the sustained section using

an averaging window of 20 msec. Neurons were classified as Pauser if their response

did not meet the requirements for Onset and the rate in the pause section was less

than 0.2 times that in the onset section and in the remainder of the sustained section.

Neurons were classified as Sustained if they did not meet the criteria for Onset or

Pauser and the rate in the sustained section was greater than 10 spikes/sec. Finally,

neurons were classified as Other if they did not fall into one of the other categories or

if the latency of the response was greater than 30 msec. Neurons were not classified

if their responses contained fewer than 30 spikes.

Synchronized rate MTFs were generated from responses to sinusoidally amplitude-

modulated (SAM) tones at CF. For each modulation frequency, the modulation of

the response was calculated as 2 times the product of the discharge rate and the

synchronization index. This method was used because synchronization rate alone

is a misleading measure of synchrony for low discharge rates (a single spike has a

60

0 50 100 150 2000

10

20

30

40

50

60Onset Neuron

Nu

mb

er o

f sp

ikes

0 50 100 150 2000

10

20

30

40Pauser Neuron

Peri-stimulus time (msec)0 50 100 150 200

0

5

10

15

20Sustained NeuronA B C

Figure 3-4. PST histograms of responses to 300 msec tones at CF for three neurons. Neuron CFswere, left to right: 5680, 3843 and 2703 Hz. Stimuli were presented 30 times. Histogram binwidthis 1 msec.

synchronization of 1.0). The gain of the responses was measured with respect to

the average discharge rate. The best modulation frequency (BMF) was calculated if

the MTF magnitude fell at least 10 dB on both sides of its maximum. The MTF

magnitude function was interpolated to 100 log-space points between 1 and 512 Hz

with a cubic spline. The bandwidth was measured at 3 dB down from the peak and

the BMF was taken as the mean of modulation frequencies spanning this bandwidth.

Responses to the consonant/dissonant tone pair stimuli were grouped by musical

interval type and summed across stimulus presentations to generate PSTHs with 1

msec binwidths. The histograms were divided into sections in the same manner as

the CF-toneburst responses described above. For each musical interval type, the

average discharge rate was calculated for the sustained section of the response and,

as a measure of rate fluctuation, the temporal standard deviation was calculated after

smoothing the PSTH with a 3 msec rectangular window.

We assess the variability of most of our data and calculations by bootstrapping:

We randomly resample (with replacement) the data and recompute the statistic of

interest and then calculate the standard deviation of the statistic across resampled tri-

als. This technique has an advantage over traditional parametric statistical techniques

in that one does not have to assume the form of the underlying variability (Efron and

61

Tibshirani, 1993). Unless noted otherwise, resampling was performed across neurons.

3.3 Results

From ten experiments, a total of 157 spike records (39 pure-tone pairs, 118 complex-

tone pairs) from 88 neurons were obtained that met our grading criteria for analysis.

37 neurons were classified as Onset, 11 as Sustained, 24 as Pauser, and 15 as Other. 30

of these neurons were classified from CF-toneburst responses, 57 from responses to the

Unison complex-tone stimulus, and 1 neuron was not classified because its response to

Unison had too few spikes. For most neurons (23 of 30), both methods of classification

gave the same result and our population analyses and conclusions based on PSTH

type remain the same regardless of the method of classification. CFs ranged from 255

to 21,700 Hz. This distribution of CFs does not reflect uniform sampling in the IC

as we targeted low-CF neurons that would respond to our relatively low-frequency

stimuli.

3.3.1 Responses to pure- and complex-tone pairs

Figure 3-5 shows responses of a Sustained IC neuron to the Minor 2nd, Tritone and

Perfect 5th pure- and complex-tone pairs. Low frequency beating can be seen in

the temporal envelope of the dissonant stimuli: pure-tone Minor 2nd (A), complex-

tone Minor 2nd (C), and complex-tone Tritone (G). In contrast, the envelope of the

more consonant stimuli is smoother and flatter: pure-tone Tritone (E), pure-tone

Perfect 5th (I), and complex-tone Perfect 5th (K). The beating of the dissonant

stimuli is clearly reflected in their neural responses whereas the consonant stimuli

evoke flatter and smaller responses. The beat rate of the fluctuating responses to

the pure-tone Minor 2nd (B) and complex-tone Tritone (H) match the beat rate of

their corresponding stimulus envelopes. However, the beat rate of the response to

the complex-tone Minor 2nd (D) is roughly three times the beat rate of its stimulus

envelope. The neural response is most likely dominated by the beat rate of the third

harmonics in the tones comprising the Minor 2nd due to the proximity of its CF

62

0 100 200-4

-2

0

2

4

stim

ulu

s am

plit

ud

e

Stimulus

0 100 2000

5

10

15

20n

um

ber

of

spik

es

0 100 200-20

-10

0

10

20

stim

ulu

s am

plit

ud

e

Stimulus

0 100 2000

5

10

15

20

nu

mb

er o

f sp

ikes

0 100 200-4

-2

0

2

4

stim

ulu

s am

plit

ud

e

0 100 2000

5

10

15

20

nu

mb

er o

f sp

ikes

0 100 200-20

-10

0

10

20st

imu

lus

amp

litu

de

0 100 2000

5

10

15

20

nu

mb

er o

f sp

ikes

0 100 200-4

-2

0

2

4

stim

ulu

s am

plit

ud

e

0 100 2000

5

10

15

20

nu

mb

er o

f sp

ikes

Peri-stimulus time (msec)0 100 200

-20

-10

0

10

20

stim

ulu

s am

plit

ud

e

0 100 2000

5

10

15

20n

um

ber

of

spik

es

NeuralResponse

Minor2nd

Tritone

Perfect5th

NeuralResponse

Two Pure Tones Two Complex Tones

A B C D

E F G H

I J K L

Figure 3-5. Stimuli waveforms and corresponding responses from a single Sustained neuron forpure- (left) and complex-tone (right) Minor 2nd (top), Tritone (middle) and Perfect 5th (bottom)stimuli. Response panels are PST histograms with a 1 msec binwidth based on 30 stimulus presen-tations. CF = 1170 Hz.

63

(1170 Hz) to these harmonics. This effect of CF is demonstrated more thoroughly in

Section 3.3.3.

Figure 3-6 shows responses of three different neurons to the pure- and complex-

tone pairs. The Onset neuron (A-B) responded at the onset of all the tone pairs,

as well as during the sustained portion for the pure-tone Minor 2nd and many of

the complex tone pairs. In addition, this neuron shows regular fluctuations in its

response to the dissonant Minor 2nds and complex-tone Tritone intervals. This can

be seen in the beating pattern of the responses to these stimuli. The Pauser neuron

(C-D) responded vigorously throughout the duration of all the stimuli except for a

brief pause after the onset. This neuron also shows fluctuations in its response to the

dissonant Minor 2nd intervals but not to the complex-tone Tritone. This is likely due

to the fact that its CF (440 Hz) is not near a pair of low-frequency beating partials

in the Tritone stimuli (see Figs. 3-10, 3-11, and 3-12). The Sustained neuron (E-F)

responded throughout the duration of the stimuli and shows a beating pattern in

response to both Minor 2nd and Tritone intervals.

Figure 3-7 shows the mean rate fluctuations across all neurons as well as the av-

erage discharge rate for all Onset neurons from which we recorded. Rate fluctuations

of responses were quantified by calculating the temporal standard deviation of their

PST histograms (see Section 3.2.2). For pure-tones pairs, the peak occurs at the

Minor 2nd for both measures. For complex-tone pairs, the Minor 2nd elicits the

greatest response as well, but the Tritone also shows a response that is significantly

greater than that from any of the remaining tone pairs. In comparing these mea-

sures to the psychoacoustic data on dissonance in Fig. 3-2 we see that the relative

ranking of musical intervals in both measures is the same as the rank order of disso-

nance ratings. There are some differences, however, in that the complex-tone Tritone

stands out more in the physiological data (B,D) than in the psychoacoustic data and

the psychoacoustic dissonance rating for pure-tone intervals decays with interval size

whereas the measure of neural rate fluctuations (A) is flatter. The latter difference

may be explained by a small CF effect in the pure-tone data: most neurons from

which we recorded had CFs higher than the pure-tone pair frequencies and tended

64

0 100 200

Complex Tone Pairs

0 200

Unim24thTri

5thOct

Pure Tone Pairs

0 5000 500

Unim24thTri

5thOct

0 5000 500

Unim24thTri

5thOct

Peri-stimulus time (msec)

A B

C D

E F

OnsetNeuron

PauserNeuron

SustainedNeuron

Figure 3-6. A-B: Responses of an Onset neuron to pure- and complex-tone pairs at specifiedintervals. Neural activity is shown over the duration of every stimulus presentation (30 presentationsper stimulus). The horizontal black line below each panel indicates stimulus on-time. CF = 1160 Hz.C-D: same for a Pauser neuron with CF = 440 Hz. E-F: same for a Sustained neuron with CF =1170 Hz. Note the different time scales in the top panels.

65

Uni m2 4th Tri 5thOct0

5

10

Ave

rag

e R

ate

(sp

/sec

)

n = 9


5

10

Ave

rag

e R

ate

(sp

/sec

)

n = 45


10

20

30

Pure Tone Pairs

Rat

e F

luct

uat

ion

s (s

p/s

ec)

n = 36


10

20

30

Complex Tone Pairs

Rat

e F

luct

uat

ion

s (s

p/s

ec)

n = 118

AllNeurons

OnsetNeurons

A B

C D

Figure 3-7. Mean rate-fluctuations across all neurons for the pure- (A) and complex-tone (B) pairs.Rate fluctuations were calculated as the temporal standard deviation of the rate. Average dischargerate across all Onset neurons for the same stimuli (C and D). Error bars are estimated standarderrors of the mean and include intra- and across-neuron variances (assumed to be orthogonal).

66

to respond more when the upper tone in the pair approached the CF. Nevertheless,

the qualitative correlate of dissonance rank order is consistent with the idea that the

dissonance of musical intervals is encoded in the rate fluctuations of IC neurons and

in the average discharge rates of IC Onset neurons. The data shown in Fig. 3-7 are

pooled across stimulus level, PSTH type, and CF. In the next few sections we examine

the effects of these attributes on the neural response.

Uni m2 4thTri 5th Oct0

10

20

30

40

Pure Tone Pairs

n = 2


10

20

30

40

Complex Tone Pairs

n = 15


10

20

30

40 n = 28


10

20

30

40 n = 79


10

20

30

40 n = 6


10

20

30

40 n = 24

Rat

e F

luct

uat

ion

s (s

p/s

ec)

15-40dB SPL

40-60dB SPL

60-80dB SPL

A B

C D

E F

Figure 3-8. Mean rate-fluctuations in response to the tone-pair stimuli as a function of stimuluslevel. Responses are averaged across neurons for each group of stimulus levels.

67

3.3.2 Effect of level and PSTH type

Figure 3-8 shows the mean rate fluctuations across all neurons grouped by stimulus

level. In general, the overall rate fluctuations (in spikes/sec) decrease slightly with

increased level, but the relative differences in fluctuations across musical interval

remain similar for all levels. Therefore, as a code for the dissonance of musical

intervals, IC neural rate fluctuations appear to be robust across stimulus level. The

relative average discharge rates of our population of Onset neurons also appeared

relatively stable over level (not shown), although we had fewer neurons and stimulus

levels to evaluate.

Uni m2 P4 Tri P5 Oct1

10

100

Rat

e F

luct

uat

ion

s (s

p/s

ec) Pure Tone Pairs

Uni m2 P4 Tri P5 Oct1

10

100Complex Tone Pairs

Uni m2 P4 Tri P5 Oct0.1

1

10

100

Dis

char

ge

Rat

e (s

p/s

ec)

OnsetSustainedPauser

Uni m2 P4 Tri P5 Oct0.1

1

10

100

A B

C D

Figure 3-9. Bootstrap estimates of the median and interquartile range (25 to 75 percentile) ofneural responses for each PSTH type and tone-pair stimulus. The format is similar to Fig. 3-7.For each group of neurons the inter- and intra-neuron variability of the statistic are accounted forby randomly sampling the population as well as randomly sampling the responses of each neuronto individual stimulus presentations. Estimates are based on 2000 randomly sampled trials. Thenumbers of neurons in each category are: NOn,Pr = 11, NOn,Cx = 45, NSus,Pr = 8, NSus,Cx = 17,NPsr,Pr = 11, NPsr,Cx = 36, where “Pr” is for pure-tone pairs, “Cx” is for complex-tone pairs,“On” is for Onset, “Sus” is for Sustained, and “Psr” is for Pauser.

The range of responses to the tone-pair stimuli for neurons of different PSTH types

68

is shown in Fig. 3-9. The figure is similar in format to Fig. 3-7 but shows bootstrap

estimates of the median and interquartile range (25th to 75th percentile) of population

neural responses for each PSTH type and stimulus. The response data was resampled

within and across neurons so that the range accounts for both inter- and intra-neuron

variability. All panels show that Onset neurons tend to be less responsive overall but

they are more sensitive than Sustained or Pauser neurons to the relative dissonance

of the stimuli. This is especially true for the complex-tone pairs (B, D). It can

also be seen from the figure that Pauser and Sustained neurons respond similarly to

complex-tone pairs whereas Sustained neurons show greater overall average rates and

rate fluctuations in response to pure-tone pairs.

It is also clear, from Fig. 3-9C-D, that only Onset neurons could code for dis-

sonance based on average discharge rate alone; Sustained and Pauser neurons show

only small variation in discharge rate across stimulus type. In addition, it can be

seen from B that a particularly sensitive code for dissonance could be generated by

taking into account the relative rate fluctuations of Onset and Sustained or Pauser

neurons: for highly consonant intervals (unison and octave), rate fluctuations from

Onset neurons are much smaller than those from Sustained or Pauser neurons; for

highly dissonant intervals (Minor 2nd), Onset rate fluctuations are nearly equal to

those of Sustained or Pauser neurons; and for mildly dissonant intervals (Tritone,

Perfect 4th), the differences in rate fluctuations are in between those extremes. This

code would have the advantage of not being dependent on absolute measures of the

responses.

3.3.3 Dependence on CF

The beat strength and rate of a neuron’s response to dissonant complex-tone pairs

depends on its CF. Figure 3-10 shows line spectra for the complex-tone Minor 2nd and

Tritone stimuli and indicates, for each, which pairs of partials interact to give beat

frequencies in the roughness range. For neurons whose CFs are close to these pairs

of partials, the response is expected to reflect the partials’ beat frequency. This is

illustrated in Figs. 3-11 and 3-12. Figure 3-11A and C show the response of a neuron

69

440 1320 2200 3080 3960

MinorSecond

440 1320 2200 3080 3960

Tritone

Frequency (Hz)

∆ƒ (Hz):A

B

26 52 78

∆ƒ (Hz): 182 76 106

Figure 3-10. Line spectra of the complex-tone Minor 2nd and Tritone stimuli. Arrows mark thepartials that give rise to the low beat frequencies for each stimulus.

0 100 200

0 100 200 0 100 200

0 100 200

A B

C D

300

200

100

0

300

200

100

0

MinorSecond

Tritone


Dis

char

ge

Rat

e (s

pik

es/s

ec)

440 Hz CF 1335 Hz CF

26 Hz 78 Hz

182 Hz 76 Hz

Figure 3-11. Peri-stimulus time histograms of the responses of two neurons to the Minor 2nd (A-B)and Tritone (C-D) complex-tone pairs. The CF of each neuron is indicated above the panels. Thebeat frequency and period (indicated by a horizontal black bar) of the stimulus partial pair closestto the neuron’s CF are shown in each plot.

70

with a CF of 440 Hz to the Minor 2nd and Tritone stimuli. Horizontal lines in the

upper left corners indicate the beat period of the partial-pair closest to CF. In A, the

beat frequency is 26 Hz and the neural response is phase locked to this frequency.

In C, the beat frequency of the Tritone’s first two partials is 182 Hz but there is

no clear representation of this frequency in the response. An envelope fluctuation

frequency of 182 Hz is too high to be well represented in most IC neural responses

(see Fig. 3-3). The responses of a neuron with a higher CF (1335 Hz) is shown in B

and D. In B, the response to the Minor 2nd shows a clear representation of the beat

frequency of the 3rd harmonics from each tone (78 Hz) but also shows the beat rate

of the fundamental frequencies (26 Hz, see A). In D, the response to the Tritone is

dominated by the 75 Hz beat frequency of the partials marked in Fig. 3-10B.

The effect of CF on the response strength of the population of IC neurons from

which we recorded is shown in Fig. 3-12. It shows normalized rate fluctuations in

response to both the complex-tone Minor 2nd (A) and Tritone (B) as a function of

CF. In response to the Minor 2nd, neurons with low CFs (near the proximal partials

of the stimulus) show greater fluctuations than those with high CFs. In response

to the Tritone, neurons show little rate fluctuations at low CFs, but more at mid-

frequency CFs where there are two pairs of partials that beat at frequencies in the

roughness range.

3.3.4 Responses to a musical excerpt

In order to examine the ability of IC neurons to follow temporal changes in (sen-

sory) dissonance within a musical passage, we generated a stimulus from an excerpt

of Bartok’s Mikrokosmos #32 (1940). The music notation, a spectrogram of the

stimulus and the responses of two IC neurons are all shown in Fig. 3-13. The most

consonant intervals in the excerpt are the Unison and Major 3rd, while the most dis-

sonant are the Major 2nd, Major 7th, and Tritone intervals. A piano tone was used

for the stimulus and the spectrogram shows the decaying energy over the duration

of each note, especially in the higher frequency harmonics. The neural recordings

show beating in response to some dissonant musical intervals, especially those which

71

0 500 1000 1500 2000 2500 3000 3500 40000.1

1

10Minor 2nd

0 500 1000 1500 2000 2500 3000 3500 40000.1

1

10Tritone

No

rmal

ized

Rat

e F

luct

uat

ion

s

Characteristic Frequency (Hz)

A

B

Figure 3-12. Normalized rate fluctuations of Sustained and Pauser neurons in response to theMinor 2nd (A) and Tritone (B) complex-tone pairs plotted as a function of CF. Rate fluctuationsof each neuron are normalized by its mean rate fluctuations in response to the more consonant tonepairs (Unison, Perfect 4th and 5th, Octave). The dark line through the data is a moving average.

72

have partial pairs near their CF (see histogram): The top response, from a neuron

with CF = 2700 Hz, shows beating in response to the Tritone (2nd tone-pair) and

both Minor 6ths (3rd and 7th tone-pairs) while the bottom response, from a neuron

with CF = 350 Hz, shows beating in response to both Major 2nds (1st and 5th tone-

pairs) and the Major 7th (8th tone-pair). Both neurons show little response to the

consonant Unison interval. This result is consistent with our previous findings that

IC neurons show beating in response to dissonant tone-pairs and that responses of

individual neurons are dependent on their CF. It also shows that an IC neuron can

follow temporal changes in dissonance in a realistic musical setting. Lastly, this re-

sult shows that our general results seem to apply to more musically realistic timbres.

Several other neurons showed similar responses however we did not record across a

broad enough range of CFs to pool our data for this stimulus.

3.3.5 Additional observations

Our motivation for looking at IC neural responses for correlates of roughness came

from the observation that IC neural MTFs resemble the psychoacoustic roughness

function (Fastl, 1990). Because we saw differences in responses to dissonant stimuli for

neurons with different PSTH types (Fig. 3-9), we decided to also look for differences

in their MTFs.

We measured MTFs in 60 of the 88 neurons from which we obtained tone-pair

data. Figure 3-14 shows the MTF magnitude and phase for an Onset, a Sustained,

and a Pauser neuron. The data are representative and show distinct differences in

the magnitude of the MTFs from these different types of neurons: The MTF from

the Onset neuron is more sharply tuned, centered at a slightly lower frequency, and

provides more gain at the BMF than the MTFs from the Sustained and Pauser

neurons; the MTF of the Pauser neuron shows a dip in magnitude between the

lowest measured frequency and the BMF. For completeness, the MTF phase is also

plotted in Fig. 3-14. We did not make significant correlations between PSTH type

and MTF phase characteristics in this study. However, this result was most likely

effected by the fact that we measured responses at relatively widely spaced modulation

73

Fre

quen

cy (

Hz)

0

500

1000

1500

2000

2500

0

200

400

600

0 0.5 1 1.5 2 2.5 3 3.50

100

200

300

Time (sec)

Dis

cha

rge

Rat

e (s

p/se

c)===========================& 23 ì ïê ïê î | ïê ïê î | í î | ïê ïê Œ Ó

===========================& 23 ì ïê ïê ïê ïê ïê ïê í ïê ïê _ ïê Œ Ó ì ì ì ì

¬[[[[LM2 Tri m6 Uni M2 M6 m6 M7 M3

CF: 350 Hz

CF: 2700 Hz

Figure 3-13. Top shows the musical notation for measures 12-13 of Bartok’s Mikrokosmos #32 (InDorian Mode). Middle shows a spectrogram of a recording of the excerpt using a piano sound froma Korg 05R/W synthesizer. Bottom panels show responses to the excerpt from two IC neurons. Theneurons’ CFs (350 Hz, bottom and 2700 Hz, top) are marked on the right side of the spectrogram.

74

frequencies and consequently had difficulty unwrapping potentially ambiguous phase

measurements.

-40

-30

-20

-10

0

10

20

Gai

n (

dB

)

-5

-4

-3

-2

-1

0

1

Ph

ase

(cyc

les)

Modulation Frequency (Hz)

Onset NeuronSustained NeuronPauser Neuron

B

A

1 10 100 1000

1 10 100 1000

Figure 3-14. MTF magnitude (A) and phase (B) for an Onset, a Sustained and a Pauser neuron.Data points have been straight-line connected for clarity. Neuron CFs were: Onset : 5059 Hz;Sustained : 935 Hz; Pauser : 1526 Hz.

In Fig. 3-15A, the MTF magnitude characteristics can be seen in median data

across all neurons for each PSTH type. Also shown are standard error ranges of the

magnitude. In addition to the characteristics described above, it can be seen that

Pauser neurons tend to have a higher gain at low modulation frequencies than the

other neuron types. The magnitude of MTFs from neurons classified as Other (not

shown) varied in shape from neuron to neuron and were not as easily characterized.

Figure 3-15B shows MTF bandwidth plotted against BMF for individual neurons.

Data is shown only for those MTFs that met our “bandpass” criteria. The charac-

teristics of the median data (A) are evident in the population distribution for each

neuron type. An analysis of variance followed by a Tukey HSD multiple comparison

test (Hotchberg and Tamhane, 1987) showed that the bandwidth of Onset MTFs

are significantly smaller than those from Sustained neurons (α < 0.05). All other

differences in bandwidth or BMF across groups were not significant, although there

75

is a tendency for Onset neurons to have lower BMFs than other neuron types.

Figure 3-15B also shows that MTF bandwidth and BMF are highly correlated,

indicating that measured MTFs have an approximately constant “Q”.

3.4 Discussion

We have shown that IC neurons beat in response to dissonant tone-pairs and that

the frequency of beating is dominated by the partial-pair closest to the neuron’s

CF. Averaged across all CFs, the rate fluctuations of IC neurons reflect perceptual

dissonance ratings of pure- and complex-tone pairs. This code for dissonance is robust

across stimulus level and is more sensitive in the responses of Onset neurons than in

Sustained or Pauser neurons. In addition, Onset neurons reflect the dissonance of

tone-pair stimuli in their average discharge rate. We have also shown that IC neurons

are capable of following changes in (sensory) dissonance within a musical excerpt.

Finally, we have shown that MTFs from Onset neurons tend to be more sharply

tuned, centered at slightly lower frequencies and provide a higher gain at the BMF

than those from Sustained or Pauser neurons.

3.4.1 Neurophysiology

This study follows the work of Tramo et al. (1992,2000) who found correlates of

dissonance in AN discharges. The main difference between the correlates described

here and those in the AN is that, while correlates are seen directly in IC responses,

AN responses require additional processing. This is due to the fact that AN fibers

code both the temporal envelope and fine time structure in their discharge patterns,

while IC neurons largely respond to the envelope only. Because dissonance appears

to be related to properties of the temporal envelope, a correlate can be seen in AN

responses only after the fine time structure has been removed by either bandpass

filtering or some other means such as summing over CF. Such a process must occur

at some point above the AN and at or below the level of the IC. MTFs have been

measured in cochlear nucleus (CN) neurons and, on average, are broader and cen-

76

1 10 100 1000-30

-25

-20

-15

-10

-5

0

5

10

15

Gai

n (

dB

)

Sus, n=8Psr, n=24On, n=22

1

10

100

1000

Best Modulation Frequency (Hz)

MT

F B

and

wid

th (

Hz) r = 0.86

1 10 100 1000

A

B

Figure 3-15. A: Median MTF magnitude for Onset, Sustained, and Pauser neurons. Symbolsdenote median data, dashed lines connect median data ± estimated standard errors. Standarderrors were obtained through bootstrapping across each group of neurons. B: MTF bandwidth vs.BMF for individual neurons. Results of an ANOVA followed by a Tukey HSD multiple comparisontest show Onset and Sustained neurons to have significantly different MTF bandwidths (α < 0.5).Other groups across bandwidth or BMF were not significantly different. The two factors werecorrelated in the log10 of the combined population data with r = 0.86.

77

tered at higher modulation frequencies than those from the IC (Frisina, Smith, and

Chamberlain, 1990; Rhode and Greenberg, 1994; Delgutte, Hammond, and Cariani,

1998). MTFs have also been measured in the lateral superior olive (LSO) and are also

relatively broad and centered higher than IC MTFs (Joris and Yin, 1998). Therefore,

because MTFs of most IC inputs are broader than those in the IC, it is likely that the

additional filtering occurs within the IC itself. Intracellular recordings from the IC

show multiple phases of excitation and inhibition that could implement a bandpass

filter (Covey, Kauer, and Casseday, 1996; Kuwada et al., 1997)

Another difference between the studies in the AN and this one is our finding

of a rate code for dissonance. Although not all IC neurons showed the effect, the

population average discharge rate of IC Onset neurons reflected the dissonance of our

tone-pair stimuli. This type of rate code has not been extensively seen in neurons

below the level of the IC, although it is possible that some CN neurons could show

similar properties. In addition, LSO neurons have been reported to show changes in

spike rate with modulation frequency (Krishna and Semple, 2000) and not much is

known about the envelope sensitivity of MSO neurons, another major input to the

IC. Nevertheless, it is possible that there exists a transformation from a temporal to

rate code within the IC.

We chose to classify neurons using a PSTH classification method similar to those

used previously to classify neurons in the CN (Bourk, 1976) and in the IC (Rees and

Møller, 1983; Rees et al., 1997; Nuding, Chen, and Sinex, 1999; Krishna and Semple,

2000). However, the PSTH types have not been linked to any neural morphology.

Our finding that Onset neurons tend to have smaller MTF bandwidths than other

neuron types has not been reported before and it may be linked to their high overall

sensitivity in response to our tone-pair stimuli. Generally, in a linear system, a

sharply tuned filter will show greater sensitivity to changes along its parameter of

tuning (eg., modulation frequency). Previous studies, however, have shown that while

a linear systems approach to modeling neural responses in the periphery (AN and

CN) can be quite successful, it does not always work in the IC(Delgutte, Hammond,

and Cariani, 1998; Delgutte, Hammond, and Cariani, 2000). In addition, our found

78

relationship between PSTH type and MTF may not hold for all neurons in the IC as

we’ve admittedly sampled from low-CF neurons in the dorsolateral portion of the IC,

where the inputs from lower nuclei that are different than the ventromedial portion

of the IC (Fullerton, 1993).

Krishna and Semple (2000) have also found a correlation between between the

PSTH type and the MTF of IC neurons in the Mongolian gerbil. They found that

rate MTFs of non-Onset neurons tended to contain regions of suppression in which

rate would diminish as the stimulus level increased. Onset neurons tended not to

contain such regions. In addition, they found that, generally for all neuron types,

as stimulus level went up temporal MTFs changed from lowpass in shape to more

bandpass. It is difficult to say whether or not their finding is related to the differences

we found between Onset and non-Onset neurons, but it does suggest that the two

types of neurons process the temporal envelope of sounds differently.

Correlates of roughness have also been seen in multi-unit and current source den-

sity recordings from cortex (A1) in awake monkeys (Fishman et al., 2000). This

finding is consistent with the idea that temporal envelope fluctuations encoded in

IC neural discharges are preserved and passed on to cortical structures. However,

multi-unit recordings are somewhat ambiguous as to which neural structures gener-

ate the measured responses, which leaves open the possibility that the measurements

by Fishman et al. were made from inputs to the cortex. This is especially plausi-

ble considering that their strongest responses came from the thalamorecipient zone

(lower lamina III) and that studies of single unit MTFs in cortex show responses to

be restricted to lower frequencies.

3.4.2 Psychophysics and perception

We have shown that dissonant sounds produce larger rate fluctuations in IC neurons

and higher average discharge rates in IC Onset neurons than do consonant sounds.

However, we have not attempted to quantitatively correlate our physiological results

with psychoacoustic data because of the lack of stimulus specificity in the current

psychoacoustic data. Early psychophysical experiments on consonance and disso-

79

nance did not have calibrated control over stimuli spectra or level (Malmberg, 1917;

Guernsey, 1928). Since then, however, a few studies have examined the roughness

and dissonance of pure-tone pairs using well controlled stimuli (Plomp and Levelt,

1965; Plomp and Steeneken, 1968; Terhardt, 1968b; Terhardt, 1974b) but little data

exists on the dissonance of complex-tone pairs. Kameoka and Kuriyagawa (1969a,b)

have demonstrated that both stimulus level and spectra shape affect perceived disso-

nance of tone pairs, but even they did not have precise control over level (or spectra)

for all listeners as they presented their stimuli over speakers in an auditorium. It is

clear that a thorough psychophysical measure of consonance and dissonance based on

calibrated stimuli is required for a more precise comparison of perceptual and neuro-

physiological responses. Such a study might include a complete pairwise comparison

of all intervals including the relative dissonance of pure- and complex-tone pairs.

Despite the incompleteness of psychophysical data, our results and qualitative cor-

relations still provide the basis to suggest that a dissonance is encoded in the temporal

patterns of IC neural responses. For our stimuli, we chose the extreme consonant and

dissonant musical intervals. All psychoacoustic studies using complex tones, regard-

less of stimulus spectra, showed the minor 2nd to be judged most dissonant and the

tritone to be more dissonant than the Perfect 4th and 5th. Unison, octave, perfect 5th

and 4th intervals were always deemed consonant. This rank order was never violated

in psychoacoustic studies and appears in our neural response measures.

The perception of sensory dissonance appears to be innate in humans and it is

not exclusive to humans. Schellenberg and Trainor (1996) found that infants, similar

to adults, show better discrimination for dissonant harmonic intervals than for conso-

nant harmonic intervals. Hulse, Bernard and Braaten (1995) showed that European

starlings could learn to discriminate musical chords and then transfer the discrim-

ination of chords to those with different fundamental frequencies. Here, although

we have found correlates of human perception in cat neural responses, we are not

suggesting that cats hear dissonance in the same way as humans. However, ecolog-

ically, it may be beneficial for cats to be able to discriminate and attend to rough

or dissonant sounds, eg., for the purpose of catching prey or to detect predators.

80

Consequently, it is possible that there are general preadaptations for certain aspects

of music processing in the mammalian auditory system.

Historically, the terms consonance and dissonance have been used, with reference

to music, in a variety of senses (Tenney, 1988; Sethares, 1999). The terms have been

used to describe musical sounds based on function, context, as well sensory attributes

and it is important to distinguish between these senses. Functional definitions are

those used by composers and theorists and come from rules that are either implicitly

defined by a music listening culture or explicitly defined by music theorists. Functional

definitions of dissonance have changed over time and are influenced by culture and

listening experience. Contextual senses of dissonance depend on the surrounding

sounds. In context, the dissonance of a particular musical sound is affected by the

general level of dissonance within a piece of music, by the implied harmony of a

particular passage or piece, as well as by the effects of auditory streaming (Wright

and Bregman, 1987). Sensory dissonance, however, refers to the quality of sounds

in isolation and is often equated with the roughness (von Helmholtz, 1863) or (lack

of) fusion (Stumpf, 1890) of the sound. These various meanings of consonance and

dissonance are not mutually exclusive but should not be confused with each other

when one refers to the “dissonance” of a particular sound.

While sensory dissonance is thought to be based on (and often equated with)

roughness, there are alternative theories on its basis. Stumpf’s (1890) fusion theory

states that sounds are consonant because their individual components fuse together

to form a single perceptual entity, more so than dissonant sounds. One problem with

this idea, as suggested by the work in auditory scene analysis, is that it seems fusion

is actually required for the perception of dissonance as well (Wright and Bregman,

1987; Bregman, 1990). The perceptual grouping and perceived timbre of a simulta-

neous set of tones can be affected by the (a)synchronization of their onsets as well

as sequential streaming cues, that is, how well a particular component fits with a

stream of preceding tones(Bregman and Pinker, 1978). Historically, dissonance was

introduced gradually into Western music through the use of simple compositional de-

vices to soften the perceived dissonance of two simultaneous sounding tones(Jeppesen,

81

1927). Dissonant notes could not begin together and could only be approached and

left by half steps (semitones). These devices tend to draw listeners’ attention away

from the fusion of two simultaneous dissonant tones and to their respective horizontal

sequential streams. Thus, it seems that although fusion may be related to perceived

dissonance, it is likely a separate percept and not its basis.

Pitch fusion or the perception of a fundamental bass frequency are also explana-

tions for the basis of consonance (Tramo et al., 2001; Rameau, 1722). For consonant,

simple frequency-ratio intervals, many partials from both tones in the interval fall on

harmonics of a common fundamental bass frequency. The hypothesis is that conso-

nance is based on the salience this fundamental pitch: the more salient it is, the more

consonant the interval. In the companion paper, we look for correlates of the funda-

mental base frequency in the temporal discharge patterns of IC neurons (McKinney,

Tramo, and Delgutte, 2001b).

Another theory on the basis of dissonance is the long wave hypothesis from Boom-

sliter and Creel (1961). They postulate that consonance is based on the length of

the overall period of a stimulus and model neural responses using an autocorrelation

network similar to Licklider’s (1956). Consonant tone pairs, whose fundamental fre-

quencies are related by simple ratios (eg. Perfect 5th, 3:2), have shorter periods than

dissonant tone pairs and their more complex ratios (eg. Minor 2nd, 16:15). In re-

sponse to consonant stimuli, such an autocorrelation mechanism would produce larger

responses at shorter lags than it would for dissonant stimuli. While this mechanism

is plausible for harmonic stimuli, it fails to explain the ability of listeners to perceive

consonance and dissonance of pairs of inharmonic tones, whose periods can both be

long. A strong variation in consonance, dissonance, and progressing harmony can be

heard for a sequence of tone pairs that have their harmonics stretched to non-integer

ratios if the scale on which they are based is stretched an equal amount (Houtsma,

Rossing, and Wagenaars, 1987; Sethares, 1999).

Roughness, on the other hand, is closely linked to musical dissonance in a variety

of contexts. It can account for the variation of sensory dissonance in inharmonic

stimuli in the same way as it does for harmonic stimuli (Sethares, 1999) and it has

82

been correlated with musical tension in non-tonal music (Pressnitzer et al., 2000).

We have shown correlates of roughness in responses of IC neurons to monaural and

diotic stimuli, but it is important to consider the responses of dichotically presented

musical intervals. While many IC neurons preferentially respond to specific interau-

ral phase differences (Yin and Kuwada, 1983) and may well respond to dichotically

presented intervals it is generally thought that the percept of dichotic roughness does

not exist (Zwicker and Fastl, 1999). An investigation into the response of IC neurons

to dichotically presented intervals is presented in the companion paper (McKinney,

Tramo, and Delgutte, 2001b).

3.5 Conclusion

Our results have revealed neural correlates of sensory dissonance in the discharge rate

fluctuations of all IC neurons and in the average discharge rates of Onset neurons.

More generally, our results illustrate the complexity and specificity of neural pro-

cessing in the auditory periphery and brainstem; percepts generally considered to be

“high order”, such as the dissonance of musical intervals, have direct neural correlates

in midbrain nuclei. Our results also suggest that neurons in the IC are specifically

important for encoding the temporal envelope of sounds.

83

84

Chapter 4

Neural correlates of the dissonance

of musical intervals in the inferior

colliculus. II. Dichotic tone

presentation and pitch salience

4.1 Introduction

We have shown that discharge rate fluctuations of inferior colliculus (IC) neurons

correlate with the dissonance of musical intervals (McKinney, Tramo, and Delgutte,

2001a). Our investigation was motivated by 1) the idea that sensory dissonance is

based largely on the psychoacoustic roughness of a sound (Plomp and Levelt, 1965;

von Helmholtz, 1863), and 2) the fact that modulation transfer functions (MTFs) of

many IC neurons resemble the psychoacoustic roughness function (Delgutte, Ham-

mond, and Cariani, 1998; Fastl, 1990; Rees and Møller, 1983). A possible difficulty

with our finding is that, while many IC neurons exhibit preferential interaural phase

differences (IPDs) (Yin and Kuwada, 1983) and thus may beat in response to dichot-

ically-presented tone pairs, it is generally thought that dichotically-presented tone

pairs do not elicit a roughness sensation (eg., Roederer, 1979). Here, we look at IC

85

neural responses to diotically-presented tone pairs and compare them with responses

to diotic stimuli.

An alternative theory for the basis of consonance suggests that consonant har-

monic intervals have a more perceptually salient common fundamental bass frequency

(FFB) than do dissonant intervals. This idea stems from Rameau’s (1722) theory of

“basse fondamentale” and complements the notion that sensory consonance is just

a lack of roughness (sensory dissonance) (Tramo et al., 2001). Figure 4-1 illustrates

the concept of the fundamental bass for three consonant intervals and the dissonant

Tritone interval, all based at 440 Hz. Line spectra of the tone-pair intervals are

shown and each tone in a pair contains six iso-level harmonics. Gray bars indicate

overlapping harmonics from the lower (black bars) and upper (white bars) tones. For

consonant intervals, fundamental-frequency ratios of the tones tend to be related by

simple ratios and thus all harmonics from both tones are harmonically related to a

common (missing) fundamental bass. In the figure, vertical dashed lines mark har-

monics of the fundamental bass (leftmost dashed line). For the Tritone, the exact

fundamental bass frequency is 13.75 Hz (greatest common denominator of 440 and

618.75 Hz) but there is a near miss at 88 Hz, which is shown in the figure.

In this study, we also investigate IC neural responses to consonant complex-tone

pairs for a representation of the fundamental bass. We examine autocorrelation (AC)

histograms of the responses because they have been shown, for auditory-nerve (AN)

responses, to exhibit correlates of pitch over a variety of stimulus paradigms (McKin-

ney and Delgutte, 1999; Cariani and Delgutte, 1996a; Cariani and Delgutte, 1996b;

Rhode, 1995). We also examine responses to dichotically-presented intervals for

a representation of the fundamental bass since, perceptually, dichotic presentation

of harmonic complexes can also elicit a pitch sensation at the (missing) fundamen-

tal (Houtsma and Goldstein, 1972).

86

0 440 880 1320 1760 2200 2640 3080 3520 3960 4400

Frequency (Hz)

Unison(1/1)

440 Hz

147 Hz

220 Hz

Perfect 4th(4/3)

Perfect 5th(3/2)

88 Hz (near miss)

Tritone(45/32)

Figure 4-1. Line-spectra of three consonant and one dissonant (Tritone) complex-tone harmonicintervals based at 440 Hz. The ratios of the fundamental frequencies of each tone in the interval aregiven under the interval name. Each tone in a complex tone pair contains six iso-level harmonics.Gray bars indicate overlapping harmonics from the lower (black bars) and upper (white bars) tones.For consonant Unison, Perfect 4th and 5th intervals, all harmonics from both tones fall on integermultiples (vertical dashed lines) of the interval’s fundamental base frequency (indicated above eachplot). For the dissonant Tritone interval, the exact fundamental bass frequency is 13.75 Hz, butthere is a near miss at 88 Hz.

87

4.2 Method

We recorded from single IC neurons in Dial-anesthetized cats in response to diotic

and dichotic presentation of complex tone pair stimuli. The methods were identical

to those in the companion paper (McKinney, Tramo, and Delgutte, 2001a) except

where noted below.

4.2.1 Experiment

For each neuron isolated, we measured the threshold tuning curve, response to tones

at characteristic frequency (CF), and the modulation transfer function (MTF) for a

pure-tone carrier at CF as described previously. We used the responses to tones at

CF to classify units as Onset, Sustained or Pauser, based on the shape their peri-

stimulus histograms (PSTH). In addition, in order to assess the neuron’s sensitivity to

interaural phase differences (IPD), we measured the response to a 2 Hz binaural beat

stimulus centered at the CF, (i.e., a tone 1 Hz below CF in the left ear and another

1 Hz above CF in the right ear). Next, responses to diotic and dichotic complex tone

pairs were measured at the following musical intervals: Unison, Minor 2nd, Perfect

4th, Tritone, Perfect 5th, Octave (see Fig. 1 in McKinney et al., 2001a). Each tone

in the pair was composed of 6 iso-level harmonics in cosine phase with a duration

of 500 msec. The tones were windowed (raised cosine) to give 5 msec rise and fall

times. Harmonic levels were typically 60 dB SPL, but ranged from 40 to 80 dB SPL.

Each tone pair was presented 30 times diotically or monaurally (whichever gave the

strongest response) as well as dichotically (base tone in ipsilateral ear, upper tone in

contralateral ear). The base tone fundamental frequency for each interval was 440

Hz.

4.2.2 Analysis

To determine sensitivity to IPD, synchronization indices were calculated from period

histograms locked to the beat frequency for the responses to binaural beat stimuli.

In order to avoid false classification due to artificially high synchronization indices of

88

weakly responding neurons we used the product of the average discharge rate and the

synchronization index, with a threshold of 6.5 spikes/sec, to characterize neurons as

either IPD-sensitive or IPD-insensitive.1.

Responses to both monaural/diotic and dichotic presentation of the consonant/dissonant

tone pair stimuli were summed across stimulus presentations to generate PSTHs with

1 msec binwidths. The histograms were smoothed with a 3 msec window and then,

for each musical interval type, the temporal standard deviation was calculated over

the sustained portion (20-480 msec after the onset) as a measure of rate fluctuation.

Pitch analyses were performed on responses to diotic/monaural and dichotic stim-

uli. Autocorrelation, or all-order interspike interval (ISI) histograms were generated

from the sustained portion of the responses (15-485 msec after the onset). The repre-

sentation of a particular stimulus frequency in a histogram was quantified by calculat-

ing the Peak-to-Background ratio (P/B) where the “Peak” is the mean number of ISIs

at integer multiples of the period (1/frequency) of interest and the “Background” is

the mean number of ISIs/bin overall. Neurons’ P/B ratios were compared with their

frequency following abilities as measured by their MTF. The MTF statistic used was

the upper cutoff frequency (Fco), measured as the highest modulation frequency for

which the response showed significant (Rayleigh test, α < 0.01) synchrony (provided

that synchrony was also significant at the next lowest modulation frequency in the

function).

As in the companion paper (McKinney, Tramo, and Delgutte, 2001a), we assess

the variability of most of our data and calculations by bootstrapping: We randomly

resample (with replacement) the data and recompute the statistic of interest and then

calculate the standard deviation of the statistic across resampled trials. (Efron and

Tibshirani, 1993). Unless noted otherwise, resampling was performed across neurons.

1If the response to a binaural beat at CF was not available we classified the IPD sensitivitybased on the response to the dichotic (equal temperament) Perfect 5th stimulus which producesa 1.5 Hz binaural beat at 1319 Hz and a 3 Hz binaural beat at 2638 Hz. In equal temperamenttuning, the Perfect 5th has a frequency ratio of 2.9966/2 rather than the Just 3/2. This causes the(nearly) coincident harmonics at 1320 Hz (see Fig 4-1 to be separated by 1.5 Hz. Classification ofIPD sensitivity was based on binaural beats at CF for 14 neurons and the dichotic Perfect 5th for2 neurons. The measures gave similar results for those neurons for which we had both.

89

4.2.3 Model

CochlearTuning

Rectification/Compression

SynchronyRoll-off

IC MTF

LeftEar

CochlearTuning

Rectification/Compression

SynchronyRoll-offRight

Ear

Figure 4-2. A simple model, used to predict temporal patterns of IC neural responses, consistsof a peripheral component for each ear followed by a binaural central component. Each peripheralcomponent incorporates cochlear tuning via a GammaTone filter (Darling, 1991), instantaneouscompression and rectification, and a 4th-order low-pass filter with a cutoff frequency of 700 Hz tomimic synchrony roll-off. The binaural component includes a binaural crosscorrelator (instantaneousproduct) followed by bandpass filter whose parameters are fit to match the neural MTF of individualneurons.

We used a simple binaural coincidence model to predict the ability of IPD sensitive

IC neurons to follow the temporal envelope of our tone-pair stimuli. The model, shown

in Fig. 4-2, includes a peripheral processor for each ear and a central, binaural model.

The peripheral model incorporates: 1) cochlear tuning (GammaTone filter, Darling

1991); 2) half-wave rectification and instantaneous compression (y = x2/(x2+thrsh2),

where thrsh was the compression threshold) to simulate cochlear nonlinearities; and

3) a 4th-order low-pass filter with 700 Hz cutoff frequency to mimic the roll-off of

90

phase locking. The central model includes a binaural crosscorrelator (implemented

by a multiplication of peripheral model outputs) followed by a bandpass IC MTF

filter (4th-order Butterworth) whose parameters are fit to match the neural MTF in

individual neurons. The central processor might also include a delay on one of the

binaural inputs to simulate the best interaural delay (ITD) of an IC neuron. However,

with regard to our stimuli, it was deemed that this delay would only slightly shift the

phase of the beating response and not effect it in other ways and so it was left out.

4.3 Results

Dichotic analyses were performed on data recorded from 16 neurons in 7 cats using

both diotic/monaural and dichotic complex-tone stimuli. 8 neurons were classified as

IPD sensitive using binaural beats. CFs ranged from 440 to 5090 Hz but do not reflect

uniform sampling in the IC as we targeted low-frequency neurons that would respond

to our stimuli. Monaural/diotic pitch analyses were performed on data from 88

neurons reported in the companion paper (McKinney, Tramo, and Delgutte, 2001a).

4.3.1 Dichotic tone pairs

Figure 4-3 shows responses of two neurons to diotically (B and E) and dichotically (C

and F) presented complex-tone pairs. Also shown are responses to the binaural beat

stimulus centered at CF (A and D). The binaural beat stimulus is 1 second in duration

with a 2-Hz beat, so sensitivity to interaural phase differences (IPD) is indicated by

the presence of two peaks in the PST histograms. The bimodal histogram shown in

panel A indicates that the neuron is sensitive to IPD. In contrast, the response in

panel B shows adaptation over the 1-second binaural beat stimulus but reveals no

sensitivity to IPD. The neuron was binaural, however, in that its response to diotic

stimulation was greater than the response to monaural stimulation of either ear (not

shown). The responses of both neurons to diotic tone pairs (B and D) are similar to

those described in the companion paper (McKinney, Tramo, and Delgutte, 2001a):

dissonant intervals elicit beating responses while consonant intervals do not. One

91

exception is the response of the IPD-sensitive neuron to the dissonant diotic Tritone

(B), which shows little beating. This is most likely due to the neuron having a low

CF so that its receptive field does not include the relatively higher beating partials

of the Tritone stimulus (see Sec. 4.3.1). Panels C and F in Fig. 4-3 show responses to

dichotically presented intervals. The responses of the IPD-sensitive neuron to dichotic

stimuli (C) are similar to its responses to diotic stimuli (B), which show beating for

the Minor 2nd stimulus but not the Tritone. The IPD-insensitive neuron, on the

other hand, does not show beating in response to any dichotic interval (F), although

its responses to diotic stimuli (E) show beating for the Minor 2nd and Tritone stimuli.

0 0.5 1 0 200 400Uni

m2

4th

Tri

5th

Oct

Diotic ComplexTone Pairs

0 200 400

Dichotic ComplexTone Pairs

0 0.5 1

Dis

char

ge

Rat

e

0 200 400Uni

m2

4th

Tri

5th

Oct

Peri-stimulus time (msec)0 200 400

Dis

char

ge

Rat

e

Neuronsensitive

to IPD

Responseto 2-Hz

binauralbeat at CF

Neuron notsensitive

to IPD

A B C

D E F

Figure 4-3. Top panels: Responses of an IPD-sensitive neuron to a 2-Hz binaural beat centeredat CF (A) and diotic (B) and dichotic (C) presentation of complex-tone pairs at specified intervals.Neural activity is shown over the duration of every stimulus presentation (30 presentations perstimulus). The horizontal black line below each panel indicates stimulus on-time. CF = 440 Hz.Bottom panels: same for an IPD-insensitive neuron with CF = 3840 Hz.

Figure 4-4 shows summary mean rate fluctuations for all neurons from which we

recorded responses to both dichotic and diotic/monaural stimuli. Neurons sensitive

92

to IPD respond with greater rate fluctuations to the Minor 2nd than to consonant

stimuli for both diotic and dichotic presentation (A and B). Those neurons also show

greater fluctuations in response to the diotic Tritone but their response to the dichotic

Tritone is not significantly different from that to the consonant Perfect 4th and 5th

stimuli. In contrast, IPD-insensitive neurons show little increase in rate fluctuations

in response to any dichotic interval (D) even though they do show large increased

rate fluctuations in response to the diotic Minor 2nd and Tritone (C).

If rate fluctuations of IC neural responses do code musical dissonance, these results

raise the possibility that there may exist a form of dichotic dissonance for some musical

intervals. Alternatively, it is possible that only neurons not sensitive to IPD code for

roughness and dissonance.

CF dependence of responses to dichotic stimuli

We showed previously that the beat rate of the neural response to dissonant complex-

tone pairs depends on the neuron’s CF (McKinney, Tramo, and Delgutte, 2001a): the

response is dominated by the beat frequency of the partial pair closest to CF. Figure 4-

5 shows line spectra for the complex Minor-2nd and Tritone intervals and indicates,

for each stimulus, which pair of partials interact to give the low beat frequencies

(in the roughness range). Figure 4-6 shows the response to these stimuli from two

neurons with CFs near the beating partials. The beat frequency and period (black

bars) of the partial pair closest to the CF are indicated in the upper left corner of

the panels. For the diotic stimuli (A, C, E, and G), if the CF is close to a pair of

proximal partials, the neural response reflects the beat frequency of those partials.

An exception is the low-CF neuron responding to the Tritone (E), where the beat

frequency may be too large (182 Hz) for the IC neuron to follow.

In response to dichotic stimuli, the responses of the two neurons in Fig. 4-6(B, D,

F, and H) show beating only to pairs of low-frequency partials. In this case, beating

is only seen in response to the lowest (B and D) and possibly second lowest (D) pair of

partials in the Minor 2nd. A possible explanation for this is the roll off of synchrony

at high frequencies in the auditory periphery and central nervous system (CNS) prior

93


10

20

n = 8


10

20

n = 8


20

40

n = 8


20

40

n = 8

Rat

e F

luct

uat

ion

s(sp

/sec

)Diotic/Monaural

Complex Tone PairsDichotic

Complex Tone PairsA B

C D

Neuronssensitive

to IPD

Neurons notsensitive

to IPD

Figure 4-4. Top panels: Mean rate fluctuations across all IPD-sensitive neurons in response todiotic (A) and dichotic (B) presentation of complex-tone pairs at specified intervals. Rate fluctu-ations were calculated as the temporal standard deviation of the rate. Bottom panels: Same forIPD-insensitive neurons. Error bars are estimated standard errors of the mean and include intra-and across-neuron variances (assumed to be orthogonal).

94

440 1320 2200 3080 3960

MinorSecond

440 1320 2200 3080 3960

Tritone

Frequency (Hz)

∆ƒ (Hz):A

B

26 52 78

∆ƒ (Hz): 182 76 106

Figure 4-5. Line spectra of the complex-tone Minor 2nd and Tritone stimuli. Arrows mark thepartials that give rise to the low beat frequencies (in the roughness range).

95

to the site of binaural interaction. For a central neuron to beat in response to a

dichotic pair of partials, the monaural inputs from each ear must phase-lock to the

partial in that ear at the stage where binaural interaction takes place. This can

only happen at low frequencies, where the response of the auditory nerve (AN) and

cochlear nucleus (CN) is strongly phase-locked. Thus, the lack of beating in response

to the dichotic Tritone (H) may be due to poor monaural phase-locking to the relevant

partials (1245 Hz and 1320 Hz, see Fig. 4-5) at the input to the binaural processor.

The CF is also related to the IPD sensitivity for IC neurons. We found a significant

difference (α < 0.05), in our sample of neurons, between the CFs of IPD-sensitive

(µ = 840, σ = 505 Hz) and IPD-insensitive (µ = 2240, σ = 1555 Hz) neurons. This is

consistent with the finding of Kuwada and Yin (1983) who found that IPD-sensitive

neurons in the IC of anesthetized cat all had CFs less than ∼ 3000 Hz. The highest

CF in our population of IPD-sensitive neurons was 1520 Hz.

4.3.2 Pitch analysis

In this section, we look for a representation of the fundamental bass frequency (FFB,

see Fig. 4-1) in all-order ISI histograms of neural responses to three consonant (Uni-

son, Perfect 4th and 5th) and one dissonant (Tritone) complex-tone pairs.

Diotic pitch

Figure 4-7 shows all-order ISI histograms of responses from two neurons with different

PSTH types (Sustained and Pauser, see Section 4.2) to the tone pairs. In each plot,

vertical dashed lines indicate integer multiples of the fundamental bass period (except

for the Tritone, see below), and the peak-to-background ratio (P/Bfreq) is given to

quantify how well the fundamental bass periodicity is represented in the histogram.

The P/B ratio is the mean number of ISIs at integer multiples of the fundamental

period divided by the mean of all ISIs/bin. A P/B ratio greater than 1.0 indicates

that the fundamental bass periodicity is represented in the interval distribution. The

histograms show that FFB, as defined in Fig. 4-1, is well represented for the consonant

96

0 100 200 0 100 200

0 100 200 0 100 200

0

100

200

300

0 100 200 0 100 200

0 100 200 0 100 2000

100

200

300

1335 Hz CF450 Hz CF

Peri-stimulus time (msec)D

isch

arg

e ra

te (

spik

es/s

ec)

Diotic Dichotic

Minor2nd

Tritone

Diotic Dichotic

A B C D

E F G H

26 Hz 26 Hz 78 Hz 78 Hz

182 Hz 182 Hz 76 Hz 76 Hz

Figure 4-6. Top panels: Responses to diotic (A,C) and dichotic (B,D) presentation of the complex-tone Minor second stimulus for a neuron with a 440 Hz CF (A,B) and another with a 1350 Hz CF(C,D). Bottom panels: Same but for the Tritone stimulus. The beat frequency and period (blackbars) of the partial pair closest to CF is indicated in each panel.

97

tone-pairs in at least one of the neurons’ responses: Unison (A, 440 Hz), Perfect 4th

(C, 147 Hz), Perfect 5th (H, 220 Hz). The responses to the Tritone, on the other

hand, do not show a representation of its FFB (13.5 Hz or the near miss at 88 Hz)

but are clearly dominated by periodicities of beating harmonics: (E) 76 Hz is the

beat frequency of harmonics closest partial-pair to the 1170-Hz CF of the Sustained

neuron; (D) 106 Hz is the beat frequency of the harmonics closest partial-pair to the

1515-Hz CF of the Pauser neuron (see Fig4-5B). In these cases, the P/B ratios were

calculated for the most dominant periodicity in the histogram. A general trend for

both neurons is that low frequencies are well represented (high P/B ratios) and higher

frequencies are less well represented in their all-order ISI distributions. In addition,

the relative ranking of consonance (Unison most consonant, Tritone least consonant)

is not correlated with the P/B ratios of histograms from single neurons.

To examine the representation of FFB in a population of IC neurons we pooled

all-order ISI histograms of responses to the Unison (A-C), Perfect 4th (D-F), Tritone

(G-I) and Perfect 5th (J-L) complex-tone pairs. Histograms from neurons with differ-

ent PSTH types were pooled separately and are shown in three columns in Fig. 4-8:

Onset neurons (left), Sustained (middle) and Pauser (right). In the histograms of

panels D-F, the prominent peaks at integer multiples of fundamental period indicate

that the fundamental bass frequency of the Perfect 4th, (147 Hz) is well represented

in interspike intervals, especially in the responses of Sustained (E) and Onset (D)

neurons. The histograms of responses to the Perfect 5th (J-L) show a weaker repre-

sentation of its fundamental bass frequency (220 Hz) but there is no obvious represen-

tation of the 440 Hz fundamental in the responses to Unison (A-C). For the Tritone

(G-I), the dominating periodicity is the 76-Hz beat frequency of the partial pair near

1300 Hz (see Fig. 4-5B). However, the modes 2 and 3 of the histograms appear slightly

skewed towards each other, most likely influenced by the 106-Hz periodicity of the

partial pair near 1760 Hz. See the single-neuron data in Fig. 4-7E-F for a comparison

of two histograms dominated by these different beat frequencies.

In order to assess the significance and variability of the P/B ratios from pooled his-

tograms we calculated bootstrap estimates of their medians and interquartile ranges

98

0 10 20 30 400

5

10

Sustained Neuron

P/B440

= 1.4

0 10 20 30 400

5

10 P/B147

= 1.8

0 10 20 30 400

10

20

30

Nu

mb

er o

f In

terv

als

P/B76

= 2.5

0 10 20 30 400

5

10 P/B220

= 1.2

0 10 20 30 400

5

10

15

20

Pauser Neuron

P/B440

= 1

0 10 20 30 400

5

10

15

20 P/B147

= 1.2

0 10 20 30 400

5

10

15

20 P/B106

= 1.6

0 10 20 30 400

5

10

15

20


P/B220

= 1.2

A B

C D

E F

G H

Unison

Perfect4th

Tritone

Perfect5th

Figure 4-7. All-order ISI histograms from a Sustained (left) and a Pauser neuron in response toUnison, Perfect 4th, Tritone and Perfect 5th complex-tone pairs. Vertical dotted lines in each plotdenote integer multiples of the fundamental base period for responses to the Unison (440 Hz), Perfect4th (146 Hz) and Perfect 5th (220 Hz) tone pairs. For the Tritone (E-F), vertical dashed lines markthe beat frequency of the partial pair closest to the neuron’s CF (1170 Hz for Sustained, 1515 Hz forPauser) which dominates the response. Representation of the corresponding periodicities in eachhistogram are quantified and displayed as the ratio P/Bfreq (see text). Binwidths are 100 µsec andhistograms were smoothed with a 300 µsec rectangular window. Note the different vertical scales onthe plots. Stimuli levels were all 40 dB SPL.

99

0 10 20 30 400

5

10

Onset Neurons

P/B440

= 1.7

0 10 20 30 400

20

40

Sustained Neurons

P/B440

= 1.1*

0 10 20 30 400

20

40

60

80

Pauser Neurons

P/B440

= 1

0 10 20 30 400

5

10

Nu

mb

er o

f In

terv

als

P/B147

= 5.6*

0 10 20 30 400

20

40

P/B147

= 2.1*

0 10 20 30 400

20

40

60

80 P/B147

= 1.2*

0 10 20 30 400

20

40

P/B76

= 2.5*

0 10 20 30 400

20

40

P/B76

= 1.6*

0 10 20 30 400

20

40

60

80 P/B76

= 1.5*

0 10 20 30 400

5

10 P/B220

= 3.9*

0 10 20 30 400

20

40


P/B220

= 1.3*

0 10 20 30 400

20

40

60

80 P/B220

= 1.1*

(N = 45) (N = 17) (N = 44)A B C

D E F

G H I

J K L

Unison

Perfect4th

Tritone

Perfect5th

Figure 4-8. Pooled all-order ISI histograms from Onset (N = 45), Sustained (N = 17) and Pauser(N = 44) neurons in response to Unison, Perfect 4th, Tritone and Perfect 5th complex-tone intervals.All panels are similar in format to those in Fig. 4-7 except that, here, P/Bfreq ratios for responsesto the Tritone were all calculated based on a periodicity of 76 Hz. ∗ denotes that a P/B ratio issignificantly greater than 1.0 (α < 0.05) based on bootstrap estimates of the mean. Binwidths are100 µsec and histograms were smoothed with a 300 µsec rectangular window. Note the differentvertical scales on the plots.

100

(25th to 75th percentile). The results show that all P/B ratios, with the exception

of those from Onset (Fig. 4-8A) and Pauser (Fig. 4-8C) responses to Unison, are

significantly greater than 1.0 (α < 0.05). Figure 4-9 shows the estimated median

and interquartile ranges for P/Bfreq as a function of frequency for all the histograms

from Fig. 4-8. The general trend for all neuron types is that P/B ratios decrease

with increasing frequency, consistent with the fall off of synchrony in the auditory

system. A deviation from this trend are the relatively low P/B76 ratios from the Tri-

tone responses. Their low value is most likely due to the fact that different neurons

respond to different beat frequencies (see Fig. 4-7. Figure 4-9 also shows that Onset

neurons provide the largest P/B ratios of all neuron types, but it is important to note

that these neurons respond weakly during the sustained portion of the stimulus and

consequently provide only a small number of intervals on which to base calculations.

77 147 220 4401

2

4

6

8

10

P/B

FF

B

FFB

(Hz)

OnsetSustainedPauser

(Tri) (P4) (Uni)(P5)

Figure 4-9. Bootstrap estimates of the median and interquartile range (25th to 75th percentile) ofP/Bfreq ratios from the histograms in Fig. 4-8. Values for Onset and Pauser neurons are offset onthe frequency axis for clarity. Estimates are based on 1000 randomly sampled trials.

Overall, our measure of pitch salience, P/B ratio, does not seem to correlate

well with the consonance of intervals shown in Fig. 4-8, of which Unison is the most

consonant, followed by the Perfect 5th, Perfect 4th and finally the Tritone. A stronger

effect here appears to be the overall decay of synchrony with increasing fundamental

101

bass frequency. So at least as far as all-order ISI histograms represent the pitch, they

do not seem to provide a good correlate of pitch salience or consonance for intervals

based at 440 Hz (and most likely higher frequencies). There may, however, exist

other neural representations of pitch that better correspond to pitch salience and

consonance.

Relation of pitch representation to the MTF

In Fig. 4-10 we compare two different measures of periodicity representation in neural

responses: 1) the highest modulation frequency for which significant synchrony to

sinusoidal amplitude modulation (SAM) is seen (Fco), and 2) the P/B ratio from all-

order ISI histograms of the responses. Panels A and D show, for individual neurons,

the P/B ratio as a function of Fco for the Perfect 4th (FFB = 147 Hz) and Perfect

5th (FFB = 220 Hz) complex-tone stimuli. In each plot, the vertical dotted line

indicates the fundamental bass frequency of that stimulus and the horizontal dashed

line marks a P/B ratio of 1.0. The data show that the P/B ratios are centered around

1.0 for Fco < FFB, while all but one P/B ratio is greater than 1.0 for Fco > FFB.

To examine the significance of this trend, we divided the neurons into two groups,

Low-Fco (Fco < FFB) and High-Fco (Fco > FFB), and generated pooled all-order

ISI histograms for each group, shown in plots B-C and E-F. For both the Perfect

4th and 5th, the histogram for High-Fco neurons exhibits more pronounced peaks

at integer multiples of the fundamental bass period. Bootstrapping across neurons

revealed a significant difference (α < 0.01) between the P/B ratios from the Low-Fco

and High-Fco pooled histograms for both the Perfect 4th and 5th. These results show

that the upper cutoff frequency Fco of a neuron’s MTF may be a reliable predictor of

its ability to phase lock to the fundamental bass frequency of complex-tone stimuli.

Dichotic pitch

While perceptually weaker than the diotic case, dichotically presented harmonics also

elicit the perception of a fundamental bass frequency (Houtsma and Goldstein, 1972;

van den Brink, 1974; Houtsma, 1984), so we examined the responses to dichotically

102

10 100 10000.1

1

10

P/B

147

0

10

20

30

40

50

Nu

mb

er o

f In

terv

als

P/B147

= 1.2 P/B147

= 2.1

10 100 10000.1

1

10

P/B

220

0 20 400

10

20

30

40

50


P/B220

= 1.1

0 20 40

P/B220

= 1.3

Low-FcoNeurons

Fc (Hz)

Perfect4th

Perfect5th

Low-Fco

Low-Fco

High-Fco

High-Fco

High-FcoNeuronsA B C

D E F

FFB = 147 Hz

FFB = 220 Hz

Figure 4-10. A,D: P/Bfreq ratios for the Perfect 4th and Perfect 5th complex-tone stimuli areplotted for individual neurons vs. the MTF corner frequency (Fc, see text). Vertical dashed linesmark the fundamental base frequency for each stimulus (146 and 220 Hz, respectively). Neuronswhose Fc is greater than the fundamental base frequency are denoted as “High-Fc” neurons, othersas “Low-Fc” neurons. B-C, E-F: Pooled all-order ISI histograms of Low- and High-Fc neurons.Histograms are in the same form as those in Fig. 4-8. Bootstrapping results show that for boththe Perfect 4th and 5th, P/Bfreq ratios from histograms based on the two populations (Low- andHigh-Fc neurons) are significantly different at the 0.01 level.

103

presented stimuli for evidence of FFB. Figure 4-11 shows all-order ISI histograms of

responses from an IPD-sensitive neuron to diotic (left) and (dichotic) presentation of

complex-tone pairs. Responses to both diotic and dichotic presentation of the Perfect

4th and 5th tone pairs (C-F) show distinct representation of FFB and produce large

P/B ratios. Responses to Unison were weak in both cases and responses to the Tritone

show some phaselocking but in an inharmonic pattern. This particular neuron showed

the strongest representation of dichotic pitch of all neurons in our sample.

Figure 4-12 shows pooled all-order ISI histograms for IPD-sensitive (A-D) and

IPD-insensitive (E-H) neurons for diotic (left) and dichotic (right) presentation of

the Perfect 4th and 5th complex-tone pairs. The ISI histograms from IPD-sensitive

neurons show peaks at integer multiples of the fundamental bass period for both diotic

and dichotic stimuli, although the peaks are smaller in the dichotic case, especially

for the Perfect 5th stimulus. The P/B ratios based on pooled histograms from IPD

sensitive neurons (A-D) are all significantly (α < 0.05) greater than 1.0 except for

the dichotic Perfect 5th stimulus (D). For IPD-insensitive neurons, only responses to

diotic stimuli show evidence of FFB.

Thus, the fundamental pitch of dichotically presented tones is represented in all-

order ISI histograms of IPD-sensitive IC neural responses. In addition, the relative

pitch salience (Houtsma, 1984) of diotically- (more salient) and dichotically-presented

intervals correlates with P/B ratios from the histograms.

4.4 Model Results

A simple binaural coincidence model was used to predict temporal discharge pat-

terns of IC neurons to diotic and dichotic stimuli (see Section 4.2.3). The general

form of this model has been used to predict the response phase of IC neurons for

binaural stimuli based on the responses to monaural stimuli for each ear (Kuwada

et al., 1984) and also to predict IC neural responses to binaural beats of mistuned

consonances (Yin, Chan, and Carney, 1987).

Figure 4-13 shows responses of the model to the Minor 2nd and Tritone stimuli,

104

0 10 20 30 400

1

2

3

4

5

DioticP/B

440 = 2.3

0 10 20 30 400

5

10

15

20 P/B147

= 6.8

0 10 20 30 400

5

10

15

20

Nu

mb

er o

f In

terv

als

P/B76

= 0.4

0 10 20 30 400

1

2

3

4

5 P/B220

= 4.9

0 10 20 30 400

1

2

3

4

5

DichoticP/B

440 = 1

0 10 20 30 400

1

2

3

4

5 P/B147

= 5.3

0 10 20 30 400

1

2

3

4

5 P/B76

= 0.6

0 10 20 30 400

1

2

3

4

5


P/B220

= 6.4

A B

C D

E F

G H

Unison

Perfect4th

Tritone

Perfect5th

Figure 4-11. All-order ISI histograms from a single IPD-sensitive neuron in response to diotic(left) and dichotic (right) presentation of the indicated complex-tone pairs. Histograms are in thesame format as in Fig. 4-8. Neuron CF = 300 Hz, stimulus level = 60 dB SPL.

105

0

10

20

30

# o

f In

terv

als

Diotic/Monaural

P/B147

= 2.4*

0

10

20

30

# o

f In

terv

als

P/B220

= 1.2*

0

5

10

15

20

# o

f In

terv

als

P/B147

= 1.2

0 10 20 30 400

5

10

15

20

# o

f In

terv

als

Interspike interval (msec)

P/B220

= 1.1*

0

5

10

Dichotic

P/B147

= 1.5*

0

5

10 P/B220

= 1.1

0

5

10

15

20 P/B147

= 1.1

0 10 20 30 400

5

10

15

20 P/B220

= 1

Perfect 4th

Perfect 5th

Perfect 4th

Perfect 5th

IPD-sensitiveneurons(N = 8)

IPD-insensitive

neurons(N = 8)

A B

C D

E F

G H

Figure 4-12. Pooled all-order ISI histograms in response to diotic/monaural (left) and dichotic(right) presentation of the Perfect 4th and 5th stimuli. A-D: Responses from IPD sensitive neurons.E-H: Responses from IPD-insensitive neurons. Histograms are in the same form as those in Fig. 4-8.∗ denotes that a P/B ratio is significantly greater than 1.0 (α < 0.05) based on bootstrap estimatesof the mean. Note the different vertical scales on the plots.

106

for model neurons whose CF and MTF match those of the neurons from Fig. 4-6. The

model qualitatively predicts the main beat frequencies in the physiological responses.

For the diotic stimuli, the model response fluctuates at the beat frequency (when low

enough) of the partial-pair closest to CF. For the dichotic stimuli, the model responses

show beats for the Minor 2nd but not the Tritone, consistent with the data. The lack

of beats in the model response to the Tritone is due to the poor phase locking of the

peripheral model outputs for the 1245 Hz and 1320 Hz partials. The model does not

predict all aspects of the responses well, e.g., the onset portion of the response in

panels D and H, but does a good job at predicting the primary beat frequencies in

the responses.

1335 Hz CF450 Hz CF


No

rmal

ized

mo

del

res

po

nse

Diotic Dichotic

Minor2nd

Tritone

Diotic Dichotic

0 100 200 0 100 200

0 100 200 0 100 200

0

0.5

1

0 100 200 0 100 200

0 100 200 0 100 2000

0.5

1

A B C D

E F G H

26 Hz 26 Hz78 Hz

78 Hz

182 Hz 182 Hz 76 Hz76 Hz

Figure 4-13. Model responses to diotic and dichotic presentation of the Minor 2nd and Tritonestimuli. Same format as Fig. 4-6.

107

4.5 Discussion

We have shown that neurons sensitive to IPD beat in response to dichotically-presented

dissonant tone pairs that have low-frequency beating harmonics and neurons not sen-

sitive to IPD do not beat in response to these stimuli. Individual IPD-sensitive

neurons’ responses are dominated by the beat rate of (dichotic) partials closest to

CF provided that the partials are at low frequencies. In the dichotic case, beating is

a central, neural interaction which requires the fine structure of the stimulus to be

represented in the neural signal from each ear at the input to the stage of binaural

interaction and thus can only occur in response to beating partials at frequencies

below the limit of phase-locking. In contrast, for monaural/diotic stimuli, beating

from neighboring partials is an acoustic interaction which can be seen in the stimulus

and occurs in neural responses regardless of the absolute frequency of the individual

partials. For our stimulus set, population responses of IPD-sensitive IC neurons to

dichotic stimuli were similar to the responses of all IC neurons to diotic presentation

with one exception: the diotic Tritone elicits beating whereas the dichotic Tritone

does not. This is likely due to the fact that the beating in the Tritone stimulus

comes from relatively high-frequency partials (> 1200 Hz) for which phaselocking is

relatively weak.

Because dichotically-presented tones are thought not to produce a roughness sen-

sation (Roederer, 1979), our current findings raise issue with our hypothesis that

sensory dissonance (roughness) is coded in responses of IC neurons. Two possible

resolutions of this “dichotic quandary” are 1) roughness is mediated only through

the responses of neurons insensitive to IPD and 2) there exists a form of dichotic

roughness. The plausibility of each of these ideas is discussed below.

We have also shown that a simple binaural coincidence model qualitatively predicts

responses of IC neurons to both diotic and dichotic musical intervals.

Finally, we showed that population all-order ISI histograms from IC neurons reflect

the fundamental bass frequency of some consonant diotic and dichotic tone pairs.

However, for diotic stimuli, there appears to be a sharp cutoff frequency between 220

108

and 440 Hz, above which little representation can be seen in the histograms. For

dichotic tone pairs, the cutoff appears even lower (< 220 Hz). Our proposed neural

correlate of pitch salience for the fundamental bass frequency, P/Bfreq, is correlated

with Fco, the maximum modulation frequency at which a neuron shows significant

synchrony. This shows a consistency between the two measures of synchrony. In

general, for the frequency range of our stimulus set, the degradation of phaselocking

at high frequencies dominates the responses such that the consonance of a particular

interval does not correlate with the salience of its fundamental bass pitch, at least as

far as pitch is represented in the form of population all-order ISI interval distributions.

Instead, the measure of salience correlates better with the absolute frequency of the

fundamental.

4.5.1 Neurophysiology

Roughness

Our results are broadly consistent with previous studies on the sensitivity of IC neu-

rons to interaural phase. Kuwada and Yin (1983) found that about 80% of low-

frequency neurons (CF<∼3000 Hz) are IPD sensitive, while neurons with high CFs are

not IPD sensitive for pure tones. For our small sample of neurons, there was an even

split (8 IPD-sensitive, 8 IPD-insensitive), but some of the CFs of the IPD-insensitive

neurons were higher than 3000 Hz. Sensitivity to IPD has been shown to occur for

pure-tone frequencies up to 3000 Hz in anesthetized cat (Kuwada and Yin, 1983)

and 2150 Hz in unanesthetized rabbit (Kuwada, Stanford, and Batra, 1987) but most

neurons only show IPD sensitivity for frequencies less than 1500-2000 Hz. Yin and

Kuwada (1983) showed that some IC neurons phaselock to binaural beat frequencies

of up to 80 Hz. Our dichotic Tritone stimulus, which did not elicit much phaselock-

ing, has frequency characteristics close to both of these limits: the partials that would

beat are at frequencies greater than 1200 Hz and the binaural beat frequencies would

be greater than 75 Hz (see Fig. 4-5).

Kuwada, Batra, and Stanford (1989) showed that the anesthetic sodium pentobar-

109

bital can effect IC neurons’ response rates, response latencies, response patterns, and

spontaneous activity. Their findings are consistent with the idea that anesthesia elic-

its greater overall inhibition. While this may effect the forms of PSTHs, other aspects

of response including best (∼ 87 Hz) and highest (∼ 250 Hz) temporal envelope mod-

ulation frequency which elicit significant synchrony are similar in anesthetized (Rees

and Møller, 1983; Rees and Palmer, 1989) and unanesthetized (Kuwada, Batra, and

Stanford, 1989) preparations. This provides support for the idea that our findings,

regarding discharge rate fluctuations in response to temporal envelope modulations,

may be generalizable to the unanesthetized case.

One possible resolution to our “dichotic quandary” is that IPD-insensitive neurons

alone code for roughness. While this is possible, Kuwada and Yin (1983) report

that IPD-insensitive neurons constitute only about 20% of low-frequency IC neurons.

Other reports (Semple and Aitkin, 1979) and our small sample population, which

include some higher frequency neurons consist of a larger percentage (∼ 50%) of IPD-

insensitive neurons, making this resolution more plausible. However, in humans, there

would be more low-frequency neurons and therefore fewer IPD-insensitive neurons.

Pitch

Dichotic pitch percepts have historically been interpreted as evidence for spectral

models of pitch (Houtsma and Goldstein, 1972; Bilsen and Goldstein, 1974). Never-

theless, the representation of dichotic pitches in temporal discharge patterns of central

neurons has been hypothesized (Greenberg, 1986) but not demonstrated so far. Here,

we have shown that correlates of dichotic pitch do indeed exist in temporal discharge

patterns of central auditory neurons. Greenberg’s hypothesis was based on the idea

that binaural interaction occurs in the form of a coincidence detector that receives

an input from each ear, much like the model we have used in this study.

A longstanding predicament for temporal models of pitch perception is the degra-

dation of phaselocking to the stimulus in neural responses at sequentially higher

levels of the auditory system. Phaselocking is seen in responses of single AN fibers

and some cochlear nucleus (CN) neurons for stimulus frequencies up to ∼ 5000 Hz

110

although synchrony begins to fall for frequencies greater than ∼ 1000 Hz (Johnson,

1980; Bourk, 1976). However, in single IC neurons, phaselocking is rarely seen for fre-

quencies greater than ∼ 600 Hz (Kuwada et al., 1984) and we saw very little even at

440 Hz. One possible resolution is that the temporal code may be converted to some

other neural code (eg. place code) prior to the IC, or that fine timing information

is preserved in a pathway that has yet to be rigorously studied, such as in the nu-

cleus of the lateral lemniscus. Another possibility is that neural responses of humans

may show better phaselocking at higher frequencies than in cat. This idea has been

examined through measurements of the scalp-recorded frequency following response

(FFR) in humans and cats, which is thought to represent synchronous responses of

neurons in the higher auditory brainstem, although its precise origin has not been

determined (Smith, Marsh, and Brown, 1975). Greenberg et al. (1987) measured

human scalp-recorded FFRs and showed a high correlation between the FFR and

the perceived pitch of a variety of complex-tone stimuli. They found that the FFR

is strong for fundamental frequencies below 500 Hz, but degrades at higher frequen-

cies to almost nothing at 1000 Hz. In cats, however, it has been shown that FFRs

are measurable up to frequencies of nearly 2500 Hz (Merzenich, Gardi, and Vivion,

1983). While this measure may be due to a better signal-to-noise ratio in cats than

in humans, the comparison of human and cat FFRs suggest that neural phaselocking

in humans is not substantially better at high frequencies than it is in cats.

Another difference between the auditory system of humans and cats is in the

distribution of neural best frequencies: humans have a slightly lower frequency range

of hearing and have greater neural representation of lower frequencies (Fay, 1988).

This fact may help the cause of temporal models for pitch despite the dearth of mid-

frequency phaselocking neurons found in the IC of the cat: overall there may be a

better representation of pitch in human neural ISIs simply because there is a larger

proportion of neurons tuned to the relevant frequency range.

While we did not find a good correlate of pitch salience in the population ISI

histograms of IC neural responses, consonance may still correlate with the pitch

salience of the fundamental bass frequency of musical intervals, albeit through a

111

different and yet unknown neural code.

It may appear that our investigation of pitch salience as a possible correlate of

consonance is clouded somewhat by our use of equal temperament tuning rather

than Just intonation. With equal temperament tuning, the ratios of fundamental

frequencies for the Perfect 4th and 5th are not exactly the simple ratios 4/3 and 3/2,

but instead are 4.0045/3 and 2.9966/2 respectively. These deviations may cause slight

temporal smearing of sharp modes in the autocorrelation functions of these stimuli.

However, the deviations are of similar size for both intervals and hence should effect

both histograms similarly. Over the 500 msec duration of the stimuli, the difference in

phase change for the fundamental of the upper tones in the dyads are only about 12%

for the 5th and 8% for the 4th. Therefore relative measures based on the histograms

should not be greatly effected.

4.5.2 Psychophysics and perception

Another possible resolution to our “dichotic quandary” is that roughness of dichot-

ically presented tones does, in fact, exist. Burns and Ward (1976) found that mu-

sicians could identify dichotic as well as diotic musical intervals consisting of two

low-frequency pure tones. They found that subjects’ performance of dichotic interval

identification, for base frequencies of 100 and 262 Hz and intervals near or less than

dichotic fusion thresholds, was equal to or better than the diotic case. At higher fre-

quencies (2000 and 3000 Hz) subjects performed worse for the dichotic intervals than

for the diotic. According to post-experiment discussions, subjects were able to use

“roughness” cues to distinguish between different dichotic intervals (Burns, 2001).

The perception of dichotic roughness may be related to the perception of binaural

beats and dichotic beats of mistuned consonances. Low-frequency binaural beats

are typically perceived as a low-frequency periodic changes in laterality when two

tones, separated by a small frequency difference are presented dichotically (Licklider,

Webster, and Hedlun, 1950; Perrott and Nelson, 1969). Perrot and Nelson (1969)

found that listeners can detect binaural beats for frequencies up to about 1500 Hz

and for frequency differences up to about 80 Hz. A weaker percept, dichotic beats

112

of mistuned consonances, is obtained from two tones presented dichotically whose

frequencies deviate slightly from a simple integer ratio (Feeney, 1997; Tobias, 1963;

Thurlow and Bernstein, 1957). Feeney (1997) showed that listeners could detect such

dichotic beats reliably for component frequencies less than 1000 Hz. Yin, Chan and

Carney (1987) have shown a neural correlate of these dichotic beats in responses of

single IC neurons that were shown to be sensitive to IPD.

For all of these dichotic percepts there is a similar maximum stimulus frequency

(∼ 1000−1500 Hz) under which they exist. This fact is consistent with the idea that

the precepts are based on neural temporal information from each ear and they are

limited at higher frequencies by the fall off of synchrony.

Certainly further psychoacoustic investigations need to be performed in order to

fully understand the notion of dichotic roughness and its relation to dissonance. As

noted above, there is currently some evidence to suggest that the percept may exist

although it is likely to be a weaker percept than its diotic counterpart and limited

to a smaller frequency region. This limitation has implications for the hypothetical

dichotic dissonance of musical intervals from different octave regions and of particular

intervals within the same octave region. Because the percept is likely to arise only

from the beating of low-frequency partials, dichotic intervals based at low frequencies

should generally sound more dissonant than those based at higher frequencies. Also,

dichotic dissonant intervals whose beating comes from the higher order partials (eg.,

Tritone) should sound relatively less dissonant than those whose beating comes from

low order partials (eg., Minor 2nd). A thorough study would examine these differences

over a broad frequency range.

4.6 Conclusion

Previously, we showed that the degree of rate fluctuations in IC neural responses is

correlated with the sensory dissonance of diotic/monaural stimuli. Here, we have

shown that binaural IC neurons sensitive to interaural phase can show beats in re-

sponse to dichotically-presented intervals, even though these stimuli are presumed not

113

to produce a roughness sensation (Roederer, 1979). Two possible resolutions of this

“dichotic quandary” are 1) only the phase-insensitive subset of IC neurons mediate

roughness, or 2) there exists an undiscovered form of dichotic dissonance. It is also

possible that roughness is coded in an entirely different manner.

Our general results for diotic and dichotic musical interval stimuli can be qualita-

tively predicted by a simple binaural coincidence model.

The results presented here illustrate the need for a more complete set of psychoa-

coustic data on the sensory dissonance of musical intervals, both diotic and dichotic.

Particular attention should be directed towards the effects the spectra and funda-

mental frequencies of the stimuli.

114

Chapter 5

Discussion

5.1 Summary of findings

5.1.1 Deviations in auditory-nerve interspike intervals lead

to a prediction of the octave enlargement effect

In Chapter 2 we showed that, in response to pure-tone stimuli, ISIs of AN fibers

deviate from integer multiples of the stimulus period. For low frequencies, (first-order)

ISIs tend to be shorter than the stimulus period and its multiples; for mid frequencies,

ISIs tend to be longer than the stimulus period and it’s multiples. Our analyses

showed that these two different types of ISI deviations stem from fundamentally

different mechanisms. The shortened intervals in response to low frequency tones are

due to multiple spikes occurring within a single stimulus period while the lengthened

intervals in response to mid frequency tones are likely due to refractory properties of

the nerve fibers.

We also showed that these ISI deviations lead to biases in temporally based esti-

mates of the stimulus frequency which, in turn, lead to an accurate prediction of the

octave enlargement effect if we are allowed to introduce a scaling factor of 2 when

making octave judgements. These findings are consistent with the idea that musical

pitch is encoded in ISI distributions of AN fibers.

Special efforts were made during this study to ensure accurate measurement of

115

AN ISI distributions. Our data will provide a precise testbed for rigorous testing of

AN models as well as for models of pitch and other perceptual phenomena based on

AN temporal activity.

5.1.2 Neural correlates of dissonance in responses of IC neu-

rons

We showed, in Chapter 3, that IC neurons respond with greater rate fluctuations

to dissonant musical intervals than to consonant intervals and that the frequency of

their fluctuations matches the beat rate of the stimulus partial-pair closest to the

neuron’s CF. Across all CFs, the average rate fluctuations increased as a function of

perceptual dissonance of the stimuli. This effect was robust across level and was more

pronounced in the responses of Onset neurons than in Sustained or Pauesr neurons.

Onset neurons also reflect the dissonance of the stimuli in their average discharge

rate. We also showed that IC neurons respond similarly to changes in dissonance in

the context of a musical passage.

The differences in responses of Onset, Sustained and Pauser neurons to the tone-

pair stimuli are paralleled by differences in the MTFs of the different unit types. We

found that MTFs from Onset neurons tend to be more sharply tuned, centered at

lower frequencies, and provide more gain at the BMF than those from Sustained or

Pauser neurons.

In Chapter 4 we examined responses of IC neurons to dichotically-presented mu-

sical intervals which are thought not to elicit a sensation of roughness. We found that

neurons sensitive to IPD show a beating response to some dissonant dichotic intervals

similar to that from diotic intervals, while neurons insensitive to IPD do not beat in

response to dichotic stimuli. Beating in response to dichotic stimuli differs from diotic

stimuli in that it requires the temporal fine structure of the stimulus to be present in

the neural response from each ear at the stage where binaural interaction takes place.

Consequently, due to the fall off of synchrony with frequency, neural beating is only

seen for dichotic stimuli that have pairs of low-frequency beating harmonics. As a re-

116

sult, dissonant tone pairs whose diotic roughness comes from high-frequency partials,

such as the Tritone, do not produce a beating response when presented dichotically.

Because IPD-sensitive neurons beat in response to some dichotic as well as diotic

dissonant tone pairs, and dichotic roughness is thought not to exist, it is clear that we

must reinterpret our conclusion, from Chapter 3, that sensory dissonance is encoded

in rate fluctuations of all IC neurons. Two possible resolutions are: 1) only those

neurons insensitive to IPDs mediate the perception of roughness (sensory dissonance);

or 2) there exists a form of dichotic roughness. It is also possible that roughness is

encoded some other form than rate fluctuations of IC neurons.

We also showed that population all-order ISI histograms from IC neurons reflect

the fundamental bass frequency of some consonant diotic and dichotic tone pairs. For

diotic stimuli, there appears to be a sharp cutoff frequency between 220 and 440 Hz,

above which little representation of fundamental bass can be seen in the histograms.

For dichotic tone pairs, the cutoff appears even lower (< 200 Hz). For the frequency

range of our stimulus set, the effect of synchrony roll-off dominates the responses

such that the consonance of a particular interval does not correlate with the relative

strength of the representation of the fundamental bass in the population all-order ISI

interval distributions. This does not preclude the notion that consonance is based on

the pitch salience of the fundamental bass, just that pitch salience may not be coded

in all-order ISI interval distributions at the level of the IC.

In addition, we showed in Chapter 4, that a simple binaural coincidence model

can predict the general temporal properties of IC neurons pertinent to the neural

coding of musical dissonance.

5.2 Limitations of the neurophysiological data

5.2.1 Effect of anesthesia

It is important to recognize the fact that neural activity in unanesthetized prepara-

tions differs from that in anesthetized preparations and, consequently, comparisons

117

of psychophysical data from unanesthetized subjects to physiological data from anes-

thetized preparations should be performed with caution. Anesthesia has been shown

to effect responses of IC neurons in a manner that is consistent with increased in-

hibition (Kuwada, Batra, and Stanford, 1989; Astl et al., 1996). While this may

effect the overall reponses rates and forms of PSTHs, other aspects of responses,

such as the best and highest modulation frequency to elicit significant synchrony are

similar in anesthetized (Rees and Møller, 1983; Rees and Palmer, 1989) and unanes-

thetized (Kuwada, Batra, and Stanford, 1989) preparations. Thus, our findings on

correlates of dissonance based on discharge rate fluctuations are likely to exist in the

unanesthetized case as well.

5.2.2 Small sample sizes

For a few aspects of our findings, we have made claims based on a relatively small

data sample size: we have only 9 measured responses to pure-tone pairs in Fig. 3-7;

and we have only 8 complete sets of measurements from IPD-sensitive and IPD-

insensitive neurons for our studies of responses to dichotic stimuli (Figs. 4-4 and

4-12). In the case of the pure-tone pairs, our data sample is limited by the fact that

our stimulus frequencies fell below the response areas of most neurons and, in the case

of the dichotic stimuli, there were simply more measurements required than could be

measured in the duration that we could hold most neurons. Despite these limitations

in sample size, our conclusions are supported by the fact that our basic results are

consistent with previous findings on interaural phase sensitivity of IC neurons (Yin,

Chan, and Carney, 1987; Kuwada et al., 1984; Yin and Kuwada, 1983) and on their

sensitivity to envelope modulation (Krishna and Semple, 2000; Rees and Møller,

1987; Rees and Møller, 1983). In addition, we have shown that IC neural responses

to our stimuli can be predicted by a simple binaural coincidence model (Section 4.4),

whose basic form has been shown to predict IC neural responses to other dichotic

stimuli (Yin, Chan, and Carney, 1987). Nevertheless, our findings could be further

bolstered by the collection of more data.

118

5.2.3 Limited frequency range

In Chapters 3 and 4 we have measured responses to only a limited set of stimuli

within a single-octave range of fundamental frequencies. Although we have not shown

physiological correlates of dissonance to exist in other octaves, there is reason to

believe that our findings are extendible. Previous studies of IC responses to amplitude

modulated tones have covered broader frequency ranges and have not reported any

significant decreases in responses for carrier frequencies outside the 440 to 880 Hz

range (Krishna and Semple, 2000; Rees and Møller, 1987; Rees and Møller, 1983).

There may, however, be differences in responses to like intervals in different octave

ranges because the corresponding beat frequencies are different (halved for the octave

below, doubled for the octave above). This will likely cause more intervals at lower

frequencies to elicit neural rate fluctuations because more partial pairs will have beat

frequencies in the IC neural response range. The reverse will be true for intervals in

higher octave ranges. More data should be collected to confirm this speculation but

there is a parallel perceptual phenomenon which is illustrated in musical practice: in

bass octave ranges most musical intervals sound dissonant so consequently, only the

most relatively consonant intervals are used.

5.3 Pitch

An assumption throughout this work has been that pitch is based on a neural repre-

sentation of the stimulus fundamental frequency. We have focused on how neural ISI

distributions represent the stimulus frequency and how they may be related to the

pitch percept but we should also discuss other neural representations of the stimulus,

namely rate/place and phase/place representations.

The tonotopic frequency response of the basilar membrane is reflected in the

array of AN fiber activity so that stimulus frequency can be estimated from the

discharge rate profile across the whole nerve. This form of representation occurs

for all stimulus frequencies to which the basilar membrane responds but it is highly

susceptible to saturation, especially in the presence of noise. Kim, et al. (1990) showed

119

that saturation becomes less of a problem if one looks only at low-spontaneous rate

fibers and operates on the fiber driven rate normalized by its standard deviation

rather than on its raw discharge rate. Rice, et al. (1995) examined the rate difference

(between stimulus and no-stimulus conditions) and showed that this representation

also performs better than raw discharge rate. Saturation is not a problem for pure

tones in quiet but it is when they are presented in noise (Siebert, 1970).

Another representation of stimulus frequency lies in the phase pattern or the

phase difference of the AN response. For low frequencies (<∼5 kHz) the AN response

is phase-locked to the stimulus and fibers that innervate the basilar membrane at

points separated by one spatial wavelength (or any integer multiple) fire at the same

phase. A coincidence detector, with inputs from two specific points on the basilar

membrane would be tuned to the spatial wavelength defined by the two points (and

its corresponding frequency). A network of such coincidence detectors could use

the phase of the response across the nerve to estimate frequency (Loeb, White, and

Merzenich, 1983; Shamma and Klein, 2000). One weakness of such models is that they

require interaction across the full CF range of the auditory system, a requirement for

which there is no physiological evidence. A similar coincidence detection mechanism,

operating on fibers close in CF, has been postulated as a basis for level and frequency

discrimination (Carney, 1994; Heinz, Carney, and Colburn, 1999). These models use

the fact that responses of fibers innervating closely neighboring portions of the basilar

membrane become more coincident as stimulus level increases.

A complete study of neural correlates of pitch effects would examine all three of

these neural representations of frequency. However, accurate characterization of the

rate/place or phase/place representations requires precise measurement of the spatial

(across characteristic frequency (CF)) distribution of AN activity. One could attempt

to perform a population study of single-unit recordings from a single animal but this

would almost certainly not give a clear-cut result for the prediction of pitch effects

as subtle as the octave enlargement effect. Another method to precisely examine

the spatial variation of AN activity is to locally sweep the stimulus frequency and

examine the changes in a single fiber (May and Huang, 1997; Cariani and Delgutte,

120

1996b). A transformation can then be made to estimate the response of nearby fibers,

assuming closely spaced fibers respond in a similar fashion. However, attempts by this

author using this method to show a rate/place correlate to the octave enlargement

effect have been inconclusive. This is not to say that these neural representations

of frequency would not demonstrate correlates of subtle pitch effects, it’s just that

they are exceedingly difficult to measure. All said, this does not take away from the

positive correlate of the octave enlargement effect that we see in ISI distributions.

An enduring issue for temporal models of pitch is the degradation of phaselocking

at progressively higher centers of the auditory system. If pitch is truly based on

ISI distributions from the auditory nerve, what happens to this interval code by

the time it reaches the IC? One possibility is that it is converted, prior to the IC,

into an alternative code, such as a rate/place representation. If this were the case,

however, one would expect to see much sharper tuning in individual IC neurons

than is generally seen. Alternatively, the interval code may still exist, albeit in a

much smaller subset of neurons than at lower-level nuclei. Phase-locking in single

IC neurons has been seen for frequencies as high as 1,200 Hz but in relatively few

neurons (Kuwada et al., 1984). This may still be enough to encode pitch information

because, as Siebert (1970) demonstrated, very few neurons are required to reliably

encode stimulus frequency in a temporal manner. This is clearly an area for further

investigation.

5.4 Consonance and dissonance

The percepts of consonance and dissonance are less obvious than that of pitch, which

makes difficult the task of interpreting the psychoacoustic data on dissonance and

quantitatively correlating it with neurophysiological responses. Investigators have

used a variety of terms to convey to subjects the meaning of consonance (pleasant-

ness, smoothness, purity, fusion, clearness) and dissonance (unpleasantness, rough-

ness, turbidity) (Kameoka and Kuriyagawa, 1969a; Kameoka and Kuriyagawa, 1969b;

Plomp and Levelt, 1965; Malmberg, 1917; Kaestner, 1909), while some studies have

121

attempted to correlate subjects’ ratings across the different criteria (van de Geer,

Levelt, and Plomp, 1962; Guernsey, 1928). Other investigators have avoided the use

of the terms consonance and dissonance altogether and instead asked subjects to rate

the tension of a particular chord, which is considered to be a functional effect of

dissonance in music (Pressnitzer et al., 2000; Bigand, Parncutt, and Lerdahl, 1996).

Associated with this relatively ambiguous description of the percept is an incom-

plete agreement on the exact rank order of dissonances for all 12 intervals within the

Western diatonic scale. An additional confounding factor is the use (across studies)

of stimuli with differing spectra, as relative component strength has been shown to

affect judgements of dissonance (Kameoka and Kuriyagawa, 1969b). As a result, we

have not tried to demonstrate, in this set of studies, a quantitative correlation be-

tween the psychoacoustic data and the physiological measures. However, despite the

differing methods and results across studies, there is a clear general agreement on the

most consonant (Unison, Octave, Perfect 5th, Perfect 4th) and dissonant (Minor 2nd,

Major 2nd) intervals, as well as on the complex-tone Tritone sounding more dissonant

than the Perfect 4th and Perfect 5th. The broad consensus of these psychophysical

relations, on which our studies are based, suggests that meaningful comparisons to

neurophysiological responses can be made.

5.5 Conclusions

Our findings illustrate the complexity and specificity of temporal neural processing at

multiple resolutions in the auditory periphery, brainstem and midbrain. In addition,

they show that musical percepts generally considered to be “high order”, such as the

dissonance of musical intervals, have direct neural correlates in low- and mid-level

nuclei of the auditory system.

122

5.6 Future Work

The general principle involved in finding a neural correlate of the octave enlarge-

ment effect in Chapter 2 could be applied to search for neural correlates of different

pitch effects using temporal as well as rate- and phase-place models. A reliable

model for pitch, based on its underlying neural code, should predict the pitch of all

stimuli under all conditions, including the pitch-intensity effect (Verschuure and van

Meeteren, 1975; Stevens, 1935; Fletcher, 1934), changes of pitch due to the presence

of noise (Stoll, 1985), post-stimulus pitch effects (Hall 3rd and Soderquist, 1982) and

dichotic pitches (Houtsma and Goldstein, 1972; Bilsen, 1977; Hartmann and McMil-

lon, 2001). A battery of tests such as these could help shed light on the neural code

for pitch in the brainstem. Many of the pitch effects listed above have been inves-

tigated psychoacoustically, but neural responses to the same stimuli have not been

examined. This could be done with multiple coding schemes in mind to examine the

relative capability of each to mediate the pitch effects and, thus, pitch overall.

Further study of the psychophysics of consonance and dissonance is required in

order to advance our knowledge of the underlying neural code. A thorough investi-

gation using a full pair-wise comparison method should compare responses to both

diotic and dichotic stimuli, use both pure- and complex-tones, and look at the effects

of spectra and fundamental frequency (octave range). Also, the relative contributions

to dissonance of pitch (of the fundamental bass) and roughness should be examined.

One possible method would be to separate the temporal envelope and fine structure

of tone pairs using the “auditory chimera” method of Smith, Delgutte and Oxen-

ham (2001) and examine which is more pertinent to the perceived dissonance of the

tone pair. Roughness cues would be associated with the temporal envelope while

pitch would be based on the fine structure.

123

124

Bibliography

Adams, J. C. (1979). “Ascending projections to the inferior colliculus,” J. Comp.Neurol. 183, 519–538.

Adams, J. C. (1995). “Cytochemical mapping of the inferior colliculus,” Abstracts ofthe 18th Midwinter Meeting of the Association for Research in Otolaryngology160.

Astl, J., Popelar, J., Kvasnak, E., and Syka, J. (1996). “Comparison of responseproperties of neurons in the inferior colliculus of guinea pigs under differentanesthetics,” Audiology 35, 335–345.

Attneave, F. and Olson, R. K. (1971). “Pitch as a medium: A new apporoach topsychophysical scaling,” Am. J. Psychol. 84, 147–166.

Bartok, B. (1940). Mikrokosmos (Boosey and Hawkes, London), Vol. 1. 1987 Edition.

Biasutti, M. (1997). “Sharp low- and high-frequency limits on musical chord recog-nition,” Hear. Res. 105, 77–84.

Bigand, E., Parncutt, R., and Lerdahl, F. (1996). “Perception of musical tension inshort chord sequences: The influence of harmonic function, sensory dissonance,horizontal motion, and musical training,” Percept. Psychophys. 58, 125–141.

Bilsen, F. A. (1977). “Pitch of noise signals: Evidence for a ”central spectrum”,” J.Acoust. Soc. Am. 61, 150–161.

Bilsen, F. A. and Goldstein, J. L. (1974). “Pitch of dichotically delayed noise and itspossible spectral basis,” J. Acoust. Soc. Am. 55, 292–296.

Boomsliter, P. and Creel, W. (1961). “The long pattern hypothesis in harmony andhearing,” J. Music Theory 5, 2–30.

Bourk, T. R. (1976), “Electrical responses of neural units in the anteroventral cochlearnucleus of the cat,” Ph.D. thesis, Massachusetts Institute of Technology, Cam-bridge, MA.

Bregman, A. S. (1990). Auditory scene analysis: the perceptual organization of sound(The MIT Press, Cambridge, Massachusetts).

125

Bregman, A. S. and Pinker, S. (1978). “Auditory streaming and the building oftimbre,” Canad. J. Psychol. 32, 19–31.

Burns, E. M. (2001). “Personal communication,”.

Burns, E. M. and Ward, W. D. (1976). “Perception of monotic and dichotic harmonicmusical intervals,” J. Acoust. Soc. Am. (abst) 59, S52.

Cariani, P. A. and Delgutte, B. (1996a). “Neural correlates of the pitch of complextones. I. Pitch and pitch salience,” J. Neurophysiol. 76, 1698–1716.

Cariani, P. A. and Delgutte, B. (1996b). “Neural correlates of the pitch of complextones. II. Pitch shift, pitch ambiguity, phase invariance, pitch circularity, ratepitch, and the dominance region for pitch,” J. Neurophysiol. 76, 1717–1734.

Carney, L. H. (1994). “Spatiotemporal encoding of sound level: models for normalencoding and recruiment of loudness,” Hear. Res. 76, 31–44.

Covey, E., Kauer, J. A., and Casseday, J. H. (1996). “Whole-cell patch-clamp record-ing reveals subthreshold sound-evoked postsynaptic currents in the inferior col-liculus of awake bats,” J. Neurosci. 16, 3009–3018.

Darling, A. M. (1991). “Properties and implementation of the GammaTone filter: atutorial,” in Speech, Hearing, and Language Work in Progress (University CollegeLondon, Department of Phonetics and Linguistics, London), Vol. 5.

de Cheveigne, A. (1985). “A nerve fiber discharge model for the study of pitch,”in Transactions of the Committee on Speech Research/Hearing Research (TheAcoustical Society of Japan, Tokyo), pp. 279–286. S85-37 (September 19, 1985).

Delgutte, B. (1990). “Physiological mechanisms of psychophysical masking: observa-tions from auditory-nerve fibers,” J. Acoust. Soc. Am. 87, 791–809.

Delgutte, B., Hammond, B. M., and Cariani, P. A. (1998). “Neural coding of thetemporal envelope of speech: Relation to modulation transfer functions,” in Psy-chophysical and Physiological Advances in Hearing, edited by A. R. Palmer,A. Rees, A. Q. Summerfield, and R. Meddis (Whurr, London), pp. 595–603.Proceedings of the 11th International Symposium on Hearing, Grantham, U K.,1-6th August, 1997.

Delgutte, B., Hammond, B. M., and Cariani, P. A. (2000). “Neural coding of thetemporal envelop of speech,” in Listening to Speech, edited by S. Greenberg andW. Ainsworth (Oxford University Press, New York), pg. (In Press).

Delgutte, B., Joris, P. X., Litovsky, R. Y., and Yin, T. C. T. (1999). “Receptive fieldsand binaural interactions for virtual-space stimuli in the cat inferior colliculus,”J. Neurophys. 81, 2833–51.

126

Delgutte, B. and Oxenham, A. J. (2001). “Auditory chimeras,” Abstracts of the 24thMidwinter Meeting of the Association for Research in Otolaryngology 623.

Demany, L. and Semal, C. (1990). “Harmonic and melodic octave templates,” J.Acoust. Soc. Am. 88, 2126–2135.

Dobbins, P. A. and Cuddy, L. L. (1982). “Octave discrimination: An experimentalconfirmation of the ”stretched” subjective octave,” J. Acoust. Soc. Am. 72, 411–415.

Doughty, J. M. and Garner, W. R. (1947). “Pitch characteristics of short tones. I.Two kinds of pitch threshold,” J. Exp. Psychol. 37, 351–365.

Doughty, J. M. and Garner, W. R. (1948). “Pitch characteristics of short tones. II.Pitch as a function of tonal duration,” J. Exp. Psychol. 38, 478–494.

Dowling, W. J. and Harwood, D. L. (1986). Music Cognition (Academic, San Diego),Series in Cognition and Perception.

Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap (Chapman& Hall, New York), Monographs on Statistics and Applied Probability.

Evans, E. F. (1983). “Pitch and cochlear nerve fibre temporal discharge patterns,”in Hearing: Physiological Bases and Psychophysics, edited by R. Klinke andR. Hartmann (Springer-Verlag, Berlin), pp. 140–146.

Fastl, H. (1990). “The hearing sensation roughness and neuronal responses to am-tones,” Hear. Res. 46, 293–296.

Fay, R. R. (1988). Hearing in Vertebrates: A Psychophysics Databook (Hill-Fay As-sociates, Winnetka, Illinois).

Feeney, M. P. (1997). “Dichotic beats of mistuned consonances,” J. Acoust. Soc. Am.102, 2333–2342.

Fishman, Y. I., Reser, D. H., Arezzo, J. C., and Steinschneider, M. (2000). “Com-plex tone processing in primary auditory cortex of the awake monkey. I. Neuralensemble correlates of roughness,” J. Acoust. Soc. Am. 108, 235–246.

Fletcher, H. (1934). “Loudness, pitch and the timbre of musical tones and theirrelation to the intensity, the frequency and the overtone structure,” J. Acoust.Soc. Am. 6, 59–69.

Frisina, R. D., Smith, R. L., and Chamberlain, S. C. (1990). “Encoding of amplitudemodulation in the gerbil cochlear nucleus. I. A hierarchy of enhancement,” Hear.Res. 44, 99–122.

Fullerton, B. (1993). “Brainstem nuclei project preferentially to different parts of theIC central nucleus,” Unpublished figure (Unpublished).

127

Gaumond, R. P., Kim, D. O., and Molnar, C. E. (1983). “Response of cochlear nervefibers to brief acoustic stimuli: Role of discharge-history effects,” J. Acoust. Soc.Am. 74, 1392–1398.

Gaumond, R. P., Molnar, C. E., and Kim, D. O. (1982). “Stimulus and recovery de-pendence of cat cochlear nerve fiber spike discharge probability,” J. Neurophysiol.48, 856–873.

Goldstein, J. L. (1973). “An optimum processor theory for the central formation ofthe pitch of complex tones,” J. Acoust. Soc. Am. 54, 1496–1516.

Goldstein, J. L. and Srulovicz, P. (1977). “Auditory-nerve spike intervals as an ade-quate basis for aural frequency measurement,” in Psychophysics and Physiologyof Hearing, edited by E. F. Evans and J. P. Wilson (Academic, London), pp.337–346.

Greenberg, S. (1986). “Comment after paper by e f. evans on page 253,” in AuditoryFrequency Selectivity, edited by B. J. C. Moore and R. D. Patterson (PlenumPress, New York), Vol. 119 of NATO ASI Series A: Life Sciences, pp. 263–264.

Greenberg, S., Marsh, J. T., Brown, W. S., and Smith, J. C. (1987). “Neural temporalcoding of low pitch. I. Human frequency-following responses to complex tones,”Hear Res 25, 91–114.

Guernsey, M. (1928). “The role of consonance and dissonance in music,” Am. J.Psychol. 40, 173–204.

Gulick, W. L., Gescheider, G. A., and Frisina, R. D. (1989). Hearing: physiologi-cal acoustics, neural coding, and psychoacoustics (Oxford University Press, NewYork).

Hall 3rd, J. W. and Soderquist, D. R. (1982). “Transient complex and pure tone pitchchanges by adaptation,” J. Acoust. Soc. Am. 71, 665–670.

Hartmann, W. M. (1993). “On the origin of the enlarged melodic octave,” J. Acoust.Soc. Am. 93, 3400–3409.

Hartmann, W. M. and McMillon, C. D. (2001). “Binaural coherence edge pitch,” J.Acoust. Soc. Am. 109, 294–305.

Heinz, M. G., Carney, L. H., and Colburn, H. S. (1999). “Monaural, cross-frequencycoincidence detection as a mechanism for decoding perceptual cues provided bythe cochlear amplifier,” J. Acoust. Soc. Am. (abst) 105, 1023.

Hotchberg, Y. and Tamhane, A. C. (1987). Multiple Comparison Procedures (Wiley,New York).

Houtsma, A. J. M. (1984). “Pitch salience of various complex sounds,” Music Per-ception 1, 296–307.

128

Houtsma, A. J. M. and Goldstein, J. L. (1972). “The central origin of the pitch ofcomplex tones: Evidence from musical interval recognition,” J. Acoust. Soc. Am.51, 520–529.

Houtsma, A. J. M., Rossing, T. D., and Wagenaars, W. M. (1987). “Auditory demon-strations,” Compact Disc. Acoustical Society of America, Eindhoven, Nether-lands.

Hulse, S. H., Bernard, D. J., and Braaten, R. F. (1995). “Auditory discriminationof chord-based spectral structures by European Starlings (sturnus vulgaris),” J.Exp. Psych. Gen. 124, 409–423.

Jeppesen, K. (1927). The Style of Palestrina and the Dissonance (Oxford UniversityPress, Oxford).

Johnson, D. H. (1980). “The relationship between spike rate and synchrony in re-sponses of auditory-nerve fibers to single tones,” J. Acoust. Soc. Am 68, 1115–1122.

Joris, P. X. and Yin, T. C. T. (1998). “Envelope coding in the lateral superior olive.III. Comparison with afferent pathways,” J. Neurophysiol. 79, 253–269.

Kaernbach, C. and Demany, L. (1998). “Psychophysical evidence against the auto-correlation theory of auditory temporal processing,” J. Acoust. Soc. Am. 104,2298–2306.

Kaestner, G. (1909). “Untersuchungen uber den gefuhlseindruck unanalysierterZweiklange,” Psychol. Studien 4, 473–504.

Kameoka, A. and Kuriyagawa, M. (1969a). “Consonance theory part I: Consonanceof dyads,” J. Acoust. of Am. 45, 1451–1459.

Kameoka, A. and Kuriyagawa, M. (1969b). “Consonance theory part II: Consonanceof complex tones and its calculation method,” J. Acoust. of Am. 45, 1460–1469.

Kiang, N. Y. S. (1980). “Peripheral neural processing of auditory information,” inHandbook of Physiology, edited by I. Darian-Smith (American Physiological So-ciety, Bethesda, MD).

Kiang, N. Y. S. (1990). “Curious oddments of auditory-nerve studies,” Hear. Res.49, 1–16.

Kiang, N. Y. S. and Moxon, E. C. (1972). “Physiological considerations in artificialstimulation of the inner ear,” Ann. Otol. Rhinol. Laryngol. 81, 714–730.

Kiang, N. Y. S. and Moxon, E. C. (1974). “Tails of tuning curves of auditory-nervefibers,” J. Acoust. Soc. Am. 55, 620–630.

129

Kiang, N. Y. S., Moxon, E. C., and Levine, R. A. (1970). “Auditory-nerve activity incats with normal and abnormal cochleas,” in Sensorineural Hearing Loss, editedby G. E. W. Wolstenholme and J. Knight (J. & A. Churchill, London), pp.241–273.

Kiang, N. Y. S., Watanabe, T., Thomas, E. C., and Clark, L. F. (1965). DischargePatterns of Single Fibers in the Cat’s Auditory Nerve (The MIT Press, Cam-bridge, MA).

Kim, D. O., Chang, S. O., and Sirianni, J. G. (1990). “A population study ofauditory-nerve fibers in unanesthetized decerebrate cats: response to pure tones,”J. Acoust. Soc. Am. 87, 1648–55.

Krishna, B. S. and Semple, M. N. (2000). “Auditory temporal processing: Responsesto sinusoidally amplitude-modulated tones in the inferior colliculus,” J. Neuro-physiol. 84, 255–273.

Kuwada, S., Batra, R., and Stanford, T. R. (1989). “Monaural and binaural responsepropetes of neurons in te infeior colliculus of the rabbit: Effects of sodium pen-tobarbital,” J. Neurophysiol. 61, 269–282.

Kuwada, S., Batra, R., Yin, T. C. T., Oliver, D. L., Haberly, L. B., and Stanford,T. R. (1997). “Intracellular recordings in response to monaural and binauralstimulation of neurons in the inferoir colliculus of the cat,” J. Neurosci. 17,7565–7581.

Kuwada, S., Stanford, T. R., and Batra, R. (1987). “Interaural phase-sensitive units inthe infeior colliculus ofthe unanesthetized rabbit: Effects of changing frequency,”J. Neuophysiol. 57, 1338–1360.

Kuwada, S. and Yin, T. C. T. (1983). “Binaural interaction in low-frequency neuronsin inferior colliculus of the cat. I. effects of long interaural delays, intensity, andrepetition rate on interaural delay function,” J. Neurophysiol. 50, 981–999.

Kuwada, S., Yin, T. C. T., Syka, J., Buunen, T. J. F., and Wickesberg, R. E. (1984).“Binaural interaction in low-frequency neurons in inferior colliculus of the cat.IV. Comparison of monaural and binaural response properties,” J. Neurophysiol.51, 1306–1325.

Langner, G. and Schreiner, C. E. (1988). “Periodicity coding in the inferior colliculusof the cat. I. Neuronal mechanisms,” J. Neurophys. 60, 1799–1822.

Liberman, M. C. and Kiang, N. Y. S. (1978). “Acoustic trauma in cats; cochlearpathology and auditory-nerve activity,” Acta Oto-Laryngologica Suppl. 358,1–63.

Liberman, M. C. and Kiang, N. Y. S. (1984). “Single-neuron labeling and chroniccochlear pathology. IV. Stereocilia damage and alterations in rate- and phase-level functions,” Hear. Res. 16, 75–90.

130

Licklider, J. C. R. (1951). “A duplex theory of pitch perception,” Experientia 7,128–134.

Licklider, J. C. R. (1956). “Auditory frequency analysis,” in Information Theory,edited by C. Cherry (Butterworths, London), pp. 253–268.

Licklider, J. C. R., Webster, J. C., and Hedlun, J. M. (1950). “On the frequencylimits of binaural beats,” J. Acoust. Soc. Am. 22, 468–473.

Loeb, G. E., White, M. W., and Merzenich, M. M. (1983). “Spatial cross-correlation.a proposed mechanism for acoustic pitch perception,” Biol. Cybern. 47, 149–63.

Malmberg, C. F. (1917). “The perception of consonance and dissonance,” Psychol.Monogr. 25, 93–133.

May, B. J. and Huang, A. Y. (1997). “Spectral cues for sound localization in cats: amodel for discharge rate representations in the auditory nerve,” J. Acoust. Soc.Am. 101, 2705–19.

McKinney, M. F. and Delgutte, B. (1999). “A possible neurophysiological basis ofthe octave enlargement effect,” J. Acoust. Soc. Am. 106, 2679–2692.

McKinney, M. F., Tramo, M. J., and Delgutte, B. (2001a). “Neural correlates of thedissonance of musical intervals in the inferior colliculus. I. Monaural and diotictone presentation,” Ph D. Thesis Chapter 3 (Unpublished).

McKinney, M. F., Tramo, M. J., and Delgutte, B. (2001b). “Neural correlates ofthe dissonance of musical intervals in the inferior colliculus. II. dichotic tonepresentation and pitch salience,” Ph D. Thesis Chapter 4 (Unpublished).

Merzenich, M. M., Gardi, J. N., and Vivion, M. C. (1983). “Animals,” in Bases ofauditory brain-stem evoked responses, edited by E. J. Moore (Grune & Stratton,New York), pp. 391–412.

Moon, T. K. (1996). “The expectation-maximization algorithm,” IEEE Signal Pro-cessing Magazine Nov., 47–60.

Nuding, S. C., Chen, G.-D., and Sinex, D. G. (1999). “Monaural response propertiesof single neurons in the chinchilla inferior colliculus,” Hear. Res. 131, 89–106.

Ohgushi, K. (1978). “On the role of spatial and temporal cues in the perception ofthe pitch of complex tones,” J. Acoust. Soc. Am. 64, 764–771.

Ohgushi, K. (1983). “The origin of tonality and a possible explanation of the octaveenlargement phenomenon,” J. Acoust. Soc. Am. 73, 1694–1700.

Partch, H. (1974). Genesis of a Music (Da Capo Press, New York).

131

Patterson, R. D., Peters, R. W., and Milroy, R. (1983). “Threshold duration formelodic pitch,” in Hearing: Physiological Bases and Psychophysics, edited byR. Klinke and R. Hartmann (Springer-Verlag, Berlin), pp. 321–325.

Perkel, D. H., Gerstein, G. L., and Moore, G. P. (1967). “Neuronal spike trains andstochastic point processes. I. The single spike train,” Biophys. J. 7, 391–418.

Perrott, D. R. and Nelson, M. A. (1969). “Limits for the detection of binaural beats,”J. Acoust. Soc. Am. 46, 1477–1481.

Plomp, R. and Levelt, W. J. M. (1965). “Tonal consonance and critical bandwidth,”J. Acoust. Soc. Am. 38, 548–560.

Plomp, R. and Steeneken, H. J. M. (1968). “Interference between two simple tones,”J. Acoust. Soc. Amer. 43, 883.

Pollack, I. (1967). “Number of pulses required for minimal pitch,” J. Acoust. Soc.Am. 42, 895.

Pressnitzer, D., McAdams, S., Winsberg, S., and Fineberg, J. (2000). “Perception ofmusical tension for nontonal orchestral timbres and its relation to psychoacousticroughness,” Perception & Psychophysics 62, 66–80.

Pythagoras (c. 540-510 B C.). cited by (von Helmholtz, 1863).

Rameau, J.-P. (1722). Treatise on Harmony (Dover Publications, Inc., New York).Translated by P. Gossett (1971).

Randel, D. M. (1978). Harvard Concise Dictionary of Music (The Belknap Press ofHarvard University Press, Cambridge, Massachusetts).

Redner, R. A. and Walker, H. F. (1984). “Mixture densities, maximum likelihoodand the EM algorithm,” SIAM Review 26, 195–239.

Rees, A. and Møller, A. R. (1983). “Responses of neurons in the inferior colliculus ofthe rat to am and fm tones,” Hear. Res. 10, 301–330.

Rees, A. and Møller, A. R. (1987). “Stimulus properties influencing the responsesof inferior colliculus neurons to amplitude-modulated sounds,” Hear. Res. 27,129–143.

Rees, A. and Palmer, A. R. (1989). “Neuronal responses to amplitude-modulated andpure-tone stimuli in the guinea pig inferior colliculus, and their modification bybroadband noise,” J. Acoust. Soc. Am. 85, 1978–1994.

Rees, A., Sarbaz, A., Malmierca, M. S., and Beau, F. E. N. L. (1997). “Regularity offiring of neurons in the inferior colliculus,” J. Neurophys. 77, 2945–2965.

Rhode, W. S. (1995). “Interspike intervals as a correlate of periodicity pitch in catcochlear nucleus,” J. Acoust. Soc. Am. 97, 2414–2429.

132

Rhode, W. S. and Greenberg, S. (1994). “Encoding of amplitude modulation in thecochlear nucleus of the cat,” J. Neurophys. 71, 1797–1825.

Rice, J. J., Young, E. D., and Spirou, G. A. (1995). “Auditory-nerve encoding ofpinna-based spectral cues: rate representation of high-frequency stimuli,” J.Acoust. Soc. Am. 97, 1764–1776.

Rodieck, R. W. (1967). “Maintained activity of cat retinal ganglion cells,” J. Neuro-phys. 30, 1043–1071.

Rodieck, R. W., Kiang, N. Y. S., and Gerstein, G. L. (1962). “Some quantitativemethods for the study of spontaneous activity of single neurons,” Biophys. J. 2,351–368.

Roederer, J. G. (1979). Introduction to the physics and psychophysics of music(Springer-Verlag, New York).

Rose, J. E., Brugge, J. F., Anderson, D. J., and Hind, J. E. (1967). “Phase-lockedresponse to low-frequency tones in single auditory nerve fibers of the squirrelmonkey,” J. Neurophysiol. 30, 769–793.

Rose, J. E., Brugge, J. F., Anderson, D. J., and Hind, J. E. (1968). “Patternsof activity in single auditory nerve fibers of the squirrel monkey,” in HearingMechanisms in Vertebrates, edited by A. V. S. de Reuck and J. Knight (Churchill,London), pp. 144–168.

Ruggero, M. A. (1973). “Response to noise of auditory nerve fibers in the squirrelmonkey,” J. Neurophysiol. 36, 569–587.

Ruggero, M. A., Rich, N. C., Shivapuja, B. G., and Temchin, A. N. (1996). “Auditory-nerve responses to low-frequency tones: Intensity dependence,” Aud. Neurosci.2, 159–185.

Schellenberg, E. G. and Trainor, L. J. (1996). “Sensory consonance and the per-ceptual similarity of complex-tone harmonic intervals: Tests of adult and infantlisteners,” J. Acoust. Soc. Am. 100, 3321–3328.

Semple, M. N. and Aitkin, L. M. (1979). “Representation of sound frequency andlaterality by units in central nucleus of cat inferior colliculus,” J. Neurophysiol.42, 1626–1639.

Sethares, W. A. (1999). Tuning, Timbre, Spectrum, Scale (Springer-Verlag, London).

Shamma, S. and Klein, D. (2000). “The case of the missing pitch templates: Howharmonic templates emerge in the early auditory system,” J. Acoust. Soc. Am.107, 2631–2644.

Siebert, W. M. (1970). “Frequency discrimination in the auditory system: Place orperiodicity mechanisms?,” Proc. IEEE 58, 723–730.

133

Smith, J. C., Marsh, J. T., and Brown, W. S. (1975). “Far-field recorded frequency-following responses: Evidence for the locus of brainstem sources,” Electroen-cephalogr. Clin. Neurophysiol. 39, 465–472.

Stevens, S. S. (1935). “The relation of pitch to intensity,” J. Acoust. Soc. Am. 6,150–154.

Stoll, G. (1985). “Pitch shift of pure and complex tones induced by masking noise,”J. Acoust. Soc. Am. 77, 188–192.

Stumpf, C. (1890). Tonpsychologi (S. Hirzel, Leipzig).

Sundberg, J. E. F. and Lindqvist, J. (1973). “Musical octaves and pitch,” J. Acoust.Soc. Am. 54, 922–929.

Tenney, J. (1988). A History of ‘Consonance’ and ‘Dissonance’ (Excelsior MusicPublishing Company, New York).

Terhardt, E. (1968a). “Uber akustische Rauhigkeit und Schwankungsstarke,” Acus-tica 20, 215–224.

Terhardt, E. (1968b). “Uber die durch amplitudenmodulierte Sinustone her-vorgerufene Horempfindung,” Acustica 20, 210–214.

Terhardt, E. (1971). “Die tonhohe harmonischer Klange und das Oktavintervall,”Acustica 24, 126–136.

Terhardt, E. (1974a). “On the perception of periodic sound fluctuations (roughness),”Acustica 30, 201–213.

Terhardt, E. (1974b). “Pitch, consonance, and harmony,” J. Acoust. Soc. Am. 55,1061–1069. virtual pitch, place.

Terhardt, E. (1974c). “Pitch of pure tones: its relation to intensity,” in Facts andModels in Hearing, edited by E. Zwicker and E. Terhardt (Springer-Verlag, NewYork), pp. 353–360. Proceedings of the Symposium on Psychophysical Modelsand Physiological Facts in Hearing.

Terhardt, E. (1977). “The two-component theory of musical consonance,” in Psy-chophysics and Physiology of Hearing, edited by E. F. Evans and J. P. Wilson(Academic Press, London), pp. 381–390. An International Symposium, Univer-sity of Keele, 12-16 April 1977.

Terhardt, E. (1984). “The concept of musical consonance: A link between music andpsychoacoustics,” Music Perception 1, 276–295.

Thurlow, W. R. and Bernstein, S. (1957). “Simultaneous two-tone pitch discrimina-tion,” J. Acoust. Soc. Am. 29, 515–519.

134

Tobias, J. V. (1963). “Application of a ‘relative’ procedure to a problem in binaural-beat perception,” J. Acoust. Soc. Am. 35, 1442–1447.

Tramo, M. J., Cariani, P. A., and Delgutte, B. (1992). “Representation of tonal con-sonance and dissonance in the temporal firing patterns of auditory-nerve fibers,”Soc. Neurosci Abstr. 18, 382.

Tramo, M. J., Cariani, P. A., Delgutte, B., and Braida, L. D. (2001). “Neurobiologicalfoundations for the theory of harmony in Western tonal music,” Annals of theNew York Academy of Sciences 930, 92–116.

Tramo, M. J., Cariani, P. A., McKinney, M. F., and Delgutte, B. (2000). “Neuralcoding of tonal consonance and dissonance,” Abstracts of the 23rd MidwinterMeeting of the Association for Research in Otolaryngology 5641.

van de Geer, J. P., Levelt, W. J. M., and Plomp, R. (1962). “The connotation ofmusical consonance,” Acta Psychol. 20, 308–319.

van den Brink, G. (1974). “Monotic and dichotic pitch matchings with complexsounds,” in Facts and Models in Hearing, edited by E. Zwicker and E. Terhardt(Springer-Verlag, New York), pp. 178–188.

Verschuure, J. and van Meeteren, A. A. (1975). “The effect of intensity on pitch,”Acustica 32, 33–44.

Vogel, A. (1974). “Roughness and its relation to the time-pattern of psychoacousticalexcitation,” in Facts and Models in Hearing, edited by E. Zwicker and E. Terhardt(Springer-Verlag, New York), pp. 241–250.

von Bekesy, G. (1960). Experiments in Hearing (McGraw-Hill, New York).

von Helmholtz, H. (1863). Die Lehre von den Tonempfindungen als physiologischeGrundlage fur die Theorie der Musik (F. Vieweg und Sohn, Braunschweig).

Walliser, V. (1969). “Uber die Spreizung von empfundenen Intervallen gegenubermathematisch harmonishen Intervallen bei Sinustonen,” Frequenz 23, 139–143.

Ward, W. D. (1954). “Subjective musical pitch,” J. Acoust. Soc. Am. 26, 369–380.

Wright, J. K. and Bregman, A. S. (1987). “Auditory stream segregation and thecontrol of dissonance in polyphonic music,” in Music and psychology: a mutualregard, edited by S. McAdams (Harwood Academic Publishers, London), Vol. 2of Contemporary Music Review, pp. 63–92.

Yin, T. C. T., Chan, J. C. K., and Carney, L. H. (1987). “Effects of interaural timedelays of noise stimuli on low-frequency cells in the cat’s inferior colliculus. II.Evidence for cross-correlation,” J. Neurophysiol. 58, 562–583.

135

Yin, T. C. T. and Kuwada, S. (1983). “Binaural interaction in low-frequency neu-rons in inferior colliculus of the cat. II. Effects of changing rate and direction ofinteraural phase,” J. Neurophysiol 50, 1000–1019.

Zwicker, E. and Fastl, H. (1999). Psychoacoustics: Facts and models (Springer-Verlag, Berlin), 2nd ed., Vol. 22 of Springer series on information sciences,Chap. Roughness, pp. 257–264.

136

Documents

Neural correlates of pitch and roughness: Toward the ...research.meei.harvard.edu/NeuralCoding/Theses/... · models for music and speech perception. 1.1 Pitch A basic assumption made