Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Neural correlates of pitch and roughness: Toward
the neural code for melody and harmony
perception
by
Martin Franciscus McKinney
Submitted to the Harvard-MIT Division of Health Sciences andTechnology
in partial fulfillment of the requirements for the degree of
Doctor of Philosopy
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
July 2001
c© Martin Franciscus McKinney, MMI. All rights reserved.
The author hereby grants to MIT permission to reproduce anddistribute publicly paper and electronic copies of this thesis document
in whole or in part.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Harvard-MIT Division of Health Sciences and Technology
July 30, 2001
Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Bertrand Delgutte
Associate Professor of Otology and Laryngology, Harvard MedicalSchool
Thesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Martha L. Gray, PhD
Edward Hood Taplin Professor of Medical and Electrical EngineeringCo-director, Harvard-M.I.T. Division of Health Sciences and
Technology
2
Neural correlates of pitch and roughness: Toward the neural
code for melody and harmony perception
by
Martin Franciscus McKinney
Submitted to the Harvard-MIT Division of Health Sciences and Technologyon July 30, 2001, in partial fulfillment of the
requirements for the degree ofDoctor of Philosopy
Abstract
The universality of many aspects of music, such as octave-based tuning systems and the use ofdissonance and consonance to create harmonic tension and resolution, suggests that their perceptionmay have fundamental neurophysiological bases. Thus, music provides a natural set of stimuli andassociated percepts with which the auditory system can be studied. Here, we seek correlates of pitch,the essential element of melody, and roughness, a primary component of dissonance, in responses ofsingle auditory neurons in anesthetized cats.
Pitch, the perceived highness or lowness of sound, is generally thought to be based on a neu-rophysiological representation of frequency. Because neural responses (spikes) phaselock to lowstimulus frequencies, interspike intervals (ISIs) reflect the stimulus period and can be used to esti-mate frequency. To rigorously test this potential code for pitch, we look for correlates of pitch underconditions where the percept deviates from a simple function of frequency. One such condition isthe octave enlargement effect, listeners’ preference for pure-tone octave ratios slightly greater than2:1. Another is the pitch of a complex tone missing the fundamental frequency: the pitch matchesthat of the missing fundamental even when different harmonics are presented to opposite ears. Weshow that a correlate of the octave enlargement effect exists in ISIs of auditory nerve (AN) fibersand a correlate of the missing-fundamental pitch exists in ISIs of neurons in the inferior colliculus,the principal auditory nucleus of the midbrain. Results also reveal greater degradation of pitchrepresentation at the midbrain compared to the periphery.
Roughness, the sensation of temporal envelope fluctuations in the range of ∼20-200 Hz, is oftenequated with sensory dissonance. Here we examine IC neural responses for correlates of sensorydissonance. We show that sensory dissonance correlates with discharge rate fluctuations of all ICneurons and with average rates of a subset of IC neurons which only respond at the onset of pure-tones. Results indicate that IC neurons are specifically important for the coding of the temporalenvelope.
Our findings illustrate the complexity and specificity of auditory neural processing in the brain-stem and midbrain and show that percepts generally considered to be high order, such as thedissonance of musical intervals, have direct correlates in neural responses in the midbrain. Moregenerally they show that the auditory system performs processing important for music at multipletime scales.
Thesis Supervisor: Bertrand DelgutteTitle: Associate Professor of Otology and Laryngology, Harvard Medical School
3
4
6
Contents
1 Introduction 11
1.1 Pitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Consonance and dissonance . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.1 Chapter 2: Octave enlargement . . . . . . . . . . . . . . . . . 15
1.3.2 Chapter 3: Monaural/diotic dissonance . . . . . . . . . . . . . 15
1.3.3 Chapter 4: Dichotic dissonance and pitch salience . . . . . . . 16
2 A possible neurophysiological basis of the octave enlargement effect 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 All-Order Interspike Intervals . . . . . . . . . . . . . . . . . . 26
2.3.2 First-Order Interspike Intervals . . . . . . . . . . . . . . . . . 31
2.4 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.1 Model for estimating pure-tone frequency . . . . . . . . . . . . 37
2.4.2 Model for octave matching . . . . . . . . . . . . . . . . . . . . 40
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5.1 Auditory Nerve Physiology . . . . . . . . . . . . . . . . . . . . 42
2.5.2 Temporal Models for Octave Matching and Pitch Perception . 44
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7
2.7 Appendix: The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . 49
2.7.1 Gaussians with independent means and variances. . . . . . . . 50
2.7.2 Gaussians with harmonically related means and a common vari-
ance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 Neural correlates of the dissonance of musical intervals in the inferior
colliculus. I. Monaural and diotic tone presentation 53
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3.1 Responses to pure- and complex-tone pairs . . . . . . . . . . . 62
3.3.2 Effect of level and PSTH type . . . . . . . . . . . . . . . . . . 68
3.3.3 Dependence on CF . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3.4 Responses to a musical excerpt . . . . . . . . . . . . . . . . . 71
3.3.5 Additional observations . . . . . . . . . . . . . . . . . . . . . . 73
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4.1 Neurophysiology . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4.2 Psychophysics and perception . . . . . . . . . . . . . . . . . . 79
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4 Neural correlates of the dissonance of musical intervals in the inferior
colliculus. II. Dichotic tone presentation and pitch salience 85
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.1 Dichotic tone pairs . . . . . . . . . . . . . . . . . . . . . . . . 91
8
4.3.2 Pitch analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.4 Model Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.5.1 Neurophysiology . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.5.2 Psychophysics and perception . . . . . . . . . . . . . . . . . . 112
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5 Discussion 115
5.1 Summary of findings . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.1.1 AN ISIs and the octave enlargement effect . . . . . . . . . . . 115
5.1.2 Correlates of dissonance in IC neural responses . . . . . . . . 116
5.2 Limitations of the neurophysiological data . . . . . . . . . . . . . . . 117
5.2.1 Effect of anesthesia . . . . . . . . . . . . . . . . . . . . . . . . 117
5.2.2 Small sample sizes . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2.3 Limited frequency range . . . . . . . . . . . . . . . . . . . . . 119
5.3 Pitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.4 Consonance and dissonance . . . . . . . . . . . . . . . . . . . . . . . 121
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
9
10
Chapter 1
Introduction
The process of listening to music involves a complex set of cognitive functions, some
of which appear to be universal. Commonalities of musical systems across cultures,
including the use of octave-based scales and melodic and rhythmic contour (Dowling
and Harwood, 1986) as well as consonance and dissonance (Sethares, 1999), suggest
that there may be specific neurophysiological responses to various aspects of musical
stimuli. Thus, music provides us with a convenient standard set of stimulus rules
and associated percepts which we can use to study brain function. The general
strategy here is to use musical stimuli and percepts to gain an understanding of how
information is processed and coded in the auditory central nervous system (CNS).
Along with rhythm, melody and harmony constitute the primary components of
music in most cultures. The perception of melody is formed from the pitches of
a set of successive single tones, while the perception of harmony is based on the
changing quality of sequential sets of simultaneously sounding tones (chords). The
quality of chords ranges from dissonant (harsh) to consonant (pleasant). Harmonic
progressions typically cycle between these two extremes, providing a sense of building
tension followed by a resolution.
Two fundamental elements on which melody and dissonance are based are pitch
and roughness respectively. Pitch is defined as, “the perceived highness or lowness
of sound” (Randel, 1978) and although it is primarily dependent on frequency, pitch
also depends somewhat on intensity (Fletcher, 1934; Stevens, 1935; Terhardt, 1974c;
11
Verschuure and van Meeteren, 1975) and the presence of other sounds (Stoll, 1985).
Pitch is also important in speech communication where it is used to convey meaning
through inflection, especially in tone languages such as Mandarin Chinese where pitch
contours provide lexical information. Roughness is the sensation produced by a sound
whose amplitude envelope fluctuates periodically at a rate of ∼20-300 Hz. It gives
the sound a harsh and unpleasant quality. Sensory dissonance, a quality of isolated
chords in the absence of contextual influences, is thought to be caused by roughness
generated by the beating of neighboring partials in a complex chord (von Helmholtz,
1863; Plomp and Levelt, 1965; Terhardt, 1974a; Terhardt, 1977).
My general thesis is that certain aspects of music perception can be attributed
to specific processing at particular stages in the auditory brainstem and midbrain.
More specifically, pitch effects, such as the octave enlargement effect, are hypothesized
to be encoded in interspike intervals of neurons in the auditory-nerve and cochlear
nucleus, while roughness, a fundamental component of dissonance, is hypothesized to
be encoded directly in the average discharge rates and temporal patterns of inferior-
colliculus neurons. The general goal of this research is to show quantitative correlates
of these perceptual phenomena in the activity of auditory neurons. Such quantitative
characterization of auditory processing may lead to more accurate computational
models for music and speech perception.
1.1 Pitch
A basic assumption made here, in relating neurophysiology to pitch, is that pitch is an
estimate, based on some neural representation, of the stimulus fundamental frequency.
Under certain conditions, i.e., pitch effects, the perceived pitch does not behave as
a simple function of frequency. Following from the above assumption, the neural
representation of frequency must also deviate from a simple function of frequency
under certain conditions. Thus, pitch effects may be useful in identifying the neural
code for pitch: by correlating neural responses with psychoacoustic behavior, one can
assess the relative importance of specific neural representations of frequency to the
12
overall pitch percept.
For low frequency tones, i.e., in the frequency range of musical pitch, stimulus
frequency is encoded in the interspike intervals of auditory-nerve fibers (Rose et al.,
1967). Cariani and Delgutte (1996a,b) showed that the pitch of a wide variety of
stimuli correlates with the most prominent interval in the distribution of auditory-
nerve all-order1 ISIs. If the ISI is indeed the code for frequency on which pitch is
based, then ISI distributions should reflect the deviations in pitch which occur in the
presence of pitch effects.
Stimulus frequency can also be obtained from aspects of neural responses other
than the ISI. The cochlear frequency map of the basilar membrane is reflected in
the tonotopic array of AN fiber activity so that stimulus frequency can be estimated
from the discharge rate profile across the whole nerve. In addition, the stimulus
frequency can be estimated from the phase pattern or phase difference of the AN
response. Neural codes for pitch based on these stimulus frequency representations
will be discussed below but the focus of the research is on ISI codes, partly because
of the practical difficulty in obtaining precise measurements of rate or phase across
the full distribution of AN fibers.
1.2 Consonance and dissonance
The consonance of two complex sounds has long been known to correlate with simple
ratios of fundamental frequency (Pythagoras, 540-410 B.C.). Noting that sounds with
fundamental frequencies related by simple ratios contain a large number of coincident
harmonics, von Helmholtz (1863) hypothesized that dissonance is caused by beating
between neighboring harmonics (when they are close but not coincident). Plomp and
Levelt (1965) showed that the dissonance of two complex tones could be predicted by
summing the roughness of its constituent neighboring (pure-tone) partials.
1All-order interval distributions consist of intervals bounded by two spikes which contain anynumber (i.e., all numbers) of intervening spikes. All-order distributions are also called autocorrela-tion or auto-coincidence distributions (Perkel, Gerstein, and Moore, 1967; Rodieck, 1967; Ruggero,1973; Evans, 1983).
13
In addition to this “roughness” theory of dissonance there have been other expla-
nations for the basis of consonance and dissonance. Stumpf’s (1890) fusion theory
states that sounds are consonant because their individual components fuse together
to form a single perceptual entity, more so than do dissonant sounds. Another theory
is the long wave hypothesis from Boomsliter and Creel (1961) which states that con-
sonance is based on the length of the overall period of a stimulus. They show that
consonant intervals (based on simple integer ratios of fundamental frequencies) have
shorter overall periods than do dissonant intervals. Finally, there is a pitch theory
which suggests that consonant harmonic intervals have a more perceptually salient
common fundamental bass frequency than do dissonant intervals. This idea stems
from Rameau’s (1722) theory of “basse fondamentale” and complements the notion
that sensory consonance is just a lack of roughness (sensory dissonance) (Tramo et al.,
2000). The plausibility of each of these theories is addressed in discussions below, but
the working assumption here is that sensory dissonance is based on roughness and
that consonance may be based on pitch salience. Chapters 3 and 4 focus on these
two theories.
1.3 Overview
The chapters that follow are written as self-contained scientific papers, each describing
a particular neural correlate of pitch or dissonance. Together, these findings illustrate
the complexity and specificity of neural processing in the auditory periphery, brain-
stem and midbrain as it pertains to music perception; they show that musical percepts
generally considered to be “high order”, such as the dissonance of musical intervals,
have direct neural correlates in low- and mid-level nuclei of the auditory CNS.
The work described below begins in the AN in Chapter 2 and moves directly to
the inferior colliculus (IC) in Chapters 3 and 4. One reason for making such a jump is
that the IC is known to be an obligatory synapse in the auditory pathway and thus the
processing that occurs at nuclei below the level of the IC should be evident in IC neural
responses. In addition, neurons at lower level nuclei, such as in the cochlear nucleus,
14
tend to encode the fine time structure of stimuli, similar to AN fibers, while neurons
in the IC respond more to the temporal envelope of stimuli (Delgutte, Hammond,
and Cariani, 1998; Joris and Yin, 1998). For our investigation of dissonance we are
more interested in the neural coding of the temporal envelope.
1.3.1 Chapter 2: A possible neurophysiological basis of the
octave enlargement effect
The octave, a frequency ratio of 2:1, is the basis for most known music tuning sys-
tems. The pitches of two tones separated by an octave are deemed equivalent in the
context of a musical scale. While the physical octave is defined as a frequency ratio of
2:1, perceptually, listeners prefer slightly greater ratios. This preference, the octave
enlargement effect, occurs for a wide variety of stimulus conditions and in subjects
with various musical backgrounds (Ward, 1954; Walliser, 1969; Dobbins and Cuddy,
1982). In Chapter 2 we show that a neural correlate for the octave enlargement effect
exists in ISIs of AN fibers. This finding provides support for the idea that musical
pitch is encoded in AN ISIs.
1.3.2 Chapter 3: Neural correlates of the dissonance of musi-
cal intervals in the inferior colliculus. I. Monaural and
diotic tone presentation.
Tramo et al. (1992; 2000) found a correlate of roughness in temporal discharge pat-
terns of auditory-nerve fibers. Their model for roughness operates on fibers grouped
by CF and uses bandpass filters to extract the temporal fluctuations in each CF band.
Their filter characteristics were based on the psychophysical dependence of roughness
on modulation frequency. It was noted that these filters resemble modulation transfer
functions (MTFs) of inferior colliculus (IC) neurons (Rees and Møller, 1983; Langner
and Schreiner, 1988; Fastl, 1990; Delgutte, Hammond, and Cariani, 1998). Based on
this observation, we examined responses of IC neurons to musical intervals for cor-
15
relates of roughness (sensory dissonance). We show, in Chapter 3, that correlates of
dissonance exist in the rate fluctuations of all IC neurons and in the average discharge
rates of a subpopulation of neurons.
1.3.3 Chapter 4: Neural correlates of the dissonance of musi-
cal intervals in the inferior colliculus. II. Dichotic tone
presentation and pitch salience.
Because many IC neurons respond to interaural phase differences (IPDs) (Yin and
Kuwada, 1983), it is likely that they would respond similarly to diotically- and dichot-
ically-presented musical intervals. This is interesting because dichotically presented
tones are thought not to produce a roughness sensation (eg., Roederer, 1979) and
therefore would not be perceived as dissonant according to our working assumption
on the basis of dissonance. Similar neural responses to these stimuli that purportedly
differ in their perception would force us to rethink our conclusions on the neural code
for dissonance. In Chapter 4, we examine responses of IC neurons to both dichotic
and diotic presentation of musical intervals and show that some neurons do indeed
respond similarly to both types of stimuli. We offer several possible resolutions to this
“dichotic quandary” and also examine the representation of pitch in ISI histograms
of neural responses in the IC and look for neural correlates of consonance.
16
Chapter 2
A possible neurophysiological basis
of the octave enlargement effect1
Abstract
Although the physical octave is defined as a simple ratio of 2:1, listenersprefer slightly greater octave ratios. Ohgushi (J. Acoust. Soc. Am., 73,1694-1700) suggested that a temporal model for octave matching wouldpredict this octave enlargement effect because, in response to pure tones,auditory-nerve interspike intervals are slightly larger than the stimulusperiod. In an effort to test Ohgushi’s hypothesis, we collected auditory-nerve single-unit responses to pure-tone stimuli from Dial-anesthetizedcats. We found that although interspike interval distributions show clearphase-locking to the stimulus, intervals systematically deviate from inte-ger multiples of the stimulus period. Due to refractory effects, intervalssmaller than 5 msec are slightly larger than the stimulus period and devi-ate most for small intervals. On the other hand, first-order intervals aresmaller than the stimulus period for stimulus frequencies less than 500 Hz.We show that this deviation is the combined effect of phase-locking andmultiple spikes within one stimulus period. A model for octave matchingwas implemented which compares frequency estimates of two tones basedon their interspike interval distributions. The model quantitatively pre-dicts the octave enlargement effect. These results are consistent with theidea that musical pitch is derived from auditory-nerve interspike intervaldistributions.
1Reprinted with permission from McKinney & Delgutte, “A possible physiological basis of theoctave enlargement effect”, Journal of the Acoustical Society of America 106(5), 1999, pp2679-2692.1999, Acoustical Society of America.
17
2.1 Introduction
The octave is the basis of most known tonal systems throughout the world (Dowling
and Harwood, 1986)2. Pitches that are an octave apart are deemed equivalent to some
degree and can serve the same musical function within certain tonal contexts. The
prevalence of the octave as the fundamental building block of tonal systems suggests
that there may be a physiological basis for octave equivalence.
A physical octave is defined as a frequency ratio of 2:1. It is known, however,
that listeners prefer octave ratios slightly greater than 2:1 (Ward, 1954; Walliser,
1969; Terhardt, 1971; Sundberg and Lindqvist, 1973). In a typical procedure to
measure this octave enlargement effect, a subject listens to a lower standard tone
alternating with an adjustable higher tone and is instructed to adjust the frequency
of the higher tone until it sounds one octave above the lower tone. Results of three
such experiments are shown in Fig. 2-1. The size of the preferred or subjective octave
is close to 2:1 at low frequencies but increases with frequency and exceeds the physical
octave by almost 3% at 2 kHz. It is difficult for listeners to make octave judgements
for tones above about 2 kHz. This corresponds to an upper limit in musical pitch
of about 4-5 kHz (Ward, 1954; Attneave and Olson, 1971). There is considerable
variability in the octave enlargement effect across listeners but it is nonetheless a
statistically significant effect in all the reported studies. The effect is also seen in a
wide variety of stimulus conditions and in subjects with various musical backgrounds.
It is seen when the two tones are presented simultaneously (Ward, 1954; Demany
and Semal, 1990) and under the method of constant stimuli (Dobbins and Cuddy,
1982). The studies shown in Fig. 2-1 were all performed using pure-tone stimuli but
Sundberg and Lindqvist (1973) reported the effect with complex tones as well as pure
tones. Ward (1954) reported the presence of the effect in listeners without musical
training and in listeners with musical training as well as in possessors of absolute
pitch. Dowling and Harwood (1986, p.103) reported the effect in a number of musical
cultures.
2Dowling and Harwood report (on p.93) only one known tonal system, from an aboriginal culturein Australia, that is not based on the octave.
18
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
Dev
iatio
n of
sub
ject
ive
octa
ve fr
om 2
:1 (
%)
25001500500Frequency of Lower Tone (Hz)
Terhardt (1971) Walliser (1969) Ward (1954) Ohgushi (1983)
Figure 2-1. Psychoacoustic measures of the octave enlargement. Adapted from Fig. 4 in Sundbergand Lindqvist (1973)and Fig. 9 in Ohgushi (1983). The subjective octave, obtained from octavematching experiments, is plotted as a deviation from the physical octave versus the frequency ofthe lower tone in the octave pair. The subjective octave is larger than the physical octave and thedeviation grows with frequency.
19
The presence of the octave enlargement effect under a wide range of subject and
stimulus conditions suggests that the effect may have a general physiological basis.
Ohgushi (1983) proposed an octave matching scheme based on a temporal model for
pitch that predicts the octave enlargement effect. In an earlier study, he noticed that,
in response to pure-tones, auditory-nerve interspike intervals are slightly longer than
integer multiples of the stimulus period (Ohgushi, 1978). He then showed, using a
temporal model for octave matching, that these deviations lead to a prediction of the
octave enlargement effect (Ohgushi, 1983).
Upon review of Ohgushi’s (1983) model for octave matching, Hartmann (1993)
pointed out an arbitrary factor of two. This scaling factor, which is not based on
any physiological process, allows a model listener to theoretically set it, and thus the
octave interval, to any value. Hartmann suggested a variation of the model that would
not rely on such a scaling factor. He also suggested that if the model operated on
all-order interspike intervals instead of first-order interspike intervals, it may better
predict the psychoacoustic data.
The work presented here was motivated by the hypotheses presented by Ohgushi
and Hartmann. Neither one of them could reliably test their predictions because
the existing physiological data consisted of only a small number of coarse-resolution
interspike-interval distributions. It was therefore difficult to measure the modes of
the distributions, i.e., characterize the intervals, with high precision. Special methods
were used in this study to ensure high precision interval analyses so that predictions of
temporal models for octave matching could be reliably evaluated. We combined spike
data across fibers to form pooled interspike interval histograms which have been shown
to reflect a wide variety of pitch phenomena (Cariani and Delgutte, 1996a; 1996b).
In addition to characterizing interspike intervals, we have developed and evaluated
models for octave matching, based on Ohgushi’s and Hartmann’s ideas, which operate
on pooled interspike interval histograms.
20
2.2 Method
The methods used in this study differ from typical auditory-nerve (AN) studies in
that specific efforts were taken to ensure accurate estimation of interspike intervals
(ISIs): Unusually long recordings were made to ensure the inclusion of a high number
of spikes in each record; very fine binwidths (1 µsec) were used when generating ISI
histograms in order to accurately estimate the modes.
2.2.1 Experiment
Data were recorded from auditory-nerve fibers in six healthy, adult cats. Cat prepa-
ration and recording techniques were standard for our laboratory (Kiang et al., 1965;
Cariani and Delgutte, 1996a).
In each experiment, the cat was Dial-anesthetized with an initial dose of 75 mg
per kg of body weight and subsequent doses of 7.5 mg per kg of body weight. A
craniectomy was performed and the middle-ear and bulla cavities were opened to
access the round window. The cerebellum was retracted to expose the AN. Injec-
tions of dexamethasone (0.26 mg/kg of body weight/day), to reduce brain swelling,
and Ringer’s saline (50 ml/day), to prevent dehydration, were given throughout the
experiment.
The cat was placed on a vibration isolation table in an electrically-shielded,
temperature-controlled (38 C) chamber. The AN compound action potential (CAP)
in response to click stimuli was monitored with a metal electrode placed near the
round window. The cat’s hearing was assessed by monitoring the CAP threshold and
single-unit thresholds.
Sound was delivered to the cat’s ear through a closed acoustic assembly driven by
a (Beyerdynamic DT 48A) headphone. The acoustic assembly was calibrated with
respect to the voltage delivered to the headphone, allowing for accurate control over
the sound pressure level at the tympanic membrane. Stimuli were generated by a
16-bit, Concurrent (DA04H) digital-to-analog converter using a sampling rate of 100
kHz. The total harmonic distortion for pure-tones between 110 and 3000 Hz was less
21
than -55 dB re fundamental when measured at a stimulus level of 95 dB SPL.
AN action potentials (spikes) were recorded with glass micropipette electrodes
filled with 2 M KCl. The electrodes were visually placed on the nerve and then
mechanically stepped through the nerve using a micropositioner (Kopf 650). The
electrode signal was band-pass filtered and fed into a spike-detector. The times of
spike peaks were recorded with 1 µs precision.
Nerve fibers were sought using a click (near 55 dB SPL) as a search stimulus. Upon
contact with a fiber, a threshold tuning curve was generated using the Moxon (Kiang,
Moxon, and Levine, 1970) algorithm with a criterion of 0. The spontaneous rate of
the fiber was then measured by counting the number of spikes over a 20 second
period. Units with a characteristic-frequency (CF) threshold more than two standard
deviations away from the mean threshold for normal AN fibers (as found by Liberman
and Kiang, 1978) were not included in the analysis.
An estimate of the number of false triggers in the spike record was derived from
examination of the ISIs. Because the absolute refractory period of AN fibers prohibit
ISIs smaller than about 0.5 msec (Gaumond, Molnar, and Kim, 1982; Gaumond,
Kim, and Molnar, 1983), intervals shorter than 0.5 msec were assumed to be false
triggers. Spike records containing more than 0.1 % of these short intervals were not
included in the analysis.
The experimental data were recorded using pure-tone stimuli at frequencies of
110, 220, 440, 880, 1500, 1760, 3000 Hz and at levels of 5, 10, 15, 20, 25, 40, 60
dB re threshold. The stimulus was presented once per second (400 msec on, 600
msec off, 2.5 msec rise and fall times) for 180 seconds or until 20,000 spikes had been
recorded, whichever came first. In order to avoid the possible complex effects of onset
transients and adaptation, spikes that occurred during the first 20 msec following the
onset of each stimulus and during the stimulus off-time were excluded. Recordings
containing fewer than 5000 spikes were not included in the analysis. This unusually
high requirement on the minimum number of spikes in the record ensures a reliable
estimate of the ISI distribution.
22
2.2.2 Analysis
Auditory-nerve responses to low-frequency stimuli tend to occur at a specific phase
with respect to the stimulus (Rose et al., 1967; Kiang et al., 1965). Thus, ISI distri-
butions display modes at intervals corresponding, roughly, to integer multiples of the
stimulus period. The main goal of the analysis in this study was to accurately esti-
mate modes of AN ISI distributions in order to quantitatively verify Ohgushi’s (1978)
observation that the intervals deviate from the stimulus period.
There were three main steps to the analysis of the ISI distributions. First, a his-
togram of the intervals was produced. Second, the mean interval of each mode in
the histogram was estimated by fitting, in the maximum likelihood sense, a Gaus-
sian mixture density to the histogram. Third, deviation of the interval modes from
stimulus periods were characterized.
Histogram generation
The first step in the analysis was to generate histograms of the ISIs. The histogram
binwidths were 2 µsec for frequencies less than 300 Hz and 1 µsec for frequencies
above 300 Hz. Both first-order and all-order ISI histograms were computed.
Mode estimation
The second step in the analysis was to estimate the modes of the interspike interval
distribution. A maximum likelihood (ML) estimation approach was implemented in
which the interval distributions were modeled as a mixture of Gaussian densities with
each mode in the distribution corresponding to a single density. This mixture density
was fit (in the ML sense) to the interval histograms and the means of the individual
Gaussian densities were taken as the estimated modes in the histogram. Two forms
of mixture density were used, one for estimating individual modes in the interval
distributions and another for estimating the fundamental mode (i.e. stimulus period)
in the distributions (and subsequently the stimulus frequency). In the first case, the
individual Gaussian densities in the mixture had mutually independent means and
23
variances. In the second case, they were assumed to have harmonically related means
and a common variance.
Because obtaining the ML estimates of the parameters is not analytically straight-
forward, we used the expectation-maximization (EM) algorithm, an iterative tech-
nique which converges to the ML estimate (Redner and Walker, 1984; Moon, 1996).
Mathematical details of our implementation are included in the appendix.
Mode offset
The third step in the analysis was to calculate the mode offset (MO), the difference
between the mode estimate (ME) and the corresponding multiple of the stimulus
period:
MOn = MEn −n
f(2.1)
where f is the frequency of the stimulus and n is the mode number (e.g., mode 1
contains intervals that are roughly one stimulus period in length and mode 2 contains
intervals that are roughly 2 stimulus periods in length). Figure 2-3(e) illustrates the
above calculation for MO1.
In an effort to represent the total AN population response to the stimuli, pooled
histograms were generated by summing all of the individual ISI histograms for a
specific stimulus frequency. Mode estimates of the pooled histograms were calculated
as well.
2.3 Results
From six experiments, a total of 399 spike records from 164 fibers were obtained that
met our requirements in terms of the minimum number of spikes, normal thresholds,
and small number of false triggers. The majority (79%) of the records were from high
spontaneous rate fibers. CFs ranged from 150 to 17,000 Hz.
Figure 2-2(a) shows a schematic representation of a stimulus waveform and a
hypothetical spike record. ISIs are roughly integer multiples of the stimulus period.
24
First-order intervals are those between consecutive spikes, second-order intervals are
those between every other spike, etc.
250
200
150
100
50
01086420
1st-Order 2nd-Order 3rd-Order
300
250
200
150
100
50
01086420
First-Order
Interspike Interval (msec)
Num
ber
of In
terv
als
300
250
200
150
100
50
01086420
All-Order
(b)
(c) (d)
Pure-toneStimulus
AN Spikes
(a)
Num
ber
of In
terv
als
Interspike Interval (msec)
First-OrderSecond-Order
Third-Order
Time
Figure 2-2. Histogram generation. (a) is a schematized representation of a pure-tone stimulus andcorresponding spike record from the auditory-nerve. The order of the interspike interval is basedon the number of spikes included in the interval: first-order intervals are those between consecutivespikes; second-order intervals are those between every other spike; third-order intervals are thosebetween every third spike. (b) and (c) are histograms of the various types of interspike intervals.(d) is an interval histogram containing intervals of all orders, thus termed an all-order histogram.All of the histograms were generated from the same spike record. The stimulus was an 880 Hzpure-tone at 84 dB SPL. The auditory-nerve fiber from which the recording was made had thefollowing properties: CF: 2609 Hz; SR: 29 spikes/sec. The histograms have a binwidth of 40 µsecand the following number of total intervals: first-order: 10305; second-order: 5696; third-order:1643; all-order: 17874.
Figure 2-2(b) shows a histogram of first-order ISIs from a single-unit recording
generated with an 880 Hz tone stimulus. The modal distribution of intervals clearly
reflects the synchronization of the spike train to the stimulus and the position of the
modes provide information about the stimulus frequency (Rose et al., 1967).
25
Figure 2-2(c) displays first-, second-, and third-order histograms based on the
same spike record as in (b). As one would expect, first-order intervals are, on average,
shorter than second- and third-order intervals and thus fall into earlier modes. There
is, however, a great deal of overlap in the distributions of the intervals of different
orders and the intervals of a particular order are not confined to a single mode in the
histogram.
The histogram shown in Fig. 2-2(d) contains ISIs of all orders, and is thus termed
the all-order ISI histogram. This histogram is sometimes referred to as the autocor-
relation or auto-coincidence histogram (Perkel, Gerstein, and Moore, 1967; Rodieck,
1967; Ruggero, 1973; Evans, 1983).
An important difference between first-order and all-order ISI histograms is their
general shape: the size of the modes in the first-order ISI histogram tends to decrease
as the mode number increases; the size of the modes in the all-order ISI histogram
is relatively constant. In other words, when one examines very long ISIs, few are
first-order intervals. The all-order ISI histogram does not reflect the decaying trend
because higher-order intervals are included and “fill in” the modes at long intervals.
In addition to the 399 spike records included in the analysis, 28 spike records that
met our data requirements were excluded from the analysis because their histograms
displayed peak-splitting. At moderate to high levels of pure-tone low-frequency stim-
ulation, AN ISI histograms can exhibit two or sometimes three peaks per stimulus
cycle instead of the normal one (Kiang and Moxon, 1972; Kiang, 1980; Liberman
and Kiang, 1984; Kiang, 1990; Ruggero et al., 1996). Most of the fibers from which
we recorded did not exhibit this behavior within our stimulus-level range, but those
records that did were excluded to simplify the analysis. In our data, peak-splitting
occurred primarily at stimulus frequencies below 440 Hz.
2.3.1 All-Order Interspike Intervals
Figure 2-3(a)-(d) are all-order ISI histograms from one AN fiber for four different
stimulus frequencies. Figure 2-3(e) is a magnification of the histogram in (a) with
the mode estimates indicated by X’s above each mode. As previously reported by
26
Ohgushi (1978; 1983), the short intervals (early modes) are slightly longer than stim-
ulus periods. This deviation is presumed to be at least partially due to the refractory
period of the auditory-nerve fiber (Ohgushi, 1978). The mode offset for the first mode
is labeled in the figure.
Mode offsets from the histograms in Fig. 2-3(a)-(d) are plotted in (f) as a function
of ISI length. The mode offset decreases monotonically as the ISI increases (Fig. 2-
3(f)) and for intervals greater than about 5 msec, mode offsets are insignificant. To
a first approximation, the mode offsets depend primarily on ISI and not stimulus
frequency. However, at any particular ISI <∼5 msec, lower frequency stimuli generally
yield slightly larger mode offsets.
Figure 2-4(a)-(c) show how mode offsets vary with fiber CF, spontaneous rate
(SR) and discharge rate (DR) for all-order histograms of 220 and 1760 Hz. The DR
is typically a compressed function of stimulus level ranging from SR to saturation
rate. The mode offsets in all-order ISI histograms do not obviously depend on the
fiber CF, SR or DR. Because of this, we decided to pool the ISI data (across fibers
and stimulus levels) and use pooled histograms for testing the model presented in the
next section. Because pooled histograms contain many more intervals than single-
fiber histograms they more accurately represent the underlying interval probability
distributions. Figure 2-4(d) shows mode offsets grouped by the cat from which they
were measured. There is a small, but statistically significant (see caption) variation,
across cats for the 1760 Hz data. Despite this trend, we decided to also pool data
across cats. Conclusions based on the analysis of data from individual cats were not
different from those based on pooled data.
Figure 2-5 shows pooled histograms for six stimulus frequencies. The pooled his-
tograms are much smoother than the single-fiber histograms due to the large number
of intervals they contain. Mode offsets are clearly visible at intervals less than about
5 msec. Modes in the 110 and 220 Hz histograms show no offset because even the
earliest modes occur at intervals greater than or near 5 msec.
Figure 2-6(a)-(e) show mode offsets as a function of interval length for pooled
histograms as well as for single-fiber histograms. Figure 2-6(f) shows just the pooled
27
80
60
40
20
01086420
1760 Hz ejt2-29-7
100
80
60
40
20
01086420
440 Hz ejt2-29-5
0.25
0.20
0.15
0.10
0.05
0.00
Mod
e O
ffset
(m
sec)
1086420
1760 Hz 880 Hz 440 Hz 220 Hz
60
50
40
30
20
10
01086420
220 Hz ejt2-29-4
80
60
40
20
0
2.52.01.51.00.50.0
1 2 3 4
1760 Hz ejt2-29-7
ModeOffset
100
80
60
40
20
01086420
880 Hz ejt2-29-6(a) (b)
(c) (d)
(e) (f)
Interspike Interval (msec)
Num
ber
of In
terv
als
Figure 2-3. Histogram mode offset. (a)-(d) are all-order ISI histograms of specified frequency with40 µsec binwidths. Vertical dashed lines mark integer multiples of the stimulus period. (e) is amagnification of the first four modes of (a). The gray curve outlining the histogram is the MLestimate of the Gaussian mixture density corresponding to the histogram. The ×’s above the modesin (e) mark the ML estimate of the mode (the ML means of the individual Gaussian pdfs in themixture density), obtained from Eq. (2.12) operating on a histogram with 1 µsec binwidths. Themode offset is the deviation of the mode estimate from the corresponding integer multiple of thestimulus period. Each histogram was generated from a separate spike record but each spike recordwas obtained from the same auditory-nerve fiber. Fiber characteristics: CF = 2602 Hz; SR = 66spikes/sec. The stimulus levels were all 10 dB re threshold, corresponding to the following levelsfor each spike record (in dB SPL): (a) 27; (b) 45; (c) 62; (d) 70. (f) displays the mode offsets fromthe histograms in (a)-(d). Mode offsets are primarily a decreasing function of interval, although, atcorresponding intervals, lower frequency stimuli yield slightly larger mode offsets.
28
0.2
0.1
0.0
-0.1
-0.2
25020015010050
0.2
0.1
0.0
-0.1
-0.2
100 1000 10000
0.2
0.1
0.0
-0.1
-0.2
120100806040200
(a) (b)
(c) (d)
All-
Ord
er M
ode
Offs
et
Characteristic Frequency (Hz) Spontaneous Rate (sp/sec)
Discharge Rate (sp/sec)
0.2
0.1
0.0
-0.1
-0.2
654321
Cat Number
220 Hz 1760 Hz
Figure 2-4. Variation of mode offset across spontaneous rate, characteristic frequency, dischargerate and cat. +’s mark the first mode offset of every individual 220 Hz data record and ’s mark thesecond mode offset of every individual 1760 Hz data record. These frequencies and mode numberswere chosen as typical representatives of our low- and high-frequency data. In (c), lines connectmode offsets (plotted against DR) that were derived from the same fiber. (a), (b), and (c) showthat there is no obvious dependence of mode offset on CF, SR, or DR. (d) shows how mode offsetdepends on the cat from which it was measured. One-way ANOVA on the data in (d), using the catnumber as the individual factor, yielded the following p-values: p = 0.003 for 1760 Hz and p = 0.139for 220 Hz. Although there were significant differences in mode offsets across cats our decision topool data across cats did not affect our general conclusions.
29
2.0
1.5
1.0
0.5
0.0
x103
1086420
3000 Hz
8
6
4
2
0
x103
1086420
880 Hz
2.5
2.0
1.5
1.0
0.5
0.0
x103
12840
110 Hz
10
8
6
4
2
0
x103
1086420
440 Hz
5
4
3
2
1
0
x103
1086420
220 Hz
10
8
6
4
2
0
x103
1086420
1760 Hz(a) (b)
(c) (d)
(e) (f)
Interspike Interval (msec)
Num
ber
of In
terv
als
Figure 2-5. Pooled histograms. (a)-(f) are pooled all-order ISI histograms of specified frequency.The histograms have the same format as Fig. 2-3(a)-(d). The intervals are pooled from the followingnumber of fibers: (a) 26; (b) 75; (c) 58; (d) 47; (e) 33; (f) 10. Positive mode offsets are visible forintervals <∼5 msec (i.e., modes at intervals smaller than 5 msec are shifted slightly to the right oftheir corresponding stimulus-period multiple). Note that the scale of the abscissa in (f) is differentthan the other panels.
30
histogram mode offsets for five stimulus frequencies. Although there is some variation
across fibers in the size of mode offsets, the characteristics seen in the single fiber data
are evident in the pooled data: the mode offset is a monotonically decreasing function
of ISI; mode offsets for intervals greater than about 5 msec are insignificant; and for
a given ISI, lower frequency stimuli yield slightly larger mode offsets. Thus, these
characteristics seem to be general phenomena and not just particular to one type of
auditory-nerve fiber or stimulus intensity.
2.3.2 First-Order Interspike Intervals
The general shape of first-order histograms change with fiber discharge rate while
the shape of all-order histograms remains relatively constant (Cariani and Delgutte,
1996a). Figure 2-7 shows interval histograms from one AN fiber for a 220 Hz tone
at three stimulus levels. As the SPL, and therefore discharge rate, increases, the
average first-order interval gets shorter and the relative sizes of the histogram modes
reflect this change: the later modes get smaller and the early modes get larger. In
contrast, as the discharge rate increases, higher-order intervals fill in the modes that
get depleted of first-order intervals so that the general shape of all-order histograms
remains unchanged.
The main difference between mode offsets of first-order and all-order intervals is
the presence of negative mode offsets for low stimulus frequencies in the first-order
data. A negative mode offset means that a particular ISI mode is shorter than
the corresponding stimulus-period multiple. This is illustrated in the first-order low-
frequency histograms in Fig. 2-8(a) and (b): the modes occur slightly to the left of the
stimulus period lines. Mode offset data for low-frequency first-order ISI histograms
are shown in panels (c)-(f). These mode offsets show a greater variability at low
stimulus frequencies than those from all-order ISIs.
The negative mode offsets in low-frequency first-order ISI histograms are due to
the presence of intervals in Mode zero (0). As Fig. 2-9(a) and (b) illustrate, an interval
falls into Mode 0 if two spikes occur within the same half-period of the stimulus. Mode
0 is bounded on the left by the absolute refractory period and on the right by half
31
0.20
0.15
0.10
0.05
0.00
-0.05
1086420
1760 Hz 880 Hz 440 Hz 220 Hz 110 Hz
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
1086420
110 Hz
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
1086420
220 Hz0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
1086420
440 Hz
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
1086420
880 Hz0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
1086420
1760 Hz
Interspike Interval (msec)
Mod
e O
ffset
(m
sec)
(a) (b)
(c) (d)
(e) (f)
Figure 2-6. Mode offsets of all-order ISI histograms. (a)-(e) display the mode offsets of pooled andindividual histograms of specified frequency. Lines connect the mode offsets of pooled histograms anddots mark the mode offsets of individual histograms. (f) shows the pooled-histogram mode offsetsfor most of the experimental stimulus frequencies. Mode offsets in pooled histograms show thesame trend with interval as those in individual histograms: mode offset is primarily a monotonicallydecreasing function of interval, although lower frequency stimuli yield slightly larger mode offsets atcorresponding intervals. Note that the scale of the ordinate in (f) is different than the other panels.
32
600
500
400
300
200
100
020151050
600
500
400
300
200
100
020151050
200
150
100
50
020151050
150
100
50
020151050
40
30
20
10
020151050
60
50
40
30
20
10
020151050
Interspike Interval (msec)
1st-Order
52dB SPL
68sp/sec
57dB SPL
125sp/sec
67dB SPL
197sp/sec
(b)
(c) (d)
(e) (f)Num
ber
of In
terv
als
All-Order
(a)
Figure 2-7. A series of 220 Hz ISI histograms (from the same auditory-nerve fiber) over a rangeof discharge rates. Stimulus level and fiber discharge rate are indicated to the left of the plots.First-order ISI histograms are plotted in (a), (c) and (e). All-order ISI histograms are plotted in(b), (d) and (f). Fiber characteristics: CF: 409 Hz; SR: 0.7 spikes/sec; Threshold at 220 Hz: 47 dBSPL. The histogram binwidths are 80 µsec.
33
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
1086420
1760 Hz 880 Hz 440 Hz 220 Hz 110 Hz
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
14121086420
110 Hz
-0.6
-0.4
-0.2
0.0
0.2
0.4
1086420
220 Hz
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
1086420
440 Hz
120
80
40
0
1086420
220 Hz bd172-51-10
100
80
60
40
20
0
14121086420
110 Hz ejt1-48-3
Interspike Interval (msec)
(a)
(c) (d)
(e) (f)
(b)
Figure 2-8. Low-frequency first-order ISI histograms display negative mode offsets. (a) and (b)show first-order ISI histograms in the same format as those in Fig. 2-3. The ×’s above the modesmark the ML estimate of the mode. (c), (d) and (e) show mode offsets from first-order ISI histogramsin the same format as Fig. 2-6. (f) shows the pooled-histogram mode offsets for most of the stimulusfrequencies. Below ∼500 Hz, first-order ISI histograms display large negative mode offsets (i.e.,modes are shifted slightly to the left of their corresponding stimulus-period multiple), in contrastto the insignificant mode offsets present in low-frequency all-order ISI histograms. Note that thescales of the axes vary across panels.
34
the stimulus period. Due to the refractory period of the AN fiber, only low frequency
stimuli (<∼500 Hz) produce ISI histograms that contain a Mode 0. When an interval
occurs in Mode 0, the preceding and following first-order intervals tend to be smaller
than if just a single spike had occurred in that half-period. The relationship between
consecutive intervals can be seen by examining a joint ISI histogram (Rodieck, Kiang,
and Gerstein, 1962), as shown in Fig. 2-9(c). The joint ISI histogram is a two-
dimensional histogram which plots the ISI size against that of the previous ISI. It
is displayed here as a grayscale image in which gray-level indicates the number of
interval pairs in a small square bin. As is the case for one-dimensional ISI histograms,
the modal distribution of intervals is clearly evident in this plot: the intervals tend
to cluster near integer multiples of the stimulus period. We will use the notation
Mode(n,m) to refer to the mode in which the previous interval is in Mode n and
the current interval is in Mode m. Examination of Mode(0, 1), Mode(0, 2), and
Mode(0, 3) shows that if the previous interval lies in Mode 0, the current interval tends
to be shorter than the corresponding stimulus-period multiple. Also, examination of
Mode(1, 0), Mode(2, 0), and Mode(3, 0) shows a similar dependency on Mode 0 for
the current interval. Thus, in a first-order ISI histogram, the presence of intervals in
Mode 0 effectively biases the other modes towards smaller values.
Offsets of higher modes in all-order ISI histograms are not affected by the presence
of intervals in Mode 0 because these histograms include higher order intervals. For
every first-order interval that is shortened by the presence of an interval in Mode 0,
there is a second-order interval (which includes the one in Mode 0) that is lengthened.
This can be seen schematically in Fig. 2-9(a). The lengthened second-order interval
falls into the same mode as the shortened first-order interval and counteracts its effect
on the mode offset.
The effect of Mode 0 on first-order ISI histograms can be quantified by selecting
only those intervals that do not precede or follow intervals in Mode 0. This condition-
ing was performed on all of the 220 Hz stimulated spike records and then histograms
of the conditioned intervals were generated. The distribution of the mode estimates
for Mode 1 in these conditioned interval distributions is plotted in Fig. 2-9(d) along
35
-1.0
-0.5
0.0
0.5
1.0
543210
time
1 10 2
Pure-toneStimulus
ANSpikes
200
150
100
50
0Num
ber
of In
terv
als
1612840Interspike Interval
0 1 2 3
First-Order(a) (b)
Num
ber
of M
odes
5.04.84.64.44.24.03.8Mode 1 Estimate
Conditioned1st-order
1st-order All-order
20Modes
(d)(c)
Figure 2-9. Intervals in Mode 0. (a) is a schematic representation of a stimulus and a correspondingspike record. The number between the spikes in (a) indicates the mode of the histogram, shownin (b), to which the first-order interval belongs. (b) is a first-order ISI histogram and (c) is a jointfirst-order ISI histogram of the same auditory-nerve spike record. The joint ISI histogram is atwo-dimensional histogram which plots ISI size against that of the previous ISI. It is displayed as agray-scale image in which gray-level indicates the number of interval pairs in a small square bin. Thedashed lines in (b) and (c) mark integer multiples of the stimulus period. The stimulus was a 220Hz pure-tone at 61 dB SPL. The fiber characteristics are: CF: 379 Hz; SR: 76 spikes/sec; thresholdat 220 Hz: 36 dB SPL. The histogram binwidth is 80 µsec in (b) and 64 µsec for both dimensionsin (c). (d) shows the distribution of the estimates of Mode 1 in 50 individual 220 Hz histogramsfor all-order, first-order and conditioned first-order ISIs. The condition in the third case is thatthe intervals do not follow or precede an interval in Mode 0. The traces are vertically offset forclarity and the vertical bar in the lower left denotes 20 modes. The binwidth of the mode estimatedistribution is 50 µsec. The vertical dashed line marks the stimulus period. Negative mode offsets inlow-frequency first-order ISI histograms are due to shortened intervals caused by intervals in Mode0 (i.e., two spikes within the same half-period of the stimulus).
36
with similar unconditioned distributions from first-order and all-order ISI histograms.
The alignment of the mode-estimate distributions for all-order and conditioned first-
order ISIs indicates that the presence of intervals in Mode 0 accounts for nearly all
of the difference between mode estimates in all-order and first-order ISI histograms.
The negative correlation between consecutive intervals is a characteristic of our
data that is not well documented in the literature. The joint ISI histogram in Fig. 2-
9(c) shows a clear dependence between the previous and current first-order ISI. All
of the modes are oval with the long axis going diagonally from the top left to the
bottom right of the figure. This means that if the previous interval was shorter than
average, the current interval will tend to be longer than average and vice versa. This
is a consequence of phase-locking: every interval longer than the stimulus period must
be compensated for by a shorter interval if the spikes are to remain phase-locked.
2.4 Model
Our primary objective in formulating a (central) model for octave matching is to
evaluate how physiological constraints in the auditory periphery, i.e., deviations in
AN ISIs, affect the central processor. This is best accomplished with simple models
that have few, if any, free parameters so that the effect of the peripheral physiological
behavior is not clouded. With this in mind, we developed a temporal model for
pure-tone octave matching based on Ohgushi’s (1983) model.
2.4.1 Model for estimating pure-tone frequency
The basic assumption of the model is that perceived pitch is equal to a biased estimate
of the stimulus frequency derived from AN ISIs. The bias in the frequency estimate
comes from the mode offsets in the ISI histograms. Frequency estimates were derived
from interval histograms using the EM algorithm (Eqs. (2.6) and (2.7)), assuming a
mixture density of Gaussians with harmonically related means (Eq. (2.14)):
f =1
µML(Nmax), (2.2)
37
where f is the estimate of stimulus frequency f , µML is the ML estimate of the
fundamental mean in the mixture density (µ+ in Eq. (2.15)), and Nmax is the number
of modes included in the calculation (M in Eqs. (2.15) and (2.16)). If the modes
occur exactly at integer multiples of the stimulus period, µML will equal the stimulus
period and the frequency estimate will be equal to the stimulus frequency.
Estimates for each stimulus frequency were calculated using pooled ISI histograms
and their deviations from the stimulus frequency were derived as follows:
fDEV = 100 · f − ff
(2.3)
where fDEV is the percent deviation of the frequency estimate and f is the frequency
estimate.
fDEV is plotted versus stimulus frequency in Fig. 2-10 for three values of Nmax.
For both all-order intervals and 1st-order intervals, fDEV is a decreasing function of
stimulus frequency3. This trend is a direct result of the dependence of mode offset on
interval size. As the stimulus frequency increases, the stimulus period decreases and
the offset for any given mode number increases. This results in a larger estimate of
the fundamental period, µML, and hence, a decrease in the frequency estimate. For
all-order intervals, fDEV is always negative because mode offsets are always positive.
On the other hand, 1st-order ISI intervals yield positive fDEV ’s for low stimulus
frequencies because the histograms contain negative mode offsets.
Figure 2-10 also shows that the free parameter Nmax greatly influences the fre-
quency estimate at high frequencies. For low values of Nmax, the frequency estimate
has a relatively large bias from the mode offsets of the lower modes. Since the mode
offset is minimal in the higher modes, the frequency estimate becomes less biased
as Nmax increases. On the other hand, Nmax has little effect on the estimates at
low frequencies because, either the mode offsets are consistently small for all modes
(all-order ISIs), or the higher modes contain few intervals and thus little weight in
3The slight deviation from monotonicity near 1500 Hz is due to differences in mode offsets acrosscats and uneven sampling across cats. The data at 1500 and 3000 Hz is primarily from two catswhich showed relatively large mode offsets in their AN responses (cats 5 and 6 in Fig. 2-4(d)).
38
-8
-6
-4
-2
0
2
4
1002 3 4 5 6 7 8 9
10002 3 4 5 6 7
First-Order
Nmax = 4
Nmax = 6
Nmax = 10
-8
-6
-4
-2
0
2
4
1002 3 4 5 6 7 8 9
10002 3 4 5 6 7
All-Order
Nmax = 4
Nmax = 6
Nmax = 10
Fre
quen
cy E
stim
ate
Dev
iatio
n (%
)
Stimulus Frequency (Hz)
(a)
(b)
Figure 2-10. Frequency estimate deviation (fDEV ) vs. frequency. (a) displays fDEV calculatedfrom pooled all-order ISI histograms for the values of Nmax shown next to each trace. (b) displaysfDEV calculated from pooled first-order ISI histograms. For frequencies >∼500 Hz, fDEV is a de-creasing function of both frequency and Nmax. Error bars show an estimate of the standard error offDEV . The estimate was calculated using the bootstrap technique (Efron and Tibshirani, 1993): 50simulations of the frequency estimate were calculated (Eq. (2.2)) in which pooled histograms weregenerated by randomly choosing (with replacement) spike records of individual stimulus presenta-tions. The standard deviation of the frequency estimates from these simulations is an estimate ofthe standard error of the mean.
39
the calculation of f (first-order ISIs).
2.4.2 Model for octave matching
The model operates on two sets of pooled ISI histograms to predict the size of the
pitch interval separating their respective stimuli. The pitch interval prediction is
obtained by comparing the frequency estimate (Eq. (2.2)) of a low-frequency tone,
f1, with the frequency estimate of a high-frequency tone, f2. The model predicts that
f1 and f2 are separated by a subjective octave when:
f2 = 2 · f1. (2.4)
The model algorithm can be interpreted graphically as attempting to align the
modes of the scaled (by two) f1 histogram with the modes of the f2 histogram. An
octave is predicted when the modes are best aligned.
The deviation of the model prediction (i.e. “subjective octave”) from the physical
octave, ∆SO, is:
∆SO = 100 · 2 · f1 − f2
f1
, (2.5)
for f1 and f2 separated by a physical octave.
Model predictions are shown in Fig. 2-11 for several values of Nmax. As in the
frequency estimate (Fig. 2-10), variation in Nmax results in large changes in model
predictions at high frequencies. As Nmax increases, more modes with little or no offset
are included in the frequency estimates and the resulting deviation of the subjective
octave decreases.
When all-order ISIs are used as model input, the model predicts an octave en-
largement in general agreement with the psychoacoustic data (Fig. 2-11(a)) at most
frequencies for Nmax ≈ 4 − 6. At low frequencies, the model underestimates the
psychoacoustic octave enlargement for all values of Nmax, but its predictions are still
within the range of the psychoacoustic data. At 1500 Hz, the model predicts the
range of psychoacoustic data simply by varying Nmax from 4 to 6.
40
3
2
1
0
-1
8 9100
2 3 4 5 6 7 8 91000
2 3
First-OrderNmax = 4
Nmax = 6
Nmax = 10
3
2
1
0
-1
8 9100
2 3 4 5 6 7 8 91000
2 3
All-OrderNmax = 4
Nmax = 6
Nmax = 10
Frequency of the Lower Tone (Hz)
Dev
iatio
n fr
om P
hysi
cal O
ctav
e (%
)
(a)
(b)
Figure 2-11. Model predictions of the octave enlargement effect. The model predictions are basedon pooled histograms for each stimulus frequency. Error bars show the estimated standard error ofthe subjective octave prediction and were calculated in a similar manner to those in Fig. 2-10. (a)shows the model predictions for all-order ISIs and several values of Nmax. (b) shows the same forfirst-order ISIs. Although, low-frequency data is not well predicted by the model, the predictionsbased on all-order intervals are within the range of the psychoacoustic data with Nmax ≈ 4− 6.
41
When operating on first-order ISIs, the model, with Nmax = 4, predicts an octave
enlargement in general agreement with the psychoacoustic data except at 100 Hz
where the model predicts a much larger deviation (Fig. 2-11(b)). In addition, the
model predicts a decrease in deviation as frequency increases (at low frequencies)
but the psychoacoustic data show the opposite trend. The model’s predicted octave
enlargement at low frequencies is due to the negative mode offsets in the first-order
ISI histograms. The frequency estimates of these low frequency tones are higher than
the true frequency (see Fig. 2-10(b)) and when they are matched to estimates of
(upper) tone frequencies that produce little or no negative mode offsets, an octave
enlargement is predicted.
In summary, the model, operating on first- or all-order ISIs with Nmax ≈ 4 − 6,
predicts the octave enlargement effect at mid- to high-frequencies. At low frequencies,
the model underestimates the effect when operating on all-order ISIs and overesti-
mates it when operating on first-order ISIs.
2.5 Discussion
2.5.1 Auditory Nerve Physiology
We have shown that, in response to low-frequency pure-tones, AN ISIs deviate system-
atically from integer multiples of the stimulus period. When quantitatively expressed
as mode offsets in ISI histograms (Eq.(2.1)), the deviations are positive for ISIs less
than 5 msec and decrease with increasing ISI until they become insignificant for ISIs
greater than 5 msec. In addition, first-order intervals show negative mode offsets for
stimulus frequencies less than 500 Hz. These robust phenomena exist for all CFs and
SRs and over a wide range of stimulus levels. Our quantitative characterization of
these physiological properties provides a solid basis to study how they can effect any
temporally-based estimate of the stimulus frequency.
Our data and analyses suggest that positive and negative mode offsets in ISI
histograms arise from fundamentally different mechanisms. We showed in Fig. 2-
42
9 that negative mode offsets, seen in first-order ISI distributions for low-frequency
stimuli, are due to the occurrence of multiple spikes within the same half-period. In
order to maintain phase-locking between the stimulus and AN response, the intervals
before and after these multiple spikes tend to be slightly shorter, on average, than
multiples of the stimulus period. Positive mode offsets, on the other hand, have been
attributed to the refractory properties of the neurons (Ohgushi, 1983; Ohgushi, 1978)
and, specifically, to a reduction in conduction velocity during the relative refractory
period (de Cheveigne, 1985). While these ideas are reasonable, it is important to note
that the delays causing the offsets could arise at any point from the basilar membrane
to the AN fiber.
A physiological characteristic that we saw in our data but ignored in the analysis is
peak-splitting. This phenomenon causes two or more modes of intervals to be present
within a single stimulus period of an ISI histogram instead of the usual one mode per
stimulus period. The multiple modes are the result of the AN response going through
a change in phase (as much as 180) relative to the stimulus as the stimulus level is
increased (Kiang and Moxon, 1972; Johnson, 1980; Kiang, 1980; Kiang, 1990).
At first sight, peak-splitting would seem to wreak havoc on temporal models for
pitch. At stimulus intensities where peak-splitting occurs, a model operating on the
intervals would estimate multiple frequencies, depending on the degree of phase-shift.
However, because the stimulus intensity at which peak-splitting occurs depends on
both fiber CF and stimulus frequency, only a small fraction of AN fibers will exhibit
peak splitting at the same stimulus intensity. So, in a temporal model for pitch that
operates on intervals pooled from fibers across many CFs, peak-splitting most likely
has a small effect on the pooled interval distribution leaving the overall frequency
estimate relatively unchanged.
43
2.5.2 Temporal Models for Octave Matching and Pitch Per-
ception
Our model for octave matching makes pitch-interval judgements based on frequency
estimates of two tones. Each frequency estimate is computed from a pooled AN ISI
histogram by fitting it with a Gaussian mixture density with harmonically related
means. An octave is predicted when the frequency estimate of one tone is twice that
of another tone. The model predicts the octave enlargement effect except at very low
frequencies, where it slightly underestimates the effect when operating on all-order
ISIs and overestimates the effect when operating on first-order ISIs.
Comparison with Ohgushi’s model
Our model is similar to Ohgushi’s (1983) model for octave matching. The basic
elements of the models are the same although there are three primary differences
in his implementation: he uses first-order ISIs only; his frequency estimates were
based on just the first two modes of the histogram while we used a variable number
(Nmax) of modes; and he calculates frequency estimates from the modes with weights
obtained by fitting the model predictions to the psychoacoustic data and adjusting
two free variables. These differences lead to different predictions at low frequencies
when operating on first-order ISIs. Ohgushi’s model predictions are consistent with
the psychoacoustic data on the octave enlargement for all frequencies while our model
has difficulties at very low frequencies (< 200 Hz). It should be pointed out that with
two free parameters, Ohgushi had more flexibility with which to fit the data.
In addition, Ohgushi operated on rather coarse (100 µsec binwidth) single-fiber ISI
histograms from only four AN fibers, published by Rose, et al. (1967; 1968), while our
model predictions were based on fine-resolution pooled histograms which represent
a large number of fibers and spikes. Analysis of our data using Ohgushi’s method
yields results similar to his.
44
Interpretation of Nmax
The one free parameter in our model is Nmax, the number of modes over which the
frequency estimate is calculated. Nmax can greatly affect the frequency estimate
and resulting octave interval prediction. Rather than treating it as an arbitrary free
parameter, it would be nice to give Nmax a physiological or psychoacoustic inter-
pretation. If one assumes that pure-tone pitch is based on the interspike interval
distribution of AN spikes, Nmax could be related to the minimum tone duration re-
quired to elicit a pitch.
A number of psychoacoustic studies have investigated the effect of tone duration
(for very short tones) on pitch (Doughty and Garner, 1947; Doughty and Garner,
1948; Pollack, 1967) and the ability to recognize musical melodies (Patterson, Peters,
and Milroy, 1983). A general result from these studies is that, for tones below about
1000 Hz, a minimum number of cycles (6 ± 3) is required to elicit a stable pitch
or to achieve maximum performance in melody recognition. On the other hand,
above 1000 Hz, a minimum tone duration (∼10 msec) is required to elicit a stable
pitch (Gulick, Gescheider, and Frisina, 1989). If Nmax is taken as the number of
cycles required to elicit a pitch for low frequencies, our empirically derived range for
Nmax (∼4− 6) is consistent with this result. It should be noted that our analyses do
not include the first 20 msec of the AN response. Verification of such a relationship
between Nmax and minimum duration for pitch would require a study which carefully
addresses the effects of adaptation and ringing of the cochlear filter for short-duration
tones. Nevertheless, our results suggest that there may be a link between the two
“integration times”.
Another consideration related to Nmax is that the overall neural delay required to
perform octave matches for low frequency tones may be physiologically implausible.
For example, with Nmax = 5, the total delay required to obtain a frequency estimate
for a 60 Hz tone is 83 msec. There is, however, evidence for the existence of a lower
limit to musical pitch around 90 Hz (Biasutti, 1997), which reduces the maximum
required neural delay in our model to about 55 msec.
45
An alternative model for octave matching
We developed and implemented a second model for octave matching following a sug-
gestion by Hartmann (1993). Noting that the scaling factor of two in Ohgushi’s (1983)
model for octave matching is arbitrary, Hartmann suggested that a more physiologi-
cally grounded model is one that attempts to correlate the ISIs without first scaling
those from the low frequency tone. The comparison is then made between two tones
using only the intervals from the even modes in the ISI histogram for the high fre-
quency tone. This model can be interpreted graphically as attempting to align the
modes of the f1 histogram with the even modes of the f2 histogram.
Despite the appeal of Hartmann’s suggestion, we found that this model fails to
predict the octave enlargement phenomenon and instead predicts a slight octave con-
traction. The cause of this prediction can be seen by examining the mode offsets
at the same interval size in Fig. 2-6(f). In an octave comparison between two tones
separated by a physical octave, the second mode in the ISI histogram for the high
frequency tone has a smaller mode offset than the first mode of the lower frequency
tone. This causes the sub-octave estimate of the higher tone to be slightly higher
than the frequency estimate of the lower tone. In order to achieve a subjective octave
match, the higher frequency tone needs to be slightly lower in frequency than the
physical octave above the lower tone. This results in a predicted octave contraction
rather than an octave enlargement.
Temporal models for pitch
Our model for octave matching is similar to existing models for frequency discrimi-
nation (Siebert, 1970; Goldstein and Srulovicz, 1977) in that they are based on the
idea that pitch is a frequency estimate of a pure-tone stimulus based on temporal
discharge patterns. Both Siebert (1970) and Goldstein and Srulovicz (1977) repre-
sent AN activity with non-homogeneous Poisson processes. Siebert’s main objective
was to investigate the limitations in frequency discrimination of an optimal processor
operating on spike times of modeled AN activity. He discovered that there is enough
46
temporal information in the all-order intervals from a small number of auditory-nerve
fibers to account for the psychoacoustic data on frequency discrimination. However,
the slope of the predicted frequency discrimination limen versus stimulus duration
far exceeded psychoacoustic performance. Goldstein and Srulovicz showed that a
similar model operating on only first-order ISIs better predicts the dependence of the
psychoacoustic frequency difference limen on stimulus duration.
The essential difference between these models and ours is that the optimal pro-
cessor models give unbiased (ML) estimates of the stimulus frequency. The octave
matching model relies on biased frequency estimates which result from the assumption
that modes of the ISI distribution are harmonically related to the stimulus period.
These biases were lacking in the Siebert and Goldstein models because refractory
effects were not included in the Poisson processes.
An important distinction within the class of temporal models for pitch is between
those that operate on first-order ISIs and those that operate on all-order ISIs. All-
order intervals can be obtained from a spike train using delay lines and coincidence
detectors as proposed by Licklider (1951). Analysis of first-order intervals, on the
other hand, requires an extra stage of processing to eliminate the higher order inter-
vals. This makes a model based on first-order ISIs less appealing, physiologically, than
one that operates on all-order intervals. A further advantage for a model operating
on all-order intervals may be the fact that all-order interval distributions tend to be
more stable across stimulus level than first-order interval distributions, as shown in
Fig. 2-7.
We have seen in this study, as have Goldstein and Srulovicz (1977), that model
predictions based on one or the other type of ISI can yield different results. Goldstein
and Srulovicz show that in the context of frequency discrimination, operating on first-
order ISIs results in a better fit to the psychoacoustic data than operating on all-order
ISIs. Also, psychophysical experiments attempting to distinguish between the two
kinds of ISI-based pitch models have favored first-order ISIs (Kaernbach and Demany,
1998). Kaernbach and Demany used random click train stimuli with specified first
and higher order interclick distributions and found that discrimination between those
47
stimuli and randomly distributed clicks was better for regular first-order interclick
intervals. Results of our study do not strongly favor either first- or all-order intervals.
Model predictions based on first-order ISIs overestimate the subjective octave at low
frequencies and those based on all-order ISIs slightly underestimate the subjective
octave. The trend with frequency, however, of those predictions based on all-order
ISIs is more consistent with the psychoacoustic data. Nevertheless, we can not rule
out models based on intermediate combinations of the two types of intervals or other
more complex models, such as Ohgushi’s, which predict the octave enlargement based
on first-order ISIs. Also, it is conceivable, that different physiological cues may be
responsible for discriminating frequency than for matching octaves or for performing
other tasks involving musical pitch.
Our model for octave matching is also analogous to the optimum processor intro-
duced by Goldstein (1973). He uses a template of Gaussian density functions spaced
harmonically along the spectral axis to fit, in the ML sense, the excitation pattern
produced by a complex tone. Although his implementation operates on spectral exci-
tation, there is nothing inherent to the model that precludes its operation on interval
distributions. Our model is similar to his in that it fits harmonic templates to noisy
and possibly inharmonic data. In Goldstein’s case, the inharmonicity only arises if
the stimulus contains inharmonically related partials. In our case, the inharmonicity
is always present and comes from mode offsets in ISI distributions.
Although we have concentrated solely on temporal models, we should not forget
that there exist alternative schemes for octave matching, namely rate/place models.
Terhardt’s (1971; 1974) model for virtual pitch theoretically predicts the octave en-
largement effect and is discussed in that light by Hartmann (1993). Terhardt suggests
that through pervasive listening to natural tone complexes we develop memory tem-
plates of tonotopic excitation patterns and that we make octave judgements based on
the places of maximum excitation in these memory templates. He further postulates
that these templates are stretched, i.e. the places of maximum excitation correspond-
ing to the harmonics in the tone complex are shifted (upwards in frequency) due to
masking effects caused by the presence of the lower harmonics. Thus, the subjective
48
octave, based on these stretched templates, is slightly larger than the physical octave.
There is some evidence that lower-frequency masking stimuli can lower the CF of an
AN fiber (Kiang and Moxon, 1974; Delgutte, 1990) but the effect of masking depends
on the overall stimulus level and on the relative levels of the signal and masker. It
is not known whether these effects are quantitatively adequate to validate Terhardt’s
theory.
2.6 Conclusion
We have shown that, in response to low-frequency pure tones, AN ISIs less than 5
msec are systematically larger than integer multiples of the stimulus period and, for
frequencies less than 500 Hz, first-order ISIs are smaller than integer multiples of the
stimulus period. These deviations result in biased estimates of frequency and can lead
directly to a prediction of the octave enlargement effect by temporal-based models.
Thus, computational models for pitch may have to incorporate detailed physiological
properties of the auditory periphery, such as refractoriness, in order to predict effects
such as octave enlargement.
Correlating psychoacoustic behavior in the context of pitch effects with physio-
logical responses to the same set of stimulus conditions can lead to valuable insights
into the neurophysiological basis of pitch. Here, we have examined models for octave
matching operating on two forms of ISIs and, although no model is completely sat-
isfactory, one of them, operating on all-order intervals, comes close to predicting the
octave enlargement effect over its entire frequency range. This result is consistent
with the notion that musical pitch is based on a temporal code.
2.7 Appendix: The EM Algorithm
In order to find the ML estimates of parameters in the Gaussian mixture densities
described in Eqs. (2.8), (2.9), and (2.14), we used the iterative EM algorithm (Redner
and Walker, 1984; Moon, 1996). This appendix briefly describes the EM algorithm
49
and shows the mathematical details of our implementation.
The general idea of the EM algorithm is as follows: Ideally, one would like to obtain
ML estimates for parameters, Φ, of a probability density function (PDF), f(y|Φ),
over the complete sample space, Y. At hand, however, is an incomplete data sample,
x, which is insufficient to compute and maximize the log-likelihood function over Y.
In our case, the vector x = xk : k = 1, N is the interspike interval distribution
where xk is a single interval and N is the number of intervals. The data sample is
incomplete because the component density in the mixture from which a particular
interval arises is not known. A complete data sample, yk = (xk, ik), would consist of
the interspike interval, xk, and an indicator, ik, of the component density from which
xk originated. So, instead of maximizing the log-likelihood over Y, the EM algorithm
maximizes the expectation of log(f(y)) given the data, x, and the current parameter
estimates, Φ′. The two-step EM algorithm is:
E-step: Determine: Q(Φ|Φ′) = E(log(f(y|Φ))|x,Φ′). (2.6)
M-step: Choose: Φ+ ∈ arg maxΦ
Q(Φ|Φ′). (2.7)
With each iteration, the next parameter estimates, Φ+, replace the current parameter
estimates, Φ′, until convergence or until the difference between sequential sets of
parameters is less than some designated ε. Our implementation of the EM algorithm
follows directly from equations developed in (Redner and Walker, 1984) so we refer
the reader to their paper for details on the preliminary derivations and focus here on
details pertinent to our implementation.
2.7.1 Gaussians with independent means and variances.
To characterize the individual modes of ISI histograms, we modeled each interval
distribution as a mixture of M weighted, univariate Gaussian PDFs with independent
means and variances:
p(x|Φ) =M∑i=1
αipi(x|φi), (2.8)
50
where x is a single interval in the distribution, Φ = (α1, . . . , αM , φ1, . . . , φM), αi
is a nonnegative weighting,∑Mi=1 αi = 1, and pi is a univariate Gaussian pdf with
parameters φi = (µi, σi):
pi(x|φi) =1√
2πσie
(x−µi)2
2σ2i . (2.9)
For a mixture of Gaussian densities in the form of Eqs. (2.8) and (2.9), Redner
and Walker (1984) derive Q(Φ|Φ′) in their Equation (4.1):
Q(Φ|Φ′) =M∑i=1
[N∑k=1
α′ipi(xk|φ′i)p(xk|Φ′)
]logαi +
M∑i=1
N∑k=1
log pi(xk|φi)α′ipi(xk|φ′i)p(xk|Φ′)
, (2.10)
where N is the number of data samples (number of intervals in the histogram), and the
other variables are as defined in Eq. (2.8). Note that maximization of Q(Φ|Φ′) with
respect to the weights, αi, is independent of the parameters, φi, of the individual
densities. Maximizing Q(Φ|Φ′) with respect to the individual parameters leads to
the following relations, which are special cases of Equations (4.5), (4.8), and (4.9)
in (Redner and Walker, 1984):
α+i =
α′iN
N∑k=1
pi(xk|φ′i)p(xk|Φ′)
, (2.11)
µ+i =
N∑k=1
xkα′ipi(xk|φ′i)p(xk|Φ′)
/N∑k=1
α′ipi(xk|φ′i)p(xk|Φ′)
, (2.12)
σ+i
2=
N∑k=1
(xk − µ+i )2α
′ipi(xk|φ′i)p(xk|Φ′)
/N∑k=1
α′ipi(xk|φ′i)p(xk|Φ′)
, (2.13)
where α+i , µ+
i , and σ+i
2are the parameter values used in the subsequent iteration of
the algorithm. In this form of mixture density, the weights, means and variances of
the individual densities in the mixture are mutually independent. This form was used
to characterize the individual modes in the interval histograms. The ML estimate of
µi was used as an estimate of the ith mode.
51
2.7.2 Gaussians with harmonically related means and a com-
mon variance.
To estimate the fundamental mode, i.e., stimulus period, of ISI histograms, we mod-
eled their distribution as a Gaussian mixture density with harmonically related means
and a common variance:
pi(x|φi) =1√2πσ
e(x−i·µ)2
2σ2 . (2.14)
The ML estimate of µ was used as an estimate of the stimulus period.
A different set of iteration equations result when considering the mixture density
described by Eqs. (2.8) and (2.14). In this case, maximizing Q(Φ|Φ′) with respect
to the individual parameters leads to the following iteration equations, similar to
Eqs. (2.12), and (2.13):
µ+ =
M∑i=1
N∑k=1
xk·iα′ipi(xk|φ′i)p(xk|Φ′)
/M∑i=1
N∑k=1
i2α′ipi(xk|φ′i)p(xk|Φ′)
, (2.15)
σ+2=
M∑i=1
N∑k=1
(xk − i · µ+)2α′ipi(xk|φ′i)p(xk|Φ′)
/M∑i=1
N∑k=1
α′ipi(xk|φ′i)p(xk|Φ′)
. (2.16)
The weights of the individual densities, α+i , are the same as in Eq. (2.11).
52
Chapter 3
Neural correlates of the dissonance
of musical intervals in the inferior
colliculus. I. Monaural and diotic
tone presentation
3.1 Introduction
It has been known since the time of Pythagoras (c. 540-510 B.C.) that complex-tone
pairs whose fundamental frequencies are related by ratios of small integers produce a
consonant and euphonious sensation, while those not so related produce a dissonant
and rough sensation. Figure 3-1 shows line spectra of six different musical intervals
consisting of pure- and complex-tone pairs while Figure 3-2 shows judgements of
dissonance for each interval. The intervals are named after the Western diatonic
scale and, in this case, are all based on A4 (440 Hz). For complex tones, dissonance is
highest for the two intervals (Minor 2nd and Tritone) whose fundamental frequencies
are not related by simple ratios. For pure-tone intervals, dissonance is also maximum
for the Minor 2nd but, in contrast to complex tones, the Tritone is not more dissonant
than the Perfect 4th or 5th. It is important to point out that the vertical scales in
53
Fig 3-2A and B were normalized independently; the relative dissonance of pure- and
complex-tone intervals is not represented in the data.
The early explanations of consonant and dissonant frequency ratios developed
into acoustic and psychophysical models (Partch, 1974; Stumpf, 1890; von Helmholtz,
1863) and, more recently, neural models (Boomsliter and Creel, 1961; Tramo, Cari-
ani, and Delgutte, 1992; Tramo et al., 2000). A prevailing theory is Helmholtz’ idea
that sensory dissonance is caused by beating between neighboring partials in a tone
complex (Plomp and Levelt, 1965). Beating occurs at the frequency difference be-
tween two partials and produces a roughness sensation when it occurs at frequencies
in the range of 20-200 Hz (von Bekesy, 1960; Plomp and Steeneken, 1968; Terhardt,
1968b; Terhardt, 1968a; Terhardt, 1974a; Vogel, 1974). The theory essentially equates
sensory dissonance and roughness. For complex-tone pairs whose fundamental fre-
quencies are related by simple ratios, less beating occurs overall because many partials
coincide (shaded bars in Fig. 3-1).
Neural correlates of roughness have been found in temporal population discharge
patterns of auditory-nerve (AN) fibers in cat (Tramo, Cariani, and Delgutte, 1992;
Tramo et al., 2000) as well as in multi-unit responses from primary auditory cortex in
monkey (Fishman et al., 2000). In order to show the correlate in AN responses, Tramo
et al. developed a model for roughness that operates on AN fibers grouped by charac-
teristic frequency (CF) and employs bandpass filters to extract temporal fluctuations
in each CF band. Their filter characteristics were based on the dependence of rough-
ness on modulation frequency. We noted that this dependence is similar in shape to
modulation transfer functions (MTFs) of inferior colliculus (IC) neurons (Rees and
Møller, 1983; Langner and Schreiner, 1988; Fastl, 1990; Delgutte, Hammond, and
Cariani, 1998), as shown in Figure 3-3. Although slightly more lowpass in shape
than the roughness data, the MTFs show that IC neurons strongly respond to stimuli
with modulation frequencies in the roughness range. Fishman et al. (2000) found
that neural response correlates of roughness in the cortex were strongest in the tha-
lamorecipient zone, suggesting that the found correlates may exist at lower levels of
the auditory system. Based on these observations, we hypothesize that there may be
54
440 660 880
Two Pure Tones
440 2860 5280
Two Complex Tones
440 469 880 440 2860 5280
440 587 880 440 2860 5280
440 619 880 440 2860 5280
440 660 880 440 2860 5280
440 660 880 440 2860 5280
Unison(1/1)
MinorSecond(16/15)
PerfectFourth
(4/3)
PerfectFifth(3/2)
Tritone(45/32)
Octave(2/1)
==&
hH
==&
hHb
==&
H
H
==&
H
bH
==&
H
H
==&
H
_H
Frequency (Hz)
Figure 3-1. Line spectra of pure- and complex-tone intervals based at 440 Hz. The ratios of the thefundamental frequencies are given under the interval name along with the musical notation. Eachtone in a complex tone pair contains six harmonics, each at the same level as the pure tones. Graybars indicate overlapping harmonics from the lower (black bars) and upper (white bars) tones.
55
Uni m2 4th Tri 5th Oct0
0.2
0.4
0.6
0.8
1
Two Pure Tones
Uni m2 4th Tri 5th Oct0
0.2
0.4
0.6
0.8
1
Two Complex Tones
Dis
son
ance
Rat
ing
(A
rbit
rary
Un
its) A B
Figure 3-2. Dissonance judgements for two-tone intervals comprised of pure and complex tones.Although both are normalized to 1.0, the plots are not on the same vertical scale. Adapted fromFig. 1 in Terhardt (1984), which is a compilation of data from Plomp and Levelt (1965), Kameokaand Kuriyagawa (1969b), and Terhardt (1977).
direct correlates of roughness, and therefore sensory dissonance, in responses of IC
neurons without the need for additional filtering.
For this study, we have recorded from single neurons in the IC of anesthetized
cats in response to the stimuli shown in Fig. 3-1 and examined the discharge rates
and temporal properties of the responses for direct correlates of dissonance.
3.2 Method
3.2.1 Experiment
Data were recorded from single neurons in the central nucleus of the IC (ICC) in 10
adult cats using methods standard for our laboratory (Delgutte et al., 1999).
Each cat was Dial-anesthetized with an initial dose of 75 mg per kg of body weight
and subsequent doses of 7.5 mg per kg of body weight. We made a caudal approach
to the IC by performing a posterior fossa craniectomy and partially aspirating the
cerebellum. The bullae were vented to maintain ambient pressure in the middle
56
10 100-20
-15
-10
-5
0
5
10
15
20
Gai
n (
dB
)Modulation Frequency (Hz)
Figure 3-3. Synchronized rate MTFs of IC neurons (thin lines) and the psychoacoustic roughnessfunction (thick line). The MTFs were measured with a CF-tone carrier (see Sec. 3.2). The psy-choacoustic roughness function is for AM tones with a carrier of 1000 Hz (adapted from Fig. 11 inTerhardt, 1968b).
ear. An intravenous drip of Ringer’s saline was provided to prevent dehydration and
injections of dexamethasone (0.26) mg/kg of body weight/day) were given to reduce
brain swelling. The cat was placed on a vibration isolated table in an electrically
shielded, temperature-controlled (38 C), sound attenuated chamber.
Sound was delivered to the cat’s ears through closed acoustic assemblies driven by
a headphone (Realistic 40-1377). The assemblies were calibrated with reference to the
voltage applied to the headphones, allowing accurate control over the sound pressure
level at the tympanic membranes. Stimuli were generated by a 16-bit digital-to-analog
converter (Concurrent DA04H) using a sampling rate of 20 kHz.
Neural action potentials (spikes) were recorded with pariline insulated tungsten
electrodes (12 MΩ). Electrodes were advanced through the IC using a micropositioner
(Kopf 650) while playing the search stimulus, a sinusoidally amplitude-modulated
(40 Hz) pure tone swept in frequency from 200 Hz to 10 kHz. The electrode signal
was bandpass filtered and fed into a spike detector/timer which measured the time
at the peak of the action potential with 1 µsec accuracy. Three methods were used
to guide the placement of the electrode in the ICC: initial placement of the electrode
57
after visually identifying the IC; attending to background activity in response to the
search stimulus, which is more prominent in the ICC than in surrounding areas; and
histological reconstruction and examination of the electrode tracts in three of the
cats.
Once isolated, four measurements were taken to characterize each neuron: 1)
Binaural interactions were examined by switching the search stimulus on and off
in each ear. All subsequent measurements for a particular neuron were made with
the most responsive of these binaural stimulus settings (monaural or diotic). 2) A
threshold tuning curve and the characteristic frequency (CF) were obtained using the
Moxon (Kiang, Moxon, and Levine, 1970) algorithm with criterion of 0 spikes. 3)
The responses to a set of 300 msec long tones at CF, 0 to 60 dB re threshold, were
measured in order to classify the neuron by the peri-stimulus time histogram (PSTH)
of these responses (see Sec. 3.2.2). 4) The responses to an amplitude modulated tone
at CF were measured for modulation frequencies (Fm) ranging from 1 to 512 Hz in
octave steps in order to generate an MTF (see Sec. 3.2.2).
The consonant/dissonant tone pairs shown in Fig. 3-1 were each presented 30
times to a neuron. The tone pairs were 500 msec in duration (except for the first
experiment where durations were 300 and 200 for the pure-tone and complex-tone
pairs respectively) and were typically presented at 60 dB SPL. For some neurons,
we also used other levels ranging from 20 to 80 dB SPL. The harmonics for the
six-component complex tones were in cosine phase and equal amplitude. While the
stimuli shown in the figure are based on Just intonation, the stimuli in this study
were based on equal temperament tuning1
The responses to each set of tone pair presentations was closely monitored during
1Tuning based on Just intonation creates successive scale step intervals of slightly unequal size.In addition the interval steps between any two scale steps depend on which note of the scale is usedas the starting point in tuning. This causes practical difficulties for combining multiple instrumentsas well as for playing music in different keys. Equal temperament tuning, the standard used today,was developed to overcome these difficulties (see eg., Sethares, 1999. It divides the octave into 12equal frequency ratios, making all musical intervals in all keys the same size. This slight detuningfrom Just intonation is perceivable but it does not effect the relative dissonances of the intervalsused in this study. The largest frequency difference in interval sizes between the two tuning systemsis 3.5 Hz at the Tritone.
58
recording and were used in the analyses only if spikes were clearly identifiable above
the background activity. In addition, estimates of the false triggers in each spike
record were obtained by calculating the number of interspike intervals (ISIs) smaller
than 0.5 msec. These ISIs are assumed to be smaller than the absolute refractory
time the neuron and therefore are most likely due to an improperly set threshold on
the spike counter. If the quantity of these short intervals exceeded 3% of the total
number of intervals the spike record was excluded from the analyses.
In addition to the isolated tone pairs, we also presented to a few neurons a two-
voice sequence, excerpted from Bartok’s Mikrokosmos #32 (Bartok, 1940) to inves-
tigate the ability of IC neurons to follow the variations in dissonance of sequential
musical intervals. The sequence was synthesized using a piano timbre on a Korg
05R/W general MIDI synthesizer and presented at an overall level of 60 dB SPL. A
spectrogram of the stimulus is shown in Fig. 3-13.
Histology
In order to closely examine the placement of our electrodes within the IC, we histolog-
ically reconstructed the electrode tracts in three of the cats. The midbrain was fixed
and sliced into 80 µm slices. Every third slice was stained with calretinin while the
remaining slices were Nissl stained. The calretinin stained slices were used to identify
putative projections of the MSO (Adams, 1979; Adams, 1995). Results showed that,
while a majority of the electrode tracts ran through the ICC, a few of them were
close to or ran through dorsal cortex of the IC, the border of which is somewhat
indistinct. Examination of the data and neuron classifications that came from the
different electrode penetrations revealed no systematic differences, so we combined
data across all electrode penetrations.
3.2.2 Analysis
Neurons were classified as one of four types based on the shape of the PSTH of their
responses to tones at CF: Onset, Sustained, Pauser, and Other. This classification
59
is similar to, although slightly broader than that of Nuding, Chen and Sinex (1998).
Figure 3-4 shows example PST histograms for the first three types. For most neu-
rons, responses were measured at three levels and the PSTHs were summed across
levels before making the classification. For neurons from which we had not measured
responses to tones at CF we used the PSTH of the response to the Unison complex-
tone pair. PSTHs were generated with 1 msec binwidths and were classified after
the following preparation: The beginning of the neuron’s response was defined as the
first point in the PSTH where the discharge rate in a 3 msec window exceeded the
rate of the previous 3 msec window by 3 standard deviations of the rate in the bins
of the previous window; the PSTH was divided into an onset section (the first 20
msec of the response), a sustained section (the remaining 280 msec of the response),
and a pause section (the first 20 msec of the sustained section). Neurons were clas-
sified as Onset if the average discharge rate in the onset section was greater than 10
times that in the sustained section. In addition, we excluded neurons from the Onset
category that contained large phasic responses but low overall rates in the sustained
section by requiring the onset rate to be greater than 5 times the largest value of a
non-overlapping moving average of the discharge rate in the sustained section using
an averaging window of 20 msec. Neurons were classified as Pauser if their response
did not meet the requirements for Onset and the rate in the pause section was less
than 0.2 times that in the onset section and in the remainder of the sustained section.
Neurons were classified as Sustained if they did not meet the criteria for Onset or
Pauser and the rate in the sustained section was greater than 10 spikes/sec. Finally,
neurons were classified as Other if they did not fall into one of the other categories or
if the latency of the response was greater than 30 msec. Neurons were not classified
if their responses contained fewer than 30 spikes.
Synchronized rate MTFs were generated from responses to sinusoidally amplitude-
modulated (SAM) tones at CF. For each modulation frequency, the modulation of
the response was calculated as 2 times the product of the discharge rate and the
synchronization index. This method was used because synchronization rate alone
is a misleading measure of synchrony for low discharge rates (a single spike has a
60
0 50 100 150 2000
10
20
30
40
50
60Onset Neuron
Nu
mb
er o
f sp
ikes
0 50 100 150 2000
10
20
30
40Pauser Neuron
Peri-stimulus time (msec)0 50 100 150 200
0
5
10
15
20Sustained NeuronA B C
Figure 3-4. PST histograms of responses to 300 msec tones at CF for three neurons. Neuron CFswere, left to right: 5680, 3843 and 2703 Hz. Stimuli were presented 30 times. Histogram binwidthis 1 msec.
synchronization of 1.0). The gain of the responses was measured with respect to
the average discharge rate. The best modulation frequency (BMF) was calculated if
the MTF magnitude fell at least 10 dB on both sides of its maximum. The MTF
magnitude function was interpolated to 100 log-space points between 1 and 512 Hz
with a cubic spline. The bandwidth was measured at 3 dB down from the peak and
the BMF was taken as the mean of modulation frequencies spanning this bandwidth.
Responses to the consonant/dissonant tone pair stimuli were grouped by musical
interval type and summed across stimulus presentations to generate PSTHs with 1
msec binwidths. The histograms were divided into sections in the same manner as
the CF-toneburst responses described above. For each musical interval type, the
average discharge rate was calculated for the sustained section of the response and,
as a measure of rate fluctuation, the temporal standard deviation was calculated after
smoothing the PSTH with a 3 msec rectangular window.
We assess the variability of most of our data and calculations by bootstrapping:
We randomly resample (with replacement) the data and recompute the statistic of
interest and then calculate the standard deviation of the statistic across resampled tri-
als. This technique has an advantage over traditional parametric statistical techniques
in that one does not have to assume the form of the underlying variability (Efron and
61
Tibshirani, 1993). Unless noted otherwise, resampling was performed across neurons.
3.3 Results
From ten experiments, a total of 157 spike records (39 pure-tone pairs, 118 complex-
tone pairs) from 88 neurons were obtained that met our grading criteria for analysis.
37 neurons were classified as Onset, 11 as Sustained, 24 as Pauser, and 15 as Other. 30
of these neurons were classified from CF-toneburst responses, 57 from responses to the
Unison complex-tone stimulus, and 1 neuron was not classified because its response to
Unison had too few spikes. For most neurons (23 of 30), both methods of classification
gave the same result and our population analyses and conclusions based on PSTH
type remain the same regardless of the method of classification. CFs ranged from 255
to 21,700 Hz. This distribution of CFs does not reflect uniform sampling in the IC
as we targeted low-CF neurons that would respond to our relatively low-frequency
stimuli.
3.3.1 Responses to pure- and complex-tone pairs
Figure 3-5 shows responses of a Sustained IC neuron to the Minor 2nd, Tritone and
Perfect 5th pure- and complex-tone pairs. Low frequency beating can be seen in
the temporal envelope of the dissonant stimuli: pure-tone Minor 2nd (A), complex-
tone Minor 2nd (C), and complex-tone Tritone (G). In contrast, the envelope of the
more consonant stimuli is smoother and flatter: pure-tone Tritone (E), pure-tone
Perfect 5th (I), and complex-tone Perfect 5th (K). The beating of the dissonant
stimuli is clearly reflected in their neural responses whereas the consonant stimuli
evoke flatter and smaller responses. The beat rate of the fluctuating responses to
the pure-tone Minor 2nd (B) and complex-tone Tritone (H) match the beat rate of
their corresponding stimulus envelopes. However, the beat rate of the response to
the complex-tone Minor 2nd (D) is roughly three times the beat rate of its stimulus
envelope. The neural response is most likely dominated by the beat rate of the third
harmonics in the tones comprising the Minor 2nd due to the proximity of its CF
62
0 100 200-4
-2
0
2
4
stim
ulu
s am
plit
ud
e
Stimulus
0 100 2000
5
10
15
20n
um
ber
of
spik
es
0 100 200-20
-10
0
10
20
stim
ulu
s am
plit
ud
e
Stimulus
0 100 2000
5
10
15
20
nu
mb
er o
f sp
ikes
0 100 200-4
-2
0
2
4
stim
ulu
s am
plit
ud
e
0 100 2000
5
10
15
20
nu
mb
er o
f sp
ikes
0 100 200-20
-10
0
10
20st
imu
lus
amp
litu
de
0 100 2000
5
10
15
20
nu
mb
er o
f sp
ikes
0 100 200-4
-2
0
2
4
stim
ulu
s am
plit
ud
e
0 100 2000
5
10
15
20
nu
mb
er o
f sp
ikes
Peri-stimulus time (msec)0 100 200
-20
-10
0
10
20
stim
ulu
s am
plit
ud
e
0 100 2000
5
10
15
20n
um
ber
of
spik
es
NeuralResponse
Minor2nd
Tritone
Perfect5th
NeuralResponse
Two Pure Tones Two Complex Tones
A B C D
E F G H
I J K L
Figure 3-5. Stimuli waveforms and corresponding responses from a single Sustained neuron forpure- (left) and complex-tone (right) Minor 2nd (top), Tritone (middle) and Perfect 5th (bottom)stimuli. Response panels are PST histograms with a 1 msec binwidth based on 30 stimulus presen-tations. CF = 1170 Hz.
63
(1170 Hz) to these harmonics. This effect of CF is demonstrated more thoroughly in
Section 3.3.3.
Figure 3-6 shows responses of three different neurons to the pure- and complex-
tone pairs. The Onset neuron (A-B) responded at the onset of all the tone pairs,
as well as during the sustained portion for the pure-tone Minor 2nd and many of
the complex tone pairs. In addition, this neuron shows regular fluctuations in its
response to the dissonant Minor 2nds and complex-tone Tritone intervals. This can
be seen in the beating pattern of the responses to these stimuli. The Pauser neuron
(C-D) responded vigorously throughout the duration of all the stimuli except for a
brief pause after the onset. This neuron also shows fluctuations in its response to the
dissonant Minor 2nd intervals but not to the complex-tone Tritone. This is likely due
to the fact that its CF (440 Hz) is not near a pair of low-frequency beating partials
in the Tritone stimuli (see Figs. 3-10, 3-11, and 3-12). The Sustained neuron (E-F)
responded throughout the duration of the stimuli and shows a beating pattern in
response to both Minor 2nd and Tritone intervals.
Figure 3-7 shows the mean rate fluctuations across all neurons as well as the av-
erage discharge rate for all Onset neurons from which we recorded. Rate fluctuations
of responses were quantified by calculating the temporal standard deviation of their
PST histograms (see Section 3.2.2). For pure-tones pairs, the peak occurs at the
Minor 2nd for both measures. For complex-tone pairs, the Minor 2nd elicits the
greatest response as well, but the Tritone also shows a response that is significantly
greater than that from any of the remaining tone pairs. In comparing these mea-
sures to the psychoacoustic data on dissonance in Fig. 3-2 we see that the relative
ranking of musical intervals in both measures is the same as the rank order of disso-
nance ratings. There are some differences, however, in that the complex-tone Tritone
stands out more in the physiological data (B,D) than in the psychoacoustic data and
the psychoacoustic dissonance rating for pure-tone intervals decays with interval size
whereas the measure of neural rate fluctuations (A) is flatter. The latter difference
may be explained by a small CF effect in the pure-tone data: most neurons from
which we recorded had CFs higher than the pure-tone pair frequencies and tended
64
0 100 200
Complex Tone Pairs
0 200
Unim24thTri
5thOct
Pure Tone Pairs
0 5000 500
Unim24thTri
5thOct
0 5000 500
Unim24thTri
5thOct
Peri-stimulus time (msec)
A B
C D
E F
OnsetNeuron
PauserNeuron
SustainedNeuron
Figure 3-6. A-B: Responses of an Onset neuron to pure- and complex-tone pairs at specifiedintervals. Neural activity is shown over the duration of every stimulus presentation (30 presentationsper stimulus). The horizontal black line below each panel indicates stimulus on-time. CF = 1160 Hz.C-D: same for a Pauser neuron with CF = 440 Hz. E-F: same for a Sustained neuron with CF =1170 Hz. Note the different time scales in the top panels.
65
Uni m2 4th Tri 5thOct0
5
10
Ave
rag
e R
ate
(sp
/sec
)
n = 9
Uni m2 4th Tri 5thOct0
5
10
Ave
rag
e R
ate
(sp
/sec
)
n = 45
Uni m2 4th Tri 5thOct0
10
20
30
Pure Tone Pairs
Rat
e F
luct
uat
ion
s (s
p/s
ec)
n = 36
Uni m2 4th Tri 5thOct0
10
20
30
Complex Tone Pairs
Rat
e F
luct
uat
ion
s (s
p/s
ec)
n = 118
AllNeurons
OnsetNeurons
A B
C D
Figure 3-7. Mean rate-fluctuations across all neurons for the pure- (A) and complex-tone (B) pairs.Rate fluctuations were calculated as the temporal standard deviation of the rate. Average dischargerate across all Onset neurons for the same stimuli (C and D). Error bars are estimated standarderrors of the mean and include intra- and across-neuron variances (assumed to be orthogonal).
66
to respond more when the upper tone in the pair approached the CF. Nevertheless,
the qualitative correlate of dissonance rank order is consistent with the idea that the
dissonance of musical intervals is encoded in the rate fluctuations of IC neurons and
in the average discharge rates of IC Onset neurons. The data shown in Fig. 3-7 are
pooled across stimulus level, PSTH type, and CF. In the next few sections we examine
the effects of these attributes on the neural response.
Uni m2 4thTri 5th Oct0
10
20
30
40
Pure Tone Pairs
n = 2
Uni m2 4thTri 5th Oct0
10
20
30
40
Complex Tone Pairs
n = 15
Uni m2 4thTri 5th Oct0
10
20
30
40 n = 28
Uni m2 4thTri 5th Oct0
10
20
30
40 n = 79
Uni m2 4thTri 5th Oct0
10
20
30
40 n = 6
Uni m2 4thTri 5th Oct0
10
20
30
40 n = 24
Rat
e F
luct
uat
ion
s (s
p/s
ec)
15-40dB SPL
40-60dB SPL
60-80dB SPL
A B
C D
E F
Figure 3-8. Mean rate-fluctuations in response to the tone-pair stimuli as a function of stimuluslevel. Responses are averaged across neurons for each group of stimulus levels.
67
3.3.2 Effect of level and PSTH type
Figure 3-8 shows the mean rate fluctuations across all neurons grouped by stimulus
level. In general, the overall rate fluctuations (in spikes/sec) decrease slightly with
increased level, but the relative differences in fluctuations across musical interval
remain similar for all levels. Therefore, as a code for the dissonance of musical
intervals, IC neural rate fluctuations appear to be robust across stimulus level. The
relative average discharge rates of our population of Onset neurons also appeared
relatively stable over level (not shown), although we had fewer neurons and stimulus
levels to evaluate.
Uni m2 P4 Tri P5 Oct1
10
100
Rat
e F
luct
uat
ion
s (s
p/s
ec) Pure Tone Pairs
Uni m2 P4 Tri P5 Oct1
10
100Complex Tone Pairs
Uni m2 P4 Tri P5 Oct0.1
1
10
100
Dis
char
ge
Rat
e (s
p/s
ec)
OnsetSustainedPauser
Uni m2 P4 Tri P5 Oct0.1
1
10
100
A B
C D
Figure 3-9. Bootstrap estimates of the median and interquartile range (25 to 75 percentile) ofneural responses for each PSTH type and tone-pair stimulus. The format is similar to Fig. 3-7.For each group of neurons the inter- and intra-neuron variability of the statistic are accounted forby randomly sampling the population as well as randomly sampling the responses of each neuronto individual stimulus presentations. Estimates are based on 2000 randomly sampled trials. Thenumbers of neurons in each category are: NOn,Pr = 11, NOn,Cx = 45, NSus,Pr = 8, NSus,Cx = 17,NPsr,Pr = 11, NPsr,Cx = 36, where “Pr” is for pure-tone pairs, “Cx” is for complex-tone pairs,“On” is for Onset, “Sus” is for Sustained, and “Psr” is for Pauser.
The range of responses to the tone-pair stimuli for neurons of different PSTH types
68
is shown in Fig. 3-9. The figure is similar in format to Fig. 3-7 but shows bootstrap
estimates of the median and interquartile range (25th to 75th percentile) of population
neural responses for each PSTH type and stimulus. The response data was resampled
within and across neurons so that the range accounts for both inter- and intra-neuron
variability. All panels show that Onset neurons tend to be less responsive overall but
they are more sensitive than Sustained or Pauser neurons to the relative dissonance
of the stimuli. This is especially true for the complex-tone pairs (B, D). It can
also be seen from the figure that Pauser and Sustained neurons respond similarly to
complex-tone pairs whereas Sustained neurons show greater overall average rates and
rate fluctuations in response to pure-tone pairs.
It is also clear, from Fig. 3-9C-D, that only Onset neurons could code for dis-
sonance based on average discharge rate alone; Sustained and Pauser neurons show
only small variation in discharge rate across stimulus type. In addition, it can be
seen from B that a particularly sensitive code for dissonance could be generated by
taking into account the relative rate fluctuations of Onset and Sustained or Pauser
neurons: for highly consonant intervals (unison and octave), rate fluctuations from
Onset neurons are much smaller than those from Sustained or Pauser neurons; for
highly dissonant intervals (Minor 2nd), Onset rate fluctuations are nearly equal to
those of Sustained or Pauser neurons; and for mildly dissonant intervals (Tritone,
Perfect 4th), the differences in rate fluctuations are in between those extremes. This
code would have the advantage of not being dependent on absolute measures of the
responses.
3.3.3 Dependence on CF
The beat strength and rate of a neuron’s response to dissonant complex-tone pairs
depends on its CF. Figure 3-10 shows line spectra for the complex-tone Minor 2nd and
Tritone stimuli and indicates, for each, which pairs of partials interact to give beat
frequencies in the roughness range. For neurons whose CFs are close to these pairs
of partials, the response is expected to reflect the partials’ beat frequency. This is
illustrated in Figs. 3-11 and 3-12. Figure 3-11A and C show the response of a neuron
69
440 1320 2200 3080 3960
MinorSecond
440 1320 2200 3080 3960
Tritone
Frequency (Hz)
∆ƒ (Hz):A
B
26 52 78
∆ƒ (Hz): 182 76 106
Figure 3-10. Line spectra of the complex-tone Minor 2nd and Tritone stimuli. Arrows mark thepartials that give rise to the low beat frequencies for each stimulus.
0 100 200
0 100 200 0 100 200
0 100 200
A B
C D
300
200
100
0
300
200
100
0
MinorSecond
Tritone
Peri-stimulus time (msec)
Dis
char
ge
Rat
e (s
pik
es/s
ec)
440 Hz CF 1335 Hz CF
26 Hz 78 Hz
182 Hz 76 Hz
Figure 3-11. Peri-stimulus time histograms of the responses of two neurons to the Minor 2nd (A-B)and Tritone (C-D) complex-tone pairs. The CF of each neuron is indicated above the panels. Thebeat frequency and period (indicated by a horizontal black bar) of the stimulus partial pair closestto the neuron’s CF are shown in each plot.
70
with a CF of 440 Hz to the Minor 2nd and Tritone stimuli. Horizontal lines in the
upper left corners indicate the beat period of the partial-pair closest to CF. In A, the
beat frequency is 26 Hz and the neural response is phase locked to this frequency.
In C, the beat frequency of the Tritone’s first two partials is 182 Hz but there is
no clear representation of this frequency in the response. An envelope fluctuation
frequency of 182 Hz is too high to be well represented in most IC neural responses
(see Fig. 3-3). The responses of a neuron with a higher CF (1335 Hz) is shown in B
and D. In B, the response to the Minor 2nd shows a clear representation of the beat
frequency of the 3rd harmonics from each tone (78 Hz) but also shows the beat rate
of the fundamental frequencies (26 Hz, see A). In D, the response to the Tritone is
dominated by the 75 Hz beat frequency of the partials marked in Fig. 3-10B.
The effect of CF on the response strength of the population of IC neurons from
which we recorded is shown in Fig. 3-12. It shows normalized rate fluctuations in
response to both the complex-tone Minor 2nd (A) and Tritone (B) as a function of
CF. In response to the Minor 2nd, neurons with low CFs (near the proximal partials
of the stimulus) show greater fluctuations than those with high CFs. In response
to the Tritone, neurons show little rate fluctuations at low CFs, but more at mid-
frequency CFs where there are two pairs of partials that beat at frequencies in the
roughness range.
3.3.4 Responses to a musical excerpt
In order to examine the ability of IC neurons to follow temporal changes in (sen-
sory) dissonance within a musical passage, we generated a stimulus from an excerpt
of Bartok’s Mikrokosmos #32 (1940). The music notation, a spectrogram of the
stimulus and the responses of two IC neurons are all shown in Fig. 3-13. The most
consonant intervals in the excerpt are the Unison and Major 3rd, while the most dis-
sonant are the Major 2nd, Major 7th, and Tritone intervals. A piano tone was used
for the stimulus and the spectrogram shows the decaying energy over the duration
of each note, especially in the higher frequency harmonics. The neural recordings
show beating in response to some dissonant musical intervals, especially those which
71
0 500 1000 1500 2000 2500 3000 3500 40000.1
1
10Minor 2nd
0 500 1000 1500 2000 2500 3000 3500 40000.1
1
10Tritone
No
rmal
ized
Rat
e F
luct
uat
ion
s
Characteristic Frequency (Hz)
A
B
Figure 3-12. Normalized rate fluctuations of Sustained and Pauser neurons in response to theMinor 2nd (A) and Tritone (B) complex-tone pairs plotted as a function of CF. Rate fluctuationsof each neuron are normalized by its mean rate fluctuations in response to the more consonant tonepairs (Unison, Perfect 4th and 5th, Octave). The dark line through the data is a moving average.
72
have partial pairs near their CF (see histogram): The top response, from a neuron
with CF = 2700 Hz, shows beating in response to the Tritone (2nd tone-pair) and
both Minor 6ths (3rd and 7th tone-pairs) while the bottom response, from a neuron
with CF = 350 Hz, shows beating in response to both Major 2nds (1st and 5th tone-
pairs) and the Major 7th (8th tone-pair). Both neurons show little response to the
consonant Unison interval. This result is consistent with our previous findings that
IC neurons show beating in response to dissonant tone-pairs and that responses of
individual neurons are dependent on their CF. It also shows that an IC neuron can
follow temporal changes in dissonance in a realistic musical setting. Lastly, this re-
sult shows that our general results seem to apply to more musically realistic timbres.
Several other neurons showed similar responses however we did not record across a
broad enough range of CFs to pool our data for this stimulus.
3.3.5 Additional observations
Our motivation for looking at IC neural responses for correlates of roughness came
from the observation that IC neural MTFs resemble the psychoacoustic roughness
function (Fastl, 1990). Because we saw differences in responses to dissonant stimuli for
neurons with different PSTH types (Fig. 3-9), we decided to also look for differences
in their MTFs.
We measured MTFs in 60 of the 88 neurons from which we obtained tone-pair
data. Figure 3-14 shows the MTF magnitude and phase for an Onset, a Sustained,
and a Pauser neuron. The data are representative and show distinct differences in
the magnitude of the MTFs from these different types of neurons: The MTF from
the Onset neuron is more sharply tuned, centered at a slightly lower frequency, and
provides more gain at the BMF than the MTFs from the Sustained and Pauser
neurons; the MTF of the Pauser neuron shows a dip in magnitude between the
lowest measured frequency and the BMF. For completeness, the MTF phase is also
plotted in Fig. 3-14. We did not make significant correlations between PSTH type
and MTF phase characteristics in this study. However, this result was most likely
effected by the fact that we measured responses at relatively widely spaced modulation
73
Fre
quen
cy (
Hz)
0
500
1000
1500
2000
2500
0
200
400
600
0 0.5 1 1.5 2 2.5 3 3.50
100
200
300
Time (sec)
Dis
cha
rge
Rat
e (s
p/se
c)===========================& 23 ì ïê ïê î | ïê ïê î | í î | ïê ïê Œ Ó
===========================& 23 ì ïê ïê ïê ïê ïê ïê í ïê ïê _ ïê Œ Ó ì ì ì ì
¬[[[[LM2 Tri m6 Uni M2 M6 m6 M7 M3
CF: 350 Hz
CF: 2700 Hz
Figure 3-13. Top shows the musical notation for measures 12-13 of Bartok’s Mikrokosmos #32 (InDorian Mode). Middle shows a spectrogram of a recording of the excerpt using a piano sound froma Korg 05R/W synthesizer. Bottom panels show responses to the excerpt from two IC neurons. Theneurons’ CFs (350 Hz, bottom and 2700 Hz, top) are marked on the right side of the spectrogram.
74
frequencies and consequently had difficulty unwrapping potentially ambiguous phase
measurements.
-40
-30
-20
-10
0
10
20
Gai
n (
dB
)
-5
-4
-3
-2
-1
0
1
Ph
ase
(cyc
les)
Modulation Frequency (Hz)
Onset NeuronSustained NeuronPauser Neuron
B
A
1 10 100 1000
1 10 100 1000
Figure 3-14. MTF magnitude (A) and phase (B) for an Onset, a Sustained and a Pauser neuron.Data points have been straight-line connected for clarity. Neuron CFs were: Onset : 5059 Hz;Sustained : 935 Hz; Pauser : 1526 Hz.
In Fig. 3-15A, the MTF magnitude characteristics can be seen in median data
across all neurons for each PSTH type. Also shown are standard error ranges of the
magnitude. In addition to the characteristics described above, it can be seen that
Pauser neurons tend to have a higher gain at low modulation frequencies than the
other neuron types. The magnitude of MTFs from neurons classified as Other (not
shown) varied in shape from neuron to neuron and were not as easily characterized.
Figure 3-15B shows MTF bandwidth plotted against BMF for individual neurons.
Data is shown only for those MTFs that met our “bandpass” criteria. The charac-
teristics of the median data (A) are evident in the population distribution for each
neuron type. An analysis of variance followed by a Tukey HSD multiple comparison
test (Hotchberg and Tamhane, 1987) showed that the bandwidth of Onset MTFs
are significantly smaller than those from Sustained neurons (α < 0.05). All other
differences in bandwidth or BMF across groups were not significant, although there
75
is a tendency for Onset neurons to have lower BMFs than other neuron types.
Figure 3-15B also shows that MTF bandwidth and BMF are highly correlated,
indicating that measured MTFs have an approximately constant “Q”.
3.4 Discussion
We have shown that IC neurons beat in response to dissonant tone-pairs and that
the frequency of beating is dominated by the partial-pair closest to the neuron’s
CF. Averaged across all CFs, the rate fluctuations of IC neurons reflect perceptual
dissonance ratings of pure- and complex-tone pairs. This code for dissonance is robust
across stimulus level and is more sensitive in the responses of Onset neurons than in
Sustained or Pauser neurons. In addition, Onset neurons reflect the dissonance of
tone-pair stimuli in their average discharge rate. We have also shown that IC neurons
are capable of following changes in (sensory) dissonance within a musical excerpt.
Finally, we have shown that MTFs from Onset neurons tend to be more sharply
tuned, centered at slightly lower frequencies and provide a higher gain at the BMF
than those from Sustained or Pauser neurons.
3.4.1 Neurophysiology
This study follows the work of Tramo et al. (1992,2000) who found correlates of
dissonance in AN discharges. The main difference between the correlates described
here and those in the AN is that, while correlates are seen directly in IC responses,
AN responses require additional processing. This is due to the fact that AN fibers
code both the temporal envelope and fine time structure in their discharge patterns,
while IC neurons largely respond to the envelope only. Because dissonance appears
to be related to properties of the temporal envelope, a correlate can be seen in AN
responses only after the fine time structure has been removed by either bandpass
filtering or some other means such as summing over CF. Such a process must occur
at some point above the AN and at or below the level of the IC. MTFs have been
measured in cochlear nucleus (CN) neurons and, on average, are broader and cen-
76
1 10 100 1000-30
-25
-20
-15
-10
-5
0
5
10
15
Gai
n (
dB
)
Sus, n=8Psr, n=24On, n=22
1
10
100
1000
Best Modulation Frequency (Hz)
MT
F B
and
wid
th (
Hz) r = 0.86
1 10 100 1000
A
B
Figure 3-15. A: Median MTF magnitude for Onset, Sustained, and Pauser neurons. Symbolsdenote median data, dashed lines connect median data ± estimated standard errors. Standarderrors were obtained through bootstrapping across each group of neurons. B: MTF bandwidth vs.BMF for individual neurons. Results of an ANOVA followed by a Tukey HSD multiple comparisontest show Onset and Sustained neurons to have significantly different MTF bandwidths (α < 0.5).Other groups across bandwidth or BMF were not significantly different. The two factors werecorrelated in the log10 of the combined population data with r = 0.86.
77
tered at higher modulation frequencies than those from the IC (Frisina, Smith, and
Chamberlain, 1990; Rhode and Greenberg, 1994; Delgutte, Hammond, and Cariani,
1998). MTFs have also been measured in the lateral superior olive (LSO) and are also
relatively broad and centered higher than IC MTFs (Joris and Yin, 1998). Therefore,
because MTFs of most IC inputs are broader than those in the IC, it is likely that the
additional filtering occurs within the IC itself. Intracellular recordings from the IC
show multiple phases of excitation and inhibition that could implement a bandpass
filter (Covey, Kauer, and Casseday, 1996; Kuwada et al., 1997)
Another difference between the studies in the AN and this one is our finding
of a rate code for dissonance. Although not all IC neurons showed the effect, the
population average discharge rate of IC Onset neurons reflected the dissonance of our
tone-pair stimuli. This type of rate code has not been extensively seen in neurons
below the level of the IC, although it is possible that some CN neurons could show
similar properties. In addition, LSO neurons have been reported to show changes in
spike rate with modulation frequency (Krishna and Semple, 2000) and not much is
known about the envelope sensitivity of MSO neurons, another major input to the
IC. Nevertheless, it is possible that there exists a transformation from a temporal to
rate code within the IC.
We chose to classify neurons using a PSTH classification method similar to those
used previously to classify neurons in the CN (Bourk, 1976) and in the IC (Rees and
Møller, 1983; Rees et al., 1997; Nuding, Chen, and Sinex, 1999; Krishna and Semple,
2000). However, the PSTH types have not been linked to any neural morphology.
Our finding that Onset neurons tend to have smaller MTF bandwidths than other
neuron types has not been reported before and it may be linked to their high overall
sensitivity in response to our tone-pair stimuli. Generally, in a linear system, a
sharply tuned filter will show greater sensitivity to changes along its parameter of
tuning (eg., modulation frequency). Previous studies, however, have shown that while
a linear systems approach to modeling neural responses in the periphery (AN and
CN) can be quite successful, it does not always work in the IC(Delgutte, Hammond,
and Cariani, 1998; Delgutte, Hammond, and Cariani, 2000). In addition, our found
78
relationship between PSTH type and MTF may not hold for all neurons in the IC as
we’ve admittedly sampled from low-CF neurons in the dorsolateral portion of the IC,
where the inputs from lower nuclei that are different than the ventromedial portion
of the IC (Fullerton, 1993).
Krishna and Semple (2000) have also found a correlation between between the
PSTH type and the MTF of IC neurons in the Mongolian gerbil. They found that
rate MTFs of non-Onset neurons tended to contain regions of suppression in which
rate would diminish as the stimulus level increased. Onset neurons tended not to
contain such regions. In addition, they found that, generally for all neuron types,
as stimulus level went up temporal MTFs changed from lowpass in shape to more
bandpass. It is difficult to say whether or not their finding is related to the differences
we found between Onset and non-Onset neurons, but it does suggest that the two
types of neurons process the temporal envelope of sounds differently.
Correlates of roughness have also been seen in multi-unit and current source den-
sity recordings from cortex (A1) in awake monkeys (Fishman et al., 2000). This
finding is consistent with the idea that temporal envelope fluctuations encoded in
IC neural discharges are preserved and passed on to cortical structures. However,
multi-unit recordings are somewhat ambiguous as to which neural structures gener-
ate the measured responses, which leaves open the possibility that the measurements
by Fishman et al. were made from inputs to the cortex. This is especially plausi-
ble considering that their strongest responses came from the thalamorecipient zone
(lower lamina III) and that studies of single unit MTFs in cortex show responses to
be restricted to lower frequencies.
3.4.2 Psychophysics and perception
We have shown that dissonant sounds produce larger rate fluctuations in IC neurons
and higher average discharge rates in IC Onset neurons than do consonant sounds.
However, we have not attempted to quantitatively correlate our physiological results
with psychoacoustic data because of the lack of stimulus specificity in the current
psychoacoustic data. Early psychophysical experiments on consonance and disso-
79
nance did not have calibrated control over stimuli spectra or level (Malmberg, 1917;
Guernsey, 1928). Since then, however, a few studies have examined the roughness
and dissonance of pure-tone pairs using well controlled stimuli (Plomp and Levelt,
1965; Plomp and Steeneken, 1968; Terhardt, 1968b; Terhardt, 1974b) but little data
exists on the dissonance of complex-tone pairs. Kameoka and Kuriyagawa (1969a,b)
have demonstrated that both stimulus level and spectra shape affect perceived disso-
nance of tone pairs, but even they did not have precise control over level (or spectra)
for all listeners as they presented their stimuli over speakers in an auditorium. It is
clear that a thorough psychophysical measure of consonance and dissonance based on
calibrated stimuli is required for a more precise comparison of perceptual and neuro-
physiological responses. Such a study might include a complete pairwise comparison
of all intervals including the relative dissonance of pure- and complex-tone pairs.
Despite the incompleteness of psychophysical data, our results and qualitative cor-
relations still provide the basis to suggest that a dissonance is encoded in the temporal
patterns of IC neural responses. For our stimuli, we chose the extreme consonant and
dissonant musical intervals. All psychoacoustic studies using complex tones, regard-
less of stimulus spectra, showed the minor 2nd to be judged most dissonant and the
tritone to be more dissonant than the Perfect 4th and 5th. Unison, octave, perfect 5th
and 4th intervals were always deemed consonant. This rank order was never violated
in psychoacoustic studies and appears in our neural response measures.
The perception of sensory dissonance appears to be innate in humans and it is
not exclusive to humans. Schellenberg and Trainor (1996) found that infants, similar
to adults, show better discrimination for dissonant harmonic intervals than for conso-
nant harmonic intervals. Hulse, Bernard and Braaten (1995) showed that European
starlings could learn to discriminate musical chords and then transfer the discrim-
ination of chords to those with different fundamental frequencies. Here, although
we have found correlates of human perception in cat neural responses, we are not
suggesting that cats hear dissonance in the same way as humans. However, ecolog-
ically, it may be beneficial for cats to be able to discriminate and attend to rough
or dissonant sounds, eg., for the purpose of catching prey or to detect predators.
80
Consequently, it is possible that there are general preadaptations for certain aspects
of music processing in the mammalian auditory system.
Historically, the terms consonance and dissonance have been used, with reference
to music, in a variety of senses (Tenney, 1988; Sethares, 1999). The terms have been
used to describe musical sounds based on function, context, as well sensory attributes
and it is important to distinguish between these senses. Functional definitions are
those used by composers and theorists and come from rules that are either implicitly
defined by a music listening culture or explicitly defined by music theorists. Functional
definitions of dissonance have changed over time and are influenced by culture and
listening experience. Contextual senses of dissonance depend on the surrounding
sounds. In context, the dissonance of a particular musical sound is affected by the
general level of dissonance within a piece of music, by the implied harmony of a
particular passage or piece, as well as by the effects of auditory streaming (Wright
and Bregman, 1987). Sensory dissonance, however, refers to the quality of sounds
in isolation and is often equated with the roughness (von Helmholtz, 1863) or (lack
of) fusion (Stumpf, 1890) of the sound. These various meanings of consonance and
dissonance are not mutually exclusive but should not be confused with each other
when one refers to the “dissonance” of a particular sound.
While sensory dissonance is thought to be based on (and often equated with)
roughness, there are alternative theories on its basis. Stumpf’s (1890) fusion theory
states that sounds are consonant because their individual components fuse together
to form a single perceptual entity, more so than dissonant sounds. One problem with
this idea, as suggested by the work in auditory scene analysis, is that it seems fusion
is actually required for the perception of dissonance as well (Wright and Bregman,
1987; Bregman, 1990). The perceptual grouping and perceived timbre of a simulta-
neous set of tones can be affected by the (a)synchronization of their onsets as well
as sequential streaming cues, that is, how well a particular component fits with a
stream of preceding tones(Bregman and Pinker, 1978). Historically, dissonance was
introduced gradually into Western music through the use of simple compositional de-
vices to soften the perceived dissonance of two simultaneous sounding tones(Jeppesen,
81
1927). Dissonant notes could not begin together and could only be approached and
left by half steps (semitones). These devices tend to draw listeners’ attention away
from the fusion of two simultaneous dissonant tones and to their respective horizontal
sequential streams. Thus, it seems that although fusion may be related to perceived
dissonance, it is likely a separate percept and not its basis.
Pitch fusion or the perception of a fundamental bass frequency are also explana-
tions for the basis of consonance (Tramo et al., 2001; Rameau, 1722). For consonant,
simple frequency-ratio intervals, many partials from both tones in the interval fall on
harmonics of a common fundamental bass frequency. The hypothesis is that conso-
nance is based on the salience this fundamental pitch: the more salient it is, the more
consonant the interval. In the companion paper, we look for correlates of the funda-
mental base frequency in the temporal discharge patterns of IC neurons (McKinney,
Tramo, and Delgutte, 2001b).
Another theory on the basis of dissonance is the long wave hypothesis from Boom-
sliter and Creel (1961). They postulate that consonance is based on the length of
the overall period of a stimulus and model neural responses using an autocorrelation
network similar to Licklider’s (1956). Consonant tone pairs, whose fundamental fre-
quencies are related by simple ratios (eg. Perfect 5th, 3:2), have shorter periods than
dissonant tone pairs and their more complex ratios (eg. Minor 2nd, 16:15). In re-
sponse to consonant stimuli, such an autocorrelation mechanism would produce larger
responses at shorter lags than it would for dissonant stimuli. While this mechanism
is plausible for harmonic stimuli, it fails to explain the ability of listeners to perceive
consonance and dissonance of pairs of inharmonic tones, whose periods can both be
long. A strong variation in consonance, dissonance, and progressing harmony can be
heard for a sequence of tone pairs that have their harmonics stretched to non-integer
ratios if the scale on which they are based is stretched an equal amount (Houtsma,
Rossing, and Wagenaars, 1987; Sethares, 1999).
Roughness, on the other hand, is closely linked to musical dissonance in a variety
of contexts. It can account for the variation of sensory dissonance in inharmonic
stimuli in the same way as it does for harmonic stimuli (Sethares, 1999) and it has
82
been correlated with musical tension in non-tonal music (Pressnitzer et al., 2000).
We have shown correlates of roughness in responses of IC neurons to monaural and
diotic stimuli, but it is important to consider the responses of dichotically presented
musical intervals. While many IC neurons preferentially respond to specific interau-
ral phase differences (Yin and Kuwada, 1983) and may well respond to dichotically
presented intervals it is generally thought that the percept of dichotic roughness does
not exist (Zwicker and Fastl, 1999). An investigation into the response of IC neurons
to dichotically presented intervals is presented in the companion paper (McKinney,
Tramo, and Delgutte, 2001b).
3.5 Conclusion
Our results have revealed neural correlates of sensory dissonance in the discharge rate
fluctuations of all IC neurons and in the average discharge rates of Onset neurons.
More generally, our results illustrate the complexity and specificity of neural pro-
cessing in the auditory periphery and brainstem; percepts generally considered to be
“high order”, such as the dissonance of musical intervals, have direct neural correlates
in midbrain nuclei. Our results also suggest that neurons in the IC are specifically
important for encoding the temporal envelope of sounds.
83
84
Chapter 4
Neural correlates of the dissonance
of musical intervals in the inferior
colliculus. II. Dichotic tone
presentation and pitch salience
4.1 Introduction
We have shown that discharge rate fluctuations of inferior colliculus (IC) neurons
correlate with the dissonance of musical intervals (McKinney, Tramo, and Delgutte,
2001a). Our investigation was motivated by 1) the idea that sensory dissonance is
based largely on the psychoacoustic roughness of a sound (Plomp and Levelt, 1965;
von Helmholtz, 1863), and 2) the fact that modulation transfer functions (MTFs) of
many IC neurons resemble the psychoacoustic roughness function (Delgutte, Ham-
mond, and Cariani, 1998; Fastl, 1990; Rees and Møller, 1983). A possible difficulty
with our finding is that, while many IC neurons exhibit preferential interaural phase
differences (IPDs) (Yin and Kuwada, 1983) and thus may beat in response to dichot-
ically-presented tone pairs, it is generally thought that dichotically-presented tone
pairs do not elicit a roughness sensation (eg., Roederer, 1979). Here, we look at IC
85
neural responses to diotically-presented tone pairs and compare them with responses
to diotic stimuli.
An alternative theory for the basis of consonance suggests that consonant har-
monic intervals have a more perceptually salient common fundamental bass frequency
(FFB) than do dissonant intervals. This idea stems from Rameau’s (1722) theory of
“basse fondamentale” and complements the notion that sensory consonance is just
a lack of roughness (sensory dissonance) (Tramo et al., 2001). Figure 4-1 illustrates
the concept of the fundamental bass for three consonant intervals and the dissonant
Tritone interval, all based at 440 Hz. Line spectra of the tone-pair intervals are
shown and each tone in a pair contains six iso-level harmonics. Gray bars indicate
overlapping harmonics from the lower (black bars) and upper (white bars) tones. For
consonant intervals, fundamental-frequency ratios of the tones tend to be related by
simple ratios and thus all harmonics from both tones are harmonically related to a
common (missing) fundamental bass. In the figure, vertical dashed lines mark har-
monics of the fundamental bass (leftmost dashed line). For the Tritone, the exact
fundamental bass frequency is 13.75 Hz (greatest common denominator of 440 and
618.75 Hz) but there is a near miss at 88 Hz, which is shown in the figure.
In this study, we also investigate IC neural responses to consonant complex-tone
pairs for a representation of the fundamental bass. We examine autocorrelation (AC)
histograms of the responses because they have been shown, for auditory-nerve (AN)
responses, to exhibit correlates of pitch over a variety of stimulus paradigms (McKin-
ney and Delgutte, 1999; Cariani and Delgutte, 1996a; Cariani and Delgutte, 1996b;
Rhode, 1995). We also examine responses to dichotically-presented intervals for
a representation of the fundamental bass since, perceptually, dichotic presentation
of harmonic complexes can also elicit a pitch sensation at the (missing) fundamen-
tal (Houtsma and Goldstein, 1972).
86
0 440 880 1320 1760 2200 2640 3080 3520 3960 4400
Frequency (Hz)
Unison(1/1)
440 Hz
147 Hz
220 Hz
Perfect 4th(4/3)
Perfect 5th(3/2)
88 Hz (near miss)
Tritone(45/32)
Figure 4-1. Line-spectra of three consonant and one dissonant (Tritone) complex-tone harmonicintervals based at 440 Hz. The ratios of the fundamental frequencies of each tone in the interval aregiven under the interval name. Each tone in a complex tone pair contains six iso-level harmonics.Gray bars indicate overlapping harmonics from the lower (black bars) and upper (white bars) tones.For consonant Unison, Perfect 4th and 5th intervals, all harmonics from both tones fall on integermultiples (vertical dashed lines) of the interval’s fundamental base frequency (indicated above eachplot). For the dissonant Tritone interval, the exact fundamental bass frequency is 13.75 Hz, butthere is a near miss at 88 Hz.
87
4.2 Method
We recorded from single IC neurons in Dial-anesthetized cats in response to diotic
and dichotic presentation of complex tone pair stimuli. The methods were identical
to those in the companion paper (McKinney, Tramo, and Delgutte, 2001a) except
where noted below.
4.2.1 Experiment
For each neuron isolated, we measured the threshold tuning curve, response to tones
at characteristic frequency (CF), and the modulation transfer function (MTF) for a
pure-tone carrier at CF as described previously. We used the responses to tones at
CF to classify units as Onset, Sustained or Pauser, based on the shape their peri-
stimulus histograms (PSTH). In addition, in order to assess the neuron’s sensitivity to
interaural phase differences (IPD), we measured the response to a 2 Hz binaural beat
stimulus centered at the CF, (i.e., a tone 1 Hz below CF in the left ear and another
1 Hz above CF in the right ear). Next, responses to diotic and dichotic complex tone
pairs were measured at the following musical intervals: Unison, Minor 2nd, Perfect
4th, Tritone, Perfect 5th, Octave (see Fig. 1 in McKinney et al., 2001a). Each tone
in the pair was composed of 6 iso-level harmonics in cosine phase with a duration
of 500 msec. The tones were windowed (raised cosine) to give 5 msec rise and fall
times. Harmonic levels were typically 60 dB SPL, but ranged from 40 to 80 dB SPL.
Each tone pair was presented 30 times diotically or monaurally (whichever gave the
strongest response) as well as dichotically (base tone in ipsilateral ear, upper tone in
contralateral ear). The base tone fundamental frequency for each interval was 440
Hz.
4.2.2 Analysis
To determine sensitivity to IPD, synchronization indices were calculated from period
histograms locked to the beat frequency for the responses to binaural beat stimuli.
In order to avoid false classification due to artificially high synchronization indices of
88
weakly responding neurons we used the product of the average discharge rate and the
synchronization index, with a threshold of 6.5 spikes/sec, to characterize neurons as
either IPD-sensitive or IPD-insensitive.1.
Responses to both monaural/diotic and dichotic presentation of the consonant/dissonant
tone pair stimuli were summed across stimulus presentations to generate PSTHs with
1 msec binwidths. The histograms were smoothed with a 3 msec window and then,
for each musical interval type, the temporal standard deviation was calculated over
the sustained portion (20-480 msec after the onset) as a measure of rate fluctuation.
Pitch analyses were performed on responses to diotic/monaural and dichotic stim-
uli. Autocorrelation, or all-order interspike interval (ISI) histograms were generated
from the sustained portion of the responses (15-485 msec after the onset). The repre-
sentation of a particular stimulus frequency in a histogram was quantified by calculat-
ing the Peak-to-Background ratio (P/B) where the “Peak” is the mean number of ISIs
at integer multiples of the period (1/frequency) of interest and the “Background” is
the mean number of ISIs/bin overall. Neurons’ P/B ratios were compared with their
frequency following abilities as measured by their MTF. The MTF statistic used was
the upper cutoff frequency (Fco), measured as the highest modulation frequency for
which the response showed significant (Rayleigh test, α < 0.01) synchrony (provided
that synchrony was also significant at the next lowest modulation frequency in the
function).
As in the companion paper (McKinney, Tramo, and Delgutte, 2001a), we assess
the variability of most of our data and calculations by bootstrapping: We randomly
resample (with replacement) the data and recompute the statistic of interest and then
calculate the standard deviation of the statistic across resampled trials. (Efron and
Tibshirani, 1993). Unless noted otherwise, resampling was performed across neurons.
1If the response to a binaural beat at CF was not available we classified the IPD sensitivitybased on the response to the dichotic (equal temperament) Perfect 5th stimulus which producesa 1.5 Hz binaural beat at 1319 Hz and a 3 Hz binaural beat at 2638 Hz. In equal temperamenttuning, the Perfect 5th has a frequency ratio of 2.9966/2 rather than the Just 3/2. This causes the(nearly) coincident harmonics at 1320 Hz (see Fig 4-1 to be separated by 1.5 Hz. Classification ofIPD sensitivity was based on binaural beats at CF for 14 neurons and the dichotic Perfect 5th for2 neurons. The measures gave similar results for those neurons for which we had both.
89
4.2.3 Model
CochlearTuning
Rectification/Compression
SynchronyRoll-off
IC MTF
LeftEar
CochlearTuning
Rectification/Compression
SynchronyRoll-offRight
Ear
Figure 4-2. A simple model, used to predict temporal patterns of IC neural responses, consistsof a peripheral component for each ear followed by a binaural central component. Each peripheralcomponent incorporates cochlear tuning via a GammaTone filter (Darling, 1991), instantaneouscompression and rectification, and a 4th-order low-pass filter with a cutoff frequency of 700 Hz tomimic synchrony roll-off. The binaural component includes a binaural crosscorrelator (instantaneousproduct) followed by bandpass filter whose parameters are fit to match the neural MTF of individualneurons.
We used a simple binaural coincidence model to predict the ability of IPD sensitive
IC neurons to follow the temporal envelope of our tone-pair stimuli. The model, shown
in Fig. 4-2, includes a peripheral processor for each ear and a central, binaural model.
The peripheral model incorporates: 1) cochlear tuning (GammaTone filter, Darling
1991); 2) half-wave rectification and instantaneous compression (y = x2/(x2+thrsh2),
where thrsh was the compression threshold) to simulate cochlear nonlinearities; and
3) a 4th-order low-pass filter with 700 Hz cutoff frequency to mimic the roll-off of
90
phase locking. The central model includes a binaural crosscorrelator (implemented
by a multiplication of peripheral model outputs) followed by a bandpass IC MTF
filter (4th-order Butterworth) whose parameters are fit to match the neural MTF in
individual neurons. The central processor might also include a delay on one of the
binaural inputs to simulate the best interaural delay (ITD) of an IC neuron. However,
with regard to our stimuli, it was deemed that this delay would only slightly shift the
phase of the beating response and not effect it in other ways and so it was left out.
4.3 Results
Dichotic analyses were performed on data recorded from 16 neurons in 7 cats using
both diotic/monaural and dichotic complex-tone stimuli. 8 neurons were classified as
IPD sensitive using binaural beats. CFs ranged from 440 to 5090 Hz but do not reflect
uniform sampling in the IC as we targeted low-frequency neurons that would respond
to our stimuli. Monaural/diotic pitch analyses were performed on data from 88
neurons reported in the companion paper (McKinney, Tramo, and Delgutte, 2001a).
4.3.1 Dichotic tone pairs
Figure 4-3 shows responses of two neurons to diotically (B and E) and dichotically (C
and F) presented complex-tone pairs. Also shown are responses to the binaural beat
stimulus centered at CF (A and D). The binaural beat stimulus is 1 second in duration
with a 2-Hz beat, so sensitivity to interaural phase differences (IPD) is indicated by
the presence of two peaks in the PST histograms. The bimodal histogram shown in
panel A indicates that the neuron is sensitive to IPD. In contrast, the response in
panel B shows adaptation over the 1-second binaural beat stimulus but reveals no
sensitivity to IPD. The neuron was binaural, however, in that its response to diotic
stimulation was greater than the response to monaural stimulation of either ear (not
shown). The responses of both neurons to diotic tone pairs (B and D) are similar to
those described in the companion paper (McKinney, Tramo, and Delgutte, 2001a):
dissonant intervals elicit beating responses while consonant intervals do not. One
91
exception is the response of the IPD-sensitive neuron to the dissonant diotic Tritone
(B), which shows little beating. This is most likely due to the neuron having a low
CF so that its receptive field does not include the relatively higher beating partials
of the Tritone stimulus (see Sec. 4.3.1). Panels C and F in Fig. 4-3 show responses to
dichotically presented intervals. The responses of the IPD-sensitive neuron to dichotic
stimuli (C) are similar to its responses to diotic stimuli (B), which show beating for
the Minor 2nd stimulus but not the Tritone. The IPD-insensitive neuron, on the
other hand, does not show beating in response to any dichotic interval (F), although
its responses to diotic stimuli (E) show beating for the Minor 2nd and Tritone stimuli.
0 0.5 1 0 200 400Uni
m2
4th
Tri
5th
Oct
Diotic ComplexTone Pairs
0 200 400
Dichotic ComplexTone Pairs
0 0.5 1
Dis
char
ge
Rat
e
0 200 400Uni
m2
4th
Tri
5th
Oct
Peri-stimulus time (msec)0 200 400
Dis
char
ge
Rat
e
Neuronsensitive
to IPD
Responseto 2-Hz
binauralbeat at CF
Neuron notsensitive
to IPD
A B C
D E F
Figure 4-3. Top panels: Responses of an IPD-sensitive neuron to a 2-Hz binaural beat centeredat CF (A) and diotic (B) and dichotic (C) presentation of complex-tone pairs at specified intervals.Neural activity is shown over the duration of every stimulus presentation (30 presentations perstimulus). The horizontal black line below each panel indicates stimulus on-time. CF = 440 Hz.Bottom panels: same for an IPD-insensitive neuron with CF = 3840 Hz.
Figure 4-4 shows summary mean rate fluctuations for all neurons from which we
recorded responses to both dichotic and diotic/monaural stimuli. Neurons sensitive
92
to IPD respond with greater rate fluctuations to the Minor 2nd than to consonant
stimuli for both diotic and dichotic presentation (A and B). Those neurons also show
greater fluctuations in response to the diotic Tritone but their response to the dichotic
Tritone is not significantly different from that to the consonant Perfect 4th and 5th
stimuli. In contrast, IPD-insensitive neurons show little increase in rate fluctuations
in response to any dichotic interval (D) even though they do show large increased
rate fluctuations in response to the diotic Minor 2nd and Tritone (C).
If rate fluctuations of IC neural responses do code musical dissonance, these results
raise the possibility that there may exist a form of dichotic dissonance for some musical
intervals. Alternatively, it is possible that only neurons not sensitive to IPD code for
roughness and dissonance.
CF dependence of responses to dichotic stimuli
We showed previously that the beat rate of the neural response to dissonant complex-
tone pairs depends on the neuron’s CF (McKinney, Tramo, and Delgutte, 2001a): the
response is dominated by the beat frequency of the partial pair closest to CF. Figure 4-
5 shows line spectra for the complex Minor-2nd and Tritone intervals and indicates,
for each stimulus, which pair of partials interact to give the low beat frequencies
(in the roughness range). Figure 4-6 shows the response to these stimuli from two
neurons with CFs near the beating partials. The beat frequency and period (black
bars) of the partial pair closest to the CF are indicated in the upper left corner of
the panels. For the diotic stimuli (A, C, E, and G), if the CF is close to a pair of
proximal partials, the neural response reflects the beat frequency of those partials.
An exception is the low-CF neuron responding to the Tritone (E), where the beat
frequency may be too large (182 Hz) for the IC neuron to follow.
In response to dichotic stimuli, the responses of the two neurons in Fig. 4-6(B, D,
F, and H) show beating only to pairs of low-frequency partials. In this case, beating
is only seen in response to the lowest (B and D) and possibly second lowest (D) pair of
partials in the Minor 2nd. A possible explanation for this is the roll off of synchrony
at high frequencies in the auditory periphery and central nervous system (CNS) prior
93
Uni m2 4th Tri 5thOct0
10
20
n = 8
Uni m2 4th Tri 5thOct0
10
20
n = 8
Uni m2 4th Tri 5thOct0
20
40
n = 8
Uni m2 4th Tri 5thOct0
20
40
n = 8
Rat
e F
luct
uat
ion
s(sp
/sec
)Diotic/Monaural
Complex Tone PairsDichotic
Complex Tone PairsA B
C D
Neuronssensitive
to IPD
Neurons notsensitive
to IPD
Figure 4-4. Top panels: Mean rate fluctuations across all IPD-sensitive neurons in response todiotic (A) and dichotic (B) presentation of complex-tone pairs at specified intervals. Rate fluctu-ations were calculated as the temporal standard deviation of the rate. Bottom panels: Same forIPD-insensitive neurons. Error bars are estimated standard errors of the mean and include intra-and across-neuron variances (assumed to be orthogonal).
94
440 1320 2200 3080 3960
MinorSecond
440 1320 2200 3080 3960
Tritone
Frequency (Hz)
∆ƒ (Hz):A
B
26 52 78
∆ƒ (Hz): 182 76 106
Figure 4-5. Line spectra of the complex-tone Minor 2nd and Tritone stimuli. Arrows mark thepartials that give rise to the low beat frequencies (in the roughness range).
95
to the site of binaural interaction. For a central neuron to beat in response to a
dichotic pair of partials, the monaural inputs from each ear must phase-lock to the
partial in that ear at the stage where binaural interaction takes place. This can
only happen at low frequencies, where the response of the auditory nerve (AN) and
cochlear nucleus (CN) is strongly phase-locked. Thus, the lack of beating in response
to the dichotic Tritone (H) may be due to poor monaural phase-locking to the relevant
partials (1245 Hz and 1320 Hz, see Fig. 4-5) at the input to the binaural processor.
The CF is also related to the IPD sensitivity for IC neurons. We found a significant
difference (α < 0.05), in our sample of neurons, between the CFs of IPD-sensitive
(µ = 840, σ = 505 Hz) and IPD-insensitive (µ = 2240, σ = 1555 Hz) neurons. This is
consistent with the finding of Kuwada and Yin (1983) who found that IPD-sensitive
neurons in the IC of anesthetized cat all had CFs less than ∼ 3000 Hz. The highest
CF in our population of IPD-sensitive neurons was 1520 Hz.
4.3.2 Pitch analysis
In this section, we look for a representation of the fundamental bass frequency (FFB,
see Fig. 4-1) in all-order ISI histograms of neural responses to three consonant (Uni-
son, Perfect 4th and 5th) and one dissonant (Tritone) complex-tone pairs.
Diotic pitch
Figure 4-7 shows all-order ISI histograms of responses from two neurons with different
PSTH types (Sustained and Pauser, see Section 4.2) to the tone pairs. In each plot,
vertical dashed lines indicate integer multiples of the fundamental bass period (except
for the Tritone, see below), and the peak-to-background ratio (P/Bfreq) is given to
quantify how well the fundamental bass periodicity is represented in the histogram.
The P/B ratio is the mean number of ISIs at integer multiples of the fundamental
period divided by the mean of all ISIs/bin. A P/B ratio greater than 1.0 indicates
that the fundamental bass periodicity is represented in the interval distribution. The
histograms show that FFB, as defined in Fig. 4-1, is well represented for the consonant
96
0 100 200 0 100 200
0 100 200 0 100 200
0
100
200
300
0 100 200 0 100 200
0 100 200 0 100 2000
100
200
300
1335 Hz CF450 Hz CF
Peri-stimulus time (msec)D
isch
arg
e ra
te (
spik
es/s
ec)
Diotic Dichotic
Minor2nd
Tritone
Diotic Dichotic
A B C D
E F G H
26 Hz 26 Hz 78 Hz 78 Hz
182 Hz 182 Hz 76 Hz 76 Hz
Figure 4-6. Top panels: Responses to diotic (A,C) and dichotic (B,D) presentation of the complex-tone Minor second stimulus for a neuron with a 440 Hz CF (A,B) and another with a 1350 Hz CF(C,D). Bottom panels: Same but for the Tritone stimulus. The beat frequency and period (blackbars) of the partial pair closest to CF is indicated in each panel.
97
tone-pairs in at least one of the neurons’ responses: Unison (A, 440 Hz), Perfect 4th
(C, 147 Hz), Perfect 5th (H, 220 Hz). The responses to the Tritone, on the other
hand, do not show a representation of its FFB (13.5 Hz or the near miss at 88 Hz)
but are clearly dominated by periodicities of beating harmonics: (E) 76 Hz is the
beat frequency of harmonics closest partial-pair to the 1170-Hz CF of the Sustained
neuron; (D) 106 Hz is the beat frequency of the harmonics closest partial-pair to the
1515-Hz CF of the Pauser neuron (see Fig4-5B). In these cases, the P/B ratios were
calculated for the most dominant periodicity in the histogram. A general trend for
both neurons is that low frequencies are well represented (high P/B ratios) and higher
frequencies are less well represented in their all-order ISI distributions. In addition,
the relative ranking of consonance (Unison most consonant, Tritone least consonant)
is not correlated with the P/B ratios of histograms from single neurons.
To examine the representation of FFB in a population of IC neurons we pooled
all-order ISI histograms of responses to the Unison (A-C), Perfect 4th (D-F), Tritone
(G-I) and Perfect 5th (J-L) complex-tone pairs. Histograms from neurons with differ-
ent PSTH types were pooled separately and are shown in three columns in Fig. 4-8:
Onset neurons (left), Sustained (middle) and Pauser (right). In the histograms of
panels D-F, the prominent peaks at integer multiples of fundamental period indicate
that the fundamental bass frequency of the Perfect 4th, (147 Hz) is well represented
in interspike intervals, especially in the responses of Sustained (E) and Onset (D)
neurons. The histograms of responses to the Perfect 5th (J-L) show a weaker repre-
sentation of its fundamental bass frequency (220 Hz) but there is no obvious represen-
tation of the 440 Hz fundamental in the responses to Unison (A-C). For the Tritone
(G-I), the dominating periodicity is the 76-Hz beat frequency of the partial pair near
1300 Hz (see Fig. 4-5B). However, the modes 2 and 3 of the histograms appear slightly
skewed towards each other, most likely influenced by the 106-Hz periodicity of the
partial pair near 1760 Hz. See the single-neuron data in Fig. 4-7E-F for a comparison
of two histograms dominated by these different beat frequencies.
In order to assess the significance and variability of the P/B ratios from pooled his-
tograms we calculated bootstrap estimates of their medians and interquartile ranges
98
0 10 20 30 400
5
10
Sustained Neuron
P/B440
= 1.4
0 10 20 30 400
5
10 P/B147
= 1.8
0 10 20 30 400
10
20
30
Nu
mb
er o
f In
terv
als
P/B76
= 2.5
0 10 20 30 400
5
10 P/B220
= 1.2
0 10 20 30 400
5
10
15
20
Pauser Neuron
P/B440
= 1
0 10 20 30 400
5
10
15
20 P/B147
= 1.2
0 10 20 30 400
5
10
15
20 P/B106
= 1.6
0 10 20 30 400
5
10
15
20
Interspike Interval (msec)
P/B220
= 1.2
A B
C D
E F
G H
Unison
Perfect4th
Tritone
Perfect5th
Figure 4-7. All-order ISI histograms from a Sustained (left) and a Pauser neuron in response toUnison, Perfect 4th, Tritone and Perfect 5th complex-tone pairs. Vertical dotted lines in each plotdenote integer multiples of the fundamental base period for responses to the Unison (440 Hz), Perfect4th (146 Hz) and Perfect 5th (220 Hz) tone pairs. For the Tritone (E-F), vertical dashed lines markthe beat frequency of the partial pair closest to the neuron’s CF (1170 Hz for Sustained, 1515 Hz forPauser) which dominates the response. Representation of the corresponding periodicities in eachhistogram are quantified and displayed as the ratio P/Bfreq (see text). Binwidths are 100 µsec andhistograms were smoothed with a 300 µsec rectangular window. Note the different vertical scales onthe plots. Stimuli levels were all 40 dB SPL.
99
0 10 20 30 400
5
10
Onset Neurons
P/B440
= 1.7
0 10 20 30 400
20
40
Sustained Neurons
P/B440
= 1.1*
0 10 20 30 400
20
40
60
80
Pauser Neurons
P/B440
= 1
0 10 20 30 400
5
10
Nu
mb
er o
f In
terv
als
P/B147
= 5.6*
0 10 20 30 400
20
40
P/B147
= 2.1*
0 10 20 30 400
20
40
60
80 P/B147
= 1.2*
0 10 20 30 400
20
40
P/B76
= 2.5*
0 10 20 30 400
20
40
P/B76
= 1.6*
0 10 20 30 400
20
40
60
80 P/B76
= 1.5*
0 10 20 30 400
5
10 P/B220
= 3.9*
0 10 20 30 400
20
40
Interspike Interval (msec)
P/B220
= 1.3*
0 10 20 30 400
20
40
60
80 P/B220
= 1.1*
(N = 45) (N = 17) (N = 44)A B C
D E F
G H I
J K L
Unison
Perfect4th
Tritone
Perfect5th
Figure 4-8. Pooled all-order ISI histograms from Onset (N = 45), Sustained (N = 17) and Pauser(N = 44) neurons in response to Unison, Perfect 4th, Tritone and Perfect 5th complex-tone intervals.All panels are similar in format to those in Fig. 4-7 except that, here, P/Bfreq ratios for responsesto the Tritone were all calculated based on a periodicity of 76 Hz. ∗ denotes that a P/B ratio issignificantly greater than 1.0 (α < 0.05) based on bootstrap estimates of the mean. Binwidths are100 µsec and histograms were smoothed with a 300 µsec rectangular window. Note the differentvertical scales on the plots.
100
(25th to 75th percentile). The results show that all P/B ratios, with the exception
of those from Onset (Fig. 4-8A) and Pauser (Fig. 4-8C) responses to Unison, are
significantly greater than 1.0 (α < 0.05). Figure 4-9 shows the estimated median
and interquartile ranges for P/Bfreq as a function of frequency for all the histograms
from Fig. 4-8. The general trend for all neuron types is that P/B ratios decrease
with increasing frequency, consistent with the fall off of synchrony in the auditory
system. A deviation from this trend are the relatively low P/B76 ratios from the Tri-
tone responses. Their low value is most likely due to the fact that different neurons
respond to different beat frequencies (see Fig. 4-7. Figure 4-9 also shows that Onset
neurons provide the largest P/B ratios of all neuron types, but it is important to note
that these neurons respond weakly during the sustained portion of the stimulus and
consequently provide only a small number of intervals on which to base calculations.
77 147 220 4401
2
4
6
8
10
P/B
FF
B
FFB
(Hz)
OnsetSustainedPauser
(Tri) (P4) (Uni)(P5)
Figure 4-9. Bootstrap estimates of the median and interquartile range (25th to 75th percentile) ofP/Bfreq ratios from the histograms in Fig. 4-8. Values for Onset and Pauser neurons are offset onthe frequency axis for clarity. Estimates are based on 1000 randomly sampled trials.
Overall, our measure of pitch salience, P/B ratio, does not seem to correlate
well with the consonance of intervals shown in Fig. 4-8, of which Unison is the most
consonant, followed by the Perfect 5th, Perfect 4th and finally the Tritone. A stronger
effect here appears to be the overall decay of synchrony with increasing fundamental
101
bass frequency. So at least as far as all-order ISI histograms represent the pitch, they
do not seem to provide a good correlate of pitch salience or consonance for intervals
based at 440 Hz (and most likely higher frequencies). There may, however, exist
other neural representations of pitch that better correspond to pitch salience and
consonance.
Relation of pitch representation to the MTF
In Fig. 4-10 we compare two different measures of periodicity representation in neural
responses: 1) the highest modulation frequency for which significant synchrony to
sinusoidal amplitude modulation (SAM) is seen (Fco), and 2) the P/B ratio from all-
order ISI histograms of the responses. Panels A and D show, for individual neurons,
the P/B ratio as a function of Fco for the Perfect 4th (FFB = 147 Hz) and Perfect
5th (FFB = 220 Hz) complex-tone stimuli. In each plot, the vertical dotted line
indicates the fundamental bass frequency of that stimulus and the horizontal dashed
line marks a P/B ratio of 1.0. The data show that the P/B ratios are centered around
1.0 for Fco < FFB, while all but one P/B ratio is greater than 1.0 for Fco > FFB.
To examine the significance of this trend, we divided the neurons into two groups,
Low-Fco (Fco < FFB) and High-Fco (Fco > FFB), and generated pooled all-order
ISI histograms for each group, shown in plots B-C and E-F. For both the Perfect
4th and 5th, the histogram for High-Fco neurons exhibits more pronounced peaks
at integer multiples of the fundamental bass period. Bootstrapping across neurons
revealed a significant difference (α < 0.01) between the P/B ratios from the Low-Fco
and High-Fco pooled histograms for both the Perfect 4th and 5th. These results show
that the upper cutoff frequency Fco of a neuron’s MTF may be a reliable predictor of
its ability to phase lock to the fundamental bass frequency of complex-tone stimuli.
Dichotic pitch
While perceptually weaker than the diotic case, dichotically presented harmonics also
elicit the perception of a fundamental bass frequency (Houtsma and Goldstein, 1972;
van den Brink, 1974; Houtsma, 1984), so we examined the responses to dichotically
102
10 100 10000.1
1
10
P/B
147
0
10
20
30
40
50
Nu
mb
er o
f In
terv
als
P/B147
= 1.2 P/B147
= 2.1
10 100 10000.1
1
10
P/B
220
0 20 400
10
20
30
40
50
Interspike Interval (msec)
P/B220
= 1.1
0 20 40
P/B220
= 1.3
Low-FcoNeurons
Fc (Hz)
Perfect4th
Perfect5th
Low-Fco
Low-Fco
High-Fco
High-Fco
High-FcoNeuronsA B C
D E F
FFB = 147 Hz
FFB = 220 Hz
Figure 4-10. A,D: P/Bfreq ratios for the Perfect 4th and Perfect 5th complex-tone stimuli areplotted for individual neurons vs. the MTF corner frequency (Fc, see text). Vertical dashed linesmark the fundamental base frequency for each stimulus (146 and 220 Hz, respectively). Neuronswhose Fc is greater than the fundamental base frequency are denoted as “High-Fc” neurons, othersas “Low-Fc” neurons. B-C, E-F: Pooled all-order ISI histograms of Low- and High-Fc neurons.Histograms are in the same form as those in Fig. 4-8. Bootstrapping results show that for boththe Perfect 4th and 5th, P/Bfreq ratios from histograms based on the two populations (Low- andHigh-Fc neurons) are significantly different at the 0.01 level.
103
presented stimuli for evidence of FFB. Figure 4-11 shows all-order ISI histograms of
responses from an IPD-sensitive neuron to diotic (left) and (dichotic) presentation of
complex-tone pairs. Responses to both diotic and dichotic presentation of the Perfect
4th and 5th tone pairs (C-F) show distinct representation of FFB and produce large
P/B ratios. Responses to Unison were weak in both cases and responses to the Tritone
show some phaselocking but in an inharmonic pattern. This particular neuron showed
the strongest representation of dichotic pitch of all neurons in our sample.
Figure 4-12 shows pooled all-order ISI histograms for IPD-sensitive (A-D) and
IPD-insensitive (E-H) neurons for diotic (left) and dichotic (right) presentation of
the Perfect 4th and 5th complex-tone pairs. The ISI histograms from IPD-sensitive
neurons show peaks at integer multiples of the fundamental bass period for both diotic
and dichotic stimuli, although the peaks are smaller in the dichotic case, especially
for the Perfect 5th stimulus. The P/B ratios based on pooled histograms from IPD
sensitive neurons (A-D) are all significantly (α < 0.05) greater than 1.0 except for
the dichotic Perfect 5th stimulus (D). For IPD-insensitive neurons, only responses to
diotic stimuli show evidence of FFB.
Thus, the fundamental pitch of dichotically presented tones is represented in all-
order ISI histograms of IPD-sensitive IC neural responses. In addition, the relative
pitch salience (Houtsma, 1984) of diotically- (more salient) and dichotically-presented
intervals correlates with P/B ratios from the histograms.
4.4 Model Results
A simple binaural coincidence model was used to predict temporal discharge pat-
terns of IC neurons to diotic and dichotic stimuli (see Section 4.2.3). The general
form of this model has been used to predict the response phase of IC neurons for
binaural stimuli based on the responses to monaural stimuli for each ear (Kuwada
et al., 1984) and also to predict IC neural responses to binaural beats of mistuned
consonances (Yin, Chan, and Carney, 1987).
Figure 4-13 shows responses of the model to the Minor 2nd and Tritone stimuli,
104
0 10 20 30 400
1
2
3
4
5
DioticP/B
440 = 2.3
0 10 20 30 400
5
10
15
20 P/B147
= 6.8
0 10 20 30 400
5
10
15
20
Nu
mb
er o
f In
terv
als
P/B76
= 0.4
0 10 20 30 400
1
2
3
4
5 P/B220
= 4.9
0 10 20 30 400
1
2
3
4
5
DichoticP/B
440 = 1
0 10 20 30 400
1
2
3
4
5 P/B147
= 5.3
0 10 20 30 400
1
2
3
4
5 P/B76
= 0.6
0 10 20 30 400
1
2
3
4
5
Interspike Interval (msec)
P/B220
= 6.4
A B
C D
E F
G H
Unison
Perfect4th
Tritone
Perfect5th
Figure 4-11. All-order ISI histograms from a single IPD-sensitive neuron in response to diotic(left) and dichotic (right) presentation of the indicated complex-tone pairs. Histograms are in thesame format as in Fig. 4-8. Neuron CF = 300 Hz, stimulus level = 60 dB SPL.
105
0
10
20
30
# o
f In
terv
als
Diotic/Monaural
P/B147
= 2.4*
0
10
20
30
# o
f In
terv
als
P/B220
= 1.2*
0
5
10
15
20
# o
f In
terv
als
P/B147
= 1.2
0 10 20 30 400
5
10
15
20
# o
f In
terv
als
Interspike interval (msec)
P/B220
= 1.1*
0
5
10
Dichotic
P/B147
= 1.5*
0
5
10 P/B220
= 1.1
0
5
10
15
20 P/B147
= 1.1
0 10 20 30 400
5
10
15
20 P/B220
= 1
Perfect 4th
Perfect 5th
Perfect 4th
Perfect 5th
IPD-sensitiveneurons(N = 8)
IPD-insensitive
neurons(N = 8)
A B
C D
E F
G H
Figure 4-12. Pooled all-order ISI histograms in response to diotic/monaural (left) and dichotic(right) presentation of the Perfect 4th and 5th stimuli. A-D: Responses from IPD sensitive neurons.E-H: Responses from IPD-insensitive neurons. Histograms are in the same form as those in Fig. 4-8.∗ denotes that a P/B ratio is significantly greater than 1.0 (α < 0.05) based on bootstrap estimatesof the mean. Note the different vertical scales on the plots.
106
for model neurons whose CF and MTF match those of the neurons from Fig. 4-6. The
model qualitatively predicts the main beat frequencies in the physiological responses.
For the diotic stimuli, the model response fluctuates at the beat frequency (when low
enough) of the partial-pair closest to CF. For the dichotic stimuli, the model responses
show beats for the Minor 2nd but not the Tritone, consistent with the data. The lack
of beats in the model response to the Tritone is due to the poor phase locking of the
peripheral model outputs for the 1245 Hz and 1320 Hz partials. The model does not
predict all aspects of the responses well, e.g., the onset portion of the response in
panels D and H, but does a good job at predicting the primary beat frequencies in
the responses.
1335 Hz CF450 Hz CF
Peri-stimulus time (msec)
No
rmal
ized
mo
del
res
po
nse
Diotic Dichotic
Minor2nd
Tritone
Diotic Dichotic
0 100 200 0 100 200
0 100 200 0 100 200
0
0.5
1
0 100 200 0 100 200
0 100 200 0 100 2000
0.5
1
A B C D
E F G H
26 Hz 26 Hz78 Hz
78 Hz
182 Hz 182 Hz 76 Hz76 Hz
Figure 4-13. Model responses to diotic and dichotic presentation of the Minor 2nd and Tritonestimuli. Same format as Fig. 4-6.
107
4.5 Discussion
We have shown that neurons sensitive to IPD beat in response to dichotically-presented
dissonant tone pairs that have low-frequency beating harmonics and neurons not sen-
sitive to IPD do not beat in response to these stimuli. Individual IPD-sensitive
neurons’ responses are dominated by the beat rate of (dichotic) partials closest to
CF provided that the partials are at low frequencies. In the dichotic case, beating is
a central, neural interaction which requires the fine structure of the stimulus to be
represented in the neural signal from each ear at the input to the stage of binaural
interaction and thus can only occur in response to beating partials at frequencies
below the limit of phase-locking. In contrast, for monaural/diotic stimuli, beating
from neighboring partials is an acoustic interaction which can be seen in the stimulus
and occurs in neural responses regardless of the absolute frequency of the individual
partials. For our stimulus set, population responses of IPD-sensitive IC neurons to
dichotic stimuli were similar to the responses of all IC neurons to diotic presentation
with one exception: the diotic Tritone elicits beating whereas the dichotic Tritone
does not. This is likely due to the fact that the beating in the Tritone stimulus
comes from relatively high-frequency partials (> 1200 Hz) for which phaselocking is
relatively weak.
Because dichotically-presented tones are thought not to produce a roughness sen-
sation (Roederer, 1979), our current findings raise issue with our hypothesis that
sensory dissonance (roughness) is coded in responses of IC neurons. Two possible
resolutions of this “dichotic quandary” are 1) roughness is mediated only through
the responses of neurons insensitive to IPD and 2) there exists a form of dichotic
roughness. The plausibility of each of these ideas is discussed below.
We have also shown that a simple binaural coincidence model qualitatively predicts
responses of IC neurons to both diotic and dichotic musical intervals.
Finally, we showed that population all-order ISI histograms from IC neurons reflect
the fundamental bass frequency of some consonant diotic and dichotic tone pairs.
However, for diotic stimuli, there appears to be a sharp cutoff frequency between 220
108
and 440 Hz, above which little representation can be seen in the histograms. For
dichotic tone pairs, the cutoff appears even lower (< 220 Hz). Our proposed neural
correlate of pitch salience for the fundamental bass frequency, P/Bfreq, is correlated
with Fco, the maximum modulation frequency at which a neuron shows significant
synchrony. This shows a consistency between the two measures of synchrony. In
general, for the frequency range of our stimulus set, the degradation of phaselocking
at high frequencies dominates the responses such that the consonance of a particular
interval does not correlate with the salience of its fundamental bass pitch, at least as
far as pitch is represented in the form of population all-order ISI interval distributions.
Instead, the measure of salience correlates better with the absolute frequency of the
fundamental.
4.5.1 Neurophysiology
Roughness
Our results are broadly consistent with previous studies on the sensitivity of IC neu-
rons to interaural phase. Kuwada and Yin (1983) found that about 80% of low-
frequency neurons (CF<∼3000 Hz) are IPD sensitive, while neurons with high CFs are
not IPD sensitive for pure tones. For our small sample of neurons, there was an even
split (8 IPD-sensitive, 8 IPD-insensitive), but some of the CFs of the IPD-insensitive
neurons were higher than 3000 Hz. Sensitivity to IPD has been shown to occur for
pure-tone frequencies up to 3000 Hz in anesthetized cat (Kuwada and Yin, 1983)
and 2150 Hz in unanesthetized rabbit (Kuwada, Stanford, and Batra, 1987) but most
neurons only show IPD sensitivity for frequencies less than 1500-2000 Hz. Yin and
Kuwada (1983) showed that some IC neurons phaselock to binaural beat frequencies
of up to 80 Hz. Our dichotic Tritone stimulus, which did not elicit much phaselock-
ing, has frequency characteristics close to both of these limits: the partials that would
beat are at frequencies greater than 1200 Hz and the binaural beat frequencies would
be greater than 75 Hz (see Fig. 4-5).
Kuwada, Batra, and Stanford (1989) showed that the anesthetic sodium pentobar-
109
bital can effect IC neurons’ response rates, response latencies, response patterns, and
spontaneous activity. Their findings are consistent with the idea that anesthesia elic-
its greater overall inhibition. While this may effect the forms of PSTHs, other aspects
of response including best (∼ 87 Hz) and highest (∼ 250 Hz) temporal envelope mod-
ulation frequency which elicit significant synchrony are similar in anesthetized (Rees
and Møller, 1983; Rees and Palmer, 1989) and unanesthetized (Kuwada, Batra, and
Stanford, 1989) preparations. This provides support for the idea that our findings,
regarding discharge rate fluctuations in response to temporal envelope modulations,
may be generalizable to the unanesthetized case.
One possible resolution to our “dichotic quandary” is that IPD-insensitive neurons
alone code for roughness. While this is possible, Kuwada and Yin (1983) report
that IPD-insensitive neurons constitute only about 20% of low-frequency IC neurons.
Other reports (Semple and Aitkin, 1979) and our small sample population, which
include some higher frequency neurons consist of a larger percentage (∼ 50%) of IPD-
insensitive neurons, making this resolution more plausible. However, in humans, there
would be more low-frequency neurons and therefore fewer IPD-insensitive neurons.
Pitch
Dichotic pitch percepts have historically been interpreted as evidence for spectral
models of pitch (Houtsma and Goldstein, 1972; Bilsen and Goldstein, 1974). Never-
theless, the representation of dichotic pitches in temporal discharge patterns of central
neurons has been hypothesized (Greenberg, 1986) but not demonstrated so far. Here,
we have shown that correlates of dichotic pitch do indeed exist in temporal discharge
patterns of central auditory neurons. Greenberg’s hypothesis was based on the idea
that binaural interaction occurs in the form of a coincidence detector that receives
an input from each ear, much like the model we have used in this study.
A longstanding predicament for temporal models of pitch perception is the degra-
dation of phaselocking to the stimulus in neural responses at sequentially higher
levels of the auditory system. Phaselocking is seen in responses of single AN fibers
and some cochlear nucleus (CN) neurons for stimulus frequencies up to ∼ 5000 Hz
110
although synchrony begins to fall for frequencies greater than ∼ 1000 Hz (Johnson,
1980; Bourk, 1976). However, in single IC neurons, phaselocking is rarely seen for fre-
quencies greater than ∼ 600 Hz (Kuwada et al., 1984) and we saw very little even at
440 Hz. One possible resolution is that the temporal code may be converted to some
other neural code (eg. place code) prior to the IC, or that fine timing information
is preserved in a pathway that has yet to be rigorously studied, such as in the nu-
cleus of the lateral lemniscus. Another possibility is that neural responses of humans
may show better phaselocking at higher frequencies than in cat. This idea has been
examined through measurements of the scalp-recorded frequency following response
(FFR) in humans and cats, which is thought to represent synchronous responses of
neurons in the higher auditory brainstem, although its precise origin has not been
determined (Smith, Marsh, and Brown, 1975). Greenberg et al. (1987) measured
human scalp-recorded FFRs and showed a high correlation between the FFR and
the perceived pitch of a variety of complex-tone stimuli. They found that the FFR
is strong for fundamental frequencies below 500 Hz, but degrades at higher frequen-
cies to almost nothing at 1000 Hz. In cats, however, it has been shown that FFRs
are measurable up to frequencies of nearly 2500 Hz (Merzenich, Gardi, and Vivion,
1983). While this measure may be due to a better signal-to-noise ratio in cats than
in humans, the comparison of human and cat FFRs suggest that neural phaselocking
in humans is not substantially better at high frequencies than it is in cats.
Another difference between the auditory system of humans and cats is in the
distribution of neural best frequencies: humans have a slightly lower frequency range
of hearing and have greater neural representation of lower frequencies (Fay, 1988).
This fact may help the cause of temporal models for pitch despite the dearth of mid-
frequency phaselocking neurons found in the IC of the cat: overall there may be a
better representation of pitch in human neural ISIs simply because there is a larger
proportion of neurons tuned to the relevant frequency range.
While we did not find a good correlate of pitch salience in the population ISI
histograms of IC neural responses, consonance may still correlate with the pitch
salience of the fundamental bass frequency of musical intervals, albeit through a
111
different and yet unknown neural code.
It may appear that our investigation of pitch salience as a possible correlate of
consonance is clouded somewhat by our use of equal temperament tuning rather
than Just intonation. With equal temperament tuning, the ratios of fundamental
frequencies for the Perfect 4th and 5th are not exactly the simple ratios 4/3 and 3/2,
but instead are 4.0045/3 and 2.9966/2 respectively. These deviations may cause slight
temporal smearing of sharp modes in the autocorrelation functions of these stimuli.
However, the deviations are of similar size for both intervals and hence should effect
both histograms similarly. Over the 500 msec duration of the stimuli, the difference in
phase change for the fundamental of the upper tones in the dyads are only about 12%
for the 5th and 8% for the 4th. Therefore relative measures based on the histograms
should not be greatly effected.
4.5.2 Psychophysics and perception
Another possible resolution to our “dichotic quandary” is that roughness of dichot-
ically presented tones does, in fact, exist. Burns and Ward (1976) found that mu-
sicians could identify dichotic as well as diotic musical intervals consisting of two
low-frequency pure tones. They found that subjects’ performance of dichotic interval
identification, for base frequencies of 100 and 262 Hz and intervals near or less than
dichotic fusion thresholds, was equal to or better than the diotic case. At higher fre-
quencies (2000 and 3000 Hz) subjects performed worse for the dichotic intervals than
for the diotic. According to post-experiment discussions, subjects were able to use
“roughness” cues to distinguish between different dichotic intervals (Burns, 2001).
The perception of dichotic roughness may be related to the perception of binaural
beats and dichotic beats of mistuned consonances. Low-frequency binaural beats
are typically perceived as a low-frequency periodic changes in laterality when two
tones, separated by a small frequency difference are presented dichotically (Licklider,
Webster, and Hedlun, 1950; Perrott and Nelson, 1969). Perrot and Nelson (1969)
found that listeners can detect binaural beats for frequencies up to about 1500 Hz
and for frequency differences up to about 80 Hz. A weaker percept, dichotic beats
112
of mistuned consonances, is obtained from two tones presented dichotically whose
frequencies deviate slightly from a simple integer ratio (Feeney, 1997; Tobias, 1963;
Thurlow and Bernstein, 1957). Feeney (1997) showed that listeners could detect such
dichotic beats reliably for component frequencies less than 1000 Hz. Yin, Chan and
Carney (1987) have shown a neural correlate of these dichotic beats in responses of
single IC neurons that were shown to be sensitive to IPD.
For all of these dichotic percepts there is a similar maximum stimulus frequency
(∼ 1000−1500 Hz) under which they exist. This fact is consistent with the idea that
the precepts are based on neural temporal information from each ear and they are
limited at higher frequencies by the fall off of synchrony.
Certainly further psychoacoustic investigations need to be performed in order to
fully understand the notion of dichotic roughness and its relation to dissonance. As
noted above, there is currently some evidence to suggest that the percept may exist
although it is likely to be a weaker percept than its diotic counterpart and limited
to a smaller frequency region. This limitation has implications for the hypothetical
dichotic dissonance of musical intervals from different octave regions and of particular
intervals within the same octave region. Because the percept is likely to arise only
from the beating of low-frequency partials, dichotic intervals based at low frequencies
should generally sound more dissonant than those based at higher frequencies. Also,
dichotic dissonant intervals whose beating comes from the higher order partials (eg.,
Tritone) should sound relatively less dissonant than those whose beating comes from
low order partials (eg., Minor 2nd). A thorough study would examine these differences
over a broad frequency range.
4.6 Conclusion
Previously, we showed that the degree of rate fluctuations in IC neural responses is
correlated with the sensory dissonance of diotic/monaural stimuli. Here, we have
shown that binaural IC neurons sensitive to interaural phase can show beats in re-
sponse to dichotically-presented intervals, even though these stimuli are presumed not
113
to produce a roughness sensation (Roederer, 1979). Two possible resolutions of this
“dichotic quandary” are 1) only the phase-insensitive subset of IC neurons mediate
roughness, or 2) there exists an undiscovered form of dichotic dissonance. It is also
possible that roughness is coded in an entirely different manner.
Our general results for diotic and dichotic musical interval stimuli can be qualita-
tively predicted by a simple binaural coincidence model.
The results presented here illustrate the need for a more complete set of psychoa-
coustic data on the sensory dissonance of musical intervals, both diotic and dichotic.
Particular attention should be directed towards the effects the spectra and funda-
mental frequencies of the stimuli.
114
Chapter 5
Discussion
5.1 Summary of findings
5.1.1 Deviations in auditory-nerve interspike intervals lead
to a prediction of the octave enlargement effect
In Chapter 2 we showed that, in response to pure-tone stimuli, ISIs of AN fibers
deviate from integer multiples of the stimulus period. For low frequencies, (first-order)
ISIs tend to be shorter than the stimulus period and its multiples; for mid frequencies,
ISIs tend to be longer than the stimulus period and it’s multiples. Our analyses
showed that these two different types of ISI deviations stem from fundamentally
different mechanisms. The shortened intervals in response to low frequency tones are
due to multiple spikes occurring within a single stimulus period while the lengthened
intervals in response to mid frequency tones are likely due to refractory properties of
the nerve fibers.
We also showed that these ISI deviations lead to biases in temporally based esti-
mates of the stimulus frequency which, in turn, lead to an accurate prediction of the
octave enlargement effect if we are allowed to introduce a scaling factor of 2 when
making octave judgements. These findings are consistent with the idea that musical
pitch is encoded in ISI distributions of AN fibers.
Special efforts were made during this study to ensure accurate measurement of
115
AN ISI distributions. Our data will provide a precise testbed for rigorous testing of
AN models as well as for models of pitch and other perceptual phenomena based on
AN temporal activity.
5.1.2 Neural correlates of dissonance in responses of IC neu-
rons
We showed, in Chapter 3, that IC neurons respond with greater rate fluctuations
to dissonant musical intervals than to consonant intervals and that the frequency of
their fluctuations matches the beat rate of the stimulus partial-pair closest to the
neuron’s CF. Across all CFs, the average rate fluctuations increased as a function of
perceptual dissonance of the stimuli. This effect was robust across level and was more
pronounced in the responses of Onset neurons than in Sustained or Pauesr neurons.
Onset neurons also reflect the dissonance of the stimuli in their average discharge
rate. We also showed that IC neurons respond similarly to changes in dissonance in
the context of a musical passage.
The differences in responses of Onset, Sustained and Pauser neurons to the tone-
pair stimuli are paralleled by differences in the MTFs of the different unit types. We
found that MTFs from Onset neurons tend to be more sharply tuned, centered at
lower frequencies, and provide more gain at the BMF than those from Sustained or
Pauser neurons.
In Chapter 4 we examined responses of IC neurons to dichotically-presented mu-
sical intervals which are thought not to elicit a sensation of roughness. We found that
neurons sensitive to IPD show a beating response to some dissonant dichotic intervals
similar to that from diotic intervals, while neurons insensitive to IPD do not beat in
response to dichotic stimuli. Beating in response to dichotic stimuli differs from diotic
stimuli in that it requires the temporal fine structure of the stimulus to be present in
the neural response from each ear at the stage where binaural interaction takes place.
Consequently, due to the fall off of synchrony with frequency, neural beating is only
seen for dichotic stimuli that have pairs of low-frequency beating harmonics. As a re-
116
sult, dissonant tone pairs whose diotic roughness comes from high-frequency partials,
such as the Tritone, do not produce a beating response when presented dichotically.
Because IPD-sensitive neurons beat in response to some dichotic as well as diotic
dissonant tone pairs, and dichotic roughness is thought not to exist, it is clear that we
must reinterpret our conclusion, from Chapter 3, that sensory dissonance is encoded
in rate fluctuations of all IC neurons. Two possible resolutions are: 1) only those
neurons insensitive to IPDs mediate the perception of roughness (sensory dissonance);
or 2) there exists a form of dichotic roughness. It is also possible that roughness is
encoded some other form than rate fluctuations of IC neurons.
We also showed that population all-order ISI histograms from IC neurons reflect
the fundamental bass frequency of some consonant diotic and dichotic tone pairs. For
diotic stimuli, there appears to be a sharp cutoff frequency between 220 and 440 Hz,
above which little representation of fundamental bass can be seen in the histograms.
For dichotic tone pairs, the cutoff appears even lower (< 200 Hz). For the frequency
range of our stimulus set, the effect of synchrony roll-off dominates the responses
such that the consonance of a particular interval does not correlate with the relative
strength of the representation of the fundamental bass in the population all-order ISI
interval distributions. This does not preclude the notion that consonance is based on
the pitch salience of the fundamental bass, just that pitch salience may not be coded
in all-order ISI interval distributions at the level of the IC.
In addition, we showed in Chapter 4, that a simple binaural coincidence model
can predict the general temporal properties of IC neurons pertinent to the neural
coding of musical dissonance.
5.2 Limitations of the neurophysiological data
5.2.1 Effect of anesthesia
It is important to recognize the fact that neural activity in unanesthetized prepara-
tions differs from that in anesthetized preparations and, consequently, comparisons
117
of psychophysical data from unanesthetized subjects to physiological data from anes-
thetized preparations should be performed with caution. Anesthesia has been shown
to effect responses of IC neurons in a manner that is consistent with increased in-
hibition (Kuwada, Batra, and Stanford, 1989; Astl et al., 1996). While this may
effect the overall reponses rates and forms of PSTHs, other aspects of responses,
such as the best and highest modulation frequency to elicit significant synchrony are
similar in anesthetized (Rees and Møller, 1983; Rees and Palmer, 1989) and unanes-
thetized (Kuwada, Batra, and Stanford, 1989) preparations. Thus, our findings on
correlates of dissonance based on discharge rate fluctuations are likely to exist in the
unanesthetized case as well.
5.2.2 Small sample sizes
For a few aspects of our findings, we have made claims based on a relatively small
data sample size: we have only 9 measured responses to pure-tone pairs in Fig. 3-7;
and we have only 8 complete sets of measurements from IPD-sensitive and IPD-
insensitive neurons for our studies of responses to dichotic stimuli (Figs. 4-4 and
4-12). In the case of the pure-tone pairs, our data sample is limited by the fact that
our stimulus frequencies fell below the response areas of most neurons and, in the case
of the dichotic stimuli, there were simply more measurements required than could be
measured in the duration that we could hold most neurons. Despite these limitations
in sample size, our conclusions are supported by the fact that our basic results are
consistent with previous findings on interaural phase sensitivity of IC neurons (Yin,
Chan, and Carney, 1987; Kuwada et al., 1984; Yin and Kuwada, 1983) and on their
sensitivity to envelope modulation (Krishna and Semple, 2000; Rees and Møller,
1987; Rees and Møller, 1983). In addition, we have shown that IC neural responses
to our stimuli can be predicted by a simple binaural coincidence model (Section 4.4),
whose basic form has been shown to predict IC neural responses to other dichotic
stimuli (Yin, Chan, and Carney, 1987). Nevertheless, our findings could be further
bolstered by the collection of more data.
118
5.2.3 Limited frequency range
In Chapters 3 and 4 we have measured responses to only a limited set of stimuli
within a single-octave range of fundamental frequencies. Although we have not shown
physiological correlates of dissonance to exist in other octaves, there is reason to
believe that our findings are extendible. Previous studies of IC responses to amplitude
modulated tones have covered broader frequency ranges and have not reported any
significant decreases in responses for carrier frequencies outside the 440 to 880 Hz
range (Krishna and Semple, 2000; Rees and Møller, 1987; Rees and Møller, 1983).
There may, however, be differences in responses to like intervals in different octave
ranges because the corresponding beat frequencies are different (halved for the octave
below, doubled for the octave above). This will likely cause more intervals at lower
frequencies to elicit neural rate fluctuations because more partial pairs will have beat
frequencies in the IC neural response range. The reverse will be true for intervals in
higher octave ranges. More data should be collected to confirm this speculation but
there is a parallel perceptual phenomenon which is illustrated in musical practice: in
bass octave ranges most musical intervals sound dissonant so consequently, only the
most relatively consonant intervals are used.
5.3 Pitch
An assumption throughout this work has been that pitch is based on a neural repre-
sentation of the stimulus fundamental frequency. We have focused on how neural ISI
distributions represent the stimulus frequency and how they may be related to the
pitch percept but we should also discuss other neural representations of the stimulus,
namely rate/place and phase/place representations.
The tonotopic frequency response of the basilar membrane is reflected in the
array of AN fiber activity so that stimulus frequency can be estimated from the
discharge rate profile across the whole nerve. This form of representation occurs
for all stimulus frequencies to which the basilar membrane responds but it is highly
susceptible to saturation, especially in the presence of noise. Kim, et al. (1990) showed
119
that saturation becomes less of a problem if one looks only at low-spontaneous rate
fibers and operates on the fiber driven rate normalized by its standard deviation
rather than on its raw discharge rate. Rice, et al. (1995) examined the rate difference
(between stimulus and no-stimulus conditions) and showed that this representation
also performs better than raw discharge rate. Saturation is not a problem for pure
tones in quiet but it is when they are presented in noise (Siebert, 1970).
Another representation of stimulus frequency lies in the phase pattern or the
phase difference of the AN response. For low frequencies (<∼5 kHz) the AN response
is phase-locked to the stimulus and fibers that innervate the basilar membrane at
points separated by one spatial wavelength (or any integer multiple) fire at the same
phase. A coincidence detector, with inputs from two specific points on the basilar
membrane would be tuned to the spatial wavelength defined by the two points (and
its corresponding frequency). A network of such coincidence detectors could use
the phase of the response across the nerve to estimate frequency (Loeb, White, and
Merzenich, 1983; Shamma and Klein, 2000). One weakness of such models is that they
require interaction across the full CF range of the auditory system, a requirement for
which there is no physiological evidence. A similar coincidence detection mechanism,
operating on fibers close in CF, has been postulated as a basis for level and frequency
discrimination (Carney, 1994; Heinz, Carney, and Colburn, 1999). These models use
the fact that responses of fibers innervating closely neighboring portions of the basilar
membrane become more coincident as stimulus level increases.
A complete study of neural correlates of pitch effects would examine all three of
these neural representations of frequency. However, accurate characterization of the
rate/place or phase/place representations requires precise measurement of the spatial
(across characteristic frequency (CF)) distribution of AN activity. One could attempt
to perform a population study of single-unit recordings from a single animal but this
would almost certainly not give a clear-cut result for the prediction of pitch effects
as subtle as the octave enlargement effect. Another method to precisely examine
the spatial variation of AN activity is to locally sweep the stimulus frequency and
examine the changes in a single fiber (May and Huang, 1997; Cariani and Delgutte,
120
1996b). A transformation can then be made to estimate the response of nearby fibers,
assuming closely spaced fibers respond in a similar fashion. However, attempts by this
author using this method to show a rate/place correlate to the octave enlargement
effect have been inconclusive. This is not to say that these neural representations
of frequency would not demonstrate correlates of subtle pitch effects, it’s just that
they are exceedingly difficult to measure. All said, this does not take away from the
positive correlate of the octave enlargement effect that we see in ISI distributions.
An enduring issue for temporal models of pitch is the degradation of phaselocking
at progressively higher centers of the auditory system. If pitch is truly based on
ISI distributions from the auditory nerve, what happens to this interval code by
the time it reaches the IC? One possibility is that it is converted, prior to the IC,
into an alternative code, such as a rate/place representation. If this were the case,
however, one would expect to see much sharper tuning in individual IC neurons
than is generally seen. Alternatively, the interval code may still exist, albeit in a
much smaller subset of neurons than at lower-level nuclei. Phase-locking in single
IC neurons has been seen for frequencies as high as 1,200 Hz but in relatively few
neurons (Kuwada et al., 1984). This may still be enough to encode pitch information
because, as Siebert (1970) demonstrated, very few neurons are required to reliably
encode stimulus frequency in a temporal manner. This is clearly an area for further
investigation.
5.4 Consonance and dissonance
The percepts of consonance and dissonance are less obvious than that of pitch, which
makes difficult the task of interpreting the psychoacoustic data on dissonance and
quantitatively correlating it with neurophysiological responses. Investigators have
used a variety of terms to convey to subjects the meaning of consonance (pleasant-
ness, smoothness, purity, fusion, clearness) and dissonance (unpleasantness, rough-
ness, turbidity) (Kameoka and Kuriyagawa, 1969a; Kameoka and Kuriyagawa, 1969b;
Plomp and Levelt, 1965; Malmberg, 1917; Kaestner, 1909), while some studies have
121
attempted to correlate subjects’ ratings across the different criteria (van de Geer,
Levelt, and Plomp, 1962; Guernsey, 1928). Other investigators have avoided the use
of the terms consonance and dissonance altogether and instead asked subjects to rate
the tension of a particular chord, which is considered to be a functional effect of
dissonance in music (Pressnitzer et al., 2000; Bigand, Parncutt, and Lerdahl, 1996).
Associated with this relatively ambiguous description of the percept is an incom-
plete agreement on the exact rank order of dissonances for all 12 intervals within the
Western diatonic scale. An additional confounding factor is the use (across studies)
of stimuli with differing spectra, as relative component strength has been shown to
affect judgements of dissonance (Kameoka and Kuriyagawa, 1969b). As a result, we
have not tried to demonstrate, in this set of studies, a quantitative correlation be-
tween the psychoacoustic data and the physiological measures. However, despite the
differing methods and results across studies, there is a clear general agreement on the
most consonant (Unison, Octave, Perfect 5th, Perfect 4th) and dissonant (Minor 2nd,
Major 2nd) intervals, as well as on the complex-tone Tritone sounding more dissonant
than the Perfect 4th and Perfect 5th. The broad consensus of these psychophysical
relations, on which our studies are based, suggests that meaningful comparisons to
neurophysiological responses can be made.
5.5 Conclusions
Our findings illustrate the complexity and specificity of temporal neural processing at
multiple resolutions in the auditory periphery, brainstem and midbrain. In addition,
they show that musical percepts generally considered to be “high order”, such as the
dissonance of musical intervals, have direct neural correlates in low- and mid-level
nuclei of the auditory system.
122
5.6 Future Work
The general principle involved in finding a neural correlate of the octave enlarge-
ment effect in Chapter 2 could be applied to search for neural correlates of different
pitch effects using temporal as well as rate- and phase-place models. A reliable
model for pitch, based on its underlying neural code, should predict the pitch of all
stimuli under all conditions, including the pitch-intensity effect (Verschuure and van
Meeteren, 1975; Stevens, 1935; Fletcher, 1934), changes of pitch due to the presence
of noise (Stoll, 1985), post-stimulus pitch effects (Hall 3rd and Soderquist, 1982) and
dichotic pitches (Houtsma and Goldstein, 1972; Bilsen, 1977; Hartmann and McMil-
lon, 2001). A battery of tests such as these could help shed light on the neural code
for pitch in the brainstem. Many of the pitch effects listed above have been inves-
tigated psychoacoustically, but neural responses to the same stimuli have not been
examined. This could be done with multiple coding schemes in mind to examine the
relative capability of each to mediate the pitch effects and, thus, pitch overall.
Further study of the psychophysics of consonance and dissonance is required in
order to advance our knowledge of the underlying neural code. A thorough investi-
gation using a full pair-wise comparison method should compare responses to both
diotic and dichotic stimuli, use both pure- and complex-tones, and look at the effects
of spectra and fundamental frequency (octave range). Also, the relative contributions
to dissonance of pitch (of the fundamental bass) and roughness should be examined.
One possible method would be to separate the temporal envelope and fine structure
of tone pairs using the “auditory chimera” method of Smith, Delgutte and Oxen-
ham (2001) and examine which is more pertinent to the perceived dissonance of the
tone pair. Roughness cues would be associated with the temporal envelope while
pitch would be based on the fine structure.
123
124
Bibliography
Adams, J. C. (1979). “Ascending projections to the inferior colliculus,” J. Comp.Neurol. 183, 519–538.
Adams, J. C. (1995). “Cytochemical mapping of the inferior colliculus,” Abstracts ofthe 18th Midwinter Meeting of the Association for Research in Otolaryngology160.
Astl, J., Popelar, J., Kvasnak, E., and Syka, J. (1996). “Comparison of responseproperties of neurons in the inferior colliculus of guinea pigs under differentanesthetics,” Audiology 35, 335–345.
Attneave, F. and Olson, R. K. (1971). “Pitch as a medium: A new apporoach topsychophysical scaling,” Am. J. Psychol. 84, 147–166.
Bartok, B. (1940). Mikrokosmos (Boosey and Hawkes, London), Vol. 1. 1987 Edition.
Biasutti, M. (1997). “Sharp low- and high-frequency limits on musical chord recog-nition,” Hear. Res. 105, 77–84.
Bigand, E., Parncutt, R., and Lerdahl, F. (1996). “Perception of musical tension inshort chord sequences: The influence of harmonic function, sensory dissonance,horizontal motion, and musical training,” Percept. Psychophys. 58, 125–141.
Bilsen, F. A. (1977). “Pitch of noise signals: Evidence for a ”central spectrum”,” J.Acoust. Soc. Am. 61, 150–161.
Bilsen, F. A. and Goldstein, J. L. (1974). “Pitch of dichotically delayed noise and itspossible spectral basis,” J. Acoust. Soc. Am. 55, 292–296.
Boomsliter, P. and Creel, W. (1961). “The long pattern hypothesis in harmony andhearing,” J. Music Theory 5, 2–30.
Bourk, T. R. (1976), “Electrical responses of neural units in the anteroventral cochlearnucleus of the cat,” Ph.D. thesis, Massachusetts Institute of Technology, Cam-bridge, MA.
Bregman, A. S. (1990). Auditory scene analysis: the perceptual organization of sound(The MIT Press, Cambridge, Massachusetts).
125
Bregman, A. S. and Pinker, S. (1978). “Auditory streaming and the building oftimbre,” Canad. J. Psychol. 32, 19–31.
Burns, E. M. (2001). “Personal communication,”.
Burns, E. M. and Ward, W. D. (1976). “Perception of monotic and dichotic harmonicmusical intervals,” J. Acoust. Soc. Am. (abst) 59, S52.
Cariani, P. A. and Delgutte, B. (1996a). “Neural correlates of the pitch of complextones. I. Pitch and pitch salience,” J. Neurophysiol. 76, 1698–1716.
Cariani, P. A. and Delgutte, B. (1996b). “Neural correlates of the pitch of complextones. II. Pitch shift, pitch ambiguity, phase invariance, pitch circularity, ratepitch, and the dominance region for pitch,” J. Neurophysiol. 76, 1717–1734.
Carney, L. H. (1994). “Spatiotemporal encoding of sound level: models for normalencoding and recruiment of loudness,” Hear. Res. 76, 31–44.
Covey, E., Kauer, J. A., and Casseday, J. H. (1996). “Whole-cell patch-clamp record-ing reveals subthreshold sound-evoked postsynaptic currents in the inferior col-liculus of awake bats,” J. Neurosci. 16, 3009–3018.
Darling, A. M. (1991). “Properties and implementation of the GammaTone filter: atutorial,” in Speech, Hearing, and Language Work in Progress (University CollegeLondon, Department of Phonetics and Linguistics, London), Vol. 5.
de Cheveigne, A. (1985). “A nerve fiber discharge model for the study of pitch,”in Transactions of the Committee on Speech Research/Hearing Research (TheAcoustical Society of Japan, Tokyo), pp. 279–286. S85-37 (September 19, 1985).
Delgutte, B. (1990). “Physiological mechanisms of psychophysical masking: observa-tions from auditory-nerve fibers,” J. Acoust. Soc. Am. 87, 791–809.
Delgutte, B., Hammond, B. M., and Cariani, P. A. (1998). “Neural coding of thetemporal envelope of speech: Relation to modulation transfer functions,” in Psy-chophysical and Physiological Advances in Hearing, edited by A. R. Palmer,A. Rees, A. Q. Summerfield, and R. Meddis (Whurr, London), pp. 595–603.Proceedings of the 11th International Symposium on Hearing, Grantham, U K.,1-6th August, 1997.
Delgutte, B., Hammond, B. M., and Cariani, P. A. (2000). “Neural coding of thetemporal envelop of speech,” in Listening to Speech, edited by S. Greenberg andW. Ainsworth (Oxford University Press, New York), pg. (In Press).
Delgutte, B., Joris, P. X., Litovsky, R. Y., and Yin, T. C. T. (1999). “Receptive fieldsand binaural interactions for virtual-space stimuli in the cat inferior colliculus,”J. Neurophys. 81, 2833–51.
126
Delgutte, B. and Oxenham, A. J. (2001). “Auditory chimeras,” Abstracts of the 24thMidwinter Meeting of the Association for Research in Otolaryngology 623.
Demany, L. and Semal, C. (1990). “Harmonic and melodic octave templates,” J.Acoust. Soc. Am. 88, 2126–2135.
Dobbins, P. A. and Cuddy, L. L. (1982). “Octave discrimination: An experimentalconfirmation of the ”stretched” subjective octave,” J. Acoust. Soc. Am. 72, 411–415.
Doughty, J. M. and Garner, W. R. (1947). “Pitch characteristics of short tones. I.Two kinds of pitch threshold,” J. Exp. Psychol. 37, 351–365.
Doughty, J. M. and Garner, W. R. (1948). “Pitch characteristics of short tones. II.Pitch as a function of tonal duration,” J. Exp. Psychol. 38, 478–494.
Dowling, W. J. and Harwood, D. L. (1986). Music Cognition (Academic, San Diego),Series in Cognition and Perception.
Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap (Chapman& Hall, New York), Monographs on Statistics and Applied Probability.
Evans, E. F. (1983). “Pitch and cochlear nerve fibre temporal discharge patterns,”in Hearing: Physiological Bases and Psychophysics, edited by R. Klinke andR. Hartmann (Springer-Verlag, Berlin), pp. 140–146.
Fastl, H. (1990). “The hearing sensation roughness and neuronal responses to am-tones,” Hear. Res. 46, 293–296.
Fay, R. R. (1988). Hearing in Vertebrates: A Psychophysics Databook (Hill-Fay As-sociates, Winnetka, Illinois).
Feeney, M. P. (1997). “Dichotic beats of mistuned consonances,” J. Acoust. Soc. Am.102, 2333–2342.
Fishman, Y. I., Reser, D. H., Arezzo, J. C., and Steinschneider, M. (2000). “Com-plex tone processing in primary auditory cortex of the awake monkey. I. Neuralensemble correlates of roughness,” J. Acoust. Soc. Am. 108, 235–246.
Fletcher, H. (1934). “Loudness, pitch and the timbre of musical tones and theirrelation to the intensity, the frequency and the overtone structure,” J. Acoust.Soc. Am. 6, 59–69.
Frisina, R. D., Smith, R. L., and Chamberlain, S. C. (1990). “Encoding of amplitudemodulation in the gerbil cochlear nucleus. I. A hierarchy of enhancement,” Hear.Res. 44, 99–122.
Fullerton, B. (1993). “Brainstem nuclei project preferentially to different parts of theIC central nucleus,” Unpublished figure (Unpublished).
127
Gaumond, R. P., Kim, D. O., and Molnar, C. E. (1983). “Response of cochlear nervefibers to brief acoustic stimuli: Role of discharge-history effects,” J. Acoust. Soc.Am. 74, 1392–1398.
Gaumond, R. P., Molnar, C. E., and Kim, D. O. (1982). “Stimulus and recovery de-pendence of cat cochlear nerve fiber spike discharge probability,” J. Neurophysiol.48, 856–873.
Goldstein, J. L. (1973). “An optimum processor theory for the central formation ofthe pitch of complex tones,” J. Acoust. Soc. Am. 54, 1496–1516.
Goldstein, J. L. and Srulovicz, P. (1977). “Auditory-nerve spike intervals as an ade-quate basis for aural frequency measurement,” in Psychophysics and Physiologyof Hearing, edited by E. F. Evans and J. P. Wilson (Academic, London), pp.337–346.
Greenberg, S. (1986). “Comment after paper by e f. evans on page 253,” in AuditoryFrequency Selectivity, edited by B. J. C. Moore and R. D. Patterson (PlenumPress, New York), Vol. 119 of NATO ASI Series A: Life Sciences, pp. 263–264.
Greenberg, S., Marsh, J. T., Brown, W. S., and Smith, J. C. (1987). “Neural temporalcoding of low pitch. I. Human frequency-following responses to complex tones,”Hear Res 25, 91–114.
Guernsey, M. (1928). “The role of consonance and dissonance in music,” Am. J.Psychol. 40, 173–204.
Gulick, W. L., Gescheider, G. A., and Frisina, R. D. (1989). Hearing: physiologi-cal acoustics, neural coding, and psychoacoustics (Oxford University Press, NewYork).
Hall 3rd, J. W. and Soderquist, D. R. (1982). “Transient complex and pure tone pitchchanges by adaptation,” J. Acoust. Soc. Am. 71, 665–670.
Hartmann, W. M. (1993). “On the origin of the enlarged melodic octave,” J. Acoust.Soc. Am. 93, 3400–3409.
Hartmann, W. M. and McMillon, C. D. (2001). “Binaural coherence edge pitch,” J.Acoust. Soc. Am. 109, 294–305.
Heinz, M. G., Carney, L. H., and Colburn, H. S. (1999). “Monaural, cross-frequencycoincidence detection as a mechanism for decoding perceptual cues provided bythe cochlear amplifier,” J. Acoust. Soc. Am. (abst) 105, 1023.
Hotchberg, Y. and Tamhane, A. C. (1987). Multiple Comparison Procedures (Wiley,New York).
Houtsma, A. J. M. (1984). “Pitch salience of various complex sounds,” Music Per-ception 1, 296–307.
128
Houtsma, A. J. M. and Goldstein, J. L. (1972). “The central origin of the pitch ofcomplex tones: Evidence from musical interval recognition,” J. Acoust. Soc. Am.51, 520–529.
Houtsma, A. J. M., Rossing, T. D., and Wagenaars, W. M. (1987). “Auditory demon-strations,” Compact Disc. Acoustical Society of America, Eindhoven, Nether-lands.
Hulse, S. H., Bernard, D. J., and Braaten, R. F. (1995). “Auditory discriminationof chord-based spectral structures by European Starlings (sturnus vulgaris),” J.Exp. Psych. Gen. 124, 409–423.
Jeppesen, K. (1927). The Style of Palestrina and the Dissonance (Oxford UniversityPress, Oxford).
Johnson, D. H. (1980). “The relationship between spike rate and synchrony in re-sponses of auditory-nerve fibers to single tones,” J. Acoust. Soc. Am 68, 1115–1122.
Joris, P. X. and Yin, T. C. T. (1998). “Envelope coding in the lateral superior olive.III. Comparison with afferent pathways,” J. Neurophysiol. 79, 253–269.
Kaernbach, C. and Demany, L. (1998). “Psychophysical evidence against the auto-correlation theory of auditory temporal processing,” J. Acoust. Soc. Am. 104,2298–2306.
Kaestner, G. (1909). “Untersuchungen uber den gefuhlseindruck unanalysierterZweiklange,” Psychol. Studien 4, 473–504.
Kameoka, A. and Kuriyagawa, M. (1969a). “Consonance theory part I: Consonanceof dyads,” J. Acoust. of Am. 45, 1451–1459.
Kameoka, A. and Kuriyagawa, M. (1969b). “Consonance theory part II: Consonanceof complex tones and its calculation method,” J. Acoust. of Am. 45, 1460–1469.
Kiang, N. Y. S. (1980). “Peripheral neural processing of auditory information,” inHandbook of Physiology, edited by I. Darian-Smith (American Physiological So-ciety, Bethesda, MD).
Kiang, N. Y. S. (1990). “Curious oddments of auditory-nerve studies,” Hear. Res.49, 1–16.
Kiang, N. Y. S. and Moxon, E. C. (1972). “Physiological considerations in artificialstimulation of the inner ear,” Ann. Otol. Rhinol. Laryngol. 81, 714–730.
Kiang, N. Y. S. and Moxon, E. C. (1974). “Tails of tuning curves of auditory-nervefibers,” J. Acoust. Soc. Am. 55, 620–630.
129
Kiang, N. Y. S., Moxon, E. C., and Levine, R. A. (1970). “Auditory-nerve activity incats with normal and abnormal cochleas,” in Sensorineural Hearing Loss, editedby G. E. W. Wolstenholme and J. Knight (J. & A. Churchill, London), pp.241–273.
Kiang, N. Y. S., Watanabe, T., Thomas, E. C., and Clark, L. F. (1965). DischargePatterns of Single Fibers in the Cat’s Auditory Nerve (The MIT Press, Cam-bridge, MA).
Kim, D. O., Chang, S. O., and Sirianni, J. G. (1990). “A population study ofauditory-nerve fibers in unanesthetized decerebrate cats: response to pure tones,”J. Acoust. Soc. Am. 87, 1648–55.
Krishna, B. S. and Semple, M. N. (2000). “Auditory temporal processing: Responsesto sinusoidally amplitude-modulated tones in the inferior colliculus,” J. Neuro-physiol. 84, 255–273.
Kuwada, S., Batra, R., and Stanford, T. R. (1989). “Monaural and binaural responsepropetes of neurons in te infeior colliculus of the rabbit: Effects of sodium pen-tobarbital,” J. Neurophysiol. 61, 269–282.
Kuwada, S., Batra, R., Yin, T. C. T., Oliver, D. L., Haberly, L. B., and Stanford,T. R. (1997). “Intracellular recordings in response to monaural and binauralstimulation of neurons in the inferoir colliculus of the cat,” J. Neurosci. 17,7565–7581.
Kuwada, S., Stanford, T. R., and Batra, R. (1987). “Interaural phase-sensitive units inthe infeior colliculus ofthe unanesthetized rabbit: Effects of changing frequency,”J. Neuophysiol. 57, 1338–1360.
Kuwada, S. and Yin, T. C. T. (1983). “Binaural interaction in low-frequency neuronsin inferior colliculus of the cat. I. effects of long interaural delays, intensity, andrepetition rate on interaural delay function,” J. Neurophysiol. 50, 981–999.
Kuwada, S., Yin, T. C. T., Syka, J., Buunen, T. J. F., and Wickesberg, R. E. (1984).“Binaural interaction in low-frequency neurons in inferior colliculus of the cat.IV. Comparison of monaural and binaural response properties,” J. Neurophysiol.51, 1306–1325.
Langner, G. and Schreiner, C. E. (1988). “Periodicity coding in the inferior colliculusof the cat. I. Neuronal mechanisms,” J. Neurophys. 60, 1799–1822.
Liberman, M. C. and Kiang, N. Y. S. (1978). “Acoustic trauma in cats; cochlearpathology and auditory-nerve activity,” Acta Oto-Laryngologica Suppl. 358,1–63.
Liberman, M. C. and Kiang, N. Y. S. (1984). “Single-neuron labeling and chroniccochlear pathology. IV. Stereocilia damage and alterations in rate- and phase-level functions,” Hear. Res. 16, 75–90.
130
Licklider, J. C. R. (1951). “A duplex theory of pitch perception,” Experientia 7,128–134.
Licklider, J. C. R. (1956). “Auditory frequency analysis,” in Information Theory,edited by C. Cherry (Butterworths, London), pp. 253–268.
Licklider, J. C. R., Webster, J. C., and Hedlun, J. M. (1950). “On the frequencylimits of binaural beats,” J. Acoust. Soc. Am. 22, 468–473.
Loeb, G. E., White, M. W., and Merzenich, M. M. (1983). “Spatial cross-correlation.a proposed mechanism for acoustic pitch perception,” Biol. Cybern. 47, 149–63.
Malmberg, C. F. (1917). “The perception of consonance and dissonance,” Psychol.Monogr. 25, 93–133.
May, B. J. and Huang, A. Y. (1997). “Spectral cues for sound localization in cats: amodel for discharge rate representations in the auditory nerve,” J. Acoust. Soc.Am. 101, 2705–19.
McKinney, M. F. and Delgutte, B. (1999). “A possible neurophysiological basis ofthe octave enlargement effect,” J. Acoust. Soc. Am. 106, 2679–2692.
McKinney, M. F., Tramo, M. J., and Delgutte, B. (2001a). “Neural correlates of thedissonance of musical intervals in the inferior colliculus. I. Monaural and diotictone presentation,” Ph D. Thesis Chapter 3 (Unpublished).
McKinney, M. F., Tramo, M. J., and Delgutte, B. (2001b). “Neural correlates ofthe dissonance of musical intervals in the inferior colliculus. II. dichotic tonepresentation and pitch salience,” Ph D. Thesis Chapter 4 (Unpublished).
Merzenich, M. M., Gardi, J. N., and Vivion, M. C. (1983). “Animals,” in Bases ofauditory brain-stem evoked responses, edited by E. J. Moore (Grune & Stratton,New York), pp. 391–412.
Moon, T. K. (1996). “The expectation-maximization algorithm,” IEEE Signal Pro-cessing Magazine Nov., 47–60.
Nuding, S. C., Chen, G.-D., and Sinex, D. G. (1999). “Monaural response propertiesof single neurons in the chinchilla inferior colliculus,” Hear. Res. 131, 89–106.
Ohgushi, K. (1978). “On the role of spatial and temporal cues in the perception ofthe pitch of complex tones,” J. Acoust. Soc. Am. 64, 764–771.
Ohgushi, K. (1983). “The origin of tonality and a possible explanation of the octaveenlargement phenomenon,” J. Acoust. Soc. Am. 73, 1694–1700.
Partch, H. (1974). Genesis of a Music (Da Capo Press, New York).
131
Patterson, R. D., Peters, R. W., and Milroy, R. (1983). “Threshold duration formelodic pitch,” in Hearing: Physiological Bases and Psychophysics, edited byR. Klinke and R. Hartmann (Springer-Verlag, Berlin), pp. 321–325.
Perkel, D. H., Gerstein, G. L., and Moore, G. P. (1967). “Neuronal spike trains andstochastic point processes. I. The single spike train,” Biophys. J. 7, 391–418.
Perrott, D. R. and Nelson, M. A. (1969). “Limits for the detection of binaural beats,”J. Acoust. Soc. Am. 46, 1477–1481.
Plomp, R. and Levelt, W. J. M. (1965). “Tonal consonance and critical bandwidth,”J. Acoust. Soc. Am. 38, 548–560.
Plomp, R. and Steeneken, H. J. M. (1968). “Interference between two simple tones,”J. Acoust. Soc. Amer. 43, 883.
Pollack, I. (1967). “Number of pulses required for minimal pitch,” J. Acoust. Soc.Am. 42, 895.
Pressnitzer, D., McAdams, S., Winsberg, S., and Fineberg, J. (2000). “Perception ofmusical tension for nontonal orchestral timbres and its relation to psychoacousticroughness,” Perception & Psychophysics 62, 66–80.
Pythagoras (c. 540-510 B C.). cited by (von Helmholtz, 1863).
Rameau, J.-P. (1722). Treatise on Harmony (Dover Publications, Inc., New York).Translated by P. Gossett (1971).
Randel, D. M. (1978). Harvard Concise Dictionary of Music (The Belknap Press ofHarvard University Press, Cambridge, Massachusetts).
Redner, R. A. and Walker, H. F. (1984). “Mixture densities, maximum likelihoodand the EM algorithm,” SIAM Review 26, 195–239.
Rees, A. and Møller, A. R. (1983). “Responses of neurons in the inferior colliculus ofthe rat to am and fm tones,” Hear. Res. 10, 301–330.
Rees, A. and Møller, A. R. (1987). “Stimulus properties influencing the responsesof inferior colliculus neurons to amplitude-modulated sounds,” Hear. Res. 27,129–143.
Rees, A. and Palmer, A. R. (1989). “Neuronal responses to amplitude-modulated andpure-tone stimuli in the guinea pig inferior colliculus, and their modification bybroadband noise,” J. Acoust. Soc. Am. 85, 1978–1994.
Rees, A., Sarbaz, A., Malmierca, M. S., and Beau, F. E. N. L. (1997). “Regularity offiring of neurons in the inferior colliculus,” J. Neurophys. 77, 2945–2965.
Rhode, W. S. (1995). “Interspike intervals as a correlate of periodicity pitch in catcochlear nucleus,” J. Acoust. Soc. Am. 97, 2414–2429.
132
Rhode, W. S. and Greenberg, S. (1994). “Encoding of amplitude modulation in thecochlear nucleus of the cat,” J. Neurophys. 71, 1797–1825.
Rice, J. J., Young, E. D., and Spirou, G. A. (1995). “Auditory-nerve encoding ofpinna-based spectral cues: rate representation of high-frequency stimuli,” J.Acoust. Soc. Am. 97, 1764–1776.
Rodieck, R. W. (1967). “Maintained activity of cat retinal ganglion cells,” J. Neuro-phys. 30, 1043–1071.
Rodieck, R. W., Kiang, N. Y. S., and Gerstein, G. L. (1962). “Some quantitativemethods for the study of spontaneous activity of single neurons,” Biophys. J. 2,351–368.
Roederer, J. G. (1979). Introduction to the physics and psychophysics of music(Springer-Verlag, New York).
Rose, J. E., Brugge, J. F., Anderson, D. J., and Hind, J. E. (1967). “Phase-lockedresponse to low-frequency tones in single auditory nerve fibers of the squirrelmonkey,” J. Neurophysiol. 30, 769–793.
Rose, J. E., Brugge, J. F., Anderson, D. J., and Hind, J. E. (1968). “Patternsof activity in single auditory nerve fibers of the squirrel monkey,” in HearingMechanisms in Vertebrates, edited by A. V. S. de Reuck and J. Knight (Churchill,London), pp. 144–168.
Ruggero, M. A. (1973). “Response to noise of auditory nerve fibers in the squirrelmonkey,” J. Neurophysiol. 36, 569–587.
Ruggero, M. A., Rich, N. C., Shivapuja, B. G., and Temchin, A. N. (1996). “Auditory-nerve responses to low-frequency tones: Intensity dependence,” Aud. Neurosci.2, 159–185.
Schellenberg, E. G. and Trainor, L. J. (1996). “Sensory consonance and the per-ceptual similarity of complex-tone harmonic intervals: Tests of adult and infantlisteners,” J. Acoust. Soc. Am. 100, 3321–3328.
Semple, M. N. and Aitkin, L. M. (1979). “Representation of sound frequency andlaterality by units in central nucleus of cat inferior colliculus,” J. Neurophysiol.42, 1626–1639.
Sethares, W. A. (1999). Tuning, Timbre, Spectrum, Scale (Springer-Verlag, London).
Shamma, S. and Klein, D. (2000). “The case of the missing pitch templates: Howharmonic templates emerge in the early auditory system,” J. Acoust. Soc. Am.107, 2631–2644.
Siebert, W. M. (1970). “Frequency discrimination in the auditory system: Place orperiodicity mechanisms?,” Proc. IEEE 58, 723–730.
133
Smith, J. C., Marsh, J. T., and Brown, W. S. (1975). “Far-field recorded frequency-following responses: Evidence for the locus of brainstem sources,” Electroen-cephalogr. Clin. Neurophysiol. 39, 465–472.
Stevens, S. S. (1935). “The relation of pitch to intensity,” J. Acoust. Soc. Am. 6,150–154.
Stoll, G. (1985). “Pitch shift of pure and complex tones induced by masking noise,”J. Acoust. Soc. Am. 77, 188–192.
Stumpf, C. (1890). Tonpsychologi (S. Hirzel, Leipzig).
Sundberg, J. E. F. and Lindqvist, J. (1973). “Musical octaves and pitch,” J. Acoust.Soc. Am. 54, 922–929.
Tenney, J. (1988). A History of ‘Consonance’ and ‘Dissonance’ (Excelsior MusicPublishing Company, New York).
Terhardt, E. (1968a). “Uber akustische Rauhigkeit und Schwankungsstarke,” Acus-tica 20, 215–224.
Terhardt, E. (1968b). “Uber die durch amplitudenmodulierte Sinustone her-vorgerufene Horempfindung,” Acustica 20, 210–214.
Terhardt, E. (1971). “Die tonhohe harmonischer Klange und das Oktavintervall,”Acustica 24, 126–136.
Terhardt, E. (1974a). “On the perception of periodic sound fluctuations (roughness),”Acustica 30, 201–213.
Terhardt, E. (1974b). “Pitch, consonance, and harmony,” J. Acoust. Soc. Am. 55,1061–1069. virtual pitch, place.
Terhardt, E. (1974c). “Pitch of pure tones: its relation to intensity,” in Facts andModels in Hearing, edited by E. Zwicker and E. Terhardt (Springer-Verlag, NewYork), pp. 353–360. Proceedings of the Symposium on Psychophysical Modelsand Physiological Facts in Hearing.
Terhardt, E. (1977). “The two-component theory of musical consonance,” in Psy-chophysics and Physiology of Hearing, edited by E. F. Evans and J. P. Wilson(Academic Press, London), pp. 381–390. An International Symposium, Univer-sity of Keele, 12-16 April 1977.
Terhardt, E. (1984). “The concept of musical consonance: A link between music andpsychoacoustics,” Music Perception 1, 276–295.
Thurlow, W. R. and Bernstein, S. (1957). “Simultaneous two-tone pitch discrimina-tion,” J. Acoust. Soc. Am. 29, 515–519.
134
Tobias, J. V. (1963). “Application of a ‘relative’ procedure to a problem in binaural-beat perception,” J. Acoust. Soc. Am. 35, 1442–1447.
Tramo, M. J., Cariani, P. A., and Delgutte, B. (1992). “Representation of tonal con-sonance and dissonance in the temporal firing patterns of auditory-nerve fibers,”Soc. Neurosci Abstr. 18, 382.
Tramo, M. J., Cariani, P. A., Delgutte, B., and Braida, L. D. (2001). “Neurobiologicalfoundations for the theory of harmony in Western tonal music,” Annals of theNew York Academy of Sciences 930, 92–116.
Tramo, M. J., Cariani, P. A., McKinney, M. F., and Delgutte, B. (2000). “Neuralcoding of tonal consonance and dissonance,” Abstracts of the 23rd MidwinterMeeting of the Association for Research in Otolaryngology 5641.
van de Geer, J. P., Levelt, W. J. M., and Plomp, R. (1962). “The connotation ofmusical consonance,” Acta Psychol. 20, 308–319.
van den Brink, G. (1974). “Monotic and dichotic pitch matchings with complexsounds,” in Facts and Models in Hearing, edited by E. Zwicker and E. Terhardt(Springer-Verlag, New York), pp. 178–188.
Verschuure, J. and van Meeteren, A. A. (1975). “The effect of intensity on pitch,”Acustica 32, 33–44.
Vogel, A. (1974). “Roughness and its relation to the time-pattern of psychoacousticalexcitation,” in Facts and Models in Hearing, edited by E. Zwicker and E. Terhardt(Springer-Verlag, New York), pp. 241–250.
von Bekesy, G. (1960). Experiments in Hearing (McGraw-Hill, New York).
von Helmholtz, H. (1863). Die Lehre von den Tonempfindungen als physiologischeGrundlage fur die Theorie der Musik (F. Vieweg und Sohn, Braunschweig).
Walliser, V. (1969). “Uber die Spreizung von empfundenen Intervallen gegenubermathematisch harmonishen Intervallen bei Sinustonen,” Frequenz 23, 139–143.
Ward, W. D. (1954). “Subjective musical pitch,” J. Acoust. Soc. Am. 26, 369–380.
Wright, J. K. and Bregman, A. S. (1987). “Auditory stream segregation and thecontrol of dissonance in polyphonic music,” in Music and psychology: a mutualregard, edited by S. McAdams (Harwood Academic Publishers, London), Vol. 2of Contemporary Music Review, pp. 63–92.
Yin, T. C. T., Chan, J. C. K., and Carney, L. H. (1987). “Effects of interaural timedelays of noise stimuli on low-frequency cells in the cat’s inferior colliculus. II.Evidence for cross-correlation,” J. Neurophysiol. 58, 562–583.
135
Yin, T. C. T. and Kuwada, S. (1983). “Binaural interaction in low-frequency neu-rons in inferior colliculus of the cat. II. Effects of changing rate and direction ofinteraural phase,” J. Neurophysiol 50, 1000–1019.
Zwicker, E. and Fastl, H. (1999). Psychoacoustics: Facts and models (Springer-Verlag, Berlin), 2nd ed., Vol. 22 of Springer series on information sciences,Chap. Roughness, pp. 257–264.
136