PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA
FACULTY OF SCIENCE
PLASTICITY AND CROSS-MODAL INTERACTIONS IN AUDITORY
DISTANCE PERCEPTION
2012 Ing. Ľuboš HLÁDEK
PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA
FACULTY OF SCIENCE
PLASTICITY AND CROSS-MODAL INTERACTIONS IN
AUDITORY DISTANCE PERCEPTION
DISSERTATION PROSPETUS
Program: Informatics
Institute: Institute of Computer Science
Advisor: Doc. RNDr. Gabriela Andrejková CSc.
Consultant: Doc. Ing. Norbert Kopčo PhD.
Reviewer: Aaron Seitz PhD.
Košice 2012 Ing. Ľuboš HLÁDEK
Abstract:
Auditory distance perception is influenced by acoustical environment. Our senses must
recalibrate each time when we enter new acoustical scene. They are shaped in different time
intervals from seconds up to days or weeks which suggests presence of different learning
processes that remain unknown. Starting point of my project is research in the field of
auditory distance perception, plasticity of auditory neural system and methods of artificial
intelligence. Two preliminary psychophysical experiments were conducted and two are
planned which will be the basis of my dissertation. However, the main goal is to propose
mathematical methods that could account adaptation of auditory distance perception in
acoustical, behavioral and electrophysiological data.
Table of Contents
1. Introduction ......................................................................................................................... 5
2. Distance perception ............................................................................................................. 7
2.1.1. Intensity and loudness ............................................................................................. 7
2.1.2. Reverberation .......................................................................................................... 9
2.1.3. Frequency .............................................................................................................. 11
2.1.4. Binaural cues and acoustic parallax ....................................................................... 12
2.1.5. Vision .................................................................................................................... 12
2.1.6. Familiarity ............................................................................................................. 13
2.1.7. Neural mechanisms ............................................................................................... 13
3. Plasticity and cross-modal interactions in auditory distance perception ........................... 15
3.1. Introduction ................................................................................................................... 15
3.2. Plasticity of auditory localization .................................................................................. 16
3.3. Room learning ............................................................................................................... 16
3.4. Model of plasticity of auditory spatial adaptation ......................................................... 18
4. Learning and decision in uncertainty................................................................................. 19
4.1. Representation and inference ........................................................................................ 19
4.1.1. Bayesian networks ................................................................................................. 19
4.1.2. Fuzzy sets and fuzzy inference .............................................................................. 19
4.1.3. Detection theory .................................................................................................... 20
4.1.4. Stochastic processes and time series analysis ........................................................ 20
4.1.5. Artificial neural networks ...................................................................................... 21
4.2. Learning process ............................................................................................................ 22
5. Preliminary and planned experiments ............................................................................... 23
5.1. Learning of reverberation cues for auditory distance perception .................................. 23
5.2. Short-term adaptation of auditory distance in a reverberant room ................................ 23
5.3. Ventriloquism aftereffect in distance ............................................................................ 24
5.4. EEG study on “room learning” ...................................................................................... 24
6. Dissertation project ............................................................................................................ 25
6.1. Plasticity of auditory distance perception in reverberant room ..................................... 25
6.2. Cross-modal interactions in auditory distance perception in reverberant room ............ 25
6.3. Mathematical modeling ................................................................................................. 26
References .................................................................................................................................... 27
5
1. Introduction
In order to understand auditory spatial perception in broader context than mechanical extraction
of acoustical cues research must focus on plasticity of auditory neural system. Subcortical and early
cortical mechanisms are classically understood as “hardwired processor” but in recent years this
perspective is changing into view that auditory pathway is place of vital plasticity which is crucial for
perceptual learning or plays significant role as sub serving mechanism for classical conditioning or
reinforcement learning. All this is underlined by observations in spatial hearing studies when very
strong and quick recalibration of auditory space is observed under many conditions. Temporary and
also permanent reorganization of auditory space was observed and is well described in many animal
studies with e.g. barn owls, ferrets and cats.
In human studies, several perceptual paradigms in connection with spatial hearing provide
strong evidence of plasticity. Precedence effect (Litovsky, Colburn, Yost, & Guzman, 1999) (time
onset difference 1-50 ms ) tells about temporal constraints of auditory system that affect perception of
echoes as well as it points to importance of leading sound. This is also underlined by Franssen effect
(Hartmann, 1989) where lateral percepts follow leading click but with certain delay when leading nd
lagging click order is reversed. The effect points to a process that builds-up an expectation which
continues certain time after leading click start coming from contralateral ear. When sounds are
temporally close up to 50 - 400ms, the perception of position of varying sound is repulsed or dragged
by the presence of sound which is coming from the same position. In experiments about contextual
plasticity in horizontal localization in our laboratory we saw build-up and decay of the effect in pre
and post training in the scale of 5-10 minutes which points to very quick recalibration of spatial
perception (Tomoriova, Andoga, & Kopco, 2007).
When we enter new acoustical scene, for instance when we move from acoustically dumped to
acoustically live environment or when we move from corner to the center of room, acoustical
reverberant profile is changed dramatically. Although horizontal and vertical localization is little
affected by the presence of reverberation, distance perception is subject to strong recalibration. This
recalibration has different time scales and different perceptual cues that might possibly contribute and
drive the recalibration.
Acoustical factors that contribute to distance perception are D2R, frequency fluctuations or
binaural cues. These could provide absolute cues whereas loudness or central frequency could provide
only relative cues.
There are also non-acoustical factors that modulate distance perception. Expectation of the
sound can be built from vocal effort of speaker, whispering is perceived closer than shouting. Also
visual stimuli can shift perceived sound position from its real place of origin. Specifically, interaction
6
of vision and spatial sound perception is extensively studied as ventriloquism effect/aftereffect but in
distance perception there is only little work that would give evidence about induced visual
recalibration, despite there are works that found visual capture for first presentation, compare blind-
fold and visual (no-blind-fold) conditions and other works that found visually induce bias during
interleaved audio-visual presentations from different locations, neither of the studies evaluated the
effect as learning process comparing pre-training and post-training.
Perception is dynamic process, is evolving over time and our senses must recalibrate very often.
When information is processed, must be stored afterwards. Also role of consolidation or attention
could be critical for learning process. In visual domain attention is usually connected with saccadic
eye movements but in auditory domain it is difficult to evaluate to which acoustic features subject is
paying his attention.
7
2. Distance perception
In auditory domain, spatial perception of distance has not been given as much attention as
horizontal or vertical localization. People are generally worse in judging distance than in judging
angular direction of origin (Zahorik, Brungart, & Bronkhorst, 2005) and the underlying mechanisms
are not described and understood in such details. Natural sound sources usually occur outside of the
listener‟s head therefore “outsideness” provides important dimension to the percept. Nonetheless, it
could be argued that perception of distance is ecologically important. Many times it is useful to
categorize how far the sound is or whether it is approaching or retreating to avoid the danger, locate
the partner or adapt the senses to communication or interactions in noisy or reverberant environment.
It is difficult to generalize subject‟s performance in distance localization tasks. Despite that
Zahorik et. al (2005) came up with the compressive power function
where r‟ is perceived distance, r is presented distance and a,k are parameters usually fit to a
≈0.15-0.7 and k slightly larger than 1. Although these numbers could be little misleading due to high
number of conditions and studies, this function provides good approximation and comparability
framework. Near sources are usually overestimated and far sources underestimated which is explained
as “horizon effect” or “margin of safety” as danger avoidance factor.
Precision or localization “blur” could be as high as 20%-60% reported by Zahorik‟s review
across broad range of studies which suggest that variability increases with distance but this
relationship does not have to be linear nor monotonic (N. Kopco & Shinn-Cunningham, 2010).
In recent studies there is no clear consensus on cues and to what extent they contribute to
perception of distance but ongoing debate signifies both acoustic and non-acoustic cues. Briefly, from
laws of physics it is known that sound energy is inversely proportional to square of distance in free-
field. In reverberant environment, reflections come into play and they also vary systematically with
increasing distance. Higher frequencies are attenuated more than lower frequencies with the path and
in the near field acoustic parallax creates set of binaural ITD and ILD cues that could be also utilized
for distance perception. One could think of Doppler-effect or motion cues but the behavioral data does
not seem support this type of cues. There is also number of non-acoustical cues. As in directional
hearing, vision could potentially be involved or familiarity with sounds seems to be important factor.
2.1.1. Intensity and loudness
Inverse square law predicts that the energy of sound will be inversely proportional to square of
distance. Since intensity could be approximated as square of energy on doubling distance it is
inversely proportional to distance. As mentioned by (Zahorik, 1996) there are three important facts to
8
mention. This law holds only in anechoic space while in reverberant spaces is severely degraded, it
holds only for point sources of sound and it does not hold in near-field were sound waves start to
interact with the body (distance is less than wavelength). This law could be also stated as
[ ]
where R0 is reference distance and R is distance of interest. Hence, doubling distance results in
6.02dB decrease in intensity.
There were many attempts along whole previous century to capture relationship between
loudness and distance. There are two instances that must be understood; discrimination and apparent
position. The first one is usually expressed as Webber‟s ration or minimal detectable change in
percent of reference distance and apparent distance is usually expressed as the increase in intensity in
dB leading to decrease in perceived distance by one half.
Very early attempts studying discrimination thresholds stated detectable changes 20-25%
(Zahorik, 1996). More recent studies (Simpson & Stanton, 1973; Strybel & Perrott, 1984) using
method of limits found in the near filed thresholds approximately 19% and 33% at 0.49m and 0.61m,
respectively, decaying inversely proportional to increasing distance reaching values 3-4% in distance
6-49m (Strybel & Perrott, 1984). These results do not support intensity discrimination performance
reported by (Miller, 1947) which was about 0.5-1dB for wideband noise for intensities in range 20-
100dB. Discrimination thresholds for pure tones are even more enhanced, not exactly following
Webber‟s law. If pressure-discrimination hypothesis is correct, people should be able to detect changes
as small as 5%-10% of reference distance which corresponds to change in intensity. This was shown
in anechoic conditions by 2-alterntive-forced-choice (2AFC) 2-up 1-down procedure averaging 6-20
reversals (Ashmead, LeRoy, & Odom, 1990) which favoring view of Warren who showed similar
results by measuring the amount of increase in vocal output that is needed to compensate for changes
in distance (Warren, 1968).
More insight into loudness-distance paradigm was brought by work of (Zahorik & Wightman,
2001) when loudness constancy was observed when intensity change was produced by change in
distance. Loudness judgments produced by static source follows Sone scale:
where L is perceived loudness, a is parameter, I is intensity and k=0.3. Distance judgments in
contrast to loudness judgments in the same setup varied with real displacements. This works starts
departure from distance perception models based on loudness and spike-count which are explainable
on peripheral or subcortical level. Thus more central neural system must be involved in auditory
distance perception. According to Zahorik loudness constancy is cofounded with reflected acoustical
energy which does not drop as much as the ratio of direct and reverberant energy.
9
2.1.2. Reverberation
Acoustic properties of sound are highly affected by reflections. It is not only pinna, head and
torso but also floor, ceiling, walls, furniture that interact with sound and thus inverse square law does
not always hold. Even if power of direct portion of sound decreases accordingly to 1/R law, reflected
sound comes from uncountable number of different sources and its behavior depends on acoustical
features of reflected surfaces. Despite that reverberation could be utilized as diffuse sound field with
almost constant but slightly decreasing power with increasing distance. Usually, direct and reverberant
portions of sound are temporally interleaved thus most of the auditory spatial cues are degraded (N.
Kopco & Shinn-Cunningham, 2002) but new acoustic cues for distance is created. It is ratio of direct
and reverberant energy (D2R).
In early experiments it was shown that people could judge distance more accurately in
reverberant than acoustic space even without any prior experience (Mershon & King, 1975). Thus
D2R provides absolute cue for distance judgments (Mershon & Bowers, 1979). Importance of
reverberation was also shown by systematic manipulation of T601 which is one of the acoustic
parameters of each echoic room and by manipulating background noise which also influences D2R
(Mershon, Ballenger, Little, McMurtry, & Buchanan, 1989). The results confirmed both role of room
condition (dead vs. live) and presence of background noise. Subjects were blind-folded and stimuli
were pulse-trains of white noise. Judgments in dead condition (T60≈0.35) were underestimated and in
live condition (T60≈0.35) overestimated plus higher background noise produced shift of perceived
distance towards listener as expected due to modification of D2R.
Headphone experiments which systematically varied number of reflections led to the first
quantitative model (Bronkhorst & Houtgast, 1999) of auditory distance perception. The model is based
on 1.) modified D2R 2.) prior knowledge of acoustical properties of room and 3.) length of perceptual
window. It follows:
where ds is perceived distance, j=1/2, A is parameter, quotient term has modified direct and
reverberant energies and rh is computed solely from acoustical properties of room. The model is
simple and has only few parameters. It incorporates windowing technique with arbitrary constant to
compute modified energies rather than computing energies from duration of reverberation. Authors
argued that computing reverberation directly would put too much complexity into model, and by the
difficulty lack of neural correlates. Perceptual window in the model is derived from precedence effect
and it could also help to explain “horizon effect”. This model seems to be successful in predicting
perceived distance under tested conditions however it would hardly account for adaptation process and
more importantly it does not mention accuracy of responses nor involvement of other possible cues as
loudness or ILD.
1 time needed to decrease power of impulse response by 60dB
10
Early study on detection threshold on D2R reported 2dB just noticeable difference (JND) but
according to (Zahorik, 2002a) it suffered from many methodological issues therefore thresholds of
D2R for 0dB, 10dB and 20dB reference intensities were evaluated using 2AFC 3-up-1-down
procedure with manipulation of reverberant part of incoming sound. Four sources were used and all
gave consistent results of 5-6dB JND. These data were fit by non-linear adaptive procedure to logistic
distribution function to obtain parameters of psychometric function. Impulse and 50ms white noise
with brief onset/offset, longer 300ms signal with gradual onset/offset and speech syllable were used as
stimuli. The reason came from studies when different thresholds were found for different signal types
in precedence effect suggesting that temporal cues could play role in D2R estimation, however results
did not confirm this hypothesis because D2R was effective equally under all conditions. Zahorik
proposed psychophysical model of auditory distance perception relating D2R JND and natural
variation of D2R with distance.
where discriminability d‟ (“dee prime”) is expressed as distance of two means of Gaussian
distributions in values of standard deviations assuming that perceived distance has mean µ and both
have equal variances σ. Whit this model he concluded that people using only D2R were able to detect
changes in perceived distance by factor 2.59 in current experiment or 2 in previous results (Zahorik,
2002b).
Zahorik‟s results were reexamined with very similar procedure (Larsen, Iyer, Lansing, & Feng,
2008), however manipulating direct rather than reverberant portion of sound. Results showed 2-3dB
JND for reference values 0dB and 10dB which is inconsistent with earlier results.
They use idea that external changes in sound filed must correspond to changes in internal
variable thus internal processes could be probed by manipulation of external variable (JND
D2R). This relationship depends on relationship between internal variable and physical property
and physical relationship between external variable and manipulated quantity
. The quantitative
model is as follows:
There are four manipulated quantities. Interaural coherence – reflected sounds coming from
higher distances decorrelate binaural inputs more than closer sounds. Variations in spectral fine
structure –variance in spectral response depends on D2R. Spectral shape – air or surroundings acts as
low pass filter and distance perception was shown to depend on low-pass cutoff frequency. Temporal
integration – buildup and decay time of the signal in the ear canal depends on D2R. The results did
not confirm contribution of binaural listening versus monaural situation, signals with 150ms
11
onsets/offsets lead to higher D2R JND and removing various spectral cues lead to decrease in JND
D2R.
Another recent (Lu & Cooke, 2010) technically profound model was proposed. It is based on
Equalization-Cancellation model (Durlach, 1963) previously used to explain release from masking
paradigm. Auditory system in the first step attempts to eliminate masking component (noise) from one
ear relative to total signal in the other ear until both components are equal in both ears. In second step,
signals are subtracted from each other which completely removes masking component. This procedure
was adopted to extract reverberant signal such that removed component was direct signal. Input of EC
D2R model is the signal from a pair of microphones which is parsed into successive frames and
processed by Gammatone filterbank, periphery filter. Two main blocks, EC block and Cross-
correlation block which extracts directional information are combined across frequencies. Finally, a
single direct-to-reverberant energy ratio value is generated for each frame of data input. Relationship
between extracted D2R and log(distance) is approximately linear. Therefore after extraction of
directional D2R, it is used in stochastic framework to assess distance from joint distribution.
2.1.3. Frequency
Frequency provides important cues for horizontal and vertical localization. There are also
situations when it can contribute to distance perception. Air can act as low-pass filter attenuating
higher frequencies more than low frequencies (Coleman, 1968; Little, Mershon, & Cox, 1992),
however these changes are subtle approximately 3-4dB/100m they could play role with more distant
sources over 15m (Blauert, 1997). Nevertheless, in everyday situations reflections from various
surfaces present in rooms could also be low-pass filtered by reflective surfaces (Larsen et al., 2008).
In near-field acoustics (Coleman, 1968) wave propagation could not be approximated as plane
because has more spherical character. This affects velocities of air molecules but ear is sensitive to
change in pressure not that much to velocity. However, there is some evidence that acoustical
interaction with torso, head and pinna could provide some cues. Approaching sounds are low-pass
filtered but high-frequency modulations seems to be invariant of distance (Brungart & Rabinowitz,
1999).
Total amount of spectral variation (Larsen et al., 2008) which is correlated with D2R on limited
range is also possible source of spectral information.
Behavioral results show that people are sensitive to manipulation of high-frequency content
(Coleman, 1968) when decrease can lead to increased apparent distance but this manipulation serves
only as relative cue (Little et al., 1992). Recent study (N. Kopco & Shinn-Cunningham, 2010) showed
that performance in near-field configuration for both frontal and lateral sources was inversely
proportional to low-frequency cut-off which was mainly caused by skewed response range in higher
cut-off. No effect of bandwidth was observed.
12
2.1.4. Binaural cues and acoustic parallax
Binaural cues can play role in near-field, up to 1m. ITD seems to be independent of distance but
ILD could serve as potential cue (Brungart & Rabinowitz, 1999). Acoustic parallax effect arises from
the difference between the path from the source to the center of head and the path from the source to
the ear and usually is expressed as ratio of these distances and it naturally varies distance. This leads to
shift in azimuth in ipsilateral ear of some high-frequency features which could be estimated for
sources up to 1m (Brungart & Rabinowitz, 1999).
It is not clear to what extent ILD could contribute to distance judgments but Kopco‟s data are
mostly explainable by D2R (N. Kopco & Shinn-Cunningham, 2010). In VEGA grant proposal2 he
gives examples of psychophysical model of ILD sensitivity
where is change in ILD, denotes noise of internal representation and
is noise of
external stimulus. This is classical discrimination model which distinguishes internal and external
noise.
In the same document he gives example of weighted perceptual combination of ILD and D2R
which is derived from power function model.
Where is perceived distance, is perceptual weight, α,β are parameters. This model assumes
that variability of two characteristics ILD and D2R is known, they are independent and that the
resulting percept is their optimal combination.
2.1.5. Vision
In anechoic room people perceived all sounds coming from the nearest visible target. This was
named “proximity image effect” (Gardner, 1968). It was later extended by to reverberant conditions
but only for the first presentation data and renamed to “visual capture” (Mershon, Desaulniers, &
Amerson, 1980). Another experiment by the same group was investigating relationship between
apparent distance and loudness by manipulating the position of “dummy” speaker (Mershon,
Desaulniers, Kiefer, Amerson, & Mills, 1981). More important was the manipulation of perceived
loudness based on perceived distance and perceptual invariance relationship was suggested. It means
that perceived loudness depends both on perceived distance and change in intensity.
More recent research however denied the presence of “proximity image effect” for localization
with multiple presentations in semi-reverberant environment. In first experiment (Zahorik, 2001)
2 unofficial document
13
which was trying to replicate original Gardner‟s results, but in reverberant space, subjects localized
multiple sounds either with and without visual cues. In vision condition, exponent of power function
fit was 0.9 of perceived distance and in non-vision condition it was 0.66. Accuracy of responses
expressed in standard deviations of answers was generally smaller in vision condition probably due to
response range which was greater in vision condition and mean localization errors improved over time
in no-vision condition which suggests learning effect.
Another experiment (Calcagno, Abregú, Eguía, & Vergara, 2012) which was trying to overcome
some of the methodological issues of Zahorik‟s results did not also prove “proximity image” effect but
they showed interesting improvement in distance judgments in blind folded condition after initial
visual condition. Initial overestimation was followed by more precise almost perfect fit (Experiment
2B). If subjects were able to see the test room prior to the experiment they have also almost perfect
performance in blind-folded condition which also points to perceived correct response range.
Famous illusion when position of a sound is biased towards visual stimuli is called
“ventriloquism effect”. It is usually studied in horizontal localization (Alais & Burr, 2004; Norbert
Kopco, Lin, Shinn-Cunningham, & Groh, 2009; Recanzone, 1998) and it provides quick adaptation
paradigm (Wozny & Shams, 2011) which is used to test perceptual mechanisms of audio-visual
interactions. In auditory distance there are very few works that come-up with this topic but they study
only visual capture in distance not temporal profile of the effect (Bowen, Ramachandran, Muday, &
Schirillo, 2011).
2.1.6. Familiarity
There are two possible meanings. First, familiarity could be prior knowledge at higher cognitive
level. For example people expect to hear whispering from proximal region and shouting from more
distal region which could serve as an explanation why people systematically underestimate whispering
and overestimate shouting (Blauert, 1997) similarly any long-term experience that could shape
listener‟s expectation prior to point when he actually experiences that specific place for e.g. some
spaces has similar acoustical characteristics therefore a listener is prepared in advance.
Second, when people enter new acoustical environment their senses must recalibrate and they
obtain new knowledge during the exposure. This could be considered as familiarity on perceptual level
and could also be used to study the properties of short-term or long-term plasticity of auditory neural
system.
2.1.7. Neural mechanisms
There are very few studies that focused on neural mechanisms of auditory distance. However,
four major groups could be defined.
1. Explanations based on peripheral processing –spike count models
14
2. Involvement of high-level areas – sensory or post-sensory processing (Zahorik et al.,
2005)
3. Effect of efferent structures – effect of recurrent attenuation (Andéol et al., 2011;
Ferry & Meddis, 2007)
4. Multi modal areas that combine different perceptual information (Zahorik et al., 2005)
15
3. Plasticity and cross-modal interactions in auditory distance perception
3.1. Introduction
Spatial hearing is not the most standard paradigm for studying plasticity in auditory pathway.
Instead, associative learning and perceptual learning streams of research try to cope with the most
central questions related to plasticity: To what extent is plasticity specific vs. general and how
associative with the behavior which it was trained for it is? How memories are created? Where they
reside and how they are recalled? What is the trade-off between consolidation and deterioration?
Where are sensory information stored and where processed? Is preprocessing plastic?
Plasticity in auditory cortex A1 is usually studied using classical conditioning and operational
learning and their combinations to induce change in behavior by training and measure the amount of
plasticity as the difference between in two testing conditions, pre-training and post-training, given that
both testing conditions are identical and the measured difference could be solely accounted to the
effect of training. The difference is mostly evaluated as the change of receptive fields (RF) which are
measured as neuronal response of single or multiple cells in vitro. Training usually leads to increase or
decrease, sharpening or broadening of RF or to change in best-frequency (BF) with respect to
conditioned stimulus (CS) however these findings must co-vary with behavioral changes in order to be
a valid proof of plasticity.
The role of A1 in learning had been under many doubts since ablations of A1 do not impair
classical conditioning which would be in line in long lasting view of cortical “sensory” involvement
rather critical site for learning and memory. Therefore A1 had not been in focus for long time in
neurophysiological research of learning and memory however this view is being challenged with
multiple findings of cortical plasticity under different conditions (Weinberger, 2007).
Perceptual learning can be defined as practice-induced improvement in the ability to perform
specific perceptual tasks (Ahissar & Hochstein, 2004). It usually takes several days or weeks and it is
more related to enhancement of sensorial or low-level processing rather than reinforced behavior
which could be considered as high-level. Ongoing debate in the visual perceptual learning goes around
arguments that support either low level origin of perceptual learning which is supported by high
spatial specificity or lack of transference of PL and by recordings from V1 that found recalibration
after training or on the other hand high level origin which is also supported by V1 recording that did
not find the change in tuning of V1 cells and fact that attention was required in order to observe
learning effects, however recent studies found perceptual learning even in task irrelevant conditions.
One of the studies of perceptual learning in auditory modality studied temporal discrimination
task (Karmarkar & Buonomano, 2003) since previous results showed generalization across untrained
frequencies but not across intervals or even across different modalities. This could have been caused
16
either by improvement in timing per se, or an enhanced ability to store and/or compare the standard
and comparison stimuli. Original task consisted of discriminating test pairs of tones that were
separated by shorter or longer interval from the pair presented at the onset of the test block, however in
current study they trained subjects only on one interval rather than comparison of two intervals and
tested for learning transfer but one group was trained as control,. Both groups exhibited generalization
across frequencies but not over intervals which would speak for dedicated, interval-specific, timing
mechanism.
3.2. Plasticity of auditory localization
Auditory spatial plasticity has been observed in many animal studies but less is known about
humans. Barn owls wearing prisms reorganized their cortical map of auditory space (Brainard &
Knudsen, 1993). Topographic organization of visual input stimuli in the level of optic tectum (OT)
was found to drive recalibration of auditory space. In normal barn owls best IDTs were correlated with
visual receptive fields (VRF) in OT but best ITDs in prism-reared owls were shifted from normal
towards ITD values that are produced by sounds at locations of shifted VRFs.
Also distinction between subcortical processing of azimuth related cues in two nuclei of IC
central (ICc) and external (ICx) which is preceeded by ICc was found. While ICc was tuned to actual
locations ICx response was shifted towards induced discrepancy. The onset of the shift in ICx was
found as early as 5-7ms which suggest that the change in ICx is driven by ascending signal rather than
efferent connections from higher processing and therefore it is thought that OT represents plasticity at
the level of ICx where it is first synthetized.
Adult ferrets rapidly relearned to use altered auditory cues by inserting ear molds when trained
with behaviorally relevant task (Kacelnik, Nodal, Parsons, & King, 2006)
In human studies, listeners who wore custom made molds were able to relearn initially deprived
vertical localization skill after couple of weeks wearing the molds and when the molds were removed
they localization ability was retained (Hofman, Van Riswick, & Van Opstal, 1998). Learning was
observed not only as a result of adaptation to deprived or supernatural cues but also after prolonged
exposure (Carlile, Hyams, & Delaney, 2001). Spatial position of sound could also be altered by
presence of another temporally close sound (50-400ms) when one sound is presented from fixed
location and the other from varying locations could be attracted or repulsed with respect to fixed sound
(Braasch & Hartung, 2002).
3.3. Room learning
When a subject is exposed to altered acoustical conditions the cues for spatial hearing are
altered too. Distance perception is influenced by the amount of reverberation expressed as T60
(Mershon et al., 1989). The learning effect was shown even after five presentations. The last, fifth,
perception was more accurate in “live” condition but did not change in “dead” condition which
17
suggests that subjects utilized reverberant cues but could not improve in condition when reverberant
cues were deprived.
Another study also showed short term improvement after 10 presentations in distance perception
in blind-folded condition (Zahorik, 2001) but no improvement in time in condition when subjects were
allowed to be visually familiar with the test room. Similar study (Calcagno et al., 2012) used two
groups of subjects in second experiment. Group A started with blind-folded and continued with visual-
cue condition and group B had reversed order. Group A first underestimated more distant judgments
then they improved but little overestimated. Group B started to overestimate their judgments but in
blind-folded condition they improved but still had little bias, however almost perfect judgments were
obtained by third group of subjects who were able to be familiarized with the test room therefore with
the response range.
Reverberation decreased horizontal localization but helped distance perception in comparison
with anechoic conditions. Subjects showed certain amount of training in reverberant conditions but
performance in horizontal localization were above those expected from anechoic conditions whereas
horizontal condition was approximately the same as anechoic condition. Distance perception
outperformed anechoic conditions (Brungart & Durlach, 1999; Shinn-cunningham, 2000). The
experiment was conducted over multiple usually 5 days and the improvement is observable within and
across days but there is not such trend in anechoic data. Room learning experiment showed, that
people who started with the change-after-trial condition learned to ignore reverberation cue and were
outperformed by people who started with change-after-session condition. Further acoustical analysis
showed that HRTFs are affected by reverberation. It is not that much evident in long-term spectra
mean spectral shape is similar but extra mostly random frequency fluctuations (10-20dB around
spectrum level) are added to the signal. Also some spectral notches that are evident in anechoic
HRTFs could be flattened.
Another set of experiments room learning were conducted in real and virtual environments
(Kopčo, Schoolmaster, & Shinn-Cunningham, 2004). First, experiment in real classroom studied
spatial transfer of learned reverberation cues over four days of training. Group A started training in
center of the classroom and moved towards corner, Group B did training in reversed order. It was
hypothesized that with practice subjects should improve and that learning effect should transfer to
other spatial locations in room because it was supposed that in each room there are specific room
characteristics that could be learned. Performance in the center of the room should be better than in the
middle. Given that Group B should improve more than Group A because is transferring from more
acoustically challenging condition (corner) towards the center where the performance should be better
(B. G. Shinn-Cunningham, 2001). Relative importance of contribution of acoustical properties of the
environment and effect of practice can be observed as difference between the groups was shown
measuring left-right variability (not distance).
18
Second set of experiments was conducted in virtual acoustic space (VAS). Here subjects were
judging distance in two acoustical conditions and two spatial positions (frontal vs. lateral). Distance
presentations with roved amplitude were presented in either acoustically consistent or acoustically
inconsistent environment which means that room acoustic changed in trial-to-trial manner or between
sessions. Two groups of listeners differed in order of multi session training: FIXED-MIXED or
MIXED-FIXED. In experiment A (Schoolmaster, Kopco, & Shinn-Cunningham, 2003) three acoustic
conditions were used: anechoic room, center of classroom and corner of the same classroom and two
acoustic conditions in follow-up study: large and small classroom (Schoolmaster, Kopčo, & Shinn-
Cunningham, 2004). Results showed that subject degraded their performance in MIXED performance
which suggests that people were unable to use trial-to-trial knowledge in all conditions but subjects
who first started with FIXED condition could transfer their knowledge and outperform subjects who
started with MIXED condition. This suggests that in MIXED conditions subjects learned to ignore
reverberant cue. Analysis of response variance showed that trial-to-trial change leads to increase in
response variability and subjects tend to decrease their response variability with time, showing
learning without explicit feedback.
3.4. Model of plasticity of auditory spatial adaptation
Shinn-Cunningham based her preliminary model (B. Shinn-Cunningham, 2000) of perception on
older model of intensity perception described by Durlach and Braida in 1969. The model works with
concept sensitivity which is based on the difference in sensation of stimulus which is scaled by two
sources of noise: perceptual and memory. In comparison to original model it deals with plasticity by
assuming time dependence.
19
4. Learning and decision in uncertainty
Biological neural system processes uncertain information. It is capable of storing and retrieving
memories. It can produce decisions from uncertain information and improve its performance after
learning. Precise mathematical description of human auditory system is not possible with current
knowledge but there are many models on system level that provide sufficient predictions of human
performance in various tasks. In the following text I will provide brief overview of computational
models that has properties of real neural systems and will be considered for my dissertation. Review of
methods of artificial intelligence could be found elsewhere (Russell & Norwig, 2005), time series
analysis (Kedem & Fokianos, 2002) .
4.1. Representation and inference
4.1.1. Bayesian networks
Joint probability distribution of continuous random variables represented by Gaussian
distributions could be called Bayesian network. Properties of Gaussian distribution and Bayes‟
formula of posterior probability provide framework for inference. The outcome of the system is:
∏
where are values of variables that contribute to conditional probability of each
variable which are previous nodes in the network. This is robust framework which could be
extended to time domain and is suitable for large scale of problems but inference on such system
might be computationally difficult.
4.1.2. Fuzzy sets and fuzzy inference
Fuzzy set is defined by membership function of set A in universe of discourse U.
[ ]
which expresses a degree of membership of each element of U to set A. This is useful way of
characterizing vague description of the state. For example: distance could be near, mid and far thus we
have three sets and concrete value of 2m could be regarded as near to value 0.3 mid to 1 and far to 0.1.
Fuzzy logic generalizes classical set theory in which one element either belongs to a set (1) or does not
(0) to concept when one element can belong to a set with certain degree in interval [ ].
Inference is based on binary operations on fuzzy sets. In classical two-valued logic operation of
conjunction produces true when both operands are true and produces false otherwise. In multi-valued
logic one must define what type of logic will be applied. Binary operation of conjunction in fuzzy
20
logic is evaluated using t-norms. Commonly used are Gödel t-norm (minimum t-norm), product t-
norm, Łukasiewicz t-norm, drastic t-norm.
4.1.3. Detection theory
Standard way to describe performance in psychophysical experiments is to define sensitivity
(Macmillan, 2005). Its measure should have value 0 when subjects are completely insensitive thus
responses are independent of experimental treatment and the value should be high if subjects respond
optimally in given conditions. Such commonly used measure is d’(“dee prime”) which could be
obtained from simple discrimination experiment but it actually expresses how far on decision axis in
probability space are two stimuli in units of standard deviations. If one assumes that each of the two
possible values in discrimination experiment is described by Gaussian distribution and both
distributions has equal variance then their distance in units of standard deviations expresses how well
these two stimuli could be discriminated.
Another common way equivalent to d‟ is to express iso-sensitivity curve of discrimination
experiment which shows hit and false alarm values of the same sensitivity because sensitivity is
independent of bias which says whether subject had tendency to prefer one or the other possible value.
Common measure of response bias in classed criterion and is 0 when hit and false alarm rates are
equal in the experiment.
Both d‟ and criterion c could be expressed by Gaussian z transformation of hit (H) and false
alarm rate (F) of discrimination experiment:
[ ]
4.1.4. Stochastic processes and time series analysis
Behavior of biological neural system is stochastic in nature and is time dependent. Thus it could
be assumed that data are in the form of time series are produced by stochastic process. By analysis of
time series we can test different classes of stochastic models.
Strongly stationary process is a process whose joint probability distribution does not change
over time thus its mean and variance does not change either. Analysis of the process deals with linear
translation-invariant (LTI) operations which could be performed by finite impulse response (FIR) and
infinite impulse response (IIR) filters. These are also analogy to MA and AR processes of Box-Jenkins
methodology of time series analysis.
Hidden Markov Model (HMM) is temporal probabilistic model in which state of the process is
described by single random variable either discrete or continuous. Possible states of variable are states
of the world. HMM of discrete variable with N states and M observation symbols in alphabet is
21
Where { } is set of transition-probabilities such that expres
probability of transition from state to ; { } is probability distribution in each of the states
and denotes kth
observation symbol in alphabet and current parameter
vector. Initial state distribution
Martingale is a class of non-stationary processes. It is often known as a model of a fair game
because prior knowledge in such game cannot help in future outcome. Definition of discrete time
martingale is discrete stochastic process that satisfies for any time n
An example of martingale is Wiener process which has broad range of applications in
econometric, electrical engineering, physics (Brownian motion). The most distinguishing property is
that increments of the state come from white noise process with zero mean and increment variance.
4.1.5. Artificial neural networks
ANN is biologically inspired mathematical model. This model could contain many neurons
which are fundamental blocks of the network. As in biological system, neurons are connected by
synaptic weights which carry actual knowledge of the network. Neurons are organized in layers and
each neuron aggregates outputs from different neurons. The activation of the neuron depends on the
inputs and activation function.
∑
Where is input function of i-th sigma neuron; j is index of neuron that sends its connection to
i-th neuron ; is activation of i-th neuron and is activation function of i-th neuron.
ANN could act as filter. In connection with time series analysis forward connections could act as
FIR and recurrent as IIR filter and if one assumes non-linear activation function we get powerful
model for stationary ARMA process that could be trained with standard methods for neural networks.
Such networks are called NAR, Jordan (NARMA) networks. These models might be further extended
with “memory” lateral connections as in Elman network.
Besides multilayer perceptron slightly different approach is RBF network. It replaces standard
increasing activation function (e.g. tanh) with radial basis function (e.g. multivariate Gaussian) which
allows to find clusters in data more easily. Both networks are universal approximators of continuous
non-linear functions.
22
Hebbian type of learning is implemented by Kohonen network which easily deals with clustering
problems. It has only one layer of neurons with simple activation but the procedure is self-adaptive
and at the end, training patterns are reflected in weights of the network.
There are many different network architectures. Hopfield network, ART, Boltzman machine,
Probabilistic neural networks, “fuzzy” networks, and their combinations but it is out of the scope of
this review to go into further details.
4.2. Learning process
Finding parameters of model is different story but neural plausibility is in question in modeling
of cognitive systems. It is important to keep in mind that data in my experiments are not “just data”
but there is real physiology that reflects learning process in brain. We can expect that human cognition
acts optimally in terms of information processing and its decision is always based on all available
evidence. Learning of any system can be driven either by error then we talk about supervised learning
or it can be clustering or blind signal separation then we talk about unsupervised learning.
The most common method of supervised learning is minimizing square of the error. Extreme of
derivation of error function tells exactly how to find parameters but it does not always have to be
trivia. Probabilistic models do not have to fit optimally with least squares due to non-linearity and
complexity. Famous methods for HMM and Bayesian networks are forward-backward and Viterbi
algorithm. Optimal parameter estimation of probabilistic model especially of normal distributions
could be done by Maximum-Likelihood method which finds the most probable values of parameters in
given observations. Technically it coincides with most probable Bayesian estimator for uniform prior
distribution. Sometimes are needed more sophisticated sub-optimal approximation algorithms for
instance Monte-Carlo.
Support vector machine is concept and set of supervised learning methods. The idea is to find the
features in data that contribute the most (support vectors) to classification. In contrast to ANNs it does
not suffer with common over-fitting or local minima and it can regulate complexity by choosing
support vectors.
There are many classification methods: k-means, Expectation-Maximization, isodata, k nearest
neighbor. Different approach is feature extraction for dimensionality reduction: slow feature analysis,
principal and independent component analysis. From neural networks Kohonen and ART neural
networks also use unsupervised learning for cluster creation.
23
5. Preliminary and planned experiments
5.1. Learning of reverberation cues for auditory distance perception
Listeners must calibrate to the room acoustics in order to judge source distance using
reverberation. A learning process underlies this calibration, resulting in improved performance when
auditory distance is examined repeatedly in the same room over the course of days. The processes of
calibration and learning are spontaneous, not requiring any feedback about the actual target location.
The current study examined whether the amount of spontaneous learning in rooms is dependent on the
relative strength with which the reverberation cue is used for the distance judgments. Listeners judged
distance of broadband noise bursts presented from distances ranging from 0.15 to 2 m directly ahead
of the listener in a small rectangular classroom. The stimulus presentation level was either roved from
trial to trial (R runs) or it was fixed within an experimental run (F runs). The subjects performed
several experimental sessions over multiple days. One subject group was trained on the F runs, one on
the R runs. Learning was observed in the R group but not in the F group, confirming that focusing on
the reverberation cue is required for the learning and room calibration to occur.
(Kopco N, Silvera P, Tskhay K, Tomoriova P, 2011)
5.2. Short-term adaptation of auditory distance in a reverberant room
In regular rooms, sounds are received at the ears along with their reflections. The amount of
reflections provides a cue for judging auditory distance, usually characterized by measuring the direct-
to-reverberant energy ratio, D/R. Since the amount of reflections varies from room to room, listeners
must adapt to the D/R cue whenever they enter a new room. Previous experiments showed that when
the reverberant environment simulated in a virtual space is inconsistent, listeners tend to ignore the
D/R cue [Kopčo et al. (2004) Learning to Judge Distance of Nearby Sounds in Reverberant and
Anechoic Environments. In: Proc. Joint congress CFA/DAGA '04]. Here, an experiment was
performed in which subjects judged distance of a broadband noise stimulus presented from speakers
placed directly in front of the listener at distances from 0.15-2m in a small rectangular classroom. Two
stimulus conditions were used: either sound was presented at a fixed presentation level (F runs), or
sounds were presented at a level that was equalized at the listener‟s ears and roved by 12dB (R runs).
Each subject participated in a session consisting of 4 F runs interleaved with 4 R runs. Subjects were
divided into two groups, differing only by the order of conditions (FRFRFRFR or RFRFRFRF).
Listeners were instructed to ignore the overall level cue in the R runs, and only listeners who followed
this instruction were included in the analysis. Results showed that the order in which the listeners were
exposed to the two conditions had a strong effect. Listeners who started in the F condition had
24
constant correlations between actual distance and response distance in both conditions. On the other
hand, listeners who started in the R condition immediately improved their F-run correlations and
gradually improved their R-run correlations during whole one-hour-long experiment. The results
suggest that the process of adaptation to room reverberation is dramatically influenced by the
characteristics of the initial exposure to sounds in a given room, resulting in differences in the
listener‟s ability to correctly interpret and optimally use the overall level and reverberation cues.
(Hládek, Tomoriová, Kopčo, & Seitz, 2011)
5.3. Ventriloquism aftereffect in distance
This is planned experiment on visual recalibration of auditory space in distance. The aim of the
study is to answer whether there will be any ventriloquism effect and after effect in distance. It will be
conducted in similar room and with the same apparatus as previous studies. Similar procedure to
(Norbert Kopco et al., 2009) with interleaved AV and A-only trials is adopted and pre and post
training will contain “aligning” runs to make sure that everybody will start and end with the same
percept. It will be two-session experiment with within-subject design with two conditions
counterbalanced. In first condition there will be positive and in second negative induced bias of 30%
of presented distance. We expect that people will fully adapt with certain delay and there will be
aftereffect which will fade out in the scale of several minutes.
5.4. EEG study on “room learning”
This study will test specificity of “room learning”. Subjects will perform task for distance
perception in different acoustical environments. We hypothesize that knowledge from one room is not
transferred to another room but there should be improvement in when tested again in the same room.
Second hypothesis is that inconsistent acoustical environment should prevent from learning anything
specific and can lead to decrease in further performance because subject will learn to ignore
reverberant cues (N. Kopco & Shinn-Cunningham, 2004). EEG recordings will be performed in the
beginning and at the end of every training phase to evaluate effect of learning on neural level.
25
6. Dissertation project
Model of auditory distance perception in localization experiments will be proposed. The aim of
the field of computational neuroscience is to come up with the mathematical tools plausible for
descriptions of perceptual and cognitive processes on behavioral and neural level. In my research I
focus on two perceptual phenomena which will be subject of mathematical modeling.
1. Plasticity of auditory distance perception in reverberant room
2. Cross-modal interactions in auditory distance perception in reverberant room
6.1. Plasticity of auditory distance perception in reverberant room
Auditory cues are basis of spatial hearing. In auditory distance perception reverberation and
intensity are two primary cues but their contribution to localization of sounds is unknown. They are
also different in nature because one provides absolute and the other relative information about position
of the sound. In my project I will focus on the role of these cues by analysis of behavioral performance
and acoustical analysis of proposed experiments.
Usual sign of plasticity in biological neural systems is change of physical properties of neurons
or their connections. However, this is not sufficient as a proof if it lacks behavioral correlates.
Different approach, common for computational neuroscientists is “top-down” approach. Controlled
change in behavior has to come from the change on neural level. There are three stages of processing
of auditory distance cues that could be responsible for plasticity:
1. Representation of auditory space.
2. Cue reweighting mechanism.
3. Context specific strategy.
My mathematical model will account these three stages and will help in determining which of
these stages are likely to contribute most to recalibration of auditory distance in reverberant room.
Brain imaging technique (EEG) will be used to test the model on neural level. Experiment on
perceptual learning will test specificity of learning between different listening conditions in different
rooms and we will expect that adaptation will lead to habituation processes once a subject „learns the
room‟.
6.2. Cross-modal interactions in auditory distance perception in reverberant
room
Cross-modal interactions mean that different modalities, specifically auditory and visual,
contribute to perception. Neural system uses two distinct structures for auditory and visual modality.
Interactions of the two modalities in spatial perception are commonly studies by ventriloquism effect.
26
There are many studies in horizontal localization (Alais & Burr, 2004; Frissen, Vroomen, & de
Gelder, 2012; Norbert Kopco et al., 2009; Recanzone, 1998; Wozny & Shams, 2011) but less in
distance and they either concentrate on visual capture (Calcagno et al., 2012; Gardner, 1968; Mershon
et al., 1980, 1981; Zahorik, 2001) or look at immediate effect of induced bias in horizontal and
distance localization (Agganis, Muday, & Schirillo, 2010; Bowen et al., 2011). Even if these studies
look at multimodal interactions they did not focus on process of recalibration of auditory distance
space by the other modality. Therefore I plan a study on ventriloquism aftereffect in distance which
could answer whether visual calibration will take place and how and in what time scale induced bias
will affect post-training performance which can reveal representation of auditory space in distance.
6.3. Mathematical modeling
In acoustical analysis effect of spectral content on perception will be examined. Signal
processing methods and models of auditory periphery will be used to examine activation in the very
first stage of auditory processing.
Data from localization experiments will be modeled. Standard psychophysical approaches
measure threshold of detectability but with current design of experiments it would be difficult to
measure detectability due to methodological constraints of response method, procedure and series
correlation.
There are three possible modeling approaches:
1. Transformation into classification problem. This would assume that subjects create and store
a set of classes which are compared with presented stimuli. For instance if subjects use near
mid and far categories for distance judgments.
2. Analogy with filtering problem. Current evidence is evaluated as probability distribution or
fuzzy set and inferred through Bayesian network, artificial neural network or fuzzy regulator
and produces result as distribution function or activation of output neuron. Polynomial
models could also model filtering problem even in time domain.
3. Hypothesis testing. Rule based system could evaluate noisy evidence and logical inference
will produce result as probability distribution or fuzzy set on decision axis.
The model must reflect not only the ability to represent localization data but also to explain
adaptation paradigm. It could be either incremental or iterative. In incremental models when a single
input is processed, model is updated afterwards, in iterative models the model is updated only after
certain amount of time. If analysis of data shows that single presentation affected global performance
then incremental model could be assumed otherwise iterative model will be more appropriate.
Electrophysiological data from EEG experiment will be used to study effect of learning on
neural level. Statistical methods, methods of spectral analysis of EEG signal and classification will
help to analyze the results of proposed experiment.
27
References
Agganis, B. T., Muday, J. a., & Schirillo, J. a. (2010). Visual Biasing of Auditory Localization in
Azimuth and Depth 1,2. Perceptual and Motor Skills, 111(3), 872-892.
doi:10.2466/22.24.27.PMS.111.6.872-892
Ahissar, M., & Hochstein, S. (2004). The reverse hierarchy theory of visual perceptual learning.
Trends in cognitive sciences, 8(10), 457-64. doi:10.1016/j.tics.2004.08.011
Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration.
Curr Biol, 14(3), 257-62. Retrieved from
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list
_uids=14761661
Andéol, G., Guillaume, A., Micheyl, C., Savel, S., Pellieux, L., & Moulin, A. (2011). Auditory
efferents facilitate sound localization in noise in humans. The Journal of neuroscience : the
official journal of the Society for Neuroscience, 31(18), 6759-63.
doi:10.1523/JNEUROSCI.0248-11.2011
Ashmead, D. H., LeRoy, D., & Odom, R. D. (1990). Perception of the relative distances of nearby
sound sources. Perception & psychophysics, 47(4), 326-31. Retrieved from
http://www.ncbi.nlm.nih.gov/pubmed/2345684
Blauert, J. (1997). Spatial Hearing (Vol. 2nd). Cambridge, MA: MIT Press.
Bowen, A. L., Ramachandran, R., Muday, J. a, & Schirillo, J. a. (2011). Visual signals bias auditory
targets in azimuth and depth. Experimental brain research. Experimentelle Hirnforschung.
Expérimentation cérébrale, 214(3), 403-14. doi:10.1007/s00221-011-2838-1
Braasch, J., & Hartung, K. (2002). Localization in the presence of a distracter and reverberation in the
frontal horizontal plane. I. Psychoacoustical data. Acta acustica united with Acustica, 88(6), 942–
955.
Brainard, M. S., & Knudsen, E. I. (1993). Experience-dependent plasticity in the inferior colliculus: A
site for visual calibration in the neural representation of auditory space in the barn owl. Journal
of Neuroscience, 13(11), 4589-4608.
Bronkhorst, A. W., & Houtgast, T. (1999). Auditory distance perception in rooms. Nature, 397(11
February), 517-520.
Brungart, D. S., & Durlach, N. I. (1999). Auditory localization of nearby sources II: Localization of a
broadband source in the near field. Journal of the Acoustical Society of America, 106(4), 1956-
1968.
Brungart, D. S., & Rabinowitz, W. M. (1999). Auditory localization of nearby sources I: Head-related
transfer functions. Journal of the Acoustical Society of America, 106(3), 1465-1479.
Calcagno, E. R., Abregú, E. L., Eguía, M. C., & Vergara, R. (2012). The role of vision in auditory
distance perception. Perception, 41(2), 175-192. doi:10.1068/p7153
28
Carlile, S., Hyams, S., & Delaney, S. (2001). Systematic distortions of auditory space perception
following prolonged exposure to broadband noise. Journal of the Acoustical Society of America,
110(1), 416-424.
Coleman, P. D. (1962). Failure to localize the source distance of an unfamiliar sound. Journal of the
Acoustical Society of America, 34(1938), 345-346.
Coleman, P. D. (1968). Dual role of frequency spectrum in determination of auditory distance. Journal
of the Acoustical Society of America, 44(2), 631-632.
Durlach, N. I. (1963). Equalization and Cancellation Theory of Binaural Masking-Level Differences.
The Journal of the Acoustical Society of America, 35(8), 1206. doi:10.1121/1.1918675
Ferry, R. T., & Meddis, R. (2007). A computer model of medial efferent suppression in the
mammalian auditory system. The Journal of the Acoustical Society of America, 122(6), 3519-26.
doi:10.1121/1.2799914
Frissen, I., Vroomen, J., & de Gelder, B. (2012). The aftereffects of ventriloquism: the time course of
the visual recalibration of auditory localization. Seeing and perceiving, 25(1), 1-14.
doi:10.1163/187847611X620883
Gardner, M. B. (1968). Proximity image effect in sound localization. Journal of the Acoustical Society
of America, 43(6), 163.
Hartmann, W. M. (1989). Localization of sound in rooms IV: The Franssen effect. The Journal of the
Acoustical Society of America, 86(4), 1366. doi:10.1121/1.398696
Hládek, Ľ., Tomoriová, B., Kopčo, N., & Seitz, A. (2011). Short-term adaptation of auditory distance
perception in a reverberant room. Making sense of sound, Plymouth, UK.
Hofman, P. M., Van Riswick, J. G. A., & Van Opstal, A. J. (1998). Relearning sound localization with
new ears. Nature Neuroscience, 1(5), 417-421.
Kacelnik, O., Nodal, F. R., Parsons, C. H., & King, A. J. (2006). Training-induced plasticity of
auditory localization in adult mammals. PLoS biology, 4(4), e71.
doi:10.1371/journal.pbio.0040071
Karmarkar, U. R., & Buonomano, D. V. (2003). Temporal specificity of perceptual learning in an
auditory discrimination task. Learning & memory (Cold Spring Harbor, N.Y.), 10(2), 141-7.
doi:10.1101/lm.55503
Kedem, B., & Fokianos, K. (2002). Regression Models for Time Series Analysis. Hoboken, NJ, USA:
John Wiley & Sons, Inc. doi:10.1002/0471266981
Kopco N, Silvera P, Tskhay K, Tomoriova P, and A. S. (2011). Learning of reverberation cues for
auditory distance perception. J Acoust Soc Am. Seattle, WA.
Kopco, N., & Shinn-Cunningham, B. G. (2002). Auditory localization in rooms: Acoustic analysis and
behavior (pp. 109-112). Zvolen, Slovakia.
Kopco, N., & Shinn-Cunningham, B. G. (2004). Effects of Spectral Content on Distance Perception in
Reverberant Space. Daytona Beach, Florida.
29
Kopco, N., & Shinn-Cunningham, B. G. (2010). Effects of stimulus spectrum on distance perception.
Journal of the Acoustical Society of America KW -., conditionally accepted for publication.
Kopco, Norbert, Lin, I.-F., Shinn-Cunningham, B. G., & Groh, J. M. (2009). Reference frame of the
ventriloquism aftereffect. The Journal of neuroscience : the official journal of the Society for
Neuroscience, 29(44), 13809-14. doi:10.1523/JNEUROSCI.2783-09.2009
Kopčo, N., Schoolmaster, M., & Shinn-Cunningham, B. G. (2004). Learning to judge distance of
nearby sounds in reverberant and anechoic environments. Strassbourg, France: Citeseer.
Retrieved from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.2952&rep=rep1&type=p
df
Larsen, E., Iyer, N., Lansing, C. R., & Feng, A. S. (2008). On the minimum audible difference in
direct-to-reverberant energy ratio. Journal of the Acoustical Society of America, 124(1), 450-461.
Litovsky, R. Y., Colburn, H. S., Yost, W. A., & Guzman, S. J. (1999). The precedence effect. Journal
of the Acoustical Society of America, 106(4), 1633-1654.
Little, A. D., Mershon, D. H., & Cox, P. H. (1992). Spectral content as a cue to perceived auditory
distance. Perception, 21, 405-416. Retrieved from
http://www.perceptionjournal.com/perception/fulltext/p21/p210405.pdf
Lu, Y., & Cooke, M. (2010). Binaural Estimation of Sound Source Distance via the Direct-to-
Reverberant Energy Ratio for Static and Moving Sources. IEEE Transactions on Audio, Speech,
and Language Processing, 18(7), 1793-1805. doi:10.1109/TASL.2010.2050687
Macmillan, N. (2005). Detection theory: A user‟s guide. Retrieved from
http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Detection+theory:+A+users+g
uide#0
Mershon, D. H., Ballenger, W. L., Little, A. D., McMurtry, P. L., & Buchanan, J. L. (1989). Effects of
room reflectance and background noise on perceived auditory distance. Perception, 18(3), 403–
416. Pion. Retrieved from http://www.perceptionweb.com/perception/fulltext/p18/p180403.pdf
Mershon, D. H., & Bowers, J. N. (1979). Absolute and relative cuse for auditory perception of
egocentric distance. Perception, 8, 311-322.
Mershon, D. H., Desaulniers, D. H., & Amerson, J. (1980). Visual capture in auditory distance
perception: Proximity image effect reconsidered. Journal of Auditory Research, 20, 129-136.
Mershon, D. H., Desaulniers, D. H., Kiefer, S. A., Amerson, J., & Mills, J. T. (1981). Perceived
loudness and visually-determined auditory distance. Perception, 10, 531-543.
Mershon, D. H., & King, L. E. E. (1975). Intensity and reverberation as factors in auditory perception
of egocentric distance. Perception, 18(6), 409-415.
Middlebrooks, J. C., & Green, D. M. (1991). Sound localization by human listeners. Annual review of
psychology, 42(1), 135-59. doi:10.1146/annurev.ps.42.020191.001031
Miller, J. A. (1947). Sensitivity to changes in the intensity of white noise and its relation to masking
and loudness. Journal of Acoustic Society of America, (19), 609-619.
30
Recanzone, G. H. (1998). Rapidly induced auditory plasticity: the ventriloquism aftereffect.
Proceedings of the National Academy of Sciences of the United States of America, 95(3), 869-75.
Retrieved from
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=33810&tool=pmcentrez&rendertype
=abstract
Russell, S., & Norwig, P. (2005). Artificial Intelligence A Modern Approach (p. 1080). New Delhi:
Prentice-Hall.
Schoolmaster, M., Kopco, N., & Shinn-Cunningham, B. G. (2003). Effects of reverberation and
experience on distance perception in simulated environments. Journal of the Acoustical Society
of America, 113, 2285.
Schoolmaster, M., Kopčo, N., & Shinn-Cunningham, B. G. (2004). Auditory Distance Perception in
Fixed and Varying Simulated Acoustic Environments. J Acoust Soc Am (pp. 2459-2459). New
York. Retrieved from http://asadl.org/jasa/resource/1/jasman/v115/i5/p2459_s5?bypassSSO=1
Shinn-Cunningham, B. (2000). Adapting to remapped auditory localization cues: a decision-theory
model. Perception & psychophysics, 62(1), 33-47. Retrieved from
http://www.ncbi.nlm.nih.gov/pubmed/10703254
Shinn-Cunningham, B. G. (2000). Learning reverberation: Implications for spatial auditory displays
(pp. 126-134). Atlanta, GA.
Shinn-Cunningham, B. G. (2001). Localizing sound in rooms. Snowbird, Utah.
Shinn-cunningham, B. (2000). Learning Reverberation : Considerations for Spatial Auditory Displays.
Society, (April), 2-5.
Simpson, W., & Stanton, L. (1973). Head movement does not facilitate perception of the distance of a
sound source. American Journal of Psychology, 86, 151-159.
Strybel, T. Z., & Perrott, D. R. (1984). Discrimination of relative distance in the auditory modality:
The success and failure ofthe loudness discrimination hypothesis. Journal of Acoustic Society of
America, (76), 318-320.
Tomoriova, B., Andoga, R., & Kopco, N. (2007). Contextual Shifts in Sound Localization Induced by
an a priori-known Distractor. Society, 6-6.
Warren, R. M. (1968). Vocal compensation for change in distance. Proceedings of the 6th
International Congress of Acoustics (pp. 61-64). Tokyo.
Weinberger, N. M. (2007). Associative representational plasticity in the auditory cortex : A synthesis
of two disciplines. Learning & Memory, 14, 1-16. doi:10.1101/lm.421807.made
Wozny, D. R., & Shams, L. (2011). Recalibration of auditory space following milliseconds of cross-
modal discrepancy. The Journal of neuroscience : the official journal of the Society for
Neuroscience, 31(12), 4607-12. doi:10.1523/JNEUROSCI.6079-10.2011
Zahorik, P. (1996). Auditory Distance Perception: A Literature Review.
Zahorik, P. (2001). Estimating sound source distance with and without vision. Optometry and Vision
Science, 78(5), 270-275.
31
Zahorik, P. (2002a). Direct-to-reverberant energy ratio sensitivity. The Journal of the Acoustical
Society of America, 112(5 Pt 1), 2110-7. Retrieved from
http://www.ncbi.nlm.nih.gov/pubmed/12430822
Zahorik, P. (2002b). Assessing auditory distance perception using virtual acoustics. Journal of the
Acoustical Society of America, 111(4), 1832-1846.
Zahorik, P., Brungart, D. S., & Bronkhorst, A. W. (2005). Auditory distance perception in humans: A
summary of past and present research. ACTA ACUSTICA UNITED WITH ACUSTICA, 91(3),
409-420.
Zahorik, P., & Wightman, F. L. (2001). Loudness constancy with varying sound source distance.
Nature neuroscience, 4(1), 78-83. doi:10.1038/82931