Transcript
Page 1: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA

FACULTY OF SCIENCE

PLASTICITY AND CROSS-MODAL INTERACTIONS IN AUDITORY

DISTANCE PERCEPTION

2012 Ing. Ľuboš HLÁDEK

Page 2: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA

FACULTY OF SCIENCE

PLASTICITY AND CROSS-MODAL INTERACTIONS IN

AUDITORY DISTANCE PERCEPTION

DISSERTATION PROSPETUS

Program: Informatics

Institute: Institute of Computer Science

Advisor: Doc. RNDr. Gabriela Andrejková CSc.

Consultant: Doc. Ing. Norbert Kopčo PhD.

Reviewer: Aaron Seitz PhD.

Košice 2012 Ing. Ľuboš HLÁDEK

Page 3: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

Abstract:

Auditory distance perception is influenced by acoustical environment. Our senses must

recalibrate each time when we enter new acoustical scene. They are shaped in different time

intervals from seconds up to days or weeks which suggests presence of different learning

processes that remain unknown. Starting point of my project is research in the field of

auditory distance perception, plasticity of auditory neural system and methods of artificial

intelligence. Two preliminary psychophysical experiments were conducted and two are

planned which will be the basis of my dissertation. However, the main goal is to propose

mathematical methods that could account adaptation of auditory distance perception in

acoustical, behavioral and electrophysiological data.

Page 4: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

Table of Contents

1. Introduction ......................................................................................................................... 5

2. Distance perception ............................................................................................................. 7

2.1.1. Intensity and loudness ............................................................................................. 7

2.1.2. Reverberation .......................................................................................................... 9

2.1.3. Frequency .............................................................................................................. 11

2.1.4. Binaural cues and acoustic parallax ....................................................................... 12

2.1.5. Vision .................................................................................................................... 12

2.1.6. Familiarity ............................................................................................................. 13

2.1.7. Neural mechanisms ............................................................................................... 13

3. Plasticity and cross-modal interactions in auditory distance perception ........................... 15

3.1. Introduction ................................................................................................................... 15

3.2. Plasticity of auditory localization .................................................................................. 16

3.3. Room learning ............................................................................................................... 16

3.4. Model of plasticity of auditory spatial adaptation ......................................................... 18

4. Learning and decision in uncertainty................................................................................. 19

4.1. Representation and inference ........................................................................................ 19

4.1.1. Bayesian networks ................................................................................................. 19

4.1.2. Fuzzy sets and fuzzy inference .............................................................................. 19

4.1.3. Detection theory .................................................................................................... 20

4.1.4. Stochastic processes and time series analysis ........................................................ 20

4.1.5. Artificial neural networks ...................................................................................... 21

4.2. Learning process ............................................................................................................ 22

5. Preliminary and planned experiments ............................................................................... 23

5.1. Learning of reverberation cues for auditory distance perception .................................. 23

5.2. Short-term adaptation of auditory distance in a reverberant room ................................ 23

5.3. Ventriloquism aftereffect in distance ............................................................................ 24

5.4. EEG study on “room learning” ...................................................................................... 24

6. Dissertation project ............................................................................................................ 25

Page 5: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

6.1. Plasticity of auditory distance perception in reverberant room ..................................... 25

6.2. Cross-modal interactions in auditory distance perception in reverberant room ............ 25

6.3. Mathematical modeling ................................................................................................. 26

References .................................................................................................................................... 27

Page 6: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

5

1. Introduction

In order to understand auditory spatial perception in broader context than mechanical extraction

of acoustical cues research must focus on plasticity of auditory neural system. Subcortical and early

cortical mechanisms are classically understood as “hardwired processor” but in recent years this

perspective is changing into view that auditory pathway is place of vital plasticity which is crucial for

perceptual learning or plays significant role as sub serving mechanism for classical conditioning or

reinforcement learning. All this is underlined by observations in spatial hearing studies when very

strong and quick recalibration of auditory space is observed under many conditions. Temporary and

also permanent reorganization of auditory space was observed and is well described in many animal

studies with e.g. barn owls, ferrets and cats.

In human studies, several perceptual paradigms in connection with spatial hearing provide

strong evidence of plasticity. Precedence effect (Litovsky, Colburn, Yost, & Guzman, 1999) (time

onset difference 1-50 ms ) tells about temporal constraints of auditory system that affect perception of

echoes as well as it points to importance of leading sound. This is also underlined by Franssen effect

(Hartmann, 1989) where lateral percepts follow leading click but with certain delay when leading nd

lagging click order is reversed. The effect points to a process that builds-up an expectation which

continues certain time after leading click start coming from contralateral ear. When sounds are

temporally close up to 50 - 400ms, the perception of position of varying sound is repulsed or dragged

by the presence of sound which is coming from the same position. In experiments about contextual

plasticity in horizontal localization in our laboratory we saw build-up and decay of the effect in pre

and post training in the scale of 5-10 minutes which points to very quick recalibration of spatial

perception (Tomoriova, Andoga, & Kopco, 2007).

When we enter new acoustical scene, for instance when we move from acoustically dumped to

acoustically live environment or when we move from corner to the center of room, acoustical

reverberant profile is changed dramatically. Although horizontal and vertical localization is little

affected by the presence of reverberation, distance perception is subject to strong recalibration. This

recalibration has different time scales and different perceptual cues that might possibly contribute and

drive the recalibration.

Acoustical factors that contribute to distance perception are D2R, frequency fluctuations or

binaural cues. These could provide absolute cues whereas loudness or central frequency could provide

only relative cues.

There are also non-acoustical factors that modulate distance perception. Expectation of the

sound can be built from vocal effort of speaker, whispering is perceived closer than shouting. Also

visual stimuli can shift perceived sound position from its real place of origin. Specifically, interaction

Page 7: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

6

of vision and spatial sound perception is extensively studied as ventriloquism effect/aftereffect but in

distance perception there is only little work that would give evidence about induced visual

recalibration, despite there are works that found visual capture for first presentation, compare blind-

fold and visual (no-blind-fold) conditions and other works that found visually induce bias during

interleaved audio-visual presentations from different locations, neither of the studies evaluated the

effect as learning process comparing pre-training and post-training.

Perception is dynamic process, is evolving over time and our senses must recalibrate very often.

When information is processed, must be stored afterwards. Also role of consolidation or attention

could be critical for learning process. In visual domain attention is usually connected with saccadic

eye movements but in auditory domain it is difficult to evaluate to which acoustic features subject is

paying his attention.

Page 8: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

7

2. Distance perception

In auditory domain, spatial perception of distance has not been given as much attention as

horizontal or vertical localization. People are generally worse in judging distance than in judging

angular direction of origin (Zahorik, Brungart, & Bronkhorst, 2005) and the underlying mechanisms

are not described and understood in such details. Natural sound sources usually occur outside of the

listener‟s head therefore “outsideness” provides important dimension to the percept. Nonetheless, it

could be argued that perception of distance is ecologically important. Many times it is useful to

categorize how far the sound is or whether it is approaching or retreating to avoid the danger, locate

the partner or adapt the senses to communication or interactions in noisy or reverberant environment.

It is difficult to generalize subject‟s performance in distance localization tasks. Despite that

Zahorik et. al (2005) came up with the compressive power function

where r‟ is perceived distance, r is presented distance and a,k are parameters usually fit to a

≈0.15-0.7 and k slightly larger than 1. Although these numbers could be little misleading due to high

number of conditions and studies, this function provides good approximation and comparability

framework. Near sources are usually overestimated and far sources underestimated which is explained

as “horizon effect” or “margin of safety” as danger avoidance factor.

Precision or localization “blur” could be as high as 20%-60% reported by Zahorik‟s review

across broad range of studies which suggest that variability increases with distance but this

relationship does not have to be linear nor monotonic (N. Kopco & Shinn-Cunningham, 2010).

In recent studies there is no clear consensus on cues and to what extent they contribute to

perception of distance but ongoing debate signifies both acoustic and non-acoustic cues. Briefly, from

laws of physics it is known that sound energy is inversely proportional to square of distance in free-

field. In reverberant environment, reflections come into play and they also vary systematically with

increasing distance. Higher frequencies are attenuated more than lower frequencies with the path and

in the near field acoustic parallax creates set of binaural ITD and ILD cues that could be also utilized

for distance perception. One could think of Doppler-effect or motion cues but the behavioral data does

not seem support this type of cues. There is also number of non-acoustical cues. As in directional

hearing, vision could potentially be involved or familiarity with sounds seems to be important factor.

2.1.1. Intensity and loudness

Inverse square law predicts that the energy of sound will be inversely proportional to square of

distance. Since intensity could be approximated as square of energy on doubling distance it is

inversely proportional to distance. As mentioned by (Zahorik, 1996) there are three important facts to

Page 9: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

8

mention. This law holds only in anechoic space while in reverberant spaces is severely degraded, it

holds only for point sources of sound and it does not hold in near-field were sound waves start to

interact with the body (distance is less than wavelength). This law could be also stated as

[ ]

where R0 is reference distance and R is distance of interest. Hence, doubling distance results in

6.02dB decrease in intensity.

There were many attempts along whole previous century to capture relationship between

loudness and distance. There are two instances that must be understood; discrimination and apparent

position. The first one is usually expressed as Webber‟s ration or minimal detectable change in

percent of reference distance and apparent distance is usually expressed as the increase in intensity in

dB leading to decrease in perceived distance by one half.

Very early attempts studying discrimination thresholds stated detectable changes 20-25%

(Zahorik, 1996). More recent studies (Simpson & Stanton, 1973; Strybel & Perrott, 1984) using

method of limits found in the near filed thresholds approximately 19% and 33% at 0.49m and 0.61m,

respectively, decaying inversely proportional to increasing distance reaching values 3-4% in distance

6-49m (Strybel & Perrott, 1984). These results do not support intensity discrimination performance

reported by (Miller, 1947) which was about 0.5-1dB for wideband noise for intensities in range 20-

100dB. Discrimination thresholds for pure tones are even more enhanced, not exactly following

Webber‟s law. If pressure-discrimination hypothesis is correct, people should be able to detect changes

as small as 5%-10% of reference distance which corresponds to change in intensity. This was shown

in anechoic conditions by 2-alterntive-forced-choice (2AFC) 2-up 1-down procedure averaging 6-20

reversals (Ashmead, LeRoy, & Odom, 1990) which favoring view of Warren who showed similar

results by measuring the amount of increase in vocal output that is needed to compensate for changes

in distance (Warren, 1968).

More insight into loudness-distance paradigm was brought by work of (Zahorik & Wightman,

2001) when loudness constancy was observed when intensity change was produced by change in

distance. Loudness judgments produced by static source follows Sone scale:

where L is perceived loudness, a is parameter, I is intensity and k=0.3. Distance judgments in

contrast to loudness judgments in the same setup varied with real displacements. This works starts

departure from distance perception models based on loudness and spike-count which are explainable

on peripheral or subcortical level. Thus more central neural system must be involved in auditory

distance perception. According to Zahorik loudness constancy is cofounded with reflected acoustical

energy which does not drop as much as the ratio of direct and reverberant energy.

Page 10: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

9

2.1.2. Reverberation

Acoustic properties of sound are highly affected by reflections. It is not only pinna, head and

torso but also floor, ceiling, walls, furniture that interact with sound and thus inverse square law does

not always hold. Even if power of direct portion of sound decreases accordingly to 1/R law, reflected

sound comes from uncountable number of different sources and its behavior depends on acoustical

features of reflected surfaces. Despite that reverberation could be utilized as diffuse sound field with

almost constant but slightly decreasing power with increasing distance. Usually, direct and reverberant

portions of sound are temporally interleaved thus most of the auditory spatial cues are degraded (N.

Kopco & Shinn-Cunningham, 2002) but new acoustic cues for distance is created. It is ratio of direct

and reverberant energy (D2R).

In early experiments it was shown that people could judge distance more accurately in

reverberant than acoustic space even without any prior experience (Mershon & King, 1975). Thus

D2R provides absolute cue for distance judgments (Mershon & Bowers, 1979). Importance of

reverberation was also shown by systematic manipulation of T601 which is one of the acoustic

parameters of each echoic room and by manipulating background noise which also influences D2R

(Mershon, Ballenger, Little, McMurtry, & Buchanan, 1989). The results confirmed both role of room

condition (dead vs. live) and presence of background noise. Subjects were blind-folded and stimuli

were pulse-trains of white noise. Judgments in dead condition (T60≈0.35) were underestimated and in

live condition (T60≈0.35) overestimated plus higher background noise produced shift of perceived

distance towards listener as expected due to modification of D2R.

Headphone experiments which systematically varied number of reflections led to the first

quantitative model (Bronkhorst & Houtgast, 1999) of auditory distance perception. The model is based

on 1.) modified D2R 2.) prior knowledge of acoustical properties of room and 3.) length of perceptual

window. It follows:

where ds is perceived distance, j=1/2, A is parameter, quotient term has modified direct and

reverberant energies and rh is computed solely from acoustical properties of room. The model is

simple and has only few parameters. It incorporates windowing technique with arbitrary constant to

compute modified energies rather than computing energies from duration of reverberation. Authors

argued that computing reverberation directly would put too much complexity into model, and by the

difficulty lack of neural correlates. Perceptual window in the model is derived from precedence effect

and it could also help to explain “horizon effect”. This model seems to be successful in predicting

perceived distance under tested conditions however it would hardly account for adaptation process and

more importantly it does not mention accuracy of responses nor involvement of other possible cues as

loudness or ILD.

1 time needed to decrease power of impulse response by 60dB

Page 11: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

10

Early study on detection threshold on D2R reported 2dB just noticeable difference (JND) but

according to (Zahorik, 2002a) it suffered from many methodological issues therefore thresholds of

D2R for 0dB, 10dB and 20dB reference intensities were evaluated using 2AFC 3-up-1-down

procedure with manipulation of reverberant part of incoming sound. Four sources were used and all

gave consistent results of 5-6dB JND. These data were fit by non-linear adaptive procedure to logistic

distribution function to obtain parameters of psychometric function. Impulse and 50ms white noise

with brief onset/offset, longer 300ms signal with gradual onset/offset and speech syllable were used as

stimuli. The reason came from studies when different thresholds were found for different signal types

in precedence effect suggesting that temporal cues could play role in D2R estimation, however results

did not confirm this hypothesis because D2R was effective equally under all conditions. Zahorik

proposed psychophysical model of auditory distance perception relating D2R JND and natural

variation of D2R with distance.

where discriminability d‟ (“dee prime”) is expressed as distance of two means of Gaussian

distributions in values of standard deviations assuming that perceived distance has mean µ and both

have equal variances σ. Whit this model he concluded that people using only D2R were able to detect

changes in perceived distance by factor 2.59 in current experiment or 2 in previous results (Zahorik,

2002b).

Zahorik‟s results were reexamined with very similar procedure (Larsen, Iyer, Lansing, & Feng,

2008), however manipulating direct rather than reverberant portion of sound. Results showed 2-3dB

JND for reference values 0dB and 10dB which is inconsistent with earlier results.

They use idea that external changes in sound filed must correspond to changes in internal

variable thus internal processes could be probed by manipulation of external variable (JND

D2R). This relationship depends on relationship between internal variable and physical property

and physical relationship between external variable and manipulated quantity

. The quantitative

model is as follows:

There are four manipulated quantities. Interaural coherence – reflected sounds coming from

higher distances decorrelate binaural inputs more than closer sounds. Variations in spectral fine

structure –variance in spectral response depends on D2R. Spectral shape – air or surroundings acts as

low pass filter and distance perception was shown to depend on low-pass cutoff frequency. Temporal

integration – buildup and decay time of the signal in the ear canal depends on D2R. The results did

not confirm contribution of binaural listening versus monaural situation, signals with 150ms

Page 12: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

11

onsets/offsets lead to higher D2R JND and removing various spectral cues lead to decrease in JND

D2R.

Another recent (Lu & Cooke, 2010) technically profound model was proposed. It is based on

Equalization-Cancellation model (Durlach, 1963) previously used to explain release from masking

paradigm. Auditory system in the first step attempts to eliminate masking component (noise) from one

ear relative to total signal in the other ear until both components are equal in both ears. In second step,

signals are subtracted from each other which completely removes masking component. This procedure

was adopted to extract reverberant signal such that removed component was direct signal. Input of EC

D2R model is the signal from a pair of microphones which is parsed into successive frames and

processed by Gammatone filterbank, periphery filter. Two main blocks, EC block and Cross-

correlation block which extracts directional information are combined across frequencies. Finally, a

single direct-to-reverberant energy ratio value is generated for each frame of data input. Relationship

between extracted D2R and log(distance) is approximately linear. Therefore after extraction of

directional D2R, it is used in stochastic framework to assess distance from joint distribution.

2.1.3. Frequency

Frequency provides important cues for horizontal and vertical localization. There are also

situations when it can contribute to distance perception. Air can act as low-pass filter attenuating

higher frequencies more than low frequencies (Coleman, 1968; Little, Mershon, & Cox, 1992),

however these changes are subtle approximately 3-4dB/100m they could play role with more distant

sources over 15m (Blauert, 1997). Nevertheless, in everyday situations reflections from various

surfaces present in rooms could also be low-pass filtered by reflective surfaces (Larsen et al., 2008).

In near-field acoustics (Coleman, 1968) wave propagation could not be approximated as plane

because has more spherical character. This affects velocities of air molecules but ear is sensitive to

change in pressure not that much to velocity. However, there is some evidence that acoustical

interaction with torso, head and pinna could provide some cues. Approaching sounds are low-pass

filtered but high-frequency modulations seems to be invariant of distance (Brungart & Rabinowitz,

1999).

Total amount of spectral variation (Larsen et al., 2008) which is correlated with D2R on limited

range is also possible source of spectral information.

Behavioral results show that people are sensitive to manipulation of high-frequency content

(Coleman, 1968) when decrease can lead to increased apparent distance but this manipulation serves

only as relative cue (Little et al., 1992). Recent study (N. Kopco & Shinn-Cunningham, 2010) showed

that performance in near-field configuration for both frontal and lateral sources was inversely

proportional to low-frequency cut-off which was mainly caused by skewed response range in higher

cut-off. No effect of bandwidth was observed.

Page 13: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

12

2.1.4. Binaural cues and acoustic parallax

Binaural cues can play role in near-field, up to 1m. ITD seems to be independent of distance but

ILD could serve as potential cue (Brungart & Rabinowitz, 1999). Acoustic parallax effect arises from

the difference between the path from the source to the center of head and the path from the source to

the ear and usually is expressed as ratio of these distances and it naturally varies distance. This leads to

shift in azimuth in ipsilateral ear of some high-frequency features which could be estimated for

sources up to 1m (Brungart & Rabinowitz, 1999).

It is not clear to what extent ILD could contribute to distance judgments but Kopco‟s data are

mostly explainable by D2R (N. Kopco & Shinn-Cunningham, 2010). In VEGA grant proposal2 he

gives examples of psychophysical model of ILD sensitivity

where is change in ILD, denotes noise of internal representation and

is noise of

external stimulus. This is classical discrimination model which distinguishes internal and external

noise.

In the same document he gives example of weighted perceptual combination of ILD and D2R

which is derived from power function model.

Where is perceived distance, is perceptual weight, α,β are parameters. This model assumes

that variability of two characteristics ILD and D2R is known, they are independent and that the

resulting percept is their optimal combination.

2.1.5. Vision

In anechoic room people perceived all sounds coming from the nearest visible target. This was

named “proximity image effect” (Gardner, 1968). It was later extended by to reverberant conditions

but only for the first presentation data and renamed to “visual capture” (Mershon, Desaulniers, &

Amerson, 1980). Another experiment by the same group was investigating relationship between

apparent distance and loudness by manipulating the position of “dummy” speaker (Mershon,

Desaulniers, Kiefer, Amerson, & Mills, 1981). More important was the manipulation of perceived

loudness based on perceived distance and perceptual invariance relationship was suggested. It means

that perceived loudness depends both on perceived distance and change in intensity.

More recent research however denied the presence of “proximity image effect” for localization

with multiple presentations in semi-reverberant environment. In first experiment (Zahorik, 2001)

2 unofficial document

Page 14: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

13

which was trying to replicate original Gardner‟s results, but in reverberant space, subjects localized

multiple sounds either with and without visual cues. In vision condition, exponent of power function

fit was 0.9 of perceived distance and in non-vision condition it was 0.66. Accuracy of responses

expressed in standard deviations of answers was generally smaller in vision condition probably due to

response range which was greater in vision condition and mean localization errors improved over time

in no-vision condition which suggests learning effect.

Another experiment (Calcagno, Abregú, Eguía, & Vergara, 2012) which was trying to overcome

some of the methodological issues of Zahorik‟s results did not also prove “proximity image” effect but

they showed interesting improvement in distance judgments in blind folded condition after initial

visual condition. Initial overestimation was followed by more precise almost perfect fit (Experiment

2B). If subjects were able to see the test room prior to the experiment they have also almost perfect

performance in blind-folded condition which also points to perceived correct response range.

Famous illusion when position of a sound is biased towards visual stimuli is called

“ventriloquism effect”. It is usually studied in horizontal localization (Alais & Burr, 2004; Norbert

Kopco, Lin, Shinn-Cunningham, & Groh, 2009; Recanzone, 1998) and it provides quick adaptation

paradigm (Wozny & Shams, 2011) which is used to test perceptual mechanisms of audio-visual

interactions. In auditory distance there are very few works that come-up with this topic but they study

only visual capture in distance not temporal profile of the effect (Bowen, Ramachandran, Muday, &

Schirillo, 2011).

2.1.6. Familiarity

There are two possible meanings. First, familiarity could be prior knowledge at higher cognitive

level. For example people expect to hear whispering from proximal region and shouting from more

distal region which could serve as an explanation why people systematically underestimate whispering

and overestimate shouting (Blauert, 1997) similarly any long-term experience that could shape

listener‟s expectation prior to point when he actually experiences that specific place for e.g. some

spaces has similar acoustical characteristics therefore a listener is prepared in advance.

Second, when people enter new acoustical environment their senses must recalibrate and they

obtain new knowledge during the exposure. This could be considered as familiarity on perceptual level

and could also be used to study the properties of short-term or long-term plasticity of auditory neural

system.

2.1.7. Neural mechanisms

There are very few studies that focused on neural mechanisms of auditory distance. However,

four major groups could be defined.

1. Explanations based on peripheral processing –spike count models

Page 15: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

14

2. Involvement of high-level areas – sensory or post-sensory processing (Zahorik et al.,

2005)

3. Effect of efferent structures – effect of recurrent attenuation (Andéol et al., 2011;

Ferry & Meddis, 2007)

4. Multi modal areas that combine different perceptual information (Zahorik et al., 2005)

Page 16: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

15

3. Plasticity and cross-modal interactions in auditory distance perception

3.1. Introduction

Spatial hearing is not the most standard paradigm for studying plasticity in auditory pathway.

Instead, associative learning and perceptual learning streams of research try to cope with the most

central questions related to plasticity: To what extent is plasticity specific vs. general and how

associative with the behavior which it was trained for it is? How memories are created? Where they

reside and how they are recalled? What is the trade-off between consolidation and deterioration?

Where are sensory information stored and where processed? Is preprocessing plastic?

Plasticity in auditory cortex A1 is usually studied using classical conditioning and operational

learning and their combinations to induce change in behavior by training and measure the amount of

plasticity as the difference between in two testing conditions, pre-training and post-training, given that

both testing conditions are identical and the measured difference could be solely accounted to the

effect of training. The difference is mostly evaluated as the change of receptive fields (RF) which are

measured as neuronal response of single or multiple cells in vitro. Training usually leads to increase or

decrease, sharpening or broadening of RF or to change in best-frequency (BF) with respect to

conditioned stimulus (CS) however these findings must co-vary with behavioral changes in order to be

a valid proof of plasticity.

The role of A1 in learning had been under many doubts since ablations of A1 do not impair

classical conditioning which would be in line in long lasting view of cortical “sensory” involvement

rather critical site for learning and memory. Therefore A1 had not been in focus for long time in

neurophysiological research of learning and memory however this view is being challenged with

multiple findings of cortical plasticity under different conditions (Weinberger, 2007).

Perceptual learning can be defined as practice-induced improvement in the ability to perform

specific perceptual tasks (Ahissar & Hochstein, 2004). It usually takes several days or weeks and it is

more related to enhancement of sensorial or low-level processing rather than reinforced behavior

which could be considered as high-level. Ongoing debate in the visual perceptual learning goes around

arguments that support either low level origin of perceptual learning which is supported by high

spatial specificity or lack of transference of PL and by recordings from V1 that found recalibration

after training or on the other hand high level origin which is also supported by V1 recording that did

not find the change in tuning of V1 cells and fact that attention was required in order to observe

learning effects, however recent studies found perceptual learning even in task irrelevant conditions.

One of the studies of perceptual learning in auditory modality studied temporal discrimination

task (Karmarkar & Buonomano, 2003) since previous results showed generalization across untrained

frequencies but not across intervals or even across different modalities. This could have been caused

Page 17: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

16

either by improvement in timing per se, or an enhanced ability to store and/or compare the standard

and comparison stimuli. Original task consisted of discriminating test pairs of tones that were

separated by shorter or longer interval from the pair presented at the onset of the test block, however in

current study they trained subjects only on one interval rather than comparison of two intervals and

tested for learning transfer but one group was trained as control,. Both groups exhibited generalization

across frequencies but not over intervals which would speak for dedicated, interval-specific, timing

mechanism.

3.2. Plasticity of auditory localization

Auditory spatial plasticity has been observed in many animal studies but less is known about

humans. Barn owls wearing prisms reorganized their cortical map of auditory space (Brainard &

Knudsen, 1993). Topographic organization of visual input stimuli in the level of optic tectum (OT)

was found to drive recalibration of auditory space. In normal barn owls best IDTs were correlated with

visual receptive fields (VRF) in OT but best ITDs in prism-reared owls were shifted from normal

towards ITD values that are produced by sounds at locations of shifted VRFs.

Also distinction between subcortical processing of azimuth related cues in two nuclei of IC

central (ICc) and external (ICx) which is preceeded by ICc was found. While ICc was tuned to actual

locations ICx response was shifted towards induced discrepancy. The onset of the shift in ICx was

found as early as 5-7ms which suggest that the change in ICx is driven by ascending signal rather than

efferent connections from higher processing and therefore it is thought that OT represents plasticity at

the level of ICx where it is first synthetized.

Adult ferrets rapidly relearned to use altered auditory cues by inserting ear molds when trained

with behaviorally relevant task (Kacelnik, Nodal, Parsons, & King, 2006)

In human studies, listeners who wore custom made molds were able to relearn initially deprived

vertical localization skill after couple of weeks wearing the molds and when the molds were removed

they localization ability was retained (Hofman, Van Riswick, & Van Opstal, 1998). Learning was

observed not only as a result of adaptation to deprived or supernatural cues but also after prolonged

exposure (Carlile, Hyams, & Delaney, 2001). Spatial position of sound could also be altered by

presence of another temporally close sound (50-400ms) when one sound is presented from fixed

location and the other from varying locations could be attracted or repulsed with respect to fixed sound

(Braasch & Hartung, 2002).

3.3. Room learning

When a subject is exposed to altered acoustical conditions the cues for spatial hearing are

altered too. Distance perception is influenced by the amount of reverberation expressed as T60

(Mershon et al., 1989). The learning effect was shown even after five presentations. The last, fifth,

perception was more accurate in “live” condition but did not change in “dead” condition which

Page 18: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

17

suggests that subjects utilized reverberant cues but could not improve in condition when reverberant

cues were deprived.

Another study also showed short term improvement after 10 presentations in distance perception

in blind-folded condition (Zahorik, 2001) but no improvement in time in condition when subjects were

allowed to be visually familiar with the test room. Similar study (Calcagno et al., 2012) used two

groups of subjects in second experiment. Group A started with blind-folded and continued with visual-

cue condition and group B had reversed order. Group A first underestimated more distant judgments

then they improved but little overestimated. Group B started to overestimate their judgments but in

blind-folded condition they improved but still had little bias, however almost perfect judgments were

obtained by third group of subjects who were able to be familiarized with the test room therefore with

the response range.

Reverberation decreased horizontal localization but helped distance perception in comparison

with anechoic conditions. Subjects showed certain amount of training in reverberant conditions but

performance in horizontal localization were above those expected from anechoic conditions whereas

horizontal condition was approximately the same as anechoic condition. Distance perception

outperformed anechoic conditions (Brungart & Durlach, 1999; Shinn-cunningham, 2000). The

experiment was conducted over multiple usually 5 days and the improvement is observable within and

across days but there is not such trend in anechoic data. Room learning experiment showed, that

people who started with the change-after-trial condition learned to ignore reverberation cue and were

outperformed by people who started with change-after-session condition. Further acoustical analysis

showed that HRTFs are affected by reverberation. It is not that much evident in long-term spectra

mean spectral shape is similar but extra mostly random frequency fluctuations (10-20dB around

spectrum level) are added to the signal. Also some spectral notches that are evident in anechoic

HRTFs could be flattened.

Another set of experiments room learning were conducted in real and virtual environments

(Kopčo, Schoolmaster, & Shinn-Cunningham, 2004). First, experiment in real classroom studied

spatial transfer of learned reverberation cues over four days of training. Group A started training in

center of the classroom and moved towards corner, Group B did training in reversed order. It was

hypothesized that with practice subjects should improve and that learning effect should transfer to

other spatial locations in room because it was supposed that in each room there are specific room

characteristics that could be learned. Performance in the center of the room should be better than in the

middle. Given that Group B should improve more than Group A because is transferring from more

acoustically challenging condition (corner) towards the center where the performance should be better

(B. G. Shinn-Cunningham, 2001). Relative importance of contribution of acoustical properties of the

environment and effect of practice can be observed as difference between the groups was shown

measuring left-right variability (not distance).

Page 19: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

18

Second set of experiments was conducted in virtual acoustic space (VAS). Here subjects were

judging distance in two acoustical conditions and two spatial positions (frontal vs. lateral). Distance

presentations with roved amplitude were presented in either acoustically consistent or acoustically

inconsistent environment which means that room acoustic changed in trial-to-trial manner or between

sessions. Two groups of listeners differed in order of multi session training: FIXED-MIXED or

MIXED-FIXED. In experiment A (Schoolmaster, Kopco, & Shinn-Cunningham, 2003) three acoustic

conditions were used: anechoic room, center of classroom and corner of the same classroom and two

acoustic conditions in follow-up study: large and small classroom (Schoolmaster, Kopčo, & Shinn-

Cunningham, 2004). Results showed that subject degraded their performance in MIXED performance

which suggests that people were unable to use trial-to-trial knowledge in all conditions but subjects

who first started with FIXED condition could transfer their knowledge and outperform subjects who

started with MIXED condition. This suggests that in MIXED conditions subjects learned to ignore

reverberant cue. Analysis of response variance showed that trial-to-trial change leads to increase in

response variability and subjects tend to decrease their response variability with time, showing

learning without explicit feedback.

3.4. Model of plasticity of auditory spatial adaptation

Shinn-Cunningham based her preliminary model (B. Shinn-Cunningham, 2000) of perception on

older model of intensity perception described by Durlach and Braida in 1969. The model works with

concept sensitivity which is based on the difference in sensation of stimulus which is scaled by two

sources of noise: perceptual and memory. In comparison to original model it deals with plasticity by

assuming time dependence.

Page 20: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

19

4. Learning and decision in uncertainty

Biological neural system processes uncertain information. It is capable of storing and retrieving

memories. It can produce decisions from uncertain information and improve its performance after

learning. Precise mathematical description of human auditory system is not possible with current

knowledge but there are many models on system level that provide sufficient predictions of human

performance in various tasks. In the following text I will provide brief overview of computational

models that has properties of real neural systems and will be considered for my dissertation. Review of

methods of artificial intelligence could be found elsewhere (Russell & Norwig, 2005), time series

analysis (Kedem & Fokianos, 2002) .

4.1. Representation and inference

4.1.1. Bayesian networks

Joint probability distribution of continuous random variables represented by Gaussian

distributions could be called Bayesian network. Properties of Gaussian distribution and Bayes‟

formula of posterior probability provide framework for inference. The outcome of the system is:

where are values of variables that contribute to conditional probability of each

variable which are previous nodes in the network. This is robust framework which could be

extended to time domain and is suitable for large scale of problems but inference on such system

might be computationally difficult.

4.1.2. Fuzzy sets and fuzzy inference

Fuzzy set is defined by membership function of set A in universe of discourse U.

[ ]

which expresses a degree of membership of each element of U to set A. This is useful way of

characterizing vague description of the state. For example: distance could be near, mid and far thus we

have three sets and concrete value of 2m could be regarded as near to value 0.3 mid to 1 and far to 0.1.

Fuzzy logic generalizes classical set theory in which one element either belongs to a set (1) or does not

(0) to concept when one element can belong to a set with certain degree in interval [ ].

Inference is based on binary operations on fuzzy sets. In classical two-valued logic operation of

conjunction produces true when both operands are true and produces false otherwise. In multi-valued

logic one must define what type of logic will be applied. Binary operation of conjunction in fuzzy

Page 21: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

20

logic is evaluated using t-norms. Commonly used are Gödel t-norm (minimum t-norm), product t-

norm, Łukasiewicz t-norm, drastic t-norm.

4.1.3. Detection theory

Standard way to describe performance in psychophysical experiments is to define sensitivity

(Macmillan, 2005). Its measure should have value 0 when subjects are completely insensitive thus

responses are independent of experimental treatment and the value should be high if subjects respond

optimally in given conditions. Such commonly used measure is d’(“dee prime”) which could be

obtained from simple discrimination experiment but it actually expresses how far on decision axis in

probability space are two stimuli in units of standard deviations. If one assumes that each of the two

possible values in discrimination experiment is described by Gaussian distribution and both

distributions has equal variance then their distance in units of standard deviations expresses how well

these two stimuli could be discriminated.

Another common way equivalent to d‟ is to express iso-sensitivity curve of discrimination

experiment which shows hit and false alarm values of the same sensitivity because sensitivity is

independent of bias which says whether subject had tendency to prefer one or the other possible value.

Common measure of response bias in classed criterion and is 0 when hit and false alarm rates are

equal in the experiment.

Both d‟ and criterion c could be expressed by Gaussian z transformation of hit (H) and false

alarm rate (F) of discrimination experiment:

[ ]

4.1.4. Stochastic processes and time series analysis

Behavior of biological neural system is stochastic in nature and is time dependent. Thus it could

be assumed that data are in the form of time series are produced by stochastic process. By analysis of

time series we can test different classes of stochastic models.

Strongly stationary process is a process whose joint probability distribution does not change

over time thus its mean and variance does not change either. Analysis of the process deals with linear

translation-invariant (LTI) operations which could be performed by finite impulse response (FIR) and

infinite impulse response (IIR) filters. These are also analogy to MA and AR processes of Box-Jenkins

methodology of time series analysis.

Hidden Markov Model (HMM) is temporal probabilistic model in which state of the process is

described by single random variable either discrete or continuous. Possible states of variable are states

of the world. HMM of discrete variable with N states and M observation symbols in alphabet is

Page 22: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

21

Where { } is set of transition-probabilities such that expres

probability of transition from state to ; { } is probability distribution in each of the states

and denotes kth

observation symbol in alphabet and current parameter

vector. Initial state distribution

Martingale is a class of non-stationary processes. It is often known as a model of a fair game

because prior knowledge in such game cannot help in future outcome. Definition of discrete time

martingale is discrete stochastic process that satisfies for any time n

An example of martingale is Wiener process which has broad range of applications in

econometric, electrical engineering, physics (Brownian motion). The most distinguishing property is

that increments of the state come from white noise process with zero mean and increment variance.

4.1.5. Artificial neural networks

ANN is biologically inspired mathematical model. This model could contain many neurons

which are fundamental blocks of the network. As in biological system, neurons are connected by

synaptic weights which carry actual knowledge of the network. Neurons are organized in layers and

each neuron aggregates outputs from different neurons. The activation of the neuron depends on the

inputs and activation function.

Where is input function of i-th sigma neuron; j is index of neuron that sends its connection to

i-th neuron ; is activation of i-th neuron and is activation function of i-th neuron.

ANN could act as filter. In connection with time series analysis forward connections could act as

FIR and recurrent as IIR filter and if one assumes non-linear activation function we get powerful

model for stationary ARMA process that could be trained with standard methods for neural networks.

Such networks are called NAR, Jordan (NARMA) networks. These models might be further extended

with “memory” lateral connections as in Elman network.

Besides multilayer perceptron slightly different approach is RBF network. It replaces standard

increasing activation function (e.g. tanh) with radial basis function (e.g. multivariate Gaussian) which

allows to find clusters in data more easily. Both networks are universal approximators of continuous

non-linear functions.

Page 23: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

22

Hebbian type of learning is implemented by Kohonen network which easily deals with clustering

problems. It has only one layer of neurons with simple activation but the procedure is self-adaptive

and at the end, training patterns are reflected in weights of the network.

There are many different network architectures. Hopfield network, ART, Boltzman machine,

Probabilistic neural networks, “fuzzy” networks, and their combinations but it is out of the scope of

this review to go into further details.

4.2. Learning process

Finding parameters of model is different story but neural plausibility is in question in modeling

of cognitive systems. It is important to keep in mind that data in my experiments are not “just data”

but there is real physiology that reflects learning process in brain. We can expect that human cognition

acts optimally in terms of information processing and its decision is always based on all available

evidence. Learning of any system can be driven either by error then we talk about supervised learning

or it can be clustering or blind signal separation then we talk about unsupervised learning.

The most common method of supervised learning is minimizing square of the error. Extreme of

derivation of error function tells exactly how to find parameters but it does not always have to be

trivia. Probabilistic models do not have to fit optimally with least squares due to non-linearity and

complexity. Famous methods for HMM and Bayesian networks are forward-backward and Viterbi

algorithm. Optimal parameter estimation of probabilistic model especially of normal distributions

could be done by Maximum-Likelihood method which finds the most probable values of parameters in

given observations. Technically it coincides with most probable Bayesian estimator for uniform prior

distribution. Sometimes are needed more sophisticated sub-optimal approximation algorithms for

instance Monte-Carlo.

Support vector machine is concept and set of supervised learning methods. The idea is to find the

features in data that contribute the most (support vectors) to classification. In contrast to ANNs it does

not suffer with common over-fitting or local minima and it can regulate complexity by choosing

support vectors.

There are many classification methods: k-means, Expectation-Maximization, isodata, k nearest

neighbor. Different approach is feature extraction for dimensionality reduction: slow feature analysis,

principal and independent component analysis. From neural networks Kohonen and ART neural

networks also use unsupervised learning for cluster creation.

Page 24: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

23

5. Preliminary and planned experiments

5.1. Learning of reverberation cues for auditory distance perception

Listeners must calibrate to the room acoustics in order to judge source distance using

reverberation. A learning process underlies this calibration, resulting in improved performance when

auditory distance is examined repeatedly in the same room over the course of days. The processes of

calibration and learning are spontaneous, not requiring any feedback about the actual target location.

The current study examined whether the amount of spontaneous learning in rooms is dependent on the

relative strength with which the reverberation cue is used for the distance judgments. Listeners judged

distance of broadband noise bursts presented from distances ranging from 0.15 to 2 m directly ahead

of the listener in a small rectangular classroom. The stimulus presentation level was either roved from

trial to trial (R runs) or it was fixed within an experimental run (F runs). The subjects performed

several experimental sessions over multiple days. One subject group was trained on the F runs, one on

the R runs. Learning was observed in the R group but not in the F group, confirming that focusing on

the reverberation cue is required for the learning and room calibration to occur.

(Kopco N, Silvera P, Tskhay K, Tomoriova P, 2011)

5.2. Short-term adaptation of auditory distance in a reverberant room

In regular rooms, sounds are received at the ears along with their reflections. The amount of

reflections provides a cue for judging auditory distance, usually characterized by measuring the direct-

to-reverberant energy ratio, D/R. Since the amount of reflections varies from room to room, listeners

must adapt to the D/R cue whenever they enter a new room. Previous experiments showed that when

the reverberant environment simulated in a virtual space is inconsistent, listeners tend to ignore the

D/R cue [Kopčo et al. (2004) Learning to Judge Distance of Nearby Sounds in Reverberant and

Anechoic Environments. In: Proc. Joint congress CFA/DAGA '04]. Here, an experiment was

performed in which subjects judged distance of a broadband noise stimulus presented from speakers

placed directly in front of the listener at distances from 0.15-2m in a small rectangular classroom. Two

stimulus conditions were used: either sound was presented at a fixed presentation level (F runs), or

sounds were presented at a level that was equalized at the listener‟s ears and roved by 12dB (R runs).

Each subject participated in a session consisting of 4 F runs interleaved with 4 R runs. Subjects were

divided into two groups, differing only by the order of conditions (FRFRFRFR or RFRFRFRF).

Listeners were instructed to ignore the overall level cue in the R runs, and only listeners who followed

this instruction were included in the analysis. Results showed that the order in which the listeners were

exposed to the two conditions had a strong effect. Listeners who started in the F condition had

Page 25: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

24

constant correlations between actual distance and response distance in both conditions. On the other

hand, listeners who started in the R condition immediately improved their F-run correlations and

gradually improved their R-run correlations during whole one-hour-long experiment. The results

suggest that the process of adaptation to room reverberation is dramatically influenced by the

characteristics of the initial exposure to sounds in a given room, resulting in differences in the

listener‟s ability to correctly interpret and optimally use the overall level and reverberation cues.

(Hládek, Tomoriová, Kopčo, & Seitz, 2011)

5.3. Ventriloquism aftereffect in distance

This is planned experiment on visual recalibration of auditory space in distance. The aim of the

study is to answer whether there will be any ventriloquism effect and after effect in distance. It will be

conducted in similar room and with the same apparatus as previous studies. Similar procedure to

(Norbert Kopco et al., 2009) with interleaved AV and A-only trials is adopted and pre and post

training will contain “aligning” runs to make sure that everybody will start and end with the same

percept. It will be two-session experiment with within-subject design with two conditions

counterbalanced. In first condition there will be positive and in second negative induced bias of 30%

of presented distance. We expect that people will fully adapt with certain delay and there will be

aftereffect which will fade out in the scale of several minutes.

5.4. EEG study on “room learning”

This study will test specificity of “room learning”. Subjects will perform task for distance

perception in different acoustical environments. We hypothesize that knowledge from one room is not

transferred to another room but there should be improvement in when tested again in the same room.

Second hypothesis is that inconsistent acoustical environment should prevent from learning anything

specific and can lead to decrease in further performance because subject will learn to ignore

reverberant cues (N. Kopco & Shinn-Cunningham, 2004). EEG recordings will be performed in the

beginning and at the end of every training phase to evaluate effect of learning on neural level.

Page 26: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

25

6. Dissertation project

Model of auditory distance perception in localization experiments will be proposed. The aim of

the field of computational neuroscience is to come up with the mathematical tools plausible for

descriptions of perceptual and cognitive processes on behavioral and neural level. In my research I

focus on two perceptual phenomena which will be subject of mathematical modeling.

1. Plasticity of auditory distance perception in reverberant room

2. Cross-modal interactions in auditory distance perception in reverberant room

6.1. Plasticity of auditory distance perception in reverberant room

Auditory cues are basis of spatial hearing. In auditory distance perception reverberation and

intensity are two primary cues but their contribution to localization of sounds is unknown. They are

also different in nature because one provides absolute and the other relative information about position

of the sound. In my project I will focus on the role of these cues by analysis of behavioral performance

and acoustical analysis of proposed experiments.

Usual sign of plasticity in biological neural systems is change of physical properties of neurons

or their connections. However, this is not sufficient as a proof if it lacks behavioral correlates.

Different approach, common for computational neuroscientists is “top-down” approach. Controlled

change in behavior has to come from the change on neural level. There are three stages of processing

of auditory distance cues that could be responsible for plasticity:

1. Representation of auditory space.

2. Cue reweighting mechanism.

3. Context specific strategy.

My mathematical model will account these three stages and will help in determining which of

these stages are likely to contribute most to recalibration of auditory distance in reverberant room.

Brain imaging technique (EEG) will be used to test the model on neural level. Experiment on

perceptual learning will test specificity of learning between different listening conditions in different

rooms and we will expect that adaptation will lead to habituation processes once a subject „learns the

room‟.

6.2. Cross-modal interactions in auditory distance perception in reverberant

room

Cross-modal interactions mean that different modalities, specifically auditory and visual,

contribute to perception. Neural system uses two distinct structures for auditory and visual modality.

Interactions of the two modalities in spatial perception are commonly studies by ventriloquism effect.

Page 27: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

26

There are many studies in horizontal localization (Alais & Burr, 2004; Frissen, Vroomen, & de

Gelder, 2012; Norbert Kopco et al., 2009; Recanzone, 1998; Wozny & Shams, 2011) but less in

distance and they either concentrate on visual capture (Calcagno et al., 2012; Gardner, 1968; Mershon

et al., 1980, 1981; Zahorik, 2001) or look at immediate effect of induced bias in horizontal and

distance localization (Agganis, Muday, & Schirillo, 2010; Bowen et al., 2011). Even if these studies

look at multimodal interactions they did not focus on process of recalibration of auditory distance

space by the other modality. Therefore I plan a study on ventriloquism aftereffect in distance which

could answer whether visual calibration will take place and how and in what time scale induced bias

will affect post-training performance which can reveal representation of auditory space in distance.

6.3. Mathematical modeling

In acoustical analysis effect of spectral content on perception will be examined. Signal

processing methods and models of auditory periphery will be used to examine activation in the very

first stage of auditory processing.

Data from localization experiments will be modeled. Standard psychophysical approaches

measure threshold of detectability but with current design of experiments it would be difficult to

measure detectability due to methodological constraints of response method, procedure and series

correlation.

There are three possible modeling approaches:

1. Transformation into classification problem. This would assume that subjects create and store

a set of classes which are compared with presented stimuli. For instance if subjects use near

mid and far categories for distance judgments.

2. Analogy with filtering problem. Current evidence is evaluated as probability distribution or

fuzzy set and inferred through Bayesian network, artificial neural network or fuzzy regulator

and produces result as distribution function or activation of output neuron. Polynomial

models could also model filtering problem even in time domain.

3. Hypothesis testing. Rule based system could evaluate noisy evidence and logical inference

will produce result as probability distribution or fuzzy set on decision axis.

The model must reflect not only the ability to represent localization data but also to explain

adaptation paradigm. It could be either incremental or iterative. In incremental models when a single

input is processed, model is updated afterwards, in iterative models the model is updated only after

certain amount of time. If analysis of data shows that single presentation affected global performance

then incremental model could be assumed otherwise iterative model will be more appropriate.

Electrophysiological data from EEG experiment will be used to study effect of learning on

neural level. Statistical methods, methods of spectral analysis of EEG signal and classification will

help to analyze the results of proposed experiment.

Page 28: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

27

References

Agganis, B. T., Muday, J. a., & Schirillo, J. a. (2010). Visual Biasing of Auditory Localization in

Azimuth and Depth 1,2. Perceptual and Motor Skills, 111(3), 872-892.

doi:10.2466/22.24.27.PMS.111.6.872-892

Ahissar, M., & Hochstein, S. (2004). The reverse hierarchy theory of visual perceptual learning.

Trends in cognitive sciences, 8(10), 457-64. doi:10.1016/j.tics.2004.08.011

Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration.

Curr Biol, 14(3), 257-62. Retrieved from

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list

_uids=14761661

Andéol, G., Guillaume, A., Micheyl, C., Savel, S., Pellieux, L., & Moulin, A. (2011). Auditory

efferents facilitate sound localization in noise in humans. The Journal of neuroscience : the

official journal of the Society for Neuroscience, 31(18), 6759-63.

doi:10.1523/JNEUROSCI.0248-11.2011

Ashmead, D. H., LeRoy, D., & Odom, R. D. (1990). Perception of the relative distances of nearby

sound sources. Perception & psychophysics, 47(4), 326-31. Retrieved from

http://www.ncbi.nlm.nih.gov/pubmed/2345684

Blauert, J. (1997). Spatial Hearing (Vol. 2nd). Cambridge, MA: MIT Press.

Bowen, A. L., Ramachandran, R., Muday, J. a, & Schirillo, J. a. (2011). Visual signals bias auditory

targets in azimuth and depth. Experimental brain research. Experimentelle Hirnforschung.

Expérimentation cérébrale, 214(3), 403-14. doi:10.1007/s00221-011-2838-1

Braasch, J., & Hartung, K. (2002). Localization in the presence of a distracter and reverberation in the

frontal horizontal plane. I. Psychoacoustical data. Acta acustica united with Acustica, 88(6), 942–

955.

Brainard, M. S., & Knudsen, E. I. (1993). Experience-dependent plasticity in the inferior colliculus: A

site for visual calibration in the neural representation of auditory space in the barn owl. Journal

of Neuroscience, 13(11), 4589-4608.

Bronkhorst, A. W., & Houtgast, T. (1999). Auditory distance perception in rooms. Nature, 397(11

February), 517-520.

Brungart, D. S., & Durlach, N. I. (1999). Auditory localization of nearby sources II: Localization of a

broadband source in the near field. Journal of the Acoustical Society of America, 106(4), 1956-

1968.

Brungart, D. S., & Rabinowitz, W. M. (1999). Auditory localization of nearby sources I: Head-related

transfer functions. Journal of the Acoustical Society of America, 106(3), 1465-1479.

Calcagno, E. R., Abregú, E. L., Eguía, M. C., & Vergara, R. (2012). The role of vision in auditory

distance perception. Perception, 41(2), 175-192. doi:10.1068/p7153

Page 29: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

28

Carlile, S., Hyams, S., & Delaney, S. (2001). Systematic distortions of auditory space perception

following prolonged exposure to broadband noise. Journal of the Acoustical Society of America,

110(1), 416-424.

Coleman, P. D. (1962). Failure to localize the source distance of an unfamiliar sound. Journal of the

Acoustical Society of America, 34(1938), 345-346.

Coleman, P. D. (1968). Dual role of frequency spectrum in determination of auditory distance. Journal

of the Acoustical Society of America, 44(2), 631-632.

Durlach, N. I. (1963). Equalization and Cancellation Theory of Binaural Masking-Level Differences.

The Journal of the Acoustical Society of America, 35(8), 1206. doi:10.1121/1.1918675

Ferry, R. T., & Meddis, R. (2007). A computer model of medial efferent suppression in the

mammalian auditory system. The Journal of the Acoustical Society of America, 122(6), 3519-26.

doi:10.1121/1.2799914

Frissen, I., Vroomen, J., & de Gelder, B. (2012). The aftereffects of ventriloquism: the time course of

the visual recalibration of auditory localization. Seeing and perceiving, 25(1), 1-14.

doi:10.1163/187847611X620883

Gardner, M. B. (1968). Proximity image effect in sound localization. Journal of the Acoustical Society

of America, 43(6), 163.

Hartmann, W. M. (1989). Localization of sound in rooms IV: The Franssen effect. The Journal of the

Acoustical Society of America, 86(4), 1366. doi:10.1121/1.398696

Hládek, Ľ., Tomoriová, B., Kopčo, N., & Seitz, A. (2011). Short-term adaptation of auditory distance

perception in a reverberant room. Making sense of sound, Plymouth, UK.

Hofman, P. M., Van Riswick, J. G. A., & Van Opstal, A. J. (1998). Relearning sound localization with

new ears. Nature Neuroscience, 1(5), 417-421.

Kacelnik, O., Nodal, F. R., Parsons, C. H., & King, A. J. (2006). Training-induced plasticity of

auditory localization in adult mammals. PLoS biology, 4(4), e71.

doi:10.1371/journal.pbio.0040071

Karmarkar, U. R., & Buonomano, D. V. (2003). Temporal specificity of perceptual learning in an

auditory discrimination task. Learning & memory (Cold Spring Harbor, N.Y.), 10(2), 141-7.

doi:10.1101/lm.55503

Kedem, B., & Fokianos, K. (2002). Regression Models for Time Series Analysis. Hoboken, NJ, USA:

John Wiley & Sons, Inc. doi:10.1002/0471266981

Kopco N, Silvera P, Tskhay K, Tomoriova P, and A. S. (2011). Learning of reverberation cues for

auditory distance perception. J Acoust Soc Am. Seattle, WA.

Kopco, N., & Shinn-Cunningham, B. G. (2002). Auditory localization in rooms: Acoustic analysis and

behavior (pp. 109-112). Zvolen, Slovakia.

Kopco, N., & Shinn-Cunningham, B. G. (2004). Effects of Spectral Content on Distance Perception in

Reverberant Space. Daytona Beach, Florida.

Page 30: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

29

Kopco, N., & Shinn-Cunningham, B. G. (2010). Effects of stimulus spectrum on distance perception.

Journal of the Acoustical Society of America KW -., conditionally accepted for publication.

Kopco, Norbert, Lin, I.-F., Shinn-Cunningham, B. G., & Groh, J. M. (2009). Reference frame of the

ventriloquism aftereffect. The Journal of neuroscience : the official journal of the Society for

Neuroscience, 29(44), 13809-14. doi:10.1523/JNEUROSCI.2783-09.2009

Kopčo, N., Schoolmaster, M., & Shinn-Cunningham, B. G. (2004). Learning to judge distance of

nearby sounds in reverberant and anechoic environments. Strassbourg, France: Citeseer.

Retrieved from

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.2952&rep=rep1&type=p

df

Larsen, E., Iyer, N., Lansing, C. R., & Feng, A. S. (2008). On the minimum audible difference in

direct-to-reverberant energy ratio. Journal of the Acoustical Society of America, 124(1), 450-461.

Litovsky, R. Y., Colburn, H. S., Yost, W. A., & Guzman, S. J. (1999). The precedence effect. Journal

of the Acoustical Society of America, 106(4), 1633-1654.

Little, A. D., Mershon, D. H., & Cox, P. H. (1992). Spectral content as a cue to perceived auditory

distance. Perception, 21, 405-416. Retrieved from

http://www.perceptionjournal.com/perception/fulltext/p21/p210405.pdf

Lu, Y., & Cooke, M. (2010). Binaural Estimation of Sound Source Distance via the Direct-to-

Reverberant Energy Ratio for Static and Moving Sources. IEEE Transactions on Audio, Speech,

and Language Processing, 18(7), 1793-1805. doi:10.1109/TASL.2010.2050687

Macmillan, N. (2005). Detection theory: A user‟s guide. Retrieved from

http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Detection+theory:+A+users+g

uide#0

Mershon, D. H., Ballenger, W. L., Little, A. D., McMurtry, P. L., & Buchanan, J. L. (1989). Effects of

room reflectance and background noise on perceived auditory distance. Perception, 18(3), 403–

416. Pion. Retrieved from http://www.perceptionweb.com/perception/fulltext/p18/p180403.pdf

Mershon, D. H., & Bowers, J. N. (1979). Absolute and relative cuse for auditory perception of

egocentric distance. Perception, 8, 311-322.

Mershon, D. H., Desaulniers, D. H., & Amerson, J. (1980). Visual capture in auditory distance

perception: Proximity image effect reconsidered. Journal of Auditory Research, 20, 129-136.

Mershon, D. H., Desaulniers, D. H., Kiefer, S. A., Amerson, J., & Mills, J. T. (1981). Perceived

loudness and visually-determined auditory distance. Perception, 10, 531-543.

Mershon, D. H., & King, L. E. E. (1975). Intensity and reverberation as factors in auditory perception

of egocentric distance. Perception, 18(6), 409-415.

Middlebrooks, J. C., & Green, D. M. (1991). Sound localization by human listeners. Annual review of

psychology, 42(1), 135-59. doi:10.1146/annurev.ps.42.020191.001031

Miller, J. A. (1947). Sensitivity to changes in the intensity of white noise and its relation to masking

and loudness. Journal of Acoustic Society of America, (19), 609-619.

Page 31: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

30

Recanzone, G. H. (1998). Rapidly induced auditory plasticity: the ventriloquism aftereffect.

Proceedings of the National Academy of Sciences of the United States of America, 95(3), 869-75.

Retrieved from

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=33810&tool=pmcentrez&rendertype

=abstract

Russell, S., & Norwig, P. (2005). Artificial Intelligence A Modern Approach (p. 1080). New Delhi:

Prentice-Hall.

Schoolmaster, M., Kopco, N., & Shinn-Cunningham, B. G. (2003). Effects of reverberation and

experience on distance perception in simulated environments. Journal of the Acoustical Society

of America, 113, 2285.

Schoolmaster, M., Kopčo, N., & Shinn-Cunningham, B. G. (2004). Auditory Distance Perception in

Fixed and Varying Simulated Acoustic Environments. J Acoust Soc Am (pp. 2459-2459). New

York. Retrieved from http://asadl.org/jasa/resource/1/jasman/v115/i5/p2459_s5?bypassSSO=1

Shinn-Cunningham, B. (2000). Adapting to remapped auditory localization cues: a decision-theory

model. Perception & psychophysics, 62(1), 33-47. Retrieved from

http://www.ncbi.nlm.nih.gov/pubmed/10703254

Shinn-Cunningham, B. G. (2000). Learning reverberation: Implications for spatial auditory displays

(pp. 126-134). Atlanta, GA.

Shinn-Cunningham, B. G. (2001). Localizing sound in rooms. Snowbird, Utah.

Shinn-cunningham, B. (2000). Learning Reverberation : Considerations for Spatial Auditory Displays.

Society, (April), 2-5.

Simpson, W., & Stanton, L. (1973). Head movement does not facilitate perception of the distance of a

sound source. American Journal of Psychology, 86, 151-159.

Strybel, T. Z., & Perrott, D. R. (1984). Discrimination of relative distance in the auditory modality:

The success and failure ofthe loudness discrimination hypothesis. Journal of Acoustic Society of

America, (76), 318-320.

Tomoriova, B., Andoga, R., & Kopco, N. (2007). Contextual Shifts in Sound Localization Induced by

an a priori-known Distractor. Society, 6-6.

Warren, R. M. (1968). Vocal compensation for change in distance. Proceedings of the 6th

International Congress of Acoustics (pp. 61-64). Tokyo.

Weinberger, N. M. (2007). Associative representational plasticity in the auditory cortex : A synthesis

of two disciplines. Learning & Memory, 14, 1-16. doi:10.1101/lm.421807.made

Wozny, D. R., & Shams, L. (2011). Recalibration of auditory space following milliseconds of cross-

modal discrepancy. The Journal of neuroscience : the official journal of the Society for

Neuroscience, 31(12), 4607-12. doi:10.1523/JNEUROSCI.6079-10.2011

Zahorik, P. (1996). Auditory Distance Perception: A Literature Review.

Zahorik, P. (2001). Estimating sound source distance with and without vision. Optometry and Vision

Science, 78(5), 270-275.

Page 32: PAVOL JOZEF ŠAFÁRIK UNIVERSITY, KOŠICE, SLOVAKIA FACULTY ...ics.upjs.sk/~hladek/pubs/minimovka_lubos.pdf · pavol jozef ŠafÁrik university, koŠice, slovakia faculty of science

31

Zahorik, P. (2002a). Direct-to-reverberant energy ratio sensitivity. The Journal of the Acoustical

Society of America, 112(5 Pt 1), 2110-7. Retrieved from

http://www.ncbi.nlm.nih.gov/pubmed/12430822

Zahorik, P. (2002b). Assessing auditory distance perception using virtual acoustics. Journal of the

Acoustical Society of America, 111(4), 1832-1846.

Zahorik, P., Brungart, D. S., & Bronkhorst, A. W. (2005). Auditory distance perception in humans: A

summary of past and present research. ACTA ACUSTICA UNITED WITH ACUSTICA, 91(3),

409-420.

Zahorik, P., & Wightman, F. L. (2001). Loudness constancy with varying sound source distance.

Nature neuroscience, 4(1), 78-83. doi:10.1038/82931


Recommended