33
Project Number: 145 Is the Function of the Auditory System to Predict Future Inputs? 1

Full Write Up

Embed Size (px)

Citation preview

Page 1: Full Write Up

Project Number: 145

Is the Function of the Auditory

System to Predict Future

Inputs?

1

Page 2: Full Write Up

AbstractThe brain evolved to process natural stimuli in order to rapidly make decisions about future actions. We

gathered a high quality natural sound database for use in two studies related to this observation. This

database will also be useful in future studies, in compliment to artificial stimuli, which have the

limitation that being artificial, may produce neural responses irrelevant to the natural function of the

nervous system. Our first study examined the hypothesis that the brain’s neural code is optimized to

best predict future inputs and used this database to train a mixture density network. We compared the

resulting parameters of the network with past recordings from the auditory nerve and found some

similarities in their properties, suggesting there may be some value in this hypothesis. In our second

study, which was unrelated to the first, we compared the capacity of different artificial stimuli to

characterise the spectro-temporal response function (STRF) of auditory cortical neurons. We aimed to

assess this capacity by comparing how well the characterizations of the STRF for different artificial

stimuli could predict the neural response to our natural sounds.

IntroductionUnderstanding how the brain encodes natural sensory information is a key question in systems

neuroscience, Over the last twenty years, the field of computational neuroscience has developed models to test

ideas as to how the central nervous system is able to complete this processing, whilst in vivo studies have

investigated the response of sensory systems to stimuli. However, traditionally, the artificial stimuli used in such

experiments are simple and unrepresentative of the natural environment1. Therefore, any studies using such

stimuli, and the conclusions they draw, are limited.

In this project, we gathered a database of 'natural' sounds, that is, sounds present environment of

evolutionary adaptedness- the environment that the auditory system of model animals would have

evolved to respond to. We sought to utilise this database in two separate studies. For the first study, we

used the database to train a model based on the idea that the brain evolved to predict future inputs. For

the second study we used the database to test the capacity of artificial stimuli used in past physiological

studies to characterise auditory cortical neurons.

Natural Sounds Database- Databases of natural sounds do presently exist2,3 but none are particularly

high quality, suffering from problems in recording such as clipping (Figure 1) and containing

2

Page 3: Full Write Up

microphone noise. Those which are high quality often only feature only one kind of natural sound, such

as speech. Finally, all such databases have sampling rate of 44.1kHz or less, leading to a Nyquist

frequency23 of 22.05kHz. However, most neurophysiological studies are done on small mammals, many

of which hear up to frequencies well above 22.05kHz. For example, ferrets hear in the range of 0.016-

44kHz16, 23. The sound database gathered will comprise of both natural and anechoic clips of sounds

present in the environment of evolutionary adaptedness, for instance, vocalisations and foliage recorded

in areas without ambient mechanical sounds. It should be of a high enough quality to be continually

useful in a wide range of future physiological studies, especially in animals with hearing above the

range of humans.

Computational Modeling- This sound database will be used as training for a model based to an

artificial neural network. This network will be trained to predict future values of sounds, given their

past values.

One of the few well tested principles of the nervous system is efficient coding hypothesis 4. This is the

idea that neural systems evolved to efficiently represent natural stimuli. It is important to note that most

instantiations of this idea consider only the representation of current stimuli e.g. the currently observed

image. By comparing the predictions of models that embody this hypothesis with data from in vivo

experiments, the predictive power of such models can be tested and, if proven, suggest what the

function (but not necessarily the mechanism) of the neural system in question is. This approach has so

far elucidated the function of different features of the nervous system including visual5,6 and auditory7,8

coding, by finding efficient ways to represent stimuli.

However, this approach has limitations. For instance, the brain discards some information9 present in a

stimulus rather than encode all features of it. Understanding what information is of importance to the

3

Figure 1: An example of clipping (taken from 'crunching leaves louder') in the Pittsburgh natural sound database

Page 4: Full Write Up

brain and must be processed is a key problem in neuroscience. One idea is that the brain encodes the

information within stimuli that is useful for predicting the future state of said stimuli10. From a

behavioral perspective this seems intuitive as the key role of the nervous system is to make decisions

about future actions that will maximize the organism's chance of survival and reproduction. The

reaction time of humans to an auditory stimulus is on the order of 160ms11 and so in order to bypass this

delay and make optimal decisions about future actions, the present and future state of the world must be

predicted by the auditory system from past states of the world.

One of our aims therefore, was to use our sound database as input for a computational model (a mixture

density network) which embodies this concept of a prediction code. We trained the model to predict

future values of the natural sounds we gathered, given past values of these sounds. The results of this

model will be compared against data from in vivo studies of the auditory nerves of model organisms,

and against past7,12 models of the auditory system which have attempted to produce a set of filters to

minimise the statistical dependence amongst the filter outputs (i.e. efficiently represent the current

sound). These efficient coding models give filters that are back to front8, steep and high frequency in

the far past and are shallow and with a low frequency in the near past, problems that we aimed to

overcome with our predictive model.

Auditory Cortical Neuron Responses-We also sought to evaluate artificial stimuli commonly used in in

vivo auditory neuroscience studies. Such studies use these simuli to characterise the spatio-temporal

receptive fields (STRFs) of neurons within the auditory cortex of animals. STRFs give the sensitivity of

neurons to different frequencies across time. The STRF can be seen as a linear model that transforms a

time-varying stimulus, such as a spectrogram, into a prediction of neural firing rate13,14. Studies have

previously used dynamic random chords (DRCs)15,16, or temporally orthogonal ripple combinations

(TORCs)17 to characterise the STRFs of neurons, though a cross stimulus comparison has never been

investigated. In addition we also tested two novel stimuli: randomly modulated noise and variable

speed dynamic random chords (vsDRCs). We aimed to see which stimulus would give an STRF that

could best predict the peristimulus time histogram (PSTH) of ferret cortical neurons played the natural

sounds we had gathered. However, due to technical difficulties, we were limited in the amount of data

we could gather and so such analysis was only preliminary.

4

Page 5: Full Write Up

Materials and MethodsNatural Sounds Database- In order to carry out our study (and for use in future physiological studies)

we required roughly an hour of sounds gathered from natural and anechoic environments. The sounds

were gathered using a Zoom H29 recorder in the anechoic chamber (a sonically insulated chamber with

walls lined with foam to absorb, and not reflect, noise from echoes) of the Auditory Neuroscience

laboratory in the Sherrington Building, Oxford and also from areas isolated from non-natural sounds in

the countryside of Oxfordshire. The gathered files were hand-edited in Audacity to remove any artifacts

or distortions, such as clipping or microphone noise, and were systematically filed in a provisional

database of 1-20s clips in the .wav format.

As we could only hold a neuron for a limited time, we could only play a small set of natural sounds in

our physiological experiment. Because of this we need to ensure that the sounds we use are a good

representation of the range of natural sounds. We therefore generated a sub-database of 1 second

sounds by the following method:

First a set of ‘good’ 1 second sound segment were found from our gathered database. For each sound

file, first the sound was resampled to the sampling rate of the physiology sound hardware (97,656Hz).

Then a highpass 5th-Order Butterworth Filter was applied at 200 Hz to remove any low frequency

noise. Then multiple 1 second long segments of sound were taken from the file, with starting points

every 100 samples. After this, in order to have sounds than come on and off within the 1 second, only

those segments were taken in which the root mean square (RMS) magnitude in both the first 10 ms and

the last 10 ms were at least 18 dB less than the RMS magnitude in the time between. To ensure that

sounds (such as rain) which did not show such isolated sounds were included, one randomly chosen

segment was also taken from each file. Next, any segments that overlap were excluded, excepting one

segment from each set of overlapping segments. Then the sounds were placed in one of 12 categories –

(table 1). This process left 709 ‘good’ 1-second sound segments.

We also carried out some preliminary analysis into intensity of these sounds above 20kHz (the limiting

sampling rate of present natural sound databases). To do this we recorded silence in the anechoic

chamber using the Zoom H2. From this, we found power spectral density at 1kHz. For each category of

sound, We then found the power spectral density for each frequency relative to the power spectral

5

Page 6: Full Write Up

density of silence at 1kHz. Next, for each category of sound, we plotted a histogram of the distribution

of power spectral densities for each frequency. We visually inspected the distribution of power spectral

densities for frequencies above 20kHz (Figure 5), compared to that for frequencies below 20kHz.

Category of Sound Files Number of Files (each 1s in length)

Leaves 32

Twigs 14

Gravel 42

Heavy Rain 12

Water 10

Birds 6

Sheep 13

Voice1 150

Voice2 213

Voice3 89

Voice4 50

Voice5 77

Computational Modeling- The task of the model is to predict future sound inputs given past sound

inputs. The model will be trained to do this task, and then we will examine its parameters to see if they

are similar to those found in physiological studies of the auditory nerve. We will use a model related to

an artificial neural network (a.k.a. a multilayer perceptron, or MLP) to do this task. Although the

cochlea and auditory nerve is not just a network of neurons, it

is an interconnected system, which can be modeled

functionally as a network, implemented as an artificial neural

network.

First we must preprocess the sounds. Because of limitations in

computing power we must down sample our sounds to a

sampling rate of 4000 Hz, and high pass our sounds at 400 Hz.

6

Table 1: The sound files in our refined natural sound database

Figure 2: a simple 3-layer backpropagation network (redrawn from ref. 30)

Page 7: Full Write Up

Then for each sound file we take lots of snippets of the sound. Each snippet is 60 samples long (15

ms). We take 200,000 snippets at random from across all the sound files. The first 50 samples (12.5ms)

of each snippets is used as input to a model, and the model is trained to predict the corresponding last

10 samples (2.5 ms) of each snippet.

Multilayer perceptron (MLP)

The first supervised learning technique we used involved a backpropogation18 network (Figure 2)

which was used to train a MLP by minimizing the error function via the sum of least squares. This

method was comprised of two distinct stages. Given that the predicted output z is given by:

(1)

Where xi(n)

is component i of vector x(n), the past values of a sound snippet, where i=1 is the present,

going to i=I into the past, where I=50. n is the snippet number, where there are n=1 to n=N snippets,

and N=200,000. wji is an IxJ matrix of weights on the inputs xi where J=50. bj is the biases. f ( ) is some

non-linearity ( in our case tanh), wkj is a JxK matrix of weights to the output z(n). zk (n) is the kth

component of vector z(n), where k=1 is the present, going to k=K into the future, where K=10. z(n) is

the prediction by the network of output t(n), the values in the future of x(n). Equation 1 is optimized to

predict t(n)over all snippets by minimizing the least squared difference between the prediction and the

true values of the future sounds:

(2)

The derivative of this error function is then taken with respect to the weights. In the second stage of

error backpropagation, these derivatives are then used to make adjustments to the weights in equation

(1) to minimize the difference between the expected values of the future sound (z) and the actual values

(t). The results of our MLP are not shown

7

Page 8: Full Write Up

Mixture Density Network

Error backpropagation is limited in its description of a highly variable, 'noisy' input as it can only

predict the expected value each future value, instead of the possible distribution of future values that a

mixture density network (MDN) can19 (Figure 3). An MDN constitutes a multilayer perceptron whose

output is used to parameterize a mixture of Gaussian distributions with a likelihood function of

(3)

Where πk represents the mixing coefficients and μk(x) and σk(x) represent the means and variances of the

input respectively. These 3 parameters are governed by the output of the multilayer perceptron, which

depends on the input (i.e. the past values, x(n) ). p(t|x) gives the distribution for the conditional

probability of t, given x. The likelihood of the target data (the future) under the above distribution,

conditional on the input data (the past), is maximized with respect to the weights of the multilayer

perceptron. This is equivalent to minimizing the negative logarithm of the likelihood function with

respect to w, for all the data, where w is all weight matrices, as given in equation 4:

(4)

In order to test the MDN, 100,000 snippets of files in the sound database were taken, each comprising

of sixty 15ms units after having been downsampled to a rate of 4kHz. The first 50 of these units were

used as training for the network to predict the last 10 units (t).

The MDN also had 50 input units and 50 hidden units. As an output it had 30 10-dimensional isotropic

Gaussians parameterised by a total of 360 parameters; 30 means each consisting of 10 values each, 30

variances, and 30 weightings. As described in the methods, the sum of these Gaussian gives the

probability distribution over the 10 sample sections to be predicted. The Gaussians were initialised by

first applying the k-means algorithm to the target data, k-means clusters the data into k clusters (in our

8

Page 9: Full Write Up

case k=30) by minimising the total squared distance between

the cluster centres and the data. The positions of the k means

gave the initial centres of the 30 Gaussian, and the relative

number of data points associated with each of the centres

gave the weighting. The variance of each Gaussian was set

to the distance to the nearest other data-point (although if this

was 0 the variance was set to 1). The input weights were

initialised using Gaussian random variables with a variance

of 1/2000. The MDN was optimised using a scaled conjugate

gradient algorithm for 100,000 iterations.

Auditory Cortical Neuron Responses-

Experimental Set-up

To obtain STRFs, electrophysiological data from a male

ferret was recorded using silicon probe electrodes

(Neuronexus Technologies, Ann Arbor, Michigan) with 16

sites on a single probe, vertically spaced at 50μm or

150μm. Responses were elicited using Panasonic RPHV27

earphones (Bracknell, UK), coupled to otoscope specula, inserted into each ear canal driven by Tucker-

Davis (Alachua, Florida) Technologies System III hardware with a 97.656kHz sample rate. Sounds

were played after being filtered through a simple cochleagram with 23 filters. Anesthesia was induced

using medetomidine hydrochloride (Domitor; 0.022 mg.kg-1.h-1) and ketamine (Ketaset; 5 mg.kg-1.h-1).

Anesthesia was maintained with an intravenous infusion (5 ml/h) of this mixture in physiological saline

containing 5% glucose and animals also received a single subcutaneous dose of 0.06 mg.kg -1.h-1

atropine sulfate and subcutaneous doses of 0.5 mg/kg dexamethasone every 12 h to reduce bronchial

secretions and cerebral edema, respectively. All animal procedures were approved by the local ethical

review committee and performed under license from the United Kingdom Home Office.

Stimuli

As stimuli we used DRCs, vsDRCs, TORCs, randomly modulated noise and our natural sounds. A

DRC consists of multiple simultaneous pure tones over a range of frequencies. Every epoch (5ms) the

9

Figure 3: a cartoon of a mixture density network (redrawn from ref. 13)

Page 10: Full Write Up

sound intensity of the each tone is picked from a uniform distribution from 10-70 dB. vsDRCs are the

same as a standard dynamic random chord, except that each tone remains at the same intensity for 1-6

epochs, the number of epochs being chose from a uniform distribution. A TORC is the sum of a set of

ripples- Gaussian white noise, modulated at a certain modulation frequency over time, and a different

modulation frequency over sound frequency. For a full description see reference 31. For the randomly

modulated noise, Gaussian white noise was taken and he adjusted using the methods in to have on

average power spectrum that was pink (1/f) and a pink modulation spectrum in over both time and

frequency. This produced a stochastic stimulus with a power spectrum and modulation spectrum

matching the average spectra of natural sounds. In comparison with the TORC stimuli it has a

stochastic aspect without the ordered phase structure. For all of the artificial stimuli, two 30-second

long sound files were played, except for the TORCs where we played three 30 second long sound files

each consisting of 10 consecutive TORCs. This was longer than the other sounds in order to be able to

compare with results from previous experiments with TORCs.

To generate our natural sound stimuli 60 sounds were randomly chosen from the database of processed

sounds in table 1 by first randomly selecting a category and then randomly selecting a file from within

that category. To ensure the chosen sounds were representative of natural sounds, this process was

repeated 2000 times and the set of 60 sounds of was chosen which had a power spectrum and

modulation spectrum closest to that of the power spectra and modulation spectra of a large database of

natural sound (ie. 1/f and 1/f^1.5 respectively, formulae taken from Singh & Theunissen20 ). These 60

natural sounds were then placed in two sound files, each containing 30 randomly order natural sound

segments, with a 250ms gap between each.

All sound types were renormalized to have a final RMS of 80 dB SPL. The 11 sounds files (2 DRCs, 2

vsDRCs, 3 TORC, 2 RMN, and 2 Natural Sounds) were played in a random order. This was done 10

times, with a new random order each time.

Data Analysis

The raw neural response traces from the 32 electrodes were sorted into spike trains from putative

neurons using spikemonger- an in-house spike sorting algorithm. Then for each neuron, the response to

each stimulus over time was taken by getting the spike count in 10 ms windows, placed every 5 ms

10

Page 11: Full Write Up

over the course of the sound. This was then averaged over the 10 repeats, and finally the average count

was converted to an average rate by dividing by the window size (10 ms). This produced a PSTH,

denoted by r, where rt is the spike rate at time t.

The stimuli were then processed into cochleagram, plotting the power of the sound as a function of

time and sound frequency. The cochleagram is constructed using a power density STFT with a 10ms

Hamming window with 5ms overlap. The magnitude of the power spectral density which is then

summed over frequency using a number of triangularly weighted bins corresponding to the equivalent

rectangular bandwidth (ERB) width in the cat (ERBs widths in the ferret are assumed to be similar).

Subsequently, the log10 taken, and all values below -2 set to -2. This gives a very simple cochleagram.

Then for each response bin at time t, the section of cochleagram preceding it is taken, extending τ steps

into the past. Thus each response rt has a corresponding preceding cochleagram segment Xf(t-τ).

The STRF is the matrix, W, which best predicts the PTSH, yt, from the preceding cochleagram

segment, Xf(t-τ) where t is time, f is sound frequency and τ is time into the past. We want to find W which

minimises the difference between the response rates and the product of the STRF and preceding

cochleagram segment:

(5)

where,

(6)

However, as there are too many parameters in W to find a clear minimum, it is required that as

additional constraint: that only a few values of W are very large. This is reasonable as it is expected that

most delays and frequencies will not influence the neuron. Thus, the error function becomes:

(7)

11

Page 12: Full Write Up

We then used the MATLAB minFunc27 to minimise E with respect to W. The data was divided into 3

parts, 80% was used to fit the STRF, 10% was used for cross-validation, and 10% was used to test how

well the STRF predicted the PSTH.

The value of λ was set by crossvalidation, that is the minimization was done for various values of λ, and

λ and the value with the least error in predicting the cross validation set was used. E, appropriately

scaled, is the measure of prediction error.

ResultsNatural Sounds Database- In total 8568 seconds of natural sounds were recorded which was refined

to a database of 12 categories of natural sounds, totaling 709s of data (table 1). Spectrograms that are

characteristic of each sound category from table 1 are plotted in Figure 4.

By inspection, it is clear that there is information in our recordings above 20kHz (Figure 4) in the form

of brief transients. We can examined this in more detail by taking the distribution of power spectral

12

Figure 4: Spectrograms characteristic of each sound category from table 1

Page 13: Full Write Up

density over all the sound files within one category (these are plotted in Figure 5). These show that all

stimuli have some information above 20kHz, though this is most pronounced in the sounds in the

'leaves', 'twigs' and 'birds' categories.

Computational Modeling- We began our analysis by looking at the input weights wji of the optimised

mixture density network (Figure 6a). Each subfigure on this plot shows the input weights to a hidden

unit in our network. These are the weights we would expect to correspond to the frequency tuned

cochlear filters, whose properties can be recorded from the auditory nerve. These subplots have been

reserved in time so they appear as impulse responses in order to compare with impulse responses

recorded from the auditory nerve of cats21,22. The input weights wji show an oscillatory form, and had

most of their power towards more recent time points (on the left of each subplot). These findings are in

line with previous physiological measurements of cochlear filters , and in contrast to Lewicki model7

which produced back-to-front filters.

These weights were then examined in detail by looking at their magnitude spectrum under a Fourier

transform (Figure 7a). Each column is the input to a hidden unit in our network. We can see that most

units show a distinct frequency peak. In Figure 7b the number of units with a best frequency in a given

octave above 125Hz was plotted. In the cochlear, we would expect a roughly equal number of units per

octave. However, in our model, low frequencies were overrepresented.

Each neuron's frequency tuning was also examined in terms of its Q10dB (the center frequency divided

by the bandwidth as measured by the range of frequencies with intensity no less than 10dB below the

maximal intensity) which, in auditory nerve physiological data, has a fixed relationship with the best

frequency of the neuron. As Figure 8a shows, plotting the Q10dB against the center frequency shows

clear correlation (R=0.3) which is significant (p=0.032). This was in line with previous data using cat

auditory nerve fibres23,24 which were plotted against our data in the same Figure. In the Lewicki model,

a similar relationship was found (Figure 8b).

13

Page 14: Full Write Up

14

Page 15: Full Write Up

15

Figure 5: The histogram of intensities (as measured by power spectral densities) across frequencies for the refined natural sound database

Page 16: Full Write Up

16

Figure 6: a) The weight vector to each hidden unit in our modelb) The cochlear filters measured in cats in references 21 and 22

Page 17: Full Write Up

17

Figure 7: a) the magnitude spectrum of the weighting vectors to the hidden units under a Fourier transform- each column is a weight vector b) A histogram of the preferred frequency of the units as assessed by the maximum value of each column on plot a)

Figure 8: a) Our plot of center frequency vs. Q10dB for each hidden unit (black dots) and a line of best fit (black line). Overlayed on this are the best fit lines from past physiological studies (references 23 [blue] and 24 [red]

Page 18: Full Write Up

Auditory Cortical Neuron Responses-In Figures 9a and b, the STRFs for the 5 stimuli presented are

shown. Unfortunately, the neural responses from the experiment were too noisy to be able to do give a

rigorous analysis of how well each stimulus type can predict the response to natural sounds. Only two

neurons showed sufficient regularity for even a preliminary analysis. However, we can still examine the

STRFs for each stimulus type, although any features are based on a very small sample size, and so may

not be representative.

From Figures 9a and 9b, the DRC responses look the noisiest but are of similar preferred frequency to

the STRFs from natural sounds. The responses to the TORCs and randomly modulated noise a width of

frequency tuning more similar to that of the STRFs from natural sounds, but their preferred frequency

is slightly higher. The cleanest STRFs appear to come from the randomly modulated noise stimuli. The

natural sound STRFs also appear more narrow over time than the STRFs over any of the artificial

stimuli.

18

Page 19: Full Write Up

ConclusionsNatural Sounds Database- In total 8568 seconds of natural sound were initially gathered, which

translated as 709 seconds of stimulus after filtering for snippets that complied with our conditions

outlined in our methods (table 1). Most importantly, these files contained no distortions and analysis

19

Figure 9 a) and b) The STRFs, neural responsitivity as a function of sound frequency and delay, for two neurons (a and b) as measured using 5 different sound types

Page 20: Full Write Up

into the modulation spectra of these sounds showed at least some power above 20kHz (Figure 5) with

the categories for 'leaves', 'twigs' and 'birds' showing pronounced power above this frequency. This is

completely lacking in previous sound databases that have a sampling rate of 44.1kHz, reducing their

relevance to studies involving model animals which can hear frequencies above 20kHz.

We have also demonstrated two uses of our natural sounds database here with encouraging, though

preliminary, results.

Computational Modeling- Our modeling studies produced a mixture density network that could predict

the future values of a sound, but also (and more importantly) showed units whose tuning to frequency

had some similarity to data from the auditory nerve, suggesting that the idea of prediction may have

some value in describing the function of the cochlea. The weighting vector to each hidden unit showed

a degree of oscillation which was, to an extent, similar to that found in the auditory nerve. The steepest

rise of the envelope of our weighting filters was nearer to the present, the reverse of the weighting

filters produced by the Lewicki model, but similar to the experimental data we compared with. The

center frequency/Q10dB relationship per unit that the model produced showed some a similar positive

slope to the same relationship examined found in past in vivo studies. However, the absolute were

lower by a factor of approximately 2.

Our model had limitations. Firstly, our weighting vectors to each hidden unit did not oscillate over time

as much as found in the physiological literature. Secondly we saw an over-representation of low

frequencies compared to the even distribution found in the cochlea. In order to overcome this,

anisotropic Gaussians could be used (which might better allow the variance of the predictions to differ

as one moves into the future), or convolutional neural networks similar to those described by Le Cun25

could be used.

Auditory Cortical Neuron Responses-Although our experiment did not yield enough data for rigorous

quantitative analysis, the results we did gather suggest that the randomly modulated noise might

produce the cleanest, most reliable, STRFs. However, which kind of stimulus would best predict the

responses to natural sounds remains unclear due to lack of clean data.

20

Page 21: Full Write Up

Although the results of our study were interesting, at this stage our data is still very preliminary and

further experiments are needed to both confirm the benefits of our sound database and of our approach

to modeling the auditory system.

Both the hypothesis that the function of the auditory system10 (as well as other systems26,27) is one of

prediction has only recently emerged. The use of an MDN model to investigat this is entirely novel. As

mentioned, one of the primary goals of this project was to create a sound database that could be used in

further investigation within the field, and so not only is it hoped that our findings might spur future

studies, but that the database we gathered might be an integral part of these.

A similar paradigm of predictive coding is already emerging in the field of vision and so it would seem

clear that adapting our model to other modalities would also be a worthwhile future task. As good

natural visual databases already exist28,29 we could easily apply our model to these.

In conclusion, we have gathered a large database of natural sounds that will proved valuable in further

investigation into the auditory system and to compliment artificial stimuli, which while useful, have the

limitation that they may be inducing neural responses without physiological relevance.

21

Page 22: Full Write Up

References

1. Felsen, G, and Yang D, "A natural approach to studying vision." Nat Neurosci. 8, no. 12, 1643-

1646, (2005)

2. Pittsburgh Natural Sound Database http://www.cnbc.cmu.edu/cplab/data_NaturalSounds.html

3. Cornell Lab of Ornithology Macaulay Library of Natural Sounds

http://vivo.cornell.edu/display/individual5547

4. Barlow, HB, “Possible principles underlying the transformations of sensory messages.” from

Sensory Communication. (1961)

5. Olshausen, B, and Field, D, "Emergence of simple-cell receptive field properties by learning a

sparse code for natural images." Nature 381 no.6583 607-609, (1996)

6. Lewicki, M. and Olshausen, B. "Probabilistic framework for the adaption of comparison image

codes." J. Opt. Soc. Am. A. 16 no. 7, 1587, (1999)

7. Lewicki, M. "Efficient coding of natural sounds.." Nat Neurosci. 5, no. 4, 356-363 (2002)

8. Dean, I. Harper, N. and McAlpine, D, "Neural population coding of sound level adapts to

stimulus statistics."Nature Neurosci. 8 no. 12, 1684-1689, (2005)

9. Mesgarami, N. and Chang, E. “Selective cortical representation of attended speaker in multi-

talker speech perception” Nature 485, 233–23 (2012)

10. Winkler, Denham and Nelken “Modeling the auditory scene: predictive regularity

representations and perceptual objects” Trends In Cognitive Science, 13 no.12, 532-40 (2009)

11. Welford, Reaction Times, 1980

12. Smith, E. and Lewicki, M. "Efficient auditory coding." Nature 439 no.7079, 978-982 (2006)

13. Zhao and Zhaoping, “Understanding Auditory Spectro-Temporal Receptive Fields and Their

Changes with Input Statistics by Efficient Coding Principles” PLoS Comp Bio, 7, e1002123,

(2011)

14. Theunissen, F.E. David, S.V. Singh, N.C. Hsu, A Vinje, W.E. and Gallant, J.L. “Estimating

spatio-temporal receptive fields of auditory and visual neurons from their responses to natural

stimuli” Network: Computation in Neural Systems, 12 no. 3, 289-316 (2001)

15. Rabionwitz, N. Willmore, B. Schnupp, J. and King, A. “Contrast Gain Control in Auditory

Cortex” Neuron, 70 6 , 1178-1191 (2011)

16. deCharms, C. Blake, D. Merzenich, M. “Optimising sound features for cortical neurons”

Science, 280 no. 53, 1439-1444 (1998)

22

Page 23: Full Write Up

17. Fritz, J. Shamma, S. Elhilali, M. Klein, D. “Rapid task-related plasticity of spectrotemporal

receptive fields in primary auditory cortex” Nature Neuroscience 6, 1216- 1223 (2003)

18. Rumelhart, D. Hinton, G. and Williams, R. “Learning representations by back propagating

errors” Nature, 323 no. 9, 533 (1986)

19. Bishop, eprint (1994)

20. Singh and Theunissen, “Modulation of natural sounds and ethological theories of auditory

processing” Acoustical Society of America, 114 no.6, 3394-3411 (2003)

21. de Boer, E. and dr Jongh, H. “On cochlear encoding: Potentialities and limitations of the

reverse‐correlation technique” Acoustical Society of America,63 no. 1, 115-135 (1978)

22. Carney, L. and Yin, T. “Temporal Coding of Resonances By Low-Frequency Auditory Nerve

Fibers: Single-Fiber Responses and a Population Model” J. Neurophysiol. 60 no. 5, 1653-

1677 (1988)

23. Evans, E. F. Cochlear nerve and cochlear nucleus. in Handbook of Sensory Physiology Vol. 5/2

1–108 (1975).

24. Rhode, W. S. & Smith, P. H. “Characteristics of tone-pip response patterns in relationship to

spontaneous rate in cat auditory nerve fibers.” Hearing Res. 18, 159–168 (1985).

25. LeCun, Y. Bengio, Y. “Convolutional Networks for Images Speech and Time Series” from

Handbook of brain theory and neural networks (1995)

26. Bar, M. “The proactive brain: using analogies and associations to generate predictions” Trends

In Cognitive Science, 11 no.7, 280-289 (2007)

27. Summerfield, C and Egner, T. “Expectation (and attention) in visual cognition” Trends In

Cognitive Science, 11 no.7, 403-409 (2009)

28. Kyoto natural image database http://www.cnbc.cmu.edu/cplab/data_kyoto.html

29. Kayser, C. Einhauser, W. and Konig, P “Temporal correlations of orientations in natural scenes”

Computational Neuroscience: Trends in Research, 52, 117-123 (2003)

30. Depireux, Simon, Klein, Shamma, “Spectro-Temporal Response Field Characterization With

Dynamic Ripples in Ferret Primary Auditory Cortex” J. Neurophysiol. 85 no. 3, 1220-1234

(2001)

31. McDermott, J.H. & Simoncelli, E.P “Sound texture perception via statistics of the auditory

periphery: Evidence from sound synthesis”. Neuron, 71, 926-940 (2011)

23