Sensory (auditory) encoding

Sensory (auditory) encoding

Stephen David QANE – S1

October 3, 2017 [email protected]

Overview

• Sensory systems from a mile (kilometer) up

• Different approaches to modeling neural encoding

• Lab: nuts & bolts of basic encoding models

A specialized sensory periphery …

EM

mechanical

chemical

chemical

mechanical

electrical (spikes)

… feeds (more or less) standard central circuits

Sensory input guides behavior

Sensory processing is lossy information flow

Sensory inputs (many* bits)

Motor outputs (a few bits)

Brain

* In theory, the auditory nerve can encode 10360 different 1-second sounds. The universe is about 1018 seconds old.

All sensory processing is “active”

Sensory inputs Motor outputs Perception / decision-making

Reward assessment / learning

”context”

Brain

Context spans a large spatio-temporal continuum

Brain

Evoked activity

Neuromodulators

Synaptic weights

Genes

Network state

slower

faster Te

mp

ora

l sca

le } Context

smaller

larger

Spat

ial s

cale

Sensory inputs Motor outputs

108 s (1013?)

10-2 s

10-7 m

10-1 m

Context spans a large spatio-temporal continuum

Sensory inputs Motor outputs Brain

Evoked activity

Neuromodulators

Synaptic weights

Genes

Network state

slower

faster ti

mes

clae

1. Context, predictive coding

3. Synaptic plasticity

4. Development

2. Attention

5. Selection / evolution

Today: focus on encoding at the perceptual scale

Sensory inputs Motor outputs Brain

Evoked activity

Neuromodulators

Synaptic weights

Genes

Network state

slower

faster ti

mes

clae

Ken Harris

Christian Machens

Srdjan Ostodic

Today: focus on the auditory system

Retina

LGN

V1

Cochlear nucleus

MGB

A1

Superior olive

IC

Visual system Auditory system

(Kandel, Schwartz & Jessell)

Encoding models applied to other systems

• Visual system – Jones & Palmer 1987 – Ringach, Hawken, Shapley 1997 – David, Vinje, Gallant 2004 – Pillow et al. 2008 – Yamins, DiCarlo 2016 – … and many, many more.

• Somatosensory system – DiCarlo, Johnson, Hsiao 1998

• Olfactory system – Nagel, Hong, Wilson 2015

Tools for characterizing selectivity

h(s) s(x,t) r(t)

Stimulus (s) Neural response (r)

• Tuning curves • best frequency, modulation rate, etc.

• Filter-based encoding models (e.g., the STRF) • Selectivity indices

• no tuning space required • e.g., information theory

• Optimal stimuli

How does neural activity encode stimuli?

• The fact that information exists in a neural response does not mean that it is used by the brain.

• Two perspectives: – Classic: What information about a stimulus can be recovered from the

neural response? – More nuanced: How is a stimulus represented across the entire neural

population? What impact does a neuron’s activity have on downstream neurons and behavior?

• Implicit vs. explicit coding

– Implicit: information exists in the periphery but may not be readily accessible.

– Explicit: information is available to decoding scheme “X”, typically a rate code. “Untangle” the representation (DiCarlo & Cox 2007)

Neural encoding models

• Tuning curve – Spike counting

• Spectro-temporal receptive field (STRF) – Spike-triggered averaging (linear regression)

• Arbitrary (nonlinear) stimulus-response mapping – Gradient descent (machine learning)

• Two themes:

– How is each method implemented? – How is each method useful?

Ge

ne

raliz

abili

ty

Co

mp

lexi

ty

Inte

rpre

tab

ility

Dat

a re

qu

irem

en

ts





• Two themes:


Ge

ne

raliz

abili

ty

Co

mp

lexi

ty

Inte

rpre

tab

ility

Dat

a re

qu

irem

en

ts

Mammalian auditory system

Cochlea (frequency

decomposition):

(http://jan.ucc.nau.edu; Kandel, Schwartz & Jessell)

Hair cells (transduction

to spikes):

Ear (collector):

Auditory inputs to brain

Spectrogram: (phase-locking largely absent)

(Yang, Wang & Shamma 1992)

Pla

ce c

od

e

Frequency tuning curve

Tuning curve: mean spike rate for each parametric manipulation of a stimulus. Spikes recorded from A1 of an awake ferret during presentation of bandpass noise centered at 20 logarithmically-spaced frequencies.

Emergent tuning properties

• Example 1: Space

– Phase-locking encodes spatial information implicitly. Spatially tuned cells encode space explicitly:

(Laback & Majdak 2008; Bala et al. 2003)

Owl midbrain: Interaural time difference (ITD):

Envelope amplitude


• Example 2: Amplitude modulation (AM)

Amplitude of natural vocalizations is modulated in time:

Sinusoidal amplitude modulated (SAM) noise

Time

3 Hz 10 Hz

Stimulus-locked response

Non-stimulus-locked

Temporal code: reliability of response at fixed time in AM cycle Rate code: average firing rate for a given AM rate


• Example 2: Amplitude modulation (AM) – Temporal code: precision of spike times relative to stimulus envelope varies with AM rate

– Rate code: total spike count varies with AM rate, but timing can be imprecise

– (NB: sometimes, confusingly, responses that follow the stimulus envelope are also referred to as “phase-locked”)

(Liang & Wang 2002)

Temporal code (vector strength)

Rate code (rate tuning)

What does the central auditory network do?

Cortex vs. periphery • Broader tuning

• Sloppier timing

• Less reliability

Possible explanations Invariance? Feature integration? Transformation to rate code? State dependence? Plasticity?

IC A1 (core) dPEG (belt)





• Two themes:


Ge

ne

raliz

abili

ty

Co

mp

lexi

ty

Inte

rpre

tab

ility

Dat

a re

qu

irem

en

ts

Receptive field models

• Neural selectivity is often described as a sensory filter, describing the relationship between neural activity and the preceding stimulus at each point in time .

• Common neural filter models in the auditory system are the spike-triggered average (STA) and spectro-temporal receptive field (STRF)

• Filter models map conceptually onto the idea of lossy information flow (many input channels map to one spike rate).

Spike-triggered average

Time lag Sp

ikes

/dB

(de Boer 1968)

Sou

nd

p

ress

ure

, s(t

)

Spike times

Cross- correlation

Resonant frequency

• Derived from methods for systems identification in engineering (a.k.a. white noise analysis). • Basic idea: Present a complex sound and derive statistically what features of the sound evoke

neural activity. • Can be applied to evoked potentials, spikes, LFPs, BOLD signals.

Spectro-temporal receptive fields (STRFs)

(Aertsen et al. 1981)

Stimulus spectrogram Neural response

Linearization. Compute the spike-triggered average after transforming the stimulus into a representation that accounts for early processing. In this case, frequency tuning of the cochlea.

STRFs: How?

+ + + + +


… …

Temporally-orthogonal ripple combinations (TORCs) are broadband noise stimuli that drive activity in AC and sample the space of possible stimuli efficiently (Klein et al 2001).

Why STRFs?

• STRFs are a general model. They can predict the neural response to any arbitrary natural stimulus.

• Perfect prediction = perfect model

• Unbiased characterization of tuning properties. Tuning curves report tuning through a pre-selected slice of stimulus space.

• Infer the biophysical mechanisms and networks that make the brain work

A snapshot of auditory cortex

STRFs for 24 channels recorded simultaneously in A1/AAF using a

fixed array of platinum-iridium electrodes.

Rate/scale space

(Singh and Theunissen, 2003)

According to Fourier theory, spectrograms can be decomposed into the sum of 2-dimensional sine wave gratings, analogous to spatial gratings in the visual system.

Rate/scale space

Modulation tuning functions (MTFs)

Time lag

Freq

ue

ncy

Rate

|FFT|

Scale

Time lag

Freq

ue

ncy

Rate

|FFT|

Time lag

Freq

ue

ncy

Rate

|FFT|

Spectro-temporal receptive fields

(Miller & Schreiner 2002)

STRFs permit the simultaneous measurement of multiple tuning properties

Matched tuning to natural stimuli?

(Miller & Schreiner 2002)

(Singh & Theunissen, 2003)

Modulation tuning

Stimulus modulation spectra STRF distribution matches modulation spectrum of natural sounds?

STRFs for single units in

awake ferret auditory

cortex from primary (core)

and secondary (belt)

auditory cortex.

A1

(primary)

dPEG

(secondary)

STRFs describe the processing hierarchy

A1 vs. dPEG:

Increasing integration time

Increasing complexity

But some dPEG neurons

look like A1. Longer tails in

distributions

STRFs describe the processing hierarchy

(Miller et al 2001)

STRFs recorded from connected pairs of MGB and A1 neurons reveal convergent inputs.

Neural activity and sensory representation

Sensory stimulus

Neural response

?

Encoding approach How does a neural signal respond to (encode) a given stimulus?

Sensory stimulus

Neural response

?

Decoding approach What information about a stimulus can be inferred (decoded) from a neural signal?

Stimulus reconstruction

Linear decoder (Bialek 1991; Mesgarani 2009)

Linear encoder (STRF; Theunissen 2001; David 2009)

What information is encoded in the neural population?

Stay tuned for Nima Mesgarani’s lecture in a few weeks!

Spike-triggered average as linear regression

−1 0 1

0

10

20

Sp

ike

s/s

Stimulus

amplitude

STRF = cross-correlation between stimulus spectrogram and time-varying spike rate.

Fre

qu

ency (

kH

z)

TORC #1

0.5

1

2

4

8

16

2468

10

Re

pe

titio

n

0 50 100 150 200 2500

40

80

120

Spik

es/s

Time (ms)

TORC #2

0 50 100 150 200 250

Time (ms)

TORC #30

0 50 100 150 200 250

Time (ms)

...

Actual PSTHPredicted PSTH

0

1

Norm

. a

mp

litu

de

Correlate stimulus and response at preferred frequency and time-lag:


STRF = cross-correlation between stimulus spectrogram and time-varying spike rate.

6.3

8.0

16.0

Fre

qu

en

cy

(k

Hz)

−1 0 1

010

20

Sp

ike

s/s

Stimulus

amplitude

15 35Time lag (ms)

25 45

12.7

10.1

Time lag (ms)20 60 100

.5

1

2

4

8

16

Fre

qu

en

cy (

kH

z)

STRFs can predict neural responses

Fre

qu

ency (

kH

z)

TORC #1

0.5

1

2

4

8

16

2468

10

Re

pe

titio

n

0 50 100 150 200 2500

40

80

120

Spik

es/s

Time (ms)

TORC #2

0 50 100 150 200 250

Time (ms)

TORC #30

0 50 100 150 200 250

Time (ms)

...

Actual PSTHPredicted PSTH

0

1

Norm

. a

mp

litu

de


.5

1

2

4

8

16

Fre

qu

en

cy (

kH

z)

STRF estimation:

STRF prediction:

Reverse correlation as linear algebra

Linear algebraic form (using a delay-line, so that all stimulus samples influencing response at time t occur in row t of stimulus matrix S):

STRF estimation:

STRF prediction:

Prediction: Estimation:

(white noise)

What about natural stimuli?

Speech (natural)

TORC (parametric)

(David et al. 2009)

Natural stimuli and STRFs

(Woolley & Theunissen 2004)

STRFs in the birdsong system depend on estimation stimulus

Natural stimuli and STRFs

STRFs are piecewise-linear estimates of a nonlinear function

What about natural stimuli?

Unlike rippled noise, speech and other natural sounds are correlated in frequency and time:

(Theunissen et al 2001)

500

1000

2000

4000

Fre

qu

ency

(H

z)

8000

10 30 50 70 90 110

Time lag (ms)

500

1000

2000

4000

Fre

qu

ency

(H

z)

8000

10 30 50 70 90 110

Time lag (ms)

STA from speech STA from TORCs

Simply computing the spike-triggered average produces an artefactually smoothed STRF:

Normalized reverse correlation

0.9

0.99

1 - 10-3

1 - 10-4

1 - 10-5

1 - 10-6

Tolerance

500

1000

2000

4000

Fre

qu

ency

(H

z)

8000

10 30 50 70 90 110

Time lag (ms)

-1

0

1

No

rmal

ized

gai

n

Optimal-tolerance estimate (based on cross-validation)

Normalized reverse correlation. Natural stimuli contain spectro-temporal correlations, which must be accounted for to obtain the correct regression solution.

(Theunissen et al 2001)

Speech vs. noise STRFs

500

1000

2000

4000

Fre

qu

ency

(H

z)

8000

10 30 50 70 90 110

Time lag (ms)

500

1000

2000

4000

Fre

qu

ency

(H

z)8000

10 30 50 70 90 110

Time lag (ms)

500

1000

2000

4000

Fre

qu

ency

(H

z)

8000

10 30 50 70 90 110

Time lag (ms)

STA from speech STA from TORCs

STRF from speech by NRC

(Careful!) application of normalized reverse correlation reveals that TORC and speech STRFs have similar spectral tuning and differ primarily in their temporal dynamics.

(David et al. 2009)

Alternative: STRF estimation by boosting

Iteration

(David & Shamma 2007)

Boosting (a.k.a. coordinate descent) is a specific implementation of a gradient descent algorithm that works by iteratively updating the single STRF coefficient that best improves prediction accuracy.

500

1000

2000

4000

Fre

qu

ency

(H

z)

8000

10 30 50 70 90 110

Time lag (ms)

-1

0

1

No

rmal

ized

gai

n

Reverse correlation vs. boosting?

500

1000

2000

4000

Fre

qu

ency

(H

z)

8000

10 30 50 70 90 110

Time lag (ms)

-1

0

1

No

rmal

ized

gai

n

500

1000

2000

4000

Fre

qu

ency

(H

z)

8000

10 30 50 70 90 110

Time lag (ms)

-1

0

1

No

rmal

ized

gai

n

STRF from boosting STRF from reverse correlation

0 0.2 0.4 0.6 0.80

0.2

0.4

0.6

0.8

NRC STRF

prediction correlation

Boost

ed S

TR

F

pre

dic

tion c

orr

elat

ion

n=164

Prediction accuracy (for single neurons in A1 played speech) is slightly higher for boosted STRFs.

?

≠





• Two themes:


Ge

ne

raliz

abili

ty

Co

mp

lexi

ty

Inte

rpre

tab

ility

Dat

a re

qu

irem

en

ts

A1 neurons are not linear


.25

.5

1

2

4

8

Fre

qu

en

cy (

kH

z)

Fre

qu

en

cy

(k

Hz)

20

40

−1 0 1

Sp

ikes/s

Stimulus

amplitude

15 35Time lag (ms)

25 45

1.6

1.3

1

0.8

0.6

0 0.2 0.4 0.6 0.80

0.2

0.4

0.6

0.8

NRC STRF

prediction correlation

Boost

ed S

TR

F

pre

dic

tion c

orr

elat

ion

n=164

Moving beyond the linear STRF

Problems with existing models • Classical STRF (LN model) has limited accuracy. • Several alternatives have been proposed but no single model has been

established as a replacement. • Behavior makes things more complicated, because data from behaving

animals are often limited and larger parameter count makes fitting harder.

Strategy (David & Thorson 2015): • Compare a large variety of model architectures with a standard data set • Reduce dimensionality while maximizing prediction accuracy • Starting point: LN STRF

Stimulus

cochleogram

Linear

filter

Static

nonlinearity

Response (all states)

Maximum a posteriori (MAP) estimation

David & Gallant 2004; Wu, David & Gallant 2007

*

a rg m in ,E r t h s t R

Loss function: smaller prediction error = more probable (e.g., mean squared error)

Model class: function that predicts response to any stimulus (e.g., the STRF)

Prior: penalize unlikely fits (e.g., flat, smooth, sparse prior).

What are the most probable fit parameters, *, given available data and existing knowledge of the system?

Estimation stimulus: sample of stimuli used for fitting

Model class

Basic model:

“Engine”:

Stimulus

cochleogram

Linear

filter

Static

nonlinearity

Response (all states)

The obvious first nonlinearity to try: apply static nonlinearity to the output of the STRF (a.k.a. Generalized Linear Model or GLM. Paninski et al. 2004):

This model can be thought of as an instance of a cascade of transformations from stimulus to response:

Fitter

Basic model:

Engine:

Fitter: Choose values of θ for which the engine produces the minimum prediction error.

Gradient :

Reduced rank spectro-temporal filters

Full rank STRF

Rank 1

Rank 2

Rank 3

Rank 4

(Simon et al. 2007; Park & Pillow 2012)

Reduced rank STRFs

Model prediction accuracy compared for standard single-unit data set recorded in ferret A1 during presentation of natural vocalizations.

(Thorson & David 2015)

Reduced rank STRFs

Prediction correlation,

FIR model

Pre

dic

tio

n c

orr

ela

tion

,

D=

2 f

acto

rize

d m

ode

l

0 0.2 0.4 0.6 0.8

0

0.2

0.4

0.6

0.8

N=176

0 100 200 300Parameter count

Factorized

***

FIR

D: 12

3

4

Me

an

pre

dic

tion

co

rrela

tion

0.42

0.44

0.46

0.48

0.50

0.40

Reduced-rank model performs better* and requires fewer parameters

* “better” only because the full-rank model suffers from over-fitting.


Parametric STRFs

Assume frequency tuning is Gaussian (2 free parameters):

Fre

qu

en

cy

Gain

σµ

Time lag

Gain

Assume temporal tuning is pole-zero filter (3-5 parameters):


Parametric STRFs

Factorized Ws & Wt

Gaussian Ws, Pole-zero Wt

FIR

0 100 277Parameter count

50 150

0.4

0.5

Mea

n p

redic

tio

n c

orr

ela

tion

0.42

0.44

0.46

0.48

FIR Filter Factorized Parameterized

R=0.58 R=0.61 R=0.64

R=0.37 R=0.42 R=0.55cell

po

r053

a-0

6-0

1ce

ll on

i013

b-b

1

0 50 100 150

Response latency (ms)

0.2

0.6

2.0

6.3

20

Stim

ulu

s f

requ

en

cy (

kH

z)

0.2

0.6

2.0

6.3

20


How many degrees of freedom are required?

Performance of n=1061 LN STRF architectures, compared for A1 neurons during presentation of vocalizations. A 29-dimensional model had best prediction accuracy.


(278 free parameters) (29 free parameters)

“Standard” STRF Parameterized STRF

Focus on temporal dynamics

Speech (many spectral channels)

Speech-modulated noise (one spectral channel, still naturalistic)

Encoding of vocalization-modulated noise


Encoding of vocalization-modulated noise

Linear (“LN”) receptive field model

A role for short-term plasticity?

• Evidence for strong influence of nonlinear synaptic depression in A1 – Fine timing of responses to amplitude-

modulated stimuli (Elhilali et al. 2004) – Dynamics of forward suppression (Wehr & Zador

2005) – Changes in STRFs estimated from speech and

rippled noise (David et al. 2009)

• Synaptic depression is a well-modeled process

at the single synapse level – Vesicle depletion rate, u, proportional to

presynaptic input – Recovery time constant, t , independent of input

Pre Post

(from Tsodyks & Markram 1997)

Nonlinear STP receptive field model

STP STRF has improved predictive power


STP STRF expanded for fully natural sounds

Spectral

weightingSTP

Temporal

filter

Static

nonlinearity

Frequency

Gain

Frequency

Gain

Stimulus

spectrogram

Spike

response

Nonlinear encoding models

0 100 200 300

n=117 A1 neurons

Parameter count

0

0.2

0.4

0.6M

edia

n p

red

ictio

n c

orr

ela

tion

perf

orm

an

ce

complexity

nonlinear filterslinear filters

Standard

LN STRF

Best-performing

nonlinear STRF

Pareto

frontier

>1000 encoding model architectures compared using 20 minutes of data recorded from n=117 A1 single units in awake ferrets during presentation of natural vocalizations Long-term plan: put model fitting system online to allow other labs to fit & compare models on their data.

Other variants of the STRF

• Linearize the input space – Gill, Woolley, Fremouw, Theunissen 2006 (cochlear model) – David, Shamma 2013 (STP example above) – Willmore, Schoppe, King, Schnupp, Harper 2016 (midbrain model)

• Subspace models

– Atencio, Sharpee, Schreiner 2008 (information theory-based) – Kozlov, Gentner 2016 – Atencio, Sharpee 201

• Gain control – Rabbinowitz, Willmore, Schnupp 2012 – Williamson, Ahrens, Linden, Sahani 2016

• Neural networks, deep networks – Harper, Willmore, Cui, Schnupp 2016 (similar to subspace models) – Yamins, DiCarlo 2016 (visual cortex) – Kell, Yamins, Norman-Haignere, McDermott (in prep!)

Take-home messages

• Neural encoding models span a wide range of complexity and generalizability

• Important factors: model architecture, fit stimulus, fit algorithm, cost function, priors

Prep for lab

• Download and install Anaconda: – https://www.anaconda.com/download/ (Python v3.6) – Follow instructions for installation – Required packages (if using a different Python distribution):

• numpy, scipy, matplotlib, ipython

• Download STRF demo code & data:

– https://bitbucket.org/lbhb/strf_demo/downloads/ – Click “Download repository” and unzip to desired directory

• Run ipython in terminal window from directory where STRF demo files are installed. Then test:

In [1]: %pylab

In [2]: run cartoon_rc.py

https://www.anaconda.com/download/




https://bitbucket.org/lbhb/strf_demo/downloads/

https://bitbucket.org/lbhb/strf_demo/downloads/

Documents

Sensory (auditory) encoding