Learning in large-scale spiking neural...

Preview:

Citation preview

Learning in large-scalespiking neural networks

Trevor Bekolay

Center for Theoretical NeuroscienceUniversity of Waterloo

Master’s thesis presentationAugust 17, 2011

Outline

Unsupervised learning

Supervised learning

Reinforcement learning

Unsupervised learning

The problem

How does the brain learn without feedback?

Neurons connect at synapses

Neurotransmitter

Neurotransmitter

receptors

Voltage-gated

calcium

channel

Axon terminal

(presynaptic)

Synaptic cleft

Dendritic spine

(postsynaptic)

Synaptic strengths change

DefinitionsThis is called synaptic plasticity.

When a synapses gets stronger, it is potentiated.

When a synapse gets weaker, it is depressed.

Modelling LTP/LTD

0 1 2 3 4 5 60

50

100

150

200

250

300

350

Time (hours)

% c

hang

e in

popul

atio

n EPS

P am

plit

ude

~100Hz stim for10s ~100Hz stim for10s

−10 −5 0 5 10 15 20 25 30−100

−50

0

50

100

Time (minutes)

% c

hang

e in

EPS

P slope

3Hz stim for 5 min

BCM rule

θ

0

∆ωij = κaiaj(aj − θ)

Modelling LTP/LTD

0 1 2 3 4 5 60

50

100

150

200

250

300

350

Time (hours)

% c

hang

e in

popul

atio

n EPS

P am

plit

ude

~100Hz stim for10s ~100Hz stim for10s

−10 −5 0 5 10 15 20 25 30−100

−50

0

50

100

Time (minutes)

% c

hang

e in

EPS

P slope

3Hz stim for 5 min

BCM rule

θ

0

∆ωij = κaiaj(aj − θ)

Modelling LTP/LTD with timing

Pre

Post

wiji

j

t i

pre

t j

post

wij

wij

Pre-post Post-pre-40 0 40

0

-0.5

1

-40 0 40

0

Pre-post Post-pre

What’s the difference?

TheoremIf spike trains have Poisson statistics,STDP can be mapped onto a BCM rule(Izhikevich, 2003 and Pfister & Gerstner, 2006).

What’s the difference?

TheoremIf spike trains have Poisson statistics,STDP can be mapped onto a BCM rule(Izhikevich, 2003 and Pfister & Gerstner, 2006).

Test with a simulation

i jΔωij= κ iφ(aj )a ,θ

Presynaptic

pulse

Postsynaptic

pulse

Pulse delay

Pulse rate

Recreated STDP curve

−0.03 −0.02 −0.01 0.00 0.01

tpost − tpre (seconds)

-50

0

50

100

%ch

ange

inωij

Pulse rate: 20Hzθ = 0.30

θ = 0.33

θ = 0.36

Exhibits frequency dependence

−0.03 −0.02 −0.01 0.00 0.01

tpost − tpre (seconds)

-10

0

10

20

30

40

50

%ch

ange

inωij

Pulse rate: 40Hz

Frequency dependence details

1 5 10 20 50 100Stimulation frequency (Hz)

-50

0

50

100

150

%ch

ange

inωij

Simulation (post-pre pairs)

θ = 0.33

θ = 0.30

1 5 10 20 50 100Stimulation frequency (Hz)

−20

−10

0

10

20

30

%ch

ange

inωij

Experiment (Kirkwood)

Light-rearedDark-reared

1 5 10 20 50 100Stimulation frequency (Hz)

-50

0

50

100

150

%ch

ange

inωij

Simulation (pre-post pairs)

θ = 0.33

θ = 0.30

Supervised learning

The problem

Given input X and output Y, find G(x) = y

Learning nonlinear functions

Function Inputdimensions

Outputdimensions

Neurons inlearningnetwork

Neurons incontrolnetwork

f(x) = x1x2 2 1 420 420f(x) = x1x2 + x3x4 4 1 756 756f(x) = [x1x2, x1x3, x2x3] 3 3 756 756f(x) = [x1, x2]⊗ [x3, x4] 4 2 700 704 (2-layer)

1500 (3-layer)f(x) = [x1, x2, x3]⊗ [x4, x5, x6] 6 3 800 804 (2-layer)

1700 (3-layer)

Simulation results

0 10 20 30 40 50 60

1.0

1.2

1.4

1.6

1.8

2.0

Rel

ativ

eer

ror

(lea

rnin

gvs

.ana

lyti

c)

f(x) = x1x2

0 20 40 60 80 1000.8

1.0

1.2

1.4

1.6

1.8

2.0

2.2

f(x) = x1x2 + x3x4

0 20 40 60 80 1000.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

f(x) = [x1x2, x1x3, x2x3]

0 20 40 60 80 100 120 140Learning time (seconds)

1.0

1.2

1.4

1.6

1.8

2.0

2.2

2.4

Rel

ativ

eer

ror

(lea

rnin

gvs

.ana

lyti

c)

3 layer control2 layer control

f(x) = [x1, x2]⊗ [x3, x4]

0 50 100 150 200 250 300 350 400Learning time (seconds)

1.0

1.2

1.4

1.6

1.8

2.0

2.2

3 layer control

2 layer control

f(x) = [x1, x2, x3]⊗ [x4, x5, x6]

Simulation results

1 2 3 4 5 6 7

Input dimensions20

30

40

50

60

70

80

90

100

Tim

eto

lear

n(s

econ

ds)

1 2 3

Output dimensions30

40

50

60

70

80

90

100

3 4 5 6 7 8 9

Input+Output dimensions20

30

40

50

60

70

80

90

100

∼ 2.25 hours to learn 500 → 500 D function

Large-scale models

Neural Engineering Framework1. Representation (encoding and decoding)

2. Transformation

Representation

−1.0

−0.8

−0.6

−0.4

−0.2

0.0

0.2

0.4

0.6

0.8

Encoding

Decoding

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5

−1.0

−0.8

−0.6

−0.4

−0.2

0.0

0.2

0.4

0.6

0.8

Time (seconds)

Encoding vectors represent neuron sensitivity

−1.0 −0.5 0.0 0.5 1.00

50

100

150

200

−0.5

0.0

0.5

−0.50.0

0.5

20

40

60

80

100

LIF tuning curves Projected in 2-D space

Plasticity in the NEF

∆ωij = κaiαjej · Ewhere

I ej is the encoding vector andI E is the error to be minimized.

Calculating E online

Error

OutputInputx

f(y) = -y

G(x) = ?

Excitatory/InhibitoryModulatoryPlastic

y

Our solution

Error

OutputInputx

f(y) = -y

G(x) = ?

Excitatory/InhibitoryModulatoryPlastic

y

Network provides E

+EM-rule ∆ωij = κaiαjej · E

Reinforcement learning

D

G

Rw

RtRt

Rw

A

D

Go

A

L R

Rw

Rt Rt

Reinforcement learning

D

G

Rw

RtRt

Rw

A

D

Go

A

L R

Rw

Rt Rt

Behavioural results

40 80 120 1600.0

0.2

0.4

0.6

0.8

1.0Always move left

Always move right

L:21% R:63% L:63% R:21% L:12% R:72% L:72% R:12%Average over 200 runs

One experimental run

One simulation run

Action selection (basal ganglia)

π(s) = arg maxaQ(st+1, at+1)

Changing Q-values

Ventral striatum calculates reward prediction error

Neural results

Experimental

Simulation

0510152025303540

020406080100120140160

Delay Approach Reward DelayN

eur

on

Tria

l

Contributions

I Shown evidence that BCM capturesqualitative features of STDP

I Proposed a biologically plausiblesolution to the supervised learningproblem

I Created a model that solves stochasticreinforcement learning task andmatches biology

A unifying learning rule

∆ωij = αjai [κuaj(aj − θ) + κsej · E]

Acknowledgements

Thank you for listening and (possibly) reading!

Recommended