Neuromorphic Computing and Learning: A …Neuromorphic Computing and Learning: A Stochastic Signal Processing Perspective Osvaldo Simeone joint work with Hyeryung Jang (KCL), Bipin

Neuromorphic Computing and Learning:A Stochastic Signal Processing Perspective

Osvaldo Simeonejoint work with Hyeryung Jang (KCL), Bipin Rajendran (NJIT), Brian

Gardner and Andre Gruning (Univ. of Surrey)

King’s College London

December 11, 2018

Osvaldo Simeone Neuromorphic Computing 1 / 53

Overview

Motivation

Models

Algorithms

Examples


Overview

Motivation

Models

Algorithms

Examples


Machine Learning Today

[Rajendran ’18]

Breakthroughs in ML have come at the expense of massive memory,energy, and time requirements...


Machine Learning Today

... making many state-of-the-art solutions not suitable for mobile orembedded devices.

[Rajendran ’18]


Machine Learning at the EdgeA solution is mobile edge or cloud computing: offload computationsto an edge or cloud server.

Possible privacy and latency issues


Machine Learning at the EdgeA solution is mobile edge or cloud computing: offload computationsto an edge or cloud server.

Possible privacy and latency issues


Machine Learning on Mobile DevicesScaling down energy and memory requirements for implementation onmobile devices requires exploring trade-offs between accuracy andcomplexity.Active field: many new chips released by established players andstart-ups to implement artificial neural networks...


Human vs MachineBeyond artificial neural networks...

13 Million Watts5600 sq. ft. & 340 tons

∼ 1010 ops/J

∼ 20 Watts2 sq. ft. & 1.4 Kg∼ 1015 ops/J

Source: https://www.olcf.ornl.gov, Google Images


Neuromorphic ComputingNeurons in the brain process and communicate over time using sparsebinary signals (spikes or action potentials).This results in a dynamic, sparse, and event-driven operation.

[Gerstner]


Spiking Neural Networks

Spiking Neural Networks (SNNs) are networks of spiking neurons.

Topic at the intersection of computational neuroscience (focused onbiological plausibility) and machine learning (focused on accuracy andefficiency).

x2

x1

xn

y

Artificial Neural Network (ANN)

Spiking Neural Network (SNN)

Time

Time

w1

wn

w2

...

w1

wn

w2

...

x2(t)

x1(t)

xn(t)

y(t)



Spiking Neural Networks (SNNs) are networks of spiking neurons.

Topic at the intersection of computational neuroscience (focused onbiological plausibility) and machine learning (focused on accuracy andefficiency).

x2

x1

xn

y



Time

Time

w1

wn

w2

...

w1

wn

w2

...

x2(t)

x1(t)

xn(t)

y(t)



Proof-of-concept hardware implementations of SNNs havedemonstrated significant energy savings as compared to ANNs...



... generating significant (and perhaps premature) press coverage andpositive market predictions.


Overview

Motivation

Models

Algorithms

Examples


I/O Interfaces

SNNNeuromorphic

sensor

Ex.: silicon cochlea, retina

Neuromorphic

actuator


I/O Interfaces

SNNencoder decoder

source actuator

5


I/O Interfaces

Rate encoding, time encoding, population encoding,...

Rate decoding, first-to-spike decoding,...

SNNencoder decoder

source actuator

5


Internal Operation

An SNN is a network of spiking neurons.

Unlike ANNs, in SNNs neurons operate sequentially over time byprocessing and communicating via spikes.

Discrete-time vs continuous-time: most hardware implementationsfollow former (e.g., Intel’s Loihi).

x2

x1

xn

y



Time

Time

w1

wn

w2

...

w1

wn

w2

...

x2(t)

x1(t)

xn(t)

y(t)


Internal Operation

An SNN is a network of spiking neurons.

Unlike ANNs, in SNNs neurons operate sequentially over time byprocessing and communicating via spikes.

Discrete-time vs continuous-time: most hardware implementationsfollow former (e.g., Intel’s Loihi).

x2

x1

xn

y



Time

Time

w1

wn

w2

...

w1

wn

w2

...

x2(t)

x1(t)

xn(t)

y(t)


Internal Operation

Internal operation defined by:

I topology (connectivity)I spiking mechanism

x2

x1

xn

y



Time

Time

w1

wn

w2

...

w1

wn

w2

...

x2(t)

x1(t)

xn(t)

y(t)


Topology

Two types of connections between neurons:I Synaptic links

F Causal dependency of a post-synaptic neuron on pre-synaptic neuronF Possibly recurrent: long-term memory


Topology

Two types of connections between neurons:I Synaptic linksI Lateral dependencies:

F Instantaneous correlation between spiking of two neuronsF Excitatory or inhibitory


Topology

An important example is a multi-layer SNN with lateral connectionswithin each layer.

Focus on this topology in the following, although many considerationsgeneralize.


Spiking MechanismEach neuron is characterized by an internal state known as membrane

potential u(l)i ,t [Gerstner and Kistler ’02].

Generally, a higher membrane potentially entails a larger probability ofspiking.It evolves over time as a function of the past behavior of pre-synapticneurons and of the neuron itself.


Spiking MechanismEach neuron is characterized by an internal state known as membrane

potential u(l)i ,t [Gerstner and Kistler ’02].

Generally, a higher membrane potentially entails a larger probability ofspiking.It evolves over time as a function of the past behavior of pre-synapticneurons and of the neuron itself.


Membrane Potential

u(l)i ,t =

∑j∈V(l−1)

w(l)j ,i

(at ∗ s(l−1)j ,t

)+ w

(l)i

(bt ∗ s(l)i ,t

)+ γ

(l)i


Membrane Potential

u(l)i ,t =

∑j∈V(l−1)

w(l)j ,i

(at ∗ s(l−1)j ,t

)+ w

(l)i

(bt ∗ s(l)i ,t

)+ γ

(l)i

Feedforward filter (kernel) at with learnable synaptic weight w(l)j ,i


Membrane Potential

u(l)i ,t =

∑j∈V(l−1)

w(l)j ,i

(at ∗ s(l−1)j ,t

)+ w

(l)i

(bt ∗ s(l)i ,t

)+ γ

(l)i

Feedback filter (kernel) bt with learnable w(l)i (e.g., refractory period)

Bias (threshold) γ(l)i


Membrane Potential

Kernels can more generally be parameterized via multiple basisfunctions and learnable weights [Pillow et al ’08].

This allows learning of temporal processing, e.g., by adapting synapticdelays.


Deterministic Models

The most common model is leaky integrate-and-fire [Gerstner andKistler ’02]:

I Spike when membrane potential is positiveI Non-differentiable with respect to model parametersI Heuristic training algorithms based on ideas such as surrogate gradient

[Neftci ’18] [Anwani and Rajendran ’18].

Probabilistic models are more flexible and yield principleddifferentiable learning rules [Koller and Friedman ’09].














Probabilistic Models

Basic probabilistic model: Generalized Linear Model (GLM)I There are no lateral connections, and the conditional spiking

probability is [Pillow et al ’08]

p(s(l)i,t = 1|s(l−1)≤t−1, s

(l)≤t−1) = σ(u

(l)i,t )


Probabilistic Models

More general energy-based model (e.g., Boltzmann machines for timeseries [Osogami ’17]):

I With lateral correlations defined by parameters r(l)i,j , the joint

probability of spiking for layer l is

pθ(l)(s(l)t |s

(l−1)≤t−1, s

(l)≤t−1) ∝ exp

{ ∑i∈V(l)

u(l)i,t s

(l)i,t +

∑i,j∈V(l)

r(l)i,j s

(l)i,t s

(l)j,t

}


Overview

Motivation

Models

Algorithms

Examples


Training SNNs

Supervised learning:

training set = {(input, output)} → generalization

Unsupervised learning:

training set = {(input or output)} → compression, sample generation,clustering, ...

Reinforcement learning:

active (iterative) training = state (input) 7→ action (output) 7→ rewardand new input


Training SNNs








Training SNNs








Training SNNs

Training is carried out by following a learning rule.

A learning rule describes how the model parameters are updated onthe basis of data in order to carry out a given task.

Online or batch

Local vs global information


Training SNNs



Online or batch



Training SNNs



Online or batch



Learning Rules

The general form of many learning rules for synaptic weights followsthe three-factor format [Fremaux and Gerstner ’17]:

θ ← θ + η × learning signal × pre-syn × post-syn

Pre-synaptic and post-synaptic terms are local to each neuron.

Learning signal, aka neuromodulator, is global.

The product pre-syn × post-syn tends to be large when the twoneurons spike at nearly the same time.

“Neurons that fire together, wire together” (Hebbian theory, STDP,BCM theory) [Hebb ’49] [Bienenstock et al ’82].


Learning Rules








Learning Rules








Deriving the Learning Rules

Three-factor rules can be derived as a form of stochastic gradientdescent for the probabilistic model:

θ ← θ + η × M × ∇ ln p(output | input)

learningrate

learningsignal


Deriving the Learning Rules

Gradient for synaptic weights under GLM (no lateral connections)obtained by summing over time t

∇w

(l)j,i

log pθ(s(l)i ,t |s

(l)≤t−1, s

(l)≤t−1) =

(at ∗ s(l−1)j ,t

)︸︷︷︸pre-synaptic trace

(s(l)i ,t − σ

(u(l)i ,t

))︸︷︷︸post-synaptic error

Post-synaptic error = desired/ observed behavior - model averagedbehavior [Bienenstock et al ’82]


Deriving the Learning Rules: Supervised Learning


= 1 or from VI output ← traininginput ← training

Variational Inference (VI) needed if there are intermediate layers[Rezende et al ’11] [Osogami ’17] [Jang et al ’18].


Deriving the Learning Rules: Unsupervised Learning

For generative (unsupervised) models:


= from VI output ← traininginput ← ∅

Unsupervised learning models always have hidden layers.


Deriving the Learning Rules: Reinforcement Learning

Using policy gradient to learn an SNN policy:


reward/ return& VI

output ← actioninput ← state

Variational Inference (VI) needed if there are intermediate layers[Rezende et al ’11] [Osogami ’17] [Jang et al ’18].


Overview

Motivation

Models

Algorithms

Examples


Supervised Learning

encoder

source

decoder

5

decoder

rotated


Supervised Learning

From [Jang et al ’18-2]


Unsupervised Learning

decoder



decoder

variational

SNNencoder




Reinforcement Learning

encoder decoder

actuator

up


Reinforcement Learning

From [Rosenfeld et al ’18]


Concluding Remarks

Statistical signal processing review of neuromorphic computing viaSpiking Neural Networks.

Additional topics:I recurrent SNNs for long-term memory [Maas ’11]I neural sampling: information encoded in steady-state behavior [Buesing

et al ’11]I Bayesian learning via Langevin dynamics [Pecevski et al ’11] [Kappel et

al ’15]

Some open problems:I meta-learning, life-long learning, transfer learning [Bellec et al ’18]I learning I/O interfaces [Lazar and Toth ’03]


Acknowledgements

This work has received funding from the European Research Council(ERC) under the European Union’s Horizon 2020 research and innovation

programme (grant agreement No. 725731) and from the US NationalScience Foundation (NSF) under grant ECCS 1710009.


References

[Gerstner and Kistler ’02] W. Gerstner and W. M. Kistler, Spiking neuronmodels: Single neurons, populations, plasticity. Cambridge UniversityPress, 2002.[Pillow et al ’08] J.W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke,E. Chichilnisky, and E. P. Simoncelli, “Spatio-temporal correlations andvisual signalling in a complete neuronal population,” Nature, vol. 454, no.7207, p. 995, 2008.[Osogami ’17] T. Osogami, “Boltzmann machines for time-series,” arXivpreprint arXiv:1708.06004, 2017.[Ibnkahla ’00] Ibnkahla M. Applications of neural networks to digitalcommunications–a survey. Signal processing. 2000 Jul 1;80(7):1185-215.[Koller and Friedman ’09] Koller D, Friedman N, Bach F. Probabilisticgraphical models: principles and techniques. MIT press; 2009.


References

[Fremaux et al ‘08] N. Fremaux and W. Gerstner, “Neuromodulatedspike-timing-dependent plasticity, and theory of three-factor learningrules,” Frontiers in neural circuits, vol. 9, p. 85, 2016.[Jang et al ’18] H. Jang, O. Simeone, B. Gardnerm and A. Gruning,“Spiking neural networks: A stochastic signal processing perspective,” ...[Rezende et al ’11] Rezende DJ, Wierstra D, Gerstner W. “Variationallearning for recurrent spiking networks”, In Advances in Neural InformationProcessing Systems, 2011 (pp. 136-144).[Brea et al ’13] J. Brea, W. Senn, and J.-P. Pfister, “Matching recall andstorage in sequence learning with spiking neural networks,” Journal ofNeuroscience, vol. 33, no. 23, pp. 9565–9575, 2013.[Hebb ’49] D. Hebb, The Organization of Behavior. New York: Wiley andSons, Nov. 1949.


References

[Bienenstock et al ’82] E. L. Bienenstock, L. N. Cooper, and P. W. Munro,“Theory for the development of neuron selectivity: orientation specificityand binocular interaction in visual cortex,” Journal of Neuroscience, vol. 2,no. 1, pp. 32–48, 1982.[Pecevski et al ’11] D. Pecevski, L. Buesing, and W. Maass, “Probabilisticinference in general graphical models through sampling in stochasticnetworks of spiking neurons,” PLOS Computational Biology, vol. 7, no.12, pp. 1–25, 12 2011.[Rosenfeld et al ’18] Rosenfeld B, Simeone O, Rajendran B. LearningFirst-to-Spike Policies for Neuromorphic Control Using Policy Gradients.arXiv preprint arXiv:1810.09977. 2018 Oct 23.[Jang et al ’18-2] Jang H, Simeone O. Training Dynamic ExponentialFamily Models with Causal and Lateral Dependencies for GeneralizedNeuromorphic Computing. arXiv preprint arXiv:1810.08940. 2018 Oct 21.


References

[Bellec et al ’18] G. Bellec, D. Salaj, A. Subramoney, R. Legenstein, andW. Maass, “Long short-term memory and learning-to-learn in networks ofspiking neurons,” arXiv preprint arXiv:1803.09574, 2018.[Kappel et al ’15] Kappel D, Habenschuss S, Legenstein R, Maass W.Synaptic sampling: a Bayesian approach to neural network plasticity andrewiring. InAdvances in Neural Information Processing Systems 2015 (pp.370-378).[Buesing et al ’11] Buesing L, Bill J, Nessler B, Maass W. Neural dynamicsas sampling: a model for stochastic computation in recurrent networks ofspiking neurons. PLoS computational biology. 2011 Nov3;7(11):e1002211.[Lazar and Toth ’03] Lazar, Aurel A., and Laszlo T. Toth. "Time encodingand perfect recovery of bandlimited signals." ICASSP (6). 2003.[Maas ’11] Maass W. Liquid state machines: motivation, theory, andapplications. InComputability in context: computation and logic in thereal world 2011 (pp. 275-296).


References

[Neftci ’18] Neftci EO. Data and power efficient intelligence withneuromorphic learning machines. iScience. 2018 Jul 27;5:52.[Anwani and Rajendran ’18] N. Anwani and B. Rajendran, “TrainingMultilayer Spiking Neural Networks using NormAD based Spatio-TemporalError Backpropagation,” arXiv:1811.10678.


Documents

Neuromorphic Computing and Learning: A …Neuromorphic Computing and Learning: A Stochastic Signal Processing Perspective Osvaldo Simeone joint work with Hyeryung Jang (KCL), Bipin